A Safer Online Public Square: Research Notes
Table of Contents
- 1. Todo
- 1.1. TODO read ACL workshop papers
- 1.2. TODO prepare a list of NLP journals to search
- 1.3. TODO search NLP journals for keywords
- 1.4. TODO reach out to Phil for Gamergate tweets
- 1.5. TODO search ACM
- 1.6. TODO search Arxiv
- 1.7. TODO read papers about readability
- 1.8. TODO read papers about automated essay grading
- 1.9. TODO read papers about spam filtering
- 1.10. TODO explore Kaggle task: Detecting Insults in Social Commentary | Kaggle
- 1.11. TODO identify good researchers to invite to Columbia
- 1.12. Administrative
- 1.13. TODO organize GitHub repository into folders
- 1.14. TODO explore hatebase.org dataset
- 1.15. TODO explore psycholinguistics journals
- 1.16. TODO set up Doodle poll for next meeting
- 1.17. TODO make master report
- 2. Links
- 2.1. News, Blogs, Mass Media
- 2.1.1. DONE New York Times WhatsApp, Crowd, Power in India
- 2.1.2. DONE New York Times: NYT: Save Free Speech from Trolls
- 2.1.3. DONE Data & Society: Online Harassment and Digital Abuse
- 2.1.4. DONE Proceedings of the first ACL Workshop on Abusive Language Online
- 2.1.5. TODO Online Harassment Resource Guide - Mattias, J. Nathan (et. al)
- 2.1.6. TED talks
- 2.2. Software
- 2.3. Organizations
- 2.3.1. Colin's doc: Organizations doing something - Google Docs
- 2.3.2. DONE WMC Speech Project
- 2.3.3. DONE Trolldor: the global blacklist of twitter trolls
- 2.3.4. DONE No Hate Speech Movement
- 2.3.5. TODO Southern Poverty Law Center
- 2.3.6. TODO Cyberbullying Research Center
- 2.3.7. TODO Committee to Protect Journalists
- 2.3.8. TODO Anti-Defamation League Task Force on Harassment and Journalism
- 2.3.9. TODO Working to Halt Online Abuse
- 2.3.10. TODO UN Broadband Commission for Sustainable Development Working Group on Broadband and Gender
- 2.3.11. TODO SRI International
- 2.3.12. TODO Women Action Media (WAM!)
- 2.3.13. TODO Hack Harassment
- 2.3.14. Algorithmic
- 2.4. Statistics about Harassment
- 2.4.1. Proportion of Internet users that experience harassment
- 2.4.2. Infographic: The Rise of Online Harassment
- 2.4.3. DONE Online Harassment | Pew Research Center
- 2.4.4. DONE WHOA: Cyberstalking Statistics.
- 2.4.5. Intimidation, Threats, and Abuse | International Women's Media Foundation (IWMF)
- 2.4.6. TODO [[][Data and Society Report: Online Harassment, Digital Abuse, and Cyberstalking in America]
- 2.5. Social Media Services
- 2.6. People
- 2.7. Patents
- 2.7.1. TODO Patent US5796948 - Offensive message interceptor for computers - Google Patents
- 2.7.2. TODO Patent US8868408 - Systems and methods for word offensiveness processing using aggregated … - Google Patents
- 2.7.3. TODO Patent US8473443 - Inappropriate content detection method for senders - Google Patents
- 2.7.4. TODO Patent US7818764 - System and method for monitoring blocked content - Google Patents
- 2.7.5. TODO Patent US20080109214 - System and method for computerized psychological content analysis of … - Google Patents
- 2.7.6. TODO Patent US20110191105 - Systems and Methods for Word Offensiveness Detection and Processing Using … - Google Patents
- 2.1. News, Blogs, Mass Media
- 3. Problems, Topics
- 3.1. Censorship policies of social media companies
- 3.2. Flagging
- 3.3. Counterspeech, Moderation
- 3.4. Cross-cultural studies
- 3.5. Troll detection / troll bots / misinformation bots
- 3.5.1. At least 10% of #gamergate tweets have bot OSes (see below)
- 3.5.2. DONE Tweet: A pattern you may have noticed: many bot and troll accounts have usernames that end in 8 random digits.
- 3.5.3. DONE Twitter Audit | How many of your followers are real?
- 3.5.4. TODO "Exposing Paid Opinion Manipulation Trolls"
- 3.5.5. TODO "Finding Opinion Manipulation Trolls in News Community Forums"
- 3.5.6. TODO "Propagation of trust and distrust for the detection of trolls in a social network"
- 3.5.7. TODO "Accurately detecting trolls in slashdot zoo via decluttering"
- 3.5.8. TODO "Assessing trust: contextual accountability"
- 3.5.9. TODO "Filtering offensive language in online communities using grammatical relations"
- 3.5.10. TODO "Offensive language detection using multi-level classification"
- 3.6. Automated Detection
- 3.7. Psychology, Perception
- 3.8. Gamergate
- 4. Questions
- 4.1. Has anyone done a comment/article similarity (relevance) study like but using word/document vectors instead of tf-idf?
- 4.2. Has anyone studied platform/OS source as predictor of potentially abusive language?
- 4.3. What can psycholinguistics studies offer to fingerprinting of abusive language?
- 4.4. Has anyone written a Twitter bot to identify abusive speech, and then ask the alleged abuser/abusee whether he/she thought it was abusive?
- 4.5. What Twitter accounts or hashtags might be cataloging abusive tweets? Can these be mined to create new datasets?
- 4.6. If we can identify male voices or deceptive, can we use that as a proxy to identifying trolls?
- 5. Books and Other Sources
- 6. Reports
- 7. References
- 8. Meeting notes
1 Todo
1.1 TODO read ACL workshop papers
Headline | Time | |
---|---|---|
Total time | 14:05 | |
read ACL workshop papers | 14:05 |
1.2 TODO prepare a list of NLP journals to search
1.3 TODO search NLP journals for keywords
1.4 TODO reach out to Phil for Gamergate tweets
1.5 TODO search ACM
1.6 TODO search Arxiv
1.7 TODO read papers about readability
1.8 TODO read papers about automated essay grading
1.9 TODO read papers about spam filtering
1.10 TODO explore Kaggle task: Detecting Insults in Social Commentary | Kaggle
1.11 TODO identify good researchers to invite to Columbia
1.12 Administrative
1.12.1 TODO fill out timesheet and submit to French department
1.13 TODO organize GitHub repository into folders
1.14 TODO explore hatebase.org dataset
1.15 TODO explore psycholinguistics journals
1.15.1 DONE Journal of Psycholinguistic Research
1.16 TODO set up Doodle poll for next meeting
1.17 TODO make master report
2 Links
2.1 News, Blogs, Mass Media
2.1.1 DONE New York Times WhatsApp, Crowd, Power in India
- Describes fake news circulated using WhatsApp that results in mob violence, riots
2.1.2 DONE New York Times: NYT: Save Free Speech from Trolls
- "…the anti-free-speech charge, applied broadly to cultural criticism and especially to feminist discourse, has proliferated."
- "[Anita] Sarkeesian has been relentlessly stalked, abused, and threatened since 2012, when she started a Kickstarter campaign to fund a series of YouTube videos critiquing the representation of women in video games."
- Sarkeesian: "They're weaponizing free speech to maintain their cultural dominance."
- "'Free speech' rhetoric begot 'fake news,' which begot 'alternative facts.'"
2.1.3 DONE Data & Society: Online Harassment and Digital Abuse
- See report in 2.4 below
2.1.4 DONE Proceedings of the first ACL Workshop on Abusive Language Online
- Contains a number of relevant papers on the automated detection of abusive language. Parsed this into individual entries.
2.1.5 TODO Online Harassment Resource Guide - Mattias, J. Nathan (et. al)
- Susan: "Literature review on online harassment circa 2015/2016. Created for Wikimedia Foundation by folks from MIT Center for Civic Media & Berkman Center for Internet and Society"
- Very thorough overview
2.1.6 TED talks
- Ashley Judd: How online abuse of women has spiraled out of control | TED Talk | TED.com
October 2016 at TEDWomen 2016
Judd recounts her ongoing experience of being terrorized on social media for her unwavering activism and calls on citizens of the internet, the tech community, law enforcement and legislators to recognize the offline harm of online harassment.
"because the threat of violence is experienced neurobiologically as violence. The cortisol shoots up, the limbic system gets fired, we lose productivity at work."
Judd founds The Speech Project:
"EDGE, the global standard for gender equality, is the minimum standard."
And the law: "In New York recently, the law could not be applied to a perpetrator because the crimes must have been committed – even if it was anonymous – they must have been committed by telephone, in mail, by telegraph –"
2.2 Software
2.2.1 DONE TrollBusters | Devpost
"Offering online "pest control" solutions for women news publishers"
- DONE presentation slides: TrollBusters: International Women's Media Foundation Hackathon Soluti…
- DONE News article: Team developing tool to combat online harassment of women journalists takes top prize at New York hack-a-thon | All Digitocracy
"TrollBusters will use proprietary audience targeting software, designed by a team at Ferrier’s university, to identify communities of trolls around any given issue using natural language processing. The service will counter cyberattacks in real- time with online community support and positive messaging, Ferrier said in her pitch."
2.2.2 DONE Perspective (Jigsaw, Google)
- Looks like much of their code is on GitHub
- NYT is working with them (Jigsaw) to aid moderation
- DONE Google's Anti-Bullying AI Mistakes Civility for Decency - Motherboard
- DONE The Times is Partnering with Jigsaw to Expand Comment Capabilities | The New York Times Company
- DONE Jigsaw working with Wikipedia: Research:Detox - Meta
2.2.3 DONE The Coral Project
- Mozilla, also in use by NYT
- Unclear how or whether this uses ML or automated detection of abuse.
- "Our Talk tool makes it easier for people to mute other users, and for newsrooms to spot and deal with abusive contributions quickly. It keeps you closer to conversations that you want to participate in, and away from those that you don’t."
- Talk v1 features – The Coral Project
- "Banned words are immediately rejected; suspect words are automatically flagged"
- "Links and banned/suspect words are highlighted for easier moderation"
2.2.4 DONE Wikipedia DeTox (also Jigsaw)
- Test
- Testing aggression model:
- "Be careful, you might find some white powder in an envelope come in the mail one day." 1% aggressive.
- "If you keep this up, you find yourself sleeping with the fishes." 12% aggressive.
- "I'm going to come to your house." 48% aggressive.
- "I'm going to nominate you for the Nobel prize, you brilliant man." 61% aggressive.
- Testing aggression model:
2.2.5 Development contests
- TODO 2012 Kaggle Task, Detecting Insults in Social Commentary hasCorpus
- winning entries used Python and scikit-learn; lots of entries ranking 8th and below used R
- tokenization is a (surprisingly) important part of this–what constitutes a word
- collapsing spaces between single-letters: "f u c k" -> "fuck"
- many of these seem to have unnecessarily custom implementations of common tokenization, stemming, or other functions.
- Q: could this be improved by using industry-standard libraries?
- almost all use some form of cross-validation or grid search, tuning its own parameters
- Vivek Sharma, 1st Place
- TODO original code: single python script
- added to repository at Code/kaggle-1st-sharma/kaggle-1st-sharma.py
- DONE uses this "bad words" file
- DONE description
My feature set was almost the same as the char and word features that Andreas used. SVC gave me better performance than regularized LR. And, some normalizations (like tuzzeg mentioned), along with using a bad words list (http://urbanoalvarez.es/blog/2008/04/04/bad-words-list/) helped quite a bit. Those were probably the only differences between Andreas' score and mine. The single SVC model would have won by itself, although the winning submission combined SVC with RF which improved the score marginally over just SVC. Regularized LR and GBRT were also tried, but they did not change the score much. I did not use the datetime field.
Tuzzeg, I experimented a little bit with phrase features, and I'm pretty sure they would be needed in any implementation of such a system. A lot of the insults were of the form: "you are/you're a/an xxxx", "xxxx like you", "you xxxx". I tried to look for a large +ve/-ve word list to determine sentiment of such phrases with unseen words, but I couldn't find a good word list that was freely available for commercial use. Does anyone know of one? Ultimately, I didn't use any such features except for a very simplified one based on "you are/you're xxx" which did help the score, although, only to a small extent.
- TODO original code: single python script
- Tuzzeg, 2nd Place
- uses Stanford POS and Stanford tagger for feature extraction, Python and scikit-learn for everything else
- uses a Random Forest regressor as a meta-classifier for a stack of basic classifiers
- uses different language models:
- char n-grams
- stem + POS models
- ! "syntax bigrams" using dependency modeling (a word paired with the tag of its dependent, e.g. "understand do" -> "understand AUX)
- DONE Short technique description
I used scikit-learn as well, with Stanford POS tagger and Stanford parser. My approach in general was ensemble of LogisitcRegression classifiers over words, stemmed words, POS tags, char ngrams, words/stems 2,3-grams, word/stem subsequences, language models over words/stems/tags and a bunch of features over dependency parsing results (110 basic classifiers in final solution). All of them were stacked using ExtraTreesRegressor.
I didn't use word correction - which could help to detect such phrases like 'r u'=='are you' or 'f#%k'.
- DONE code on GitHub
- Much, much more code than the 1st place script
- DONE In-depth description
- Andrei Olariu, 3rd Place
- very elaborate custom tokenization, removes repeated letters ("coooool" -> "cool")
- "grouping together sequences of one-letter words – like “f u c k”"
- uses neural net classifier to tie together three basic categorizers
- adds custom features: "the ratio of curse words; the text length; the ratio of *, ! or ?; the ratio of capital letter (should have used words in all caps instead)"
- DONE Summary
"SVMs, neural networks and some good tokenizing"
- DONE Description in blog post
- DONE code on GitHub
- like 2nd place entry, much, much more code here than 1st place script
- very elaborate custom tokenization, removes repeated letters ("coooool" -> "cool")
- Joshnk, 4th Place
- DONE Summary
I used character n-grams, tfidf with sublineartf and SGDRegressor with early stopping. I am somewhat proud of the early stopping code.
My reason for using a regression estimator was that the evaluation was going to be AUC, which is sensitive only to the order of the scores, not the finer details. Had I used a classifier, I would have needed to do something with predict proba to arrange the items in a good order anyway. SGD is also nice because it works well with sparse inputs lets you explore things like the use of the elastic net penalty while sticking with the same classifier.
As I said in my comment on Andreas Mueller's blog, the final order has an element of luck to it, because the final test set was so small and the labeling was rather noisy
- DONE code on GitHub
- command-line Python program
- seems to be manually tuned instead of using CV?
- DONE Summary
- Andreas Mueller, 6th Place
- TODO code on GitHub
- DONE Blog post: Peekaboo: Recap of my first Kaggle Competition: Detecting Insults in Social Commentary {update 3}
- uses a combination of four language models, incl. char n-grams, word n-grams (performed better than chars), custom features
- all params cross-validated
- bad words list: "For the list of bad words, I used one that allegedly is also used by google. As this will include 'motherfucker' but not 'idiot' or 'moron' (two VERY important words in the training / leaderboard set), I extended the list with these and whatever the thesaurus said was 'stupid'."
- TODO code on GitHub
2.3 Organizations
2.3.1 Colin's doc: Organizations doing something - Google Docs
2.3.3 DONE Trolldor: the global blacklist of twitter trolls
The aim of Trolldor is to combat the defenselessness of Twitter users. We want to get across the need behavior on Twitter to be based on respect for users, to encourage a good social network environment.
We feel that the behavior of some Twitter users is part of the problem, which is why we’ve created Trolldor, a place where users themselves are the ones who can report other users that fail to respect everyone else.
Trolldor works like a blacklist of Trolls, and is open to any user in the world with a Twitter account.
- Needs three reports from different users to get listed.
- Maintain a list of top 10 worldwide tr
2.3.4 DONE No Hate Speech Movement
"A youth campaign of the Council of Europe for human rights online, to reduce the levels of acceptance of hate speech and develop online youth participation and citizenship, including in Internet governance processes."
2.3.5 TODO Southern Poverty Law Center
- maintain a list and map of 917 hate groups operating in the US
2.3.6 TODO Cyberbullying Research Center
"The Cyberbullying Research Center is dedicated to providing up-to-date information about the nature, extent, causes, and consequences of cyberbullying among adolescents. Cyberbullying can be defined as “Willful and repeated harm inflicted through the use of computers, cell phones, and other electronic devices.” It is also known as “cyber bullying,” “electronic bullying,” “e-bullying,” “sms bullying,” “mobile bullying,” “online bullying,” “digital bullying,” or “Internet bullying.” The Center also explores other adolescent behaviors online including sexting, problematic social networking practices, and a variety of issues related to digital citizenship."
2.3.7 TODO Committee to Protect Journalists
"The Committee to Protect Journalists is an independent, nonprofit organization that promotes press freedom worldwide. We defend the right of journalists to report the news without fear of reprisal."
2.3.8 TODO Anti-Defamation League Task Force on Harassment and Journalism
2.3.9 TODO Working to Halt Online Abuse
2.3.10 TODO UN Broadband Commission for Sustainable Development Working Group on Broadband and Gender
- TODO Report: Cyber Violence Against Women and Girls
- TODO Response in NY Mag: The U.N.’s Cyberharassment Report Is Really Bad
- TODO Response in NY Mag: The U.N.’s Cyberharassment Report Is Really Bad
2.3.11 TODO SRI International
"nine months ago, a social network approached the SRI and said it had a major problem with bullying on its platform. The company, which Winarsky declined to identify, had already gathered a wealth of reports and data sets on bullying and offered them to SRI to see if its researchers could do anything to help curb the problem." alba_weeding_2015
2.3.12 TODO Women Action Media (WAM!)
"allowed to report and identify harassment on behalf of others" and report them to Twitter lapowsky_its_2015
2.3.13 TODO Hack Harassment
"Hack Harassment is a coalition of organizations and individuals who share in the common goal of building a more inclusive and supportive online community. Hack Harassment does not guarantee the world will be free from online harassment, but together, we hope to bring us all closer to that goal."
2.3.14 Algorithmic
- TODO Jigsaw: org within Alphabet (Google)
"We’re an incubator within Alphabet that builds technology to tackle some of the toughest global security challenges facing the world today—from thwarting online censorship to mitigating the threats from digital attacks to countering violent extremism to protecting people from online harassment."
- Creators of project Perspective
2.4 Statistics about Harassment
2.4.1 Proportion of Internet users that experience harassment
- 47% (D&S report)
2.4.2 Infographic: The Rise of Online Harassment
Survey by:
- Rad Campaign (Web Design Agency)
- Lincoln Park Strategies (Data analytics)
- Craig Newmark (Consultant?)
2.4.3 DONE Online Harassment | Pew Research Center
2014 Report
2.4.4 DONE WHOA: Cyberstalking Statistics.
2.4.6 TODO [[https://www.datasociety.net/pubs/oh/Online_Harassment_2016.pdf][Data and Society Report: Online Harassment, Digital Abuse, and Cyberstalking in America]
- DONE A new study suggests online harassment is pressuring women and minorities to self-censor — Quartz
- "Researchers consistently find that people self-censor online to avoid retaliation. This could be positive: For instance, people might be less likely to use a racial slur online if they think they’ll be condemned for it. But given the differences in people’s experience of harassment, this survey suggests that young people, especially young women and LGB people, are less likely to make online contributions at all because they’re worried about being attacked for it."
- DONE Blog post: Culture of Harassment – Data & Society: Points
- Summarizes D&S report.
- "Danah Boyd reads Data & Society and CiPHR’s new report, “Online Harassment, Digital Abuse, and Cyberstalking in America,” and connects it with her own qualitative research and today’s political culture. Online harassment, she argues, suppresses voices that need to be heard for the public sphere to be public. — Ed."
- TODO 47 Percent of U.S. Internet Users Have Experienced Online Abuse - The Atlantic
2.5 Social Media Services
2.5.1 General Legal / Terms of Service Issues
- TODO "Towards a better protection of social media users: a legal perspective on the terms of use of social networking sites" wauters_towards_2014
- TODO "Intermediaries and hate speech: Fostering digital citizenship for our information age." citron_intermediaries_2011
2.5.2 Facebook
- DONE - ProPublica: Facebook's Secret Censorship Rules Protect White Men from Hate Speech But Not Black Children
- Describes Facebook's rules for deleting posts
- Facebook doesn't delete attacks on "subsets" of people, e.g. "female drivers," but deletes posts of "protected categories," of entire races, sexes, religious affiliations, e.g. "white men."
- Facebook permits speech that is illegal in some countries, like Holocaust denial
- FB currently employs about 4,500 censors
- FB shuts down accounts of some activists. (Article doesn't explain reasons.)
- "Kate Klonick, a Ph.D. candidate at Yale Law School who has spent two years studying censorship operations at tech companies,"
- "Candidate Trump’s posting — which has come back to haunt him in court decisions voiding his proposed travel ban — appeared to violate Facebook’s rules against “calls for exclusion” of a protected religious group. Zuckerberg decided to allow it because it was part of the political discourse, according to people familiar with the situation."
- Q: Would allowing incendiary posts/comments ultimately be healthy for society, since it allows for criticism and discourse?
2.5.3 Twitter
- DONE Twitter blog post: Progress on addressing online abuse
- "We’re enabling you to mute keywords, phrases, and even entire conversations you don’t want to see notifications about"
- "We’ve also improved our internal tools and systems in order to deal more effectively with this conduct when it’s reported to us. Our goal is a faster and more transparent process."
- DONE Twitter: Hateful Conduct Policy
- "You may not promote violence against or directly attack or threaten other people on the basis of race, ethnicity, national origin, sexual orientation, gender, gender identity, religious affiliation, age, disability, or disease."
- "Context matters. Some Tweets may seem to be abusive when viewed in isolation, but may not be when viewed in the context of a larger conversation."
- Say that they may suspend accounts for violations.
- DONE Wired article: Twitter Eggs, the End Has Finally Come for Your Awfulness | WIRED
- On algorithms for filtering trolls: "Twitter says it has developed algorithms that can detect when an account engages in abusive behavior—for instance, if it repeatedly tweets at non-followers."
- On user-level filtering: "Twitter will now let users filter "Twitter eggs" out of their notifications."
- Twitter timeout
2.5.4 Mastodon
- DONE Mastodon.social: Why does every new “Twitter” fail?
- Calls Mastodon a failure, and attempts a postmortem.
- TODO WIRED: Social Media Upstart Mastodon Is Like Twitter, Except Way More Civil | WIRED
2.5.5 WhatsApp
2.5.6 Reddit
2.5.7 Wikipedia
- TODO The Work of Sustaining Order in Wikipedia: The Banning of a Vandal geiger_work_2010
- TODO Book: Wikipedia and the Politics of Openness tkacz_wikipedia_2014
2.5.8 Metafilter
- TODO Dissertation: "What we talk about when we talk about talking: Ethos at work in an online community" warnick_what_2010
Abstract: "This dissertation explores the rhetorical concept of ethos as it functions in contemporary online communities, via a case study of one successful online community, MetaFilter. com. A year-long virtual ethnography of MetaFilter demonstrates that understanding ethos as it functions online requires a multilayered definition that accounts for the traditional notion of ethos as vir bonus, the strict Aristotelian conception of ethos as …"
2.6 People
2.7 Patents
3 Problems, Topics
3.1 Censorship policies of social media companies
3.2 Flagging
3.2.1 TODO "What is a Flag for? Social Media Reporting Tools and the Vocabulary of Complaint" crawford_what_2016
3.2.2 TODO Reporting, Reviewing, and Responding to Harassment on Twitter. matias_reporting_2015
3.3 Counterspeech, Moderation
3.3.1 DONE "Vectors for Counterspeech on Twitter" wright_vectors_2017
- counterspeech
- "a direct response to hateful or harmful speech" 57
Counterpseech "can exhibit a number of different communicative strategies including humor, emotional appeals, multi-stage dialog, and over verbal attack itself" 58
- "an empathetic and/or kind tone, use of images, and use of humor" 59
- "no indication that these forms are templated" 58
Identify one-to-one counterspeech, many-to-one, and many-to-many
"The blog “Racists Getting Fired” made a practice of punishing people who posted racist content by contacting their employers and, similarly, demanding that they be fired (McDonald, 2014). Such responses are no doubt successful at changing the online speech of their targets, but may only harden the hateful convictions of those targets, and constitute online mob justice." 60
3.3.2 TODO "The Virtues of Moderation" grimmelmann_virtues_2015
3.3.3 TODO "Slash (dot) and burn: distributed moderation in a large online conversation space" lampe_slash_2004
3.4 Cross-cultural studies
3.4.1 TODO "Rephrasing Profanity in Chinese Text" su_rephrasing_2017
3.4.2 TODO "Legal Framework, Dataset and Annotation Schema for Socially Unacceptable Online Discourse Practices in Slovene" fiser_legal_2017
3.4.3 TODO "Abusive Language Detection on Arabic Social Media" mubarak_abusive_2017
3.5 Troll detection / troll bots / misinformation bots
3.5.1 At least 10% of #gamergate tweets have bot OSes (see below)
3.5.2 DONE Tweet: A pattern you may have noticed: many bot and troll accounts have usernames that end in 8 random digits.
3.5.3 DONE Twitter Audit | How many of your followers are real?
- Service that tries to detect whether your followers are real people.
- How does it work?
3.5.4 TODO "Exposing Paid Opinion Manipulation Trolls" mihaylov_exposing_2015
Abstract: "We solve the training data problem by assuming that a user who is called a troll by several different people is likely to be such"
Data:
- Scraped comments from the largest Bulgarian newspaper website (445)
- Requires users to be logged in
Features that distinguish between paid trolls and non-trolls:
- day of week: F-score of 0.89
- reply status: 0.75
- time in hours: 0.75
Results:
- "Overall, paid trolls looked roughly like the 'mentioned' trolls, except that they were posting most of their comments on working days and during working hours."
- Paid trolls are more successful at upsetting people (negative votes from other users were correlated)
3.5.5 TODO "Finding Opinion Manipulation Trolls in News Community Forums" mihaylov_finding_2015
3.5.6 TODO "Propagation of trust and distrust for the detection of trolls in a social network" ortega_propagation_2012
3.5.7 TODO "Accurately detecting trolls in slashdot zoo via decluttering" kumar_accurately_2014
3.5.8 TODO "Assessing trust: contextual accountability" rowe_assessing_2009
3.5.9 TODO "Filtering offensive language in online communities using grammatical relations" xu_filtering_2010
3.5.10 TODO "Offensive language detection using multi-level classification" razavi_offensive_2010
3.6 Automated Detection
3.6.1 Of high-quality contributions
- DONE "How Useful are Your Comments?- Analyzing and Predicting YouTube Comments and Comment Ratings" siersdorfer_how_2010
- "Can we predict the community feedback for comments?" 892
- "automatically generated content ratings might help to identify users showing malicious behavior such as spammers and trolls at an early stage, and, in the future, might lead to methods for recommending to an individual user of the system other users with similar interests and points of views." 892
- use 6.1M comments from 67K videos 893
- mean # comments 475
- distribution of comment ratings skews positive, with mean of 0.61
- find MDWs for comments with high, low ratings
- low rating MDWs contain racial, gender slurs, obscenities
- sentiment analysis shows correlation between machine-detected sentiment and ratings
- use SentiWordNet thesaurus
- use SVM classifiers to predict categories
- predictably, the classifier works best on high and low ratings, not as well on comments with neutral ratings
- test "variance of comment ratings as indicator for polarizing videos"
- find MDWS for polarizing and non-polarizing videos.
- high comment rating variance MDWS include political terms, terms relating to religion
- low comment rating variance MDWs include sports-, hobby-, and tax-related terms
- "Politics videos have significantly more negatively rated comments than any other category. Music videos, on the other hand, have a clear majority of positively rated comments."
- Music has the highest mean comment rating, science and automotive videos the lowest.
- Mean sentivalues across categories also correlate, with music showing the highest mean, and autos, gaming, science the with the lowest mean.
- DONE "The Editor's Eye: Curation and Comment Relevance on the New York Times" diakopoulos_editors_2015
"explores the manifestation of editorial quality criteria in comments that have been curated and selected on the New York Times website as “NYT Picks.” The relationship between comment selection and comment relevance is examined through the analysis of 331,785 comments, including 12,542 editor’s selections. A robust association between editorial selection and article relevance or conversational relevance was found."
"Could new computational tools be used to reduce the amount of time journalists need to spend doing this curatorial work, to identify worthy but overlooked contributions, or to scale their ability to consider more content?"
NYT comment moderation:
- pre-moderate comments
- assign "NYT Picks" badge to good comments
Preprocessing: tokenize, normalize, stopword filter, and stem
- reduce the vocabulary to 22,837 features
- transform into tf-idfs
- analyze cosine similarity between comments and articles
Find that "the article relevance of the comment is positively associated with a higher chance of it being selected by an editor."
"There was a slight negative correlation between elapsed time and whether the comment was an editor’s selection (Spearman rho = -0.048, p = 0). Thus, there are less editor’s selections later in the conversation." 3
"Comments made in the first hour have a distinctly higher article relevance than in the immediately subsequent hours. But after about 18 hours the average article relevance begins increasing again up to hour 48" 3
This article seems to assume that tf-idf cosine similarity can be directly interpreted as "relevance."
- It's possible that a very relevant comment contains very few of the words used in the article, and would then be computationally considered irrelevant.
- DONE "Predicting information credibility in time-sensitive social media" castillo_predicting_2013
- supervised categorization of "credible" and non-credible tweet groups or "information cascades"
- study propogation of tweets, tweet "affirmations," "questions," and other reactions
- use data set of manually-labeled (Amazon Turk) tweets as "likely to be true," etc.
- best 8 features that distinguish between "NEWS" and "CHAT" (discussion) labels: (573)
- ! "fraction of authors in the topic that have written a self-description (“bio” in Twitter terms)"
- "count of distinct URLs"
- "fraction of URLs pointing to domains in the top 100 most visited domains on the web"
- "average length of the tweets"
- "count of distinct user mentions"
- "fraction of tweets containing a hashtag"
- "fraction of tweets containing a “frowning” emoticon"
- "maximum depth of propagation trees"
- test clustering/classification methods, find that Random Forest classifies best.
- best features that distinguish between "credible" and "not credible" labels: (575)
- the average number of tweets posted by authors of the tweets in the topic in the past
- the average number of followers of authors posting these tweets
- the fraction of tweets having a positive sentiment
- the fraction of tweets having a negative sentiment
- the fraction of tweets containing a URL that contain the most frequent URL
- the fraction of tweets containing a URL
- the fraction of URLs pointing to a domain among the top 10,000 most visited
- the fraction of tweets containing a user mention;
- the average length of the tweets;
- the fraction of tweets containing a question mark;
- the fraction of tweets containing an exclamation mark;
- the fraction of tweets containing a question or an exclamation mark;
- the fraction of tweets containing a “smiling” emoticons;
- the fraction of tweets containing a first-person pronoun;
- the fraction of tweets containing a third-person pronoun; and
- the maximum depth of the propagation trees.
- test clustering methods, find that logistic regression classifies with ~80% accuracy
- DONE "Constructive Language in News Comments" kolhatkar_constructive_2017 hasCorpus
- create a custom annotated corpus
- crowdsource the annotation of comments as "constructive" or not (12)
- "Out of the 1,121 comments, 603 comments (53.79%) were classified as constructive, 517 (46.12%) as non-constructive, and the annotators were not sure in only one case." (12)
- corpus available on GitHub
- also use Yahoo News Annotated Corpus and Argument Extraction Corpus
- train a Bi-directional Long Short-Term Memory model (biLSTM) (implemented in TensorFlow)
- make word vectors for each word, using GloVe vectors
- categorization is about 72% precise
- features with strong correlation with constructiveness:
- "argumentative discourse relations"
- "stance adverbials (e.g., undoubtedly, paradoxically, of course)"
- "reasoning verbs (e.g., cause, lead)"
- modals
- crowdsource annotation of comments as "toxic" or not on a scale
- "constructiveness and toxicity are orthogonal categories."
- create a custom annotated corpus
- DONE "Finding high-quality content in social media" agichtein_finding_2008
- study a Yahoo Answers corpus
- express "high quality content" through user reputation,
- calculated through graph-based algorithms like PageRank, HITS, ExpertiseRank
- features: "all word n-grams up to length 5 that appear in the collection more than 3 times used as features."
- also add as features POS representations of n-grams
- ! "Some part-of-speech sequences are typical of correctly- formed questions: e.g., the sequence “when|how|why to (verb)” (as in “how to identify. . . ”) is typical of lower-quality ques- tions, whereas the sequence “when|how|why (verb) (personal pronoun) (verb)” (as in “how do I remove. . . ”) is more typical of correctly-formed content."
- use formality score of heylighen_variation_2002
- also add as features POS representations of n-grams
- classifier: stochastic gradient boosted trees
- "A particularly useful aspect of boosted trees for our settings is their ability to utilize combinations of sparse and dense features." (187)
- relevance scores: "To represent this we include the KL-divergence between the language models of the two texts, their non-stopword overlap, the ratio between their lengths, and other similar features."
- measure "non-stopword word overlap between question and answer"; this is one of their answer features
- readability: Kincaid score is an answer feature
- 20 most signification question quality features:
- Average number of ”stars” to questions by the same asker; the punctuation density in the question’s subject; the question’s category (assigned by the asker).; “Normalized Clickthrough:” The number of clicks on the question thread, normalized by the average number of clicks for all questions in its category.; Average number of ”Thumbs up” received by answers written by the asker of the current question.; Number of words per sentence.; Average number of answers with references (URLs) given by the asker of the current question.; Fraction of questions asked by the asker in which he opens the question’s answers to voting (instead of pick- ing the best answer by hand).; Average length of the questions by the asker; the number of “best answers” authored by the user; the number of days the user was active in the system.; “Thumbs up” received by the answers wrote by the asker of the current question, minus “thumbs down”, divided by total number of “thumbs” received.; “Clicks over Views:” The number of clicks on a question thread divided by the number of times the question thread was retrieved as a search result (see [2]); the KL-divergence between the question’s language model and a model estimated from a collection of question answered by the Yahoo editorial team (available in http://ask.yahoo.com); the fraction of words that are not in the list of the top-10 words in the collection, ranked by frequency; the number of “capitalization errors” in the question (e.g., sentence not starting with a capitalized word); the number of days that has passed since the asker wrote his/her first question or answer in the system; the total number of answers of the asker that have been selected as the “best answer”; the number of questions that the asker has asked in its most active category, over the total number of questions that the asker has asked; the entropy of the part-of-speech tags of the question.
- 20 most significant answer features:
- Answer length; The number of words in the answer with a corpus frequency larger than c; the number of “thumbs up” minus “thumbs down” received by the answerer, divided by the total number of “thumbs” s/he has received.; the entropy of the trigram character-level model of the answer; the fraction of answers of the answerer that have been picked as best answers (either by the askers of such questions, or by a community voting); The unique number of words in the answer; average number of abuse reports received by the answerer over his/her answers ;
- The non-stopword word overlap between the question and the answer.
- ∅ The Kincaid [21] score of the answer.
- The average number of answers received by the questions asked by the asker of this answer; the ratio between the length of the question and the length of the answer; the number of “thumbs up” minus “thumbs down” received by the answerer; the average numbers of “thumbs” received by the answers to other questions asked by the asker of this answer; the entropy of the unigram character-level model of the answer; the KL-divergence between the answer’s language model and a model estimated from the Wikipedia discussion pages; number of abuse reports received by the asker of the question being answered; the sum of the lengths of all the answers received by the asker of the question being answered; the sum of the “thumbs down” received by the answers received by the asker of the question being answered; the average number of answers with votes in the questions asked by the asker of the question being answered
- DONE "How opinions are received by online communities: a case study on amazon.com helpfulness votes" danescu-niculescu-mizil_how_2009
Study of Amazon.com reviews and evaluations of those reviews ("24 out of 25 people found this review helpful").
"We find that the perceived helpfulness of a review depends not just on its content but also but also in subtle ways on how the expressed evaluation relates to other evaluations of the same product." 1
Three-party concerns: "Rather than asking questions of the form “What did Y think of X?”, we are asking, “What did Z think of Y’s opinion of X?” Crucially, there are now three entities in the process rather than two." 1
- ! "Heider’s theory of structural balance in social psychology seeks to understand subjective relationships by considering sets of three entities at a time as the basic unit of analysis."
! "A significant and particularly wide-ranging set of effects is based on the relationship of a review’s star rating to the star ratings of other reviews for the same product. We view these as fundamentally social effects, given that they are based on the relationship of one user’s opinion to the opinions expressed by others in the same setting."
Dataset: "over four million reviews of roughly 675,000 books on Amazon’s U.S. site, as well as smaller but comparably- sized corpora from Amazon’s U.K., Germany, and Japan sites"
Test four hypotheses (2):
- "conformity hypothesis" that reviews are considered more helpful if their star ratings are close to the average
- "individual-bias hypothesis" that users like reviews that agree with their opinions
- "brilliant-but-cruel hypothesis" that users assume low reviews correlate with intelligence
- "quality-only" hypothesis that ratings correlate with textual quality
! find that helpfulness ratio inversely proportional to star rating
- reviews "punished asymmetrically: slightly negative reviews are punished more strongly…than slightly positive reviews"
- "it is not simply that closeness to the average is rewarded; among reviews that are slightly away from the mean, there is a bias toward overly positive ones" 3
- find generally that "conformity hypothesis" is true, except when variance in star ratings is high
- find that, cross-culturally, these findings hold true
- they "control for text" by looking at helpfulness ratings of identical reviews 3, find that their observed effect holds true regardless
- DONE "Variation in the contextuality of language: An empirical measure." heylighen_variation_2002
From abstract: "An empirical measure of this variation is proposed, the 'formality' or 'F-score', based on the frequencies of different word classes. Nouns, adjectives, articles and prepositions are more frequent in low-context or 'formal' types of expression; pronouns, adverbs, verbs and interjections are more frequent in high-context styles."
Uses anthropologist Edward T. Hall's definition of "high-context" and "low-context" situations.
- high-context: communication is implicit
- low-context: communication is more explicit and overt
- "the association of context with specific cultures seems to imply that the degree of context, dependence is merely the result of historical accidents or of idiosyncratic differences between ethnicities"
Define a "formality/contextuality continuum" in which "the opposite of contextuality may be called 'formality'" 298
- yet differentiate between "deep formality," which aims to be explicit and avoid ambiguity, and "surface formality," which is "ceremonial or required by convention."
! Argue that "completely unambiguous description is impossible" (300), citing Gödel's incompleteness theorem and Heisenberg's uncertainty principle
And textual genres: "we expect contextuality to be lowest in the more static, intellectual or informational forms of expression … this includes official, legal, technical or scientific documents … We expect contextuality to be highest in the more interactive and personal communication situations … this includes relaxed conversations, dialogues, … and personal letters." 302
Divides lexicon into more and less context-dependent classes:
- deictic words ("we," "him," "my," "here," "upstairs," "however") 306
- pronouns, adverbs, and interjections
- non-deictic words: most nouns and adjectives
- nouns, adjectives, and prepositions
F = (noun frequency + adjective freq. + preposition freq. + article freq. - pronoun freq. - verb freq. - adverb freq. - interjection freq. + 100)/2
Using a corpus with varying degrees of formality:
- F-scores: 44 (conversation), 54 (oral examination), 56 (essay)
Find that: 311
- those with academic degrees score higher (44 vs. 40)
- men higher than women (42 vs. 39)
Italian genres:
- movies, theater: 48, 52
- novels: 58-64
- newspapers and magazines: 66-71
- essays, science 69, 72
French:
- "interview with a call-girl": 45
- "interview with the president": 52
- "an address to the nation by the president": 58
- "an article in an intellectual newspaper": 78
Use factor analysis to find significant factors to explain variation
On integrating contextual information: "Following Levelt's (1989) classification of linguistic deixis, we can distinguish four categories of context factors: the persons involved, the space or setting of the communication, the time, and the discourse preceding the present expression." 324
- "the larger the difference in psychological or cultural background [between people communicating] the higher the formality of their communication" 324
- "the more different the spatial setting for sender and receiver, the smaller the shared context"
- "the longer the time span between sending and receiving, the less will remain of the original context" [and thus higher formality]
"the degree of extroversion was found to have a significant negative correlation with the explicitness factor measuring formality." 331-2
- DONE "Comment classification for an online news domain." brand_comment_2014
"Through investigation of supervised learning techniques, we show that content-based features better serves as a predictor of popularity, while quality-based features are better suited for predicting user engagement." 50
Test "quality-based features" and "content-based features"
Quality-based features:
- response time of user's comment
- length of comment
- uppercase frequency
- question mark / exclamation mark frequency
Lexical features:
- entropy of words in the comment: [is this just TR?]
- spelling
- profanity
- "informativeness": "how unique a comment is within its thread" (TF-IDF)
- "relevance": set intersection of words between comment and article
Social features:
- sentiment analysis
- "subjectivity" (neutrality of sentiment analysis, defined as between 45-50% sentiment)
- "engagement": number of child comments
Use linear regression and support vector regression;
Find that content-based features outperform quality-based features in predicting comment votes, but quality + content features outperforms both.
- But: "This could be attributed to biased voting patterns in the community, eg. users that would “like” a comment multiple times if it supports their viewpoint (politically, religiously, or otherwise), but not necessarily evaluate the comment’s quality." 55
- "The quality-based features are, however, better suited for predicting the engagement a comment will receive from users in a comment thread" 55
3.6.2 Of potentially abusive behavior
- Bullying
- TODO "Improved cyberbullying detection using gender information" dadvar_improved_2012
- TODO "Towards understanding cyberbullying behavior in a semi-anonymous social network" hosseinmardi_towards_2014
- TODO "Let's gang up on cyberbullying" lieberman_lets_2011
- TODO "A framework for cyberbullying detection in social network" kansara_framework_2015
- TODO "Script-based story matching for cyberbullying prevention" macbeth_script-based_2013
- TODO "Fast Learning for Sentiment Analysis on Bullying" xu_fast_2012
- TODO "An examination of regret in bullying tweets" xu_examination_2013
- TODO "Detection and fine-grained classification of cyberbullying events" van_hee_detection_2015
- TODO "Learning from bullying traces in social media" xu_learning_2012
- TODO "Cyberbullying detection: a step toward a safer internet yard" dadvar_cyberbullying_2012
- TODO "Modeling the detection of Textual Cyberbullying" dinakar_modeling_2011
- TODO "Detecting offensive language in social media to protect adolescent online safety." chen_detecting_2012
- TODO "An effective approach for cyberbullying detection" nahar_effective_2013
- TODO "Improved cyberbullying detection using gender information" dadvar_improved_2012
- DONE "Finding Deceptive Opinion Spam by Any Stretch of the Imagination" ott_finding_2011 hasCorpus
"ultimately develop a classifier that is nearly 90% accurate on our gold-standard opinion spam dataset."
- opinion spam
- defined as "inappropriate or fraudulent reviews," usu. for monetary gain 1
- deceptive opinion spam
- "fictitious opinions that have been deliberately written to sound authentic, in order to deceive the reader." 1
present public dataset of "gold-standard" deceptive reviews
Find that "a combined classifier with both n-gram and psychological deception features achieves nearly 90% cross-validated accuracy on this task. In contrast, we find deceptive opinion spam detection to be well beyond the capabilities of most human judges, who perform roughly at-chance"
Dataset creation:
- ! generate set of deceptive spam by hiring spammers on Mechanical Turk
- generate "truthful opinions" by removing five-star reviews, reviews by first-time authors
Find that:
- "automated classifiers outperform human judges for every metric"
- "deceptive opinions contain more superlatives"
"The combined model LIWC+BIGRMAS+SVM is 89.8% accurate at detecting deceptive opinion spam" 8
Qualities of truthful/deceptive language:
- "truthful opinions tend to include more sensorial and concrete language than deceptive opinions; in particular, truthful opinions are more specific about spatial configurations" 9
- "we observe an increased focus in deceptive opinions on aspects external to the hotel being reviewed (e.g. husband, business, vacation)" 9
"We find that while standard n-gram-based text categorization is the best individual detection approach, a combination approach using psycholinguistically-motivated features and n-gram features can perform slighly better." 9
- DONE "Automatic identification of personal insults on social news sites" sood_automatic_2012
"Our training corpus is a set of comments from a news commenting site that we tasked Amazon Mechanical Turk workers with labeling. Each comment is labeled for the presence of profanity, insults, and the object of the insults."
"we believe it is worthwhile to distinguish off-topic negative comments form on-topic negative comments that, while negative, are offered the spirit of debate." 1
"sentiment analysis is, in addition to being author, context and community-specific, a domain-specific problem"
- "for example, a 'cold' beverage is good while a 'cold' politician is bad" 3
- "in order to build an accurate sentiment analysis system, you must have labeled training data from within the target domain." 3
Corpus: 1.6M comments from 234K users in 168K threads from Yahoo! Buzz, 2010
- filter this for comments of length between 72 and 324 chars.
Label the data with help from Amazon Turk workers
- throw out comments in which there was no consensus
use linear kernel support vector machines for classification, end up usin gmultistep classifier SVM
find that genre (politics, entertainment, etc.) strongly affects categorizer accuracy, with news and politics having the lowest, and business and entertainment having the highest.
find that "bigrams and stems using a presence representation performed best," at around 85% accuracy
- "presence" here is binary presence of words, rather than their frequency
- using this representation, they redo the analysis, but find that it doesn't improve categorization in all domains
Relevance + sentiment analysis: "Our approach combines relevance analysis for detecting off-topic comments with valence analysis methods for detecting negative comments."
- relevance: relevance is the sum of TF-IDF differences between words
- DONE "Using Convolutional Neural Networks to Classify Hate-Speech" gamback_using_2017
"The classifier assigns each tweet to one of four predefined categories: racism, sexism, both (racism and sexism) and non-hate-speech. Four Convolutional Neural Network models were trained on resp. character 4-grams, word vectors based on semantic information built using word2vec, randomly generated word vectors, and word vectors combined with character n-grams. The feature set was down-sized in the networks by max- pooling, and a softmax function used to classify tweets. Tested by 10-fold cross-validation, the model based on word2vec embeddings performed best, with higher precision than recall, and a 78.3% F-score."
Corpus: use the English Twtiter hate-speech dataset created by waseem_hateful_2016
"following Waseem and Hovy (2016) only length 4 character n-grams were used. Clearly it would be interesting to explore whether these are uniformly ineffective when changing the n-gram size"
- DONE "Detecting Nastiness in Social Media" samghabadi_detecting_2017 hasCorpus
Corpus scraped from ask.fm
- 586K question-answer pairs
- Ask.fm's anonymity "allows attackers the power to freely harass users by flooding their pages with profanity-laden questions and comments" 63
- "Several teen suicides have been attributed to cyberbullying in ask.fm"
- "We crawl data containing profanities and then determine whether or not it contains invective. Annotations on this data are improved iteratively by in-lab annotations and crowdsourcing." 63
- Crowdsourced annotation of corpus using CrowdFlower 65
Bad words list:
- ! "Bad words list" compiled from Google's bad words list and words listed in hosseinmardi_towards_2014
- "most of these bad words are often used in a casual way, so detecting cases in which there are potential invective requires careful feature engineering" 65
"We also show the robustness of our model by evaluating it on different data sets (Wikipeida Abusive Language Data Set, and Kaggle)."
- ? Yet is this robustness a good thing? Shouldn't domain-specific models work better?
And spam: "Researchers have reported that cyberbullying posts are contextual, personalized, and creative, which make them harder to detect than detecting spam." 64
Final F-score of 59%
Data available at http://ritual.uh.edu/resources
Also test their system on Kaggle data
Use supervised classification algorithm linear SVM
Features:
- TF-IDF-weighted n-grams, char n-grams
- ! also k-skip n-grams ("to capture long-distance context")
- Normalized count of emoticons
- SentiWordNet scores on sentences
- LIWC (Linguistic Inquiry and Word Count) categories
- ? Has anyone used WordNet hypernyms?
- LDA topics
- Two types of Word embeddings: document vectors, and averaged word vectors
- ! patterns: "combination of lexical forms and POS tags"
Results:
- Best F-score AUC (area under curve) is 0.889 for Wikipedia data set;
- performs with a F-score of 0.75 using all features
Poor performance with ask.fm, since they use shorter texts
- TODO "Automated hate speech detection and the problem of offensive language." davidson_automated_2017
- TODO "Hateful Symbols or Hateful People: Predictive features for hate speech detection on twitter" waseem_hateful_2016 hasCorpus
- TODO "Abusive language detection in online user content" nobata_abusive_2016
- TODO "Detection of harassment on web 2.0" yin_detection_2009
- TODO "Impact of content features for automatic online abuse detection." papegnies_impact_2017
- TODO "Ex machina: Personal attacks seen at scale." wulczyn_ex_2017 hasCorpus
- TODO "Smokey: Automatic recognition of hostile messages" spertus_smokey:_1997
- TODO "Measuring the reliability of hate speech annotations: The case of the European refugee crisis." ross_measuring_2017
- TODO "Detecting offensive tweets via topical feature discovery over a large scale twitter corpus" xiang_detecting_2012
- TODO "Cross-Language Learning from Bots and Users to Detect Vandalism on Wikipedia" tran_cross-language_2015
- TODO "Mining for gold farmers: Automatic detection of deviant players in mmogs." ahmad_mining_2009
- TODO "Don’t hate the player, hate the game: The racialization of labor in World of Warcraft." nakamura_dont_2009
- TODO "Antisocial Behavior in Online Discussion Communities" cheng_antisocial_2015
- TODO "Deep Learning for User Comment Moderation" pavlopoulos_deep_2017
- TODO "Class-based Prediction Errors to Detect Hate Speech with Out-of-vocabulary Words" serra_class-based_2017
- TODO "One-step and Two-step Classification for Abusive Language Detection on Twitter" park_one-step_2017
- TODO "Technology Solutions to Combat Online Harassment" kennedy_iii_hack_2017
- TODO "Understanding Abuse: A Typology of Abusive Language Detection Subtasks" waseem_understanding_2017
- TODO "Illegal is not a Noun: Linguistic Form for Detection of Pejorative Nominalizations" palmer_illegal_2017
- TODO "Locate the hate: Detecting tweets against blacks." kwok_locate_2013
- TODO "Hate speech detection with comment embeddings" djuric_hate_2015
- TODO "Analyzing the targets of hate in online social media" silva_analyzing_2016
3.6.3 Linguistic properties of abusive language
- TODO "Dimensions of Abusive Language on Twitter" clarke_dimensions_2017
- TODO "Abusive language detection in online user content" nobata_abusive_2016
3.6.5 Of opinion spam
- TODO "Opinion spam and analysis" jindal_opinion_2008
- TODO "Review spam detection" jindal_review_2007
- TODO "Detecting group review spam" mukherjee_detecting_2011
- TODO "Analyzing and detecting review spam" jindal_analyzing_2007
- TODO "Finding unusual review patterns using unexpected rules" jindal_finding_2010
- TODO "Detecting product review spammers using rating behavior" lim_detecting_2010
- TODO "Distortion as a validation criterion in the identification of suspicious reviews" wu_distortion_2010
- TODO "Comparison of deceptive and truthful travel reviews" yoo_comparison_2009
3.7 Psychology, Perception
3.7.1 TODO "The “Nasty Effect:” Online Incivility and Risk Perceptions of Emerging Technologies." anderson_nasty_2014
3.7.2 TODO "Newsworthiness and Network Gatekeeping on Twitter: The Role of Social Deviance" diakopoulos_newsworthiness_2014
3.7.3 And (Computational/Quantitative) Psycholinguistics
- DONE Labs
- DONE UCSD: Computational Psycholinguistics Lab
- Website not updated since 2014
- DONE MIT: Computational Psycholinguistics Lab
- Website not updated since 2014
- DONE UCSD: Computational Psycholinguistics Lab
- Linguistic properties of speech/writing of those diagnosed with mental illness
- DONE "The Emotional Lexicon of Individuals Diagnosed with Antisocial Personality Disorder" gawda_emotional_2013
Abstract: "This study investigated the specific emotional lexicons in narratives created by persons diagnosed with antisocial personality disorder (ASPD) to test the hypothesis that individuals with ASPD exhibit deficiencies in emotional language. Study participants consisted of 60 prison inmates with ASPD, 40 prison inmates without ASPD, and 60 men without antisocial tendencies who described situations involving love, hate and anxiety depicted by photographs. The lexical choices made in the narratives were analyzed, and a comparison of the three groups revealed differences between the emotional narratives of inmates with ASPD, inmates without ASPD, and the control group. Although the narratives of the individuals with ASPD included more words describing emotions and higher levels of emotional intensity, the valence of these words was inappropriate. The linguistic characteristics of these narratives were associated with high levels of psychopathy and low emotional reactivity."
- Citing previous research, "individuals with psychopathic personalities create less structured narratives that lack temporal perspective … and do not describe the emotional context or focus on negative aspects of the situation" 572
Subjects:
- "60 prison inmates with ASPD"
- "40 prison inmates without ASPD"
- "60 men wihtout antisocial tendencies"
- very similar age, education, IQ, verbal comprehension, etc among these groups
Results:
- ASPD narratives show much higher:
- emotion words (all)
- positive words (all)
- negative words (love)
- high-intensity words (love)
- nouns (hate)
- adjectives (love)
- verbs (love, anxiety)
- ASPD narratives show much lower:
- negative words (hate)
? This seems to suggest that with ASPD-diagnosed patients, sentimental valence of words might need to be context-dependent.
- Sentiment on its own, therefore, would prove not to be a great indicator of abusive language, but whether that sentiment was out-of-place for the context.
- TODO "Syntax of Emotional Narratives of Persons Diagnosed with Antisocial Personality" gawda_syntax_2010
- DONE "The Language of the Psychopath" rieber_language_1994
Deep review of the literature of the language of psychopathy, although not strictly employing a quantitative approach to the language.
"The true psychopath compels the psychiatric observer to ask the perplexing and largely unanswered question 'Why doesn't that person have the common decency to go crazy?'" 2
? Language that "goes crazy," therefore, cannot be considered a mark of psychopathy.
"[Psychopaths] do not allow themselves to be moved by words and concepts that their fellow citizens value." 12
Notes Eichler's 1965 study's results: "sociopaths were higher than normals on negation, retraction, evaluation. As compared with impulsives, sociopaths were higher than normal on nonpersonal references." 15
- TODO "A graph theory model of the semantic structure of attitudes" bovasso_graph_1993
abstract: "The semantic structure underlying the attitudes of pretreatment and posttreatment drug addicts was modeled using a network analysis of free word associations."
- DONE "The Emotional Lexicon of Individuals Diagnosed with Antisocial Personality Disorder" gawda_emotional_2013
- Linguistic properties of emotional expression
- TODO "Measuring Emotional Expression with the Linguistic Inquiry and Word Count" kahn_measuring_2007
- TODO "Linguistic Markers and Emotional Intensity" argaman_linguistic_2010
- Study speakers of Hebrew language.
- TODO "Measuring Emotional Expression with the Linguistic Inquiry and Word Count" kahn_measuring_2007
- Swearing
- DONE "Swears in Context: The Difference Between Casual and Abusive Swearing" kapoor_swears_2016
Notes Rieber et al. 1979: "obscenities used denotatively can be considered far more harh and offensive than those used connotatively."
Cites patent Patent US20110191105 (see above) where: "Reactions to offensive words were explained in terms of an 'offensiveness threshold' based on the individual’s sensitivity to profane language. Thus, if a word’s offensiveness score was higher than the individual’s offensiveness threshold, the word would be considered inappropriate and offensive; but if the individual’s tolerance for swearwords were high, and the word’s offensiveness score did not exceed the threshold, it was not likely to be perceived as offensive." 260
Distinguish between "mild," "moderate," and "severe" types of swears, cross-linguistically and across natioalities.
Test "appropriateness"
Hypotheses:
- "H1: Mild swears are more appropriate than moderate swears, which in turn, are more appropriate than severe swears."
- "H2: Swearing in casual contexts is more appropriate than swearing in abusive settings."
- "H3: Mild swears in casual contexts are the least inappropriate, and severe swears in abusive contexts are the most inappropriate."
Results:
- "Mild swears were likely to be used in casual, cathartic, and hostile scenarios; moderate swears were more likely to be used in conversational and abusive contexts."
- results "partially support H4": "severe swears are likely to be employed in abusive and hostile contexts (H4)." 266
- TODO "Does Emotional Arousal Influence Swearing Fluency?" stephens_does_2017
- DONE "Swears in Context: The Difference Between Casual and Abusive Swearing" kapoor_swears_2016
3.8 Gamergate
3.8.2 TODO Feminist Critics of Video Games Facing Threats in ‘GamerGate’ Campaign - The New York Times
- TODO "What Lies Beneath: The Linguistic Traces of Deception in Online Dating" toma_what_2012
4 Questions
4.1 Has anyone done a comment/article similarity (relevance) study like diakopoulos_editors_2015 but using word/document vectors instead of tf-idf?
- kolhatkar_constructive_2017 vectorizes words, but not to compute similarity with articles
- gamback_using_2017 uses word embeddings, finds that categorizer works best with these
4.2 Has anyone studied platform/OS source as predictor of potentially abusive language?
- Keyhole shows high incidence of bot platforms for #gamergate. These account for almost 20%:
- twittbot
- Cheap Bots, Done Quick!
- ITTT (If this, then that)
4.3 What can psycholinguistics studies offer to fingerprinting of abusive language?
4.4 Has anyone written a Twitter bot to identify abusive speech, and then ask the alleged abuser/abusee whether he/she thought it was abusive?
- This approach might be able to learn from correct/incorrect identifications.
4.5 What Twitter accounts or hashtags might be cataloging abusive tweets? Can these be mined to create new datasets?
4.6 If we can identify male voices or deceptive, can we use that as a proxy to identifying trolls?
5 Books and Other Sources
5.3 TODO - "Gendertrolling: How Misogyny Went Viral" mantilla_gendertrolling:_2015
5.4 DONE - Weeding Out Online Bullying Is Tough, So Let Machines Do It
alba_weeding_2015 Weeding Out Online Bullying Is Tough, So Let Machines Do It | WIRED
SRI International uses data from a major unspecified social media company to train an algorithm against reported data.
"Smart abusers": "Jamia Wilson, executive director of Women Action Media, a group Twitter appointed last fall to look at reports of harassment on the social network, says her main concern is that abusers are well-aware of the initiatives to curb harassment on networks—and employ sophisticated techniques to avoid detection."
6 Reports
6.1 Report 1
The detection and prediction of abusive or other "low-quality" language is a much-discussed topic in the computer science field of natural language processing and in computational linguistics. The work I've examined so far largely treats the problem as one of document classification, a subset of machine learning. Documents, which could be articles, comments, tweets, or other text, are first preprocessed (converting them to words or sequences of words), vectorized (transformed into numeric representations of these words), and the resulting vectors, usually along with other contextual features, are used to train machine learning algorithms to recognize abusive or other kinds of language. Once the algorithm is trained against labeled data (comments that have been marked as abusive by other users, for instance), it can then be used to guess whether a test document should be categorized as abusive.
Although the machine learning algorithm ultimately decides which of the features best categorize its data, whether to use word vector features or other contextual features, and how to weight those features, the researcher must first decide which features to feed it. In some cases, features include term frequencies, adjusted for their frequency in the document or corpus (TF-IDF) (diakopoulos_editors_2015), or n-dimensional word embeddings (agichtein_finding_2008), trained on data like Stanford's GloVe vectors. Nicholas Diakopoulos et al., for instance, introduce a measure of the "relevance" of a news website comment to its article by measuring the cosine similarities of TF-IDF vectors between them. Eugene Agichtein et al use a similar technique to measure relevance of questions and answers from a Q&A website, measuring instead the KL divergence of their language models. Agichtein's team also vectorizes their texts by transforming them into part-of-speech representations, discovering that certain grammatical constructions correlate with the "quality" of the question or answer.
Sentiment analysis, a sub-field of natural language processing, can also provide useful features for categorization. Stefan Siersdorfer et al find that sentiment scores, computed using the SentiWordNet, correlate with user ratings of comments on YouTube (siersdorfer_how_2010). Carlos Castillo et al, as well, find sentiment scores to be among the best features that distinguish between "credible" and "non-credible" tweets (castillo_predicting_2013).
Some of the more interesting features used to train these categorizers, however, are metatextual, rather than textual features. Castillo et al, for instance, find that whether a Twitter user has completed his or her self-description ("bio") is a feature that is weighted highly in distinguishing between tweets automatically categorized as either "news" and "discussion" (castillo_predicting_2013). Agichtein et al use social network theory, and in particular trust propagation theory, to predict "high-quality" questions and answers. If user A answers a question asked by a well-known expert answerer B, for instance, they assume a certain level of expertise on the part of user A.
While these papers describe techniques for abusive language detection, and not necessarily software, such software does exist. TrollBusters, the fruit of a 2015 hackathon, claims to "identify communities of trolls around any given issue using natural language processing" and "counter cyberattacks in real-time with online community support and positive messaging." As far as I can tell, it is proprietary software. Perspective, a product produced by the startup Jigsaw, an Alphabet (Google) company, is a more mature-looking product, with a public API that could be used to label comments according to their potential "toxicity." Although much of Perspective's code is on GitHub, it is unclear how much of their model is public, so there might still be room for development of a fully open-source tool.
There are a few dozen other papers in this area I have yet to explore, and a few related fields, besides. The fields of automated essay grading and readability indexing may hold techniques that are useful to the automated detection of abusive text. Non-computational fields, as well, such as psychology and media studies, may provide useful ideas for ML feature design. I hope to explore the Gamergate controversy in more detail, especially since a colleague of mine has recently done a computational analysis of its tweets. (A quick analysis of gamergate tweets on Keyhole reveals that around 10% of the tweets came from Twitter bot platforms–are there automated abuse robots, and how might these be identified?)
6.2 Report 2,
Most of the work I've examined this week belongs to the fields of computational linguistics and natural language processing, and treats the problem of the identification of abusive language as a document categorizing problem. The training data used for these studies is often generated by employing casual workers on Amazon Mechanical Turk or CrowdFlower to manually annotate data. Features used by these studies include average sentiment analysis scores, emoticons used, sylistic patterns such as sentence length, word embeddings, and LDA (topic modeling) topics. In one case (samghabadi_detecting_2017) a "bad words dictionary" was created from combining a Google-created list with a list from another researcher. Categorizers used include Long Short-Term Memory (LSTM) recurrent neural networks, (kolhatkar_constructive_2017), Convolutional Neural Networks (gamback_using_2017), and Support Vector Machines (SVM) (samghabadi_detecting_2017). The method that performs best in categorizing abusive language seems to vary greatly according to data set and domain. Sood et al. (sood_automatic_2012), for instance, find that word bigrams (sequences of two words) are the best-performing features, while Samghabadi et al. (samghabadi_detecting_2017) find character 4-grams (sequences of four characters) to perform better. Data sets also show a wide variety: some consist of news comments, while others are of tweets. Typically, the longer the document, the better the categorizer will perform, and different algorithms are needed for each.
Although a number of these studies don't seem to publish their data and code, many of them do, making room for easy repetition of their experiments, or design of new experiments that make use of some of their code and/or data. In particular, the 2012 Kaggle task "Detecting Insults in Social Commentary" has a thread where participants are posting their code. Also, I've started tagging those studies that publish their training corpora using the tag "hasCorpus."
As previously noted, very little user-space software seems to exist for detection of harassment, and its quality seems to be very much in its infancy. I tested Jigsaw's Perspective, which I mentioned in my previous report, against a number of intentionally ambiguous and threatening sentences. I then compared these scores with those generated from the Wiki DeTox agression model, also a Jigsaw project:
- "Be careful, you might find some white powder in an envelope come in the mail one day."
- WDT: 1% aggressive
- Perspective: 14% toxic
- "If you keep this up, you find yourself sleeping with the fishes."
- WDT: 12% aggressive.
- Perspective: 38% toxic
- "I'm going to come to your house."
- WDT: 48% aggressive.
- Perspective: 15% toxic
- "I'm going to nominate you for the Nobel prize, you brilliant man."
- WDT: 61% aggressive.
- Perspective: 17% toxic.
These scores highlight both the high variability between algorithms, and their difficulty with ambiguous language.
More abstract and theoretical work in this area also seems worthy of more examination. Heylighen et al's formality score, a formula using part-of-speech representations of words, uses anthropological and psycholinguistic theories of contextuality (linguistic deixis). Although this measure is used directly in categorization experiments (agichtein_finding_2008), its methodology might also be adapted to build other POS pattern-based approaches for the detection of abusive language. The methods of the sub-field of deceptive opinion spam (false product reviews, for instance), which in some cases succeed in detecting opinion spam at 90%, a success rate much higher than those of human judges, might also be adapted to the detection of abusive language.
6.3 Report 3,
This week, I began by exploring some of the winning entries from the 2012 Kaggle data science contest, Detecting Insults in Social Commentary. The top six entries used the Python programming language and its machine learning libraries, like Scikit-Learn; other entries used the statistical language R or other programming languages. Since the top entries all seemed to use similar categorizers and meta-categorizers (grid-search cross-validation techniques), they largely differ in preprocessing. One coder credits "good tokenization" as one of the major keys to his success. Domain-specific knowledge, and in particular linguistic observation of the training data, then, provided the most tangible advantages. Knowledge of the obfuscation techniques used by speakers of insults, for instance, contributed to these useful tokenization techniques.
Following my previous report on formality scores and their use in these categorization tasks, I began to investigate the field of computational psycholinguistics. A few articles in this field exist that take quantitative approaches to the study of language produced by people who have been diagnosed with mental illness. gawda_emotional_2013, for instance, studies narratives written by prison inmates diagnosed with Antisocial Personality Disorder (ASPD), as compared with a control group, and those diagnosed as not having the disorder. They find that emotional words are higher in general among those with ASPD, but negative words, for instance, might have lower than normal scores for narratives that describe hate. When seen in the context of our project of the computational identification of abusive language, this finding suggests that negative words on their own may not be markers of abuse, at least that originating from those with ASPD. Similarly, rieber_language_1994, a literature review of "the language of psychopathy" finds that often one of the distinguishing linguistic features of these patients is the lack of emotional markers in certain contexts. Here again, this indicates that strong emotional valence, as measured by sentiment analysis, might not on its own be a useful feature for a categorizer, and that contextually contrasting emotional content might perform better.
These contextual complications are analogous with those studied in a few papers on swearing. kapoor_swears_2016, for instance, attempts to differentiate between "casual" and "abusive" swearing. They categorize swear words as "mild," "moderate," and "severe," and find that "moderate" and "severe" swear words are more likely to occur in abusive contexts. They cite a 2011 patent that scores offensiveness as swearing that contrasts with a user's swearing "threshold." This is another instance of abuse detection that relies on contextually contrasting language.
Since many projects in abusive language detection position themselves socio-contextually, and describe their studies as attempt to identify "trolls," or those who habitually abuse or harass others, an important subcategory of this area of research is the identification of professional trolls. These are trolls that are either agents provocateurs employed by government agencies, or employed by private "reputation management" consultants. One study in this area, studying comments on a Bulgarian news website, found that the day of the week and the hour of the day were useful features to distinguish between paid and unpaid trolls. Computationally identifying paid trolls, and other systematic or automated forms of harassment, might leverage metadata like this, potentially making it one of the easiest subtasks for abuse detection.
New directions for research include six US patents related to the detection of abusive language, or for "offensiveness" more generally; a statistical exploration of the hatebase.org dataset of hate speech (thanks for the tip, Colin); more work related to troll detection, especially in graph theory and signed social network theory; and more fine-grained analysis of the code from the 2012 Kaggle competition and other publicly-available algorithms.
7 References
Bibliography
- [anti-defamation_league_adl_2016] Anti-Defamation League, ADL Report: Control-Alt-Delete: Recommendations of the ADL Task Force on the Harassment of Journalists, , , . link.
- [alba_weeding_2015] Alba, Weeding Out Online Bullying Is Tough, So Let Machines Do It, WIRED, , . link.
- [lapowsky_its_2015] Lapowsky, It's Too Easy for Trolls to Game Twitter's Anti-Abuse Tools, WIRED, , . link.
- [wauters_towards_2014] Wauters, Lievens & Valcke, Towards a better protection of social media users: a legal perspective on the terms of use of social networking sites, International Journal of Law and Information Technology, 22(3), 254-294 . link.
- [citron_intermediaries_2011] Citron & Norton, Intermediaries and hate speech: Fostering digital citizenship for our information age, BUL Rev., 91, 1435 . link.
- [geiger_work_2010] Geiger & Ribes, The Work of Sustaining Order in Wikipedia: The Banning of a Vandal, 117-126, in in: Proceedings of the 2010 ACM Conference on Computer Supported Cooperative Work, edited by ACM
- [tkacz_wikipedia_2014] Tkacz, Wikipedia and the Politics of Openness, University of Chicago Press .
- [warnick_what_2010] Warnick, What we talk about when we talk about talking: Ethos at work in an online community, Iowa State University .
- [crawford_what_2016] Crawford & Gillespie, What is a flag for? Social media reporting tools and the vocabulary of complaint, New Media & Society, 18(3), 410-428 . link. doi.
- [matias_reporting_2015] Matias, Johnson, Boesel, Keegan, Friedman & DeTar, Reporting, Reviewing, and Responding to Harassment on Twitter, arXiv:1505.03359 [cs], , . link.
- [wright_vectors_2017] Wright, Ruths, Dillon, Saleem & Benesch, Vectors for Counterspeech on Twitter, ACL 2017, , 57 . link.
- [grimmelmann_virtues_2015] Grimmelmann, The virtues of moderation, Yale JL & Tech., 17, 42 . link.
- [lampe_slash_2004] Lampe & Resnick, Slash (dot) and burn: distributed moderation in a large online conversation space, 543-550, in in: Proceedings of the SIGCHI conference on Human factors in computing systems, edited by ACM
- [su_rephrasing_2017] Su, Huang, Chang & Lin, Rephrasing Profanity in Chinese Text, ACL 2017, , 18 . link.
- [fiser_legal_2017] Fišer, Ljubešic & Erjavec, Legal Framework, Dataset and Annotation Schema for Socially Unacceptable Online Discourse Practices in Slovene, ACL 2017, , 46 . link.
- [mubarak_abusive_2017] Mubarak, Darwish & Magdy, Abusive Language Detection on Arabic Social Media, ACL 2017, , 52 . link.
- [mihaylov_exposing_2015] Mihaylov, Koychev, Georgiev & Nakov, Exposing Paid Opinion Manipulation Trolls., 443-450, in in: RANLP, edited by
- [mihaylov_finding_2015] Mihaylov, Georgiev & Nakov, Finding Opinion Manipulation Trolls in News Community Forums., 310-314, in in: CoNLL, edited by
- [ortega_propagation_2012] Ortega, Troyano, Cruz, Vallejo & Enríquez, Propagation of trust and distrust for the detection of trolls in a social network, Computer Networks, 56(12), 2884-2895 . link. doi.
- [kumar_accurately_2014] Kumar, Spezzano & Subrahmanian, Accurately detecting trolls in slashdot zoo via decluttering, 188-195, in in: Advances in Social Networks Analysis and Mining (ASONAM), 2014 IEEE/ACM International Conference on, edited by IEEE
- [rowe_assessing_2009] Rowe & Butters, Assessing trust: contextual accountability, ESWC, Heraklion, , . link.
- [xu_filtering_2010] Xu & Zhu, Filtering offensive language in online communities using grammatical relations, 1-10, in in: Proceedings of the Seventh Annual Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference, edited by
- [razavi_offensive_2010] Razavi, Inkpen, Uritsky & Matwin, Offensive language detection using multi-level classification, Advances in Artificial Intelligence, , 16-27 . link.
- [siersdorfer_how_2010] Siersdorfer, Chelaru, Nejdl & San Pedro, How useful are your comments?: analyzing and predicting youtube comments and comment ratings, 891-900, in in: Proceedings of the 19th international conference on World wide web, edited by ACM
- [diakopoulos_editors_2015] Diakopoulos, The Editor's Eye: Curation and Comment Relevance on the New York Times, 1153-1157, in in: Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing, edited by ACM
- [castillo_predicting_2013] Castillo, Mendoza & Poblete, Predicting information credibility in time-sensitive social media, Internet Research, 23(5), 560-588 . link.
- [kolhatkar_constructive_2017] Kolhatkar & Taboada, Constructive Language in News Comments, ACL 2017, , 11 . link.
- [agichtein_finding_2008] Agichtein, Castillo, Donato, Gionis & Mishne, Finding high-quality content in social media, 183-194, in in: Proceedings of the 2008 international conference on web search and data mining, edited by ACM
- [heylighen_variation_2002] Heylighen & Dewaele, Variation in the contextuality of language: An empirical measure, Foundations of Science, 7(3), 293-340 . link.
- [danescu-niculescu-mizil_how_2009] Danescu-Niculescu-Mizil, Kossinets, Kleinberg & Lee, How opinions are received by online communities: a case study on amazon. com helpfulness votes, 141-150, in in: Proceedings of the 18th international conference on World wide web, edited by ACM
- [brand_comment_2014] Brand & Van Der Merwe, Comment classification for an online news domain, , , . link.
- [dadvar_improved_2012] Dadvar, de Jong, Ordelman & Trieschnigg, Improved cyberbullying detection using gender information, , , . link.
- [hosseinmardi_towards_2014] Hosseinmardi, Ghasemianlangroodi, Han, Lv & Mishra, Towards understanding cyberbullying behavior in a semi-anonymous social network, 244-252, in in: Advances in Social Networks Analysis and Mining (ASONAM), 2014 IEEE/ACM International Conference on, edited by IEEE
- [lieberman_lets_2011] Lieberman, Dinakar & Jones, Let's gang up on cyberbullying, Computer, 44(9), 93-96 . link.
- [kansara_framework_2015] Kansara & Shekokar, A framework for cyberbullying detection in social network, International Journal of Current Engineering and Technology, 5, . link.
- [macbeth_script-based_2013] Macbeth, Adeyema, Lieberman & Fry, Script-based story matching for cyberbullying prevention, 901-906, in in: CHI'13 Extended Abstracts on Human Factors in Computing Systems, edited by ACM
- [xu_fast_2012] Xu, Zhu & Bellmore, Fast learning for sentiment analysis on bullying, 10, in in: Proceedings of the First International Workshop on Issues of Sentiment Discovery and Opinion Mining, edited by ACM
- [xu_examination_2013] Xu, Burchfiel, Zhu & Bellmore, An Examination of Regret in Bullying Tweets., 697-702, in in: HLT-NAACL, edited by
- [van_hee_detection_2015] Van Hee, Lefever, Verhoeven, Mennes, Desmet, De Pauw, Daelemans & Hoste, Detection and fine-grained classification of cyberbullying events, 672-680, in in: International Conference Recent Advances in Natural Language Processing (RANLP), edited by
- [xu_learning_2012] Xu, Jun, Zhu & Bellmore, Learning from bullying traces in social media, 656-666, in in: Proceedings of the 2012 conference of the North American chapter of the association for computational linguistics: Human language technologies, edited by Association for Computational Linguistics
- [dadvar_cyberbullying_2012] Dadvar & De Jong, Cyberbullying detection: a step toward a safer internet yard, 121-126, in in: Proceedings of the 21st International Conference on World Wide Web, edited by ACM
- [dinakar_modeling_2011] Dinakar, Reichart & Lieberman, Modeling the detection of Textual Cyberbullying., The Social Mobile Web, 11(2), . link.
- [chen_detecting_2012] Chen, Zhou, Zhu & Xu, Detecting offensive language in social media to protect adolescent online safety, 71-80, in in: Privacy, Security, Risk and Trust (PASSAT), 2012 International Conference on and 2012 International Confernece on Social Computing (SocialCom), edited by IEEE
- [nahar_effective_2013] Nahar, Li & Pang, An effective approach for cyberbullying detection, Communications in Information Science and Management Engineering, 3(5), 238 . link.
- [ott_finding_2011] Ott, Choi, Cardie & Hancock, Finding deceptive opinion spam by any stretch of the imagination, 309-319, in in: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, edited by Association for Computational Linguistics
- [sood_automatic_2012] Sood, Churchill & Antin, Automatic identification of personal insults on social news sites, Journal of the Association for Information Science and Technology, 63(2), 270-285 . link.
- [gamback_using_2017] Gambäck & Sikdar, Using Convolutional Neural Networks to Classify Hate-Speech, ACL 2017, , 85 . link.
- [waseem_hateful_2016] Waseem & Hovy, Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter., 88-93, in in: SRW@ HLT-NAACL, edited by
- [samghabadi_detecting_2017] Samghabadi, Maharjan, Sprague, Diaz-Sprague & Solorio, Detecting Nastiness in Social Media, ACL 2017, , 63 . link.
- [davidson_automated_2017] Davidson, Warmsley, Macy & Weber, Automated Hate Speech Detection and the Problem of Offensive Language, arXiv preprint arXiv:1703.04009, , . link.
- [nobata_abusive_2016] Nobata, Tetreault, Thomas, Mehdad & Chang, Abusive language detection in online user content, 145-153, in in: Proceedings of the 25th International Conference on World Wide Web, edited by International World Wide Web Conferences Steering Committee
- [yin_detection_2009] Yin, Xue, Hong, Davison, Kontostathis & Edwards, Detection of harassment on web 2.0, Proceedings of the Content Analysis in the WEB, 2, 1-7 . link.
- [papegnies_impact_2017] Papegnies, Labatut, Dufour & Linares, Impact Of Content Features For Automatic Online Abuse Detection, arXiv preprint arXiv:1704.03289, , . link.
- [wulczyn_ex_2017] Wulczyn, Thain & Dixon, Ex machina: Personal attacks seen at scale, 1391-1399, in in: Proceedings of the 26th International Conference on World Wide Web, edited by International World Wide Web Conferences Steering Committee
- [spertus_smokey:_1997] Spertus, Smokey: Automatic recognition of hostile messages, 1058-1065, in in: AAAI/IAAI, edited by
- [ross_measuring_2017] Ross, Rist, Carbonell, Cabrera, Kurowsky & Wojatzki, Measuring the reliability of hate speech annotations: The case of the european refugee crisis, arXiv preprint arXiv:1701.08118, , . link.
- [xiang_detecting_2012] Xiang, Fan, Wang, Hong & Rose, Detecting offensive tweets via topical feature discovery over a large scale twitter corpus, 1980-1984, in in: Proceedings of the 21st ACM international conference on Information and knowledge management, edited by ACM
- [tran_cross-language_2015] Tran & Christen, Cross-language learning from bots and users to detect vandalism on wikipedia, IEEE Transactions on Knowledge and Data Engineering, 27(3), 673-685 . link.
- [ahmad_mining_2009] Ahmad, Keegan, Srivastava, Williams & Contractor, Mining for gold farmers: Automatic detection of deviant players in mmogs, 340-345, in in: Computational Science and Engineering, 2009. CSE'09. International Conference on, edited by IEEE
- [nakamura_dont_2009] Nakamura, Don't hate the player, hate the game: The racialization of labor in World of Warcraft, Critical Studies in Media Communication, 26(2), 128-144 . link.
- [cheng_antisocial_2015] Cheng, Danescu-Niculescu-Mizil & Leskovec, Antisocial Behavior in Online Discussion Communities., 61-70, in in: ICWSM, edited by
- [pavlopoulos_deep_2017] Pavlopoulos, Malakasiotis & Androutsopoulos, Deep Learning for User Comment Moderation, arXiv preprint arXiv:1705.09993, , . link.
- [serra_class-based_2017] Serra, Leontiadis, Spathis, Stringhini, Blackburn & Vakali, Class-based Prediction Errors to Detect Hate Speech with Out-of-vocabulary Words, ACL 2017, , 36 . link.
- [park_one-step_2017] Park & Fung, One-step and Two-step Classification for Abusive Language Detection on Twitter, arXiv preprint arXiv:1706.01206, , . link.
- [kennedy_iii_hack_2017] Kennedy III, McCollough, Dixon, Bastidas, Ryan, Loo & Sahay, Hack Harassment: Technology Solutions to Combat Online Harassment, ACL 2017, , 73 . link.
- [waseem_understanding_2017] Waseem, Davidson, Warmsley & Weber, Understanding Abuse: A Typology of Abusive Language Detection Subtasks, arXiv preprint arXiv:1705.09899, , . link.
- [palmer_illegal_2017] Palmer, Robinson & Phillips, Illegal is not a Noun: Linguistic Form for Detection of Pejorative Nominalizations, ACL 2017, , 91 . link.
- [kwok_locate_2013] Kwok & Wang, Locate the Hate: Detecting Tweets against Blacks., in in: AAAI, edited by
- [djuric_hate_2015] Djuric, Zhou, Morris, Grbovic, Radosavljevic & Bhamidipati, Hate speech detection with comment embeddings, 29-30, in in: Proceedings of the 24th International Conference on World Wide Web, edited by ACM
- [silva_analyzing_2016] Silva, Mondal, Correa, Benevenuto & Weber, Analyzing the Targets of Hate in Online Social Media., 687-690, in in: ICWSM, edited by
- [clarke_dimensions_2017] Clarke & Grieve, Dimensions of Abusive Language on Twitter, ACL 2017, , 1 . link.
- [liu_survey_2012] Liu & Zhang, A Survey of Opinion Mining and Sentiment Analysis, SpringerLink, , 415-463 . link. doi.
- [jindal_opinion_2008] Jindal & Liu, Opinion spam and analysis, 219-230, in in: Proceedings of the 2008 International Conference on Web Search and Data Mining, edited by ACM
- [jindal_review_2007] Jindal & Liu, Review spam detection, 1189-1190, in in: Proceedings of the 16th international conference on World Wide Web, edited by ACM
- [mukherjee_detecting_2011] Mukherjee, Liu, Wang, Glance & Jindal, Detecting group review spam, 93-94, in in: Proceedings of the 20th international conference companion on World wide web, edited by ACM
- [jindal_analyzing_2007] Jindal & Liu, Analyzing and detecting review spam, 547-552, in in: Data Mining, 2007. ICDM 2007. Seventh IEEE International Conference on, edited by IEEE
- [jindal_finding_2010] Jindal, Liu & Lim, Finding unusual review patterns using unexpected rules, 1549-1552, in in: Proceedings of the 19th ACM international conference on Information and knowledge management, edited by ACM
- [lim_detecting_2010] Lim, Nguyen, Jindal, Liu & Lauw, Detecting product review spammers using rating behaviors, 939-948, in in: Proceedings of the 19th ACM international conference on Information and knowledge management, edited by ACM
- [wu_distortion_2010] Wu, Greene, Smyth & Cunningham, Distortion as a validation criterion in the identification of suspicious reviews, 10-13, in in: Proceedings of the First Workshop on Social Media Analytics, edited by ACM
- [yoo_comparison_2009] Yoo & Gretzel, Comparison of deceptive and truthful travel reviews, Information and communication technologies in tourism 2009, , 37-47 . link.
- [anderson_nasty_2014] Anderson, Brossard, Scheufele, Xenos & Ladwig, The “nasty effect:” Online incivility and risk perceptions of emerging technologies, Journal of Computer-Mediated Communication, 19(3), 373-387 . link.
- [diakopoulos_newsworthiness_2014] Diakopoulos & Zubiaga, Newsworthiness and Network Gatekeeping on Twitter: The Role of Social Deviance., in in: ICWSM, edited by
- [gawda_emotional_2013] Gawda, The Emotional Lexicon of Individuals Diagnosed with Antisocial Personality Disorder, Journal of Psycholinguistic Research, 42(6), 571-580 . link. doi.
- [gawda_syntax_2010] Gawda, Syntax of Emotional Narratives of Persons Diagnosed with Antisocial Personality, Journal of Psycholinguistic Research, 39(4), 273-283 . link. doi.
- [rieber_language_1994] Rieber & Vetter, The language of the psychopath, Journal of Psycholinguistic Research, 23(1), 1-28 . link. doi.
- [bovasso_graph_1993] Bovasso, Szalay, Biase & Stanford, A graph theory model of the semantic structure of attitudes, Journal of Psycholinguistic Research, 22(4), 411-425 . link. doi.
- [kahn_measuring_2007] Kahn, Tobin, Massey & Anderson, Measuring Emotional Expression with the Linguistic Inquiry and Word Count, The American Journal of Psychology, 120(2), 263-286 . link. doi.
- [argaman_linguistic_2010] Argaman, Linguistic Markers and Emotional Intensity, Journal of Psycholinguistic Research, 39(2), 89-99 . link. doi.
- [kapoor_swears_2016] Kapoor, Swears in Context: The Difference Between Casual and Abusive Swearing, Journal of Psycholinguistic Research, 45(2), 259-274 . link. doi.
- [stephens_does_2017] Stephens & Zile, Does Emotional Arousal Influence Swearing Fluency?, Journal of Psycholinguistic Research, 46(4), 983-995 . link. doi.
- [toma_what_2012] Toma & Hancock, What lies beneath: The linguistic traces of deception in online dating profiles, Journal of Communication, 62(1), 78-97 . link.
- [martellozzo_cybercrime_2017] Martellozzo & Jane, Cybercrime and its victims, Routledge .
- [jane_misogyny_2016] Jane, Misogyny Online: A Short (and Brutish) History, SAGE .
- [mantilla_gendertrolling:_2015] Mantilla, Gendertrolling: How Misogyny Went Viral: How Misogyny Went Viral, ABC-CLIO .
- [duggan_online_2014] Duggan, Online Harassment, , , . link.
8 Meeting notes
8.1 Notes from meeting
How does anti-bullying work in real life? How does online bullying differ from real-world bullying?
- Does bullying happen IRL when no one else is around, when they're not being watched?
- Clear definitions of harassment and bullying are important here.
The training corpus and its limitations is important. Statistical literature on evaluating bullying?
- How could we quantify the adverse effects of bullying?
How would intervention work? "Publications on the Study of Bullying"
- http://research.cs.wisc.edu/bullying/
- Using social media data to distinguish bullying from teasing.
What opportunities for colloration are there? Aggression, personal attacks as irrelevance. What power differentials are there between high-profile (lots of followers) figures and low-profile figures? What applications of RST might there be?
8.2 Notes from meeting
Google account suspension of School of Prof. Studies stats professor, tweeting about Clinton and the 2016 election
- ML algorithm probably made a mistake in categorizing this as abusive
- A Handful of Tech Companies Decide Who Has Free Speech Online. That's Not Good. | Inc.com
8.2.1 Colin's week 1 summary: Summaries Week 1 - Google Docs
8.3 Notes from meeting
Colin: lack of theorizing re: cyberbullying techniques of counterspeech communities of abuse / trust propogation social network studies have been done, and formality studies, but not yet formality+social network ! do more reading in psycholinguistics.
- deixis