Notes for sentiment, etc - Stanford University

Notes for sentiment, etc - Stanford University

Computational Extraction of Social and Interactional Meaning from Speech Dan Jurafsky and Mari Ostendorf Lecture 5: Agreement, Citation, Propositional Attitude Mari Ostendorf Agreement, Citation, Propositional Attitude Agreement vs. disagreement with propositions (and people) How to make friends & influence people

Tool for affiliation, indicator of influence Tool for distancing, indicator of factions or rifts in groups Important component of group problem solving Speech Examples Revisited A: Thiss probably what the LDC uses. I mean they do a lot of transcription at the LDC. B: OK. A: I could ask my contacts at the LDC what it is they actually use. B: Oh! Good idea, great idea. A: After all these things, he raises hundreds of millions of dollars. I mean uh the fella B: but he never stops talking about it. A: but ok B: Arent you supposed to y- I mean A: well thats a little- the Lord says B: Does charity mean something if youre constantly using it as a cudgel to beat your

enemies over the- Im better than you. I give money to charity. A: Well look, now I Subgroups Example: Wikipedia Talk Page By including the "Haditha Massacre" in the Human Rights Abuse section, we are effectively convicting the Marines that are currently on trial. I think we need to wait until the trial is over. UnregisteredUser1 Disagree. All I see is the listing "Haditha killings (Under investigation)." Is the word Massacre used? If not, I believe it should be because this word fits every version of the story presented in the public, including Time, the US Marines, and the Iraqi Government. RegisteredUser1 I agree with RegisteredUser1, this is about (current) history, not law. Just because something hasn't been decided by a court doesn't mean it didn't happen. It should be enough in the article to just mention that the marines charged/suspected of the massacre have not yet been convicted. RegisteredUser2 I disagree, you cannot call it a human rights violation if its not stated what happened there. Also your statement "have not yet been convicted" is kind of the thing we are attempting to avoid. Without guilt or

a better understanding of the situation I think its premature to put it in the human rights violation section. RegisteredUser3 Actually, as long as NPOV, WP:Verifiability are maintained you can call it a human rights violation even if it is untrue. As Wikipedia says "As counterintuitive as it may seem, the threshold for inclusion in Wikipedia is verifiability, not truth." Like it or not, as long as there are reputable sources calling it a massacre and/or a human rights violation then it can be included in the article. RegisteredUser4 Calling it a human rights violation in itself is POV. I also do not think anyone would appreciate you attempting to manipulate wiki policy for the sake of adding POV into an article. RegisteredUser3 Influencing Example There is a guideline that we shouldn't semi-protect articles linked from front page, so as to allow new editors a chance to edit articles they are most likely to read. But in this case all we are doing is enabling a swarm of socks. Semi-protection is definitely needed in this instance, with an apology should a new, well-intentioned editor actually show up amidst the swarm and be prevented from editing. Semi-protect this sucker, or we'll never

determine the appropriate course of action for this article. RegUser2 Even though semi-protection is defidentally good for what is nominally "my" side it's against policy and not appropriate. Please take it off. RegUser3 Is is absolutely not against policy. Wikipedia:Protection policy is very clear: For this article at this time, it's necessary. That's in perfect compliance with policy. RegUser2 Removing the image without discussion is aggressively bad editing (which I am often guilty of). It's not vandalism. sprotect is only for vandalism. RegUser3 Repeated violations of 3RR and using sockpuppets, together with admitting that the purpose of removing the image is to curry favour with one's god and not to improve Wikipedia, doesn't so much cross the line from bad editing to vandalism as pole vault it. RegUser4 Ok, my WP:AGF is falling. I still think sprotect is agressive, but not as badly as I did before. RegUser3 Influenced participant: alignment change

Online Political Discussion Forum Q: Gavin Newsom- I expected more from him when I supported him in the 2003 election. He showed himself as a family-man/Catholic, but he ended up being the exact oppisate, supporting abortion, and giving homosexuals marriage licenses. I love San Francisco, but I hate the people. Sometimes, the people make me want to move to Sacramento or DC to fix things up. R: And what is wrong with giving homosexuals the right to settle down with the person they love? What is it to you if a few limp-wrists get married in San Francisco? Homosexuals are people, too, who take out their garbage, pay their taxes, go to work, take care of their dogs, and what they do in their bedroom is none of your business.

Citations (from Teufel et al., 2006) Following Pereira et al. 93, we measure word similarity by the relative entropy or Kulbach-Leibler (KL) distance, between the corresponding conditional distributions. His [Hindles] notion of similarity seems to agree with our intuitions in many cases, but it is not clear how it can be used directly to construct word classes and corresponding models of association. Overview Common threads Examples:

Agreements & disagreements in meetings Agreements & disagreements in online discussions Citation function More common threads (Plus examples from unpublished UW studies on Wikipedia discussions.) Overview Common threads Examples: Agreements & disagreements in meetings Agreements & disagreements in online discussions Citation function

More common threads Common Threads Sentiment detection (sort of) Discussions: agreement/disagreement/neutral Citations: positive/negative/neutral (opt. contrast) Most studies detect person/paper as target, not the proposition per se Challenges Cultural bias & infrequent negatives Bag of words is not enough Identifying person/paper target of agreement (context

can extend beyond the sentiment sentence) Computational modeling Challenge: Cultural Bias English meetings: many more agreements than disagreements Mandarin wiki dicussions: fewer explicit disagreements than in English Citations: several studies find that negative citations are rare (presumably because they are politically dangerous) People use positive words to soften the blow: right but., yeah with negative intonation

Challenge: Polarity Words in BOW Need to account for negation agree vs. dont agree, absolutely vs. absolutely not BUT fewer than half the positive words in negative turns are lexically negated Some part-of-speech issues, e.g. well People include positive words to soften the blow dissenting turns have more positive words than negative right occurs 75 times in dissenting turns, 162 times in neutral turns & only 33 times in supporting turns

Polarity Word Trickiness (cont.) Positive negatives yeah larry i i want to correct something randi said of course right but but you you can't say that punching him in the back of the head is justified Negative positives Steph- vent away that sucks no you stick with what you're doing Challenge: Identifying the Target Baseline: The target is the most recent speaker:

67% accurate for Wiki discussions 80% accurate for meetings Adding names doesnt help much (70% accurate for Wiki discussions) Target can be more than one person In political discussion forum (Abbott et al. 11), 82% of posts with quotes have quotes that can be linked to previous post Citation information often not in the same sentence as the citation (Teufel et al. 06). Chat: complication of asynchrony

PubCoord Acct Secty Secty PubCoord PubCoord ProjMgr Secty PubCoord Acct PubCoord PubCoord Acct Secty

Acct ProjMgr Acct Secty Acct PubCoord ProjMgr Acct Secty Acct PubCoord Are we agreed on about 60 for soda? yeah, only ourselves are set apart, I think They can't take a bottle.

Okay, I agree on 60 for soda Vote agreed Yeah, agree How much does ice cost? 2.50 per pack how about 50, because project manager won't drink that much soda probably What is he a camel? and some folks won't drink any? lol no, some people dont like flavor, carbonation Shut up! Soda can be harsh or, OMG calories

please stay on topic yeah, i dont like the carbonation Alright, I've identified two of you I was just going to say that... me too! so was that $50 for ice? actually, I guess I know who everyone is then What? ? Acct Secty PubCoord Secty

PubCoord ProjMgr Acct no, 50 for pop oh No, 50 for soda is fine I guess please vote between 50 or 60 I think maybe 10 for ice Yeah :/ and someone already volunteered their cooler? PubCoord Yessir Secty *please vote between 50 or 60 for

soda Secty I vote 60 PubCoord 60 ProjMgr 50 Acct i vote 50 ProjMgr TIE! PubCoord then? Secty 50 it is Acct g d it

Acct yeah, 55 Secty okay, 55 Secty so how much is left, accountant? ? Computational Modeling -Review Standard text classification problem Extract feature vector apply model score classes Choose class with best score Popular models

Nave Bayes Decision trees/forests vs. boostexter/icsiboost Maximum entropy New since Lec 5 SVMs K-nearest neighbor (lazy learning or memory-based) Feature selection or regularization Evaluation: Classification accuracy or Macro F (mean of F measures) Feature Extraction Noise Issue Both speech and text have noise challenges Speech: speech recognition errors (especially when

there is overlapping speech) Online discussions: typos and funny spellings defidentally good the exact oppisate Not a big issue for edited text (e.g. most articles that would have citations) Challenge: Skewed Priors Large percentage of sentences are neutral, standard training algorithms emphasize the frequent classes Some solutions:

Use development set to tune detection thresholds Random sampling using biased priors and bagging (classifier combination) Overview Common threads Examples: Agreements & disagreements in meetings Agreements & disagreements in online discussions Citation function More common threads Detecting (Dis)Agreements in

Meetings A: I could ask my contacts at the LDC what it is they actually use. B: Oh! Good idea, great idea. Adjacency pair speaker detection (given B, find A) Target detection for agreements & disagreements Also includes question/answer, offer/acceptance, etc. Classify B as agreement/disagreement/other (Backchannels modeled separately, but including in other for scoring.) Galley et al. 2004

Meeting Data ICSI Meeting corpus 75 1-hour meetings, average of 6.5 participants/meeting Hand transcribed, audio automatically time aligned Hand labeled for adjacency pairs 7 meetings pause-segmented into spurts Class distribution: Agree: 12% Disagree: 7% Other: 81% Adjacency Pair Speaker Ranking Features (B given, A is candidate target)

Structural: +/- overlap, # of speakers/spurts between A & B, etc Duration: duration of overlap, duration of A, time between A & B, overlap with others, speaking rate Lexical: word counts, counts of shared words, cue word indicators, name indicator, Dialog acts (oracle) Feature selection: incremental Classifier: Maximum entropy Adjacency Pair Results Only small gain from oracle DA information:

91.3% Agreement/Disagreement Classifier Features Structural: previous next spurt same/diff Duration: spurt, silence & overlap duration, speech rate Lexical: similar to adjacency pairs, plus polarity word counts Label dependency: contextual tags (a speaker is likely to disagree with someone who disagrees with them) Classifier Conditional Markov model (Max Entropy Markov Model)

Agreement/Disagreement Results Overview Common threads Examples: Agreements & disagreements in meetings Agreements & disagreements in online discussions Citation function More common threads Detecting (Dis)Agreement in Online Discussions

Task: label R in a Q-R (quote-response) pair as agreement/disagreement . Abbott et al., 2011 ARGUE Data 110k forum posts (11k discussion threads, 2764 authors) from website Forums include: evolution, gun control, abortion, gay marriage, healthcare, death penalty, Annotations by Mechanical Turkers with [-5,5] scale Disagree-agree (Krippendorffs = 0.62) Other annotations had < 0.5: attach, fact/emotion, sarcasm, nice/nasty

8k good Q-R pairs annotated sample & use (-1,1) threshold gives 682 pairs for testing Class distribution: resampled to be balanced (Dis)Agree Classifier Features MetaPost: author info, time between posts, # other quotes Unigram & Bigram counts, initial unigram/bigram/trigram Repeated punctuation (collapsed to ??,!!, ?!) LIWC measures Parse dependencies , POS-polarity opinion dependencies Tf-idf cosine distance to previous post

Classifier: Nave Bayes & JRip (WEKA toolkit) Chi-squared feature selection, plus feature selection implicit in JRip (rule learner) Sample (Dis)Agree Classifier (Dis)Agree Classification Results JRip beats NB JRip Accuracy: Local features: 68% Othe annotations: 81% Caveat: optimistic, since neutral cases

are removed. Overview Common threads Examples: Agreements & disagreements in meetings Agreements & disagreements in online discussions Citation function More common threads Classification of Citation Function Teufel et al., 2006

Agreement, usage, compatibility (6) Weakness (4) Contrast neutral Citation Study Data 26 articles w/ 548 citations Kappa = 0.72 for 12 categories Class distribution: >67% neutral + neutral contrast, 4% negative, 19% usage Citation Classifier Features

Grammar of 1762 cue phrases, e.g. as far as we are aware from other work + 892 from this corpus 185 POS patterns for recognizing agents (self-cites vs. others) w/ 20 manually acquired verb clusters Verb tense, voice, modality Sentence location in paragraph & section Classifier: K-nearest neighbor (WEKA toolkit) Citation Classification Results K=0.75 for humans for these categories Overview Common threads

Examples: Agreements & disagreements in meetings Agreements & disagreements in online discussions Citation function More common threads Collected Observations re Features Phrase patterns and location-based n-grams are useful Structural features are useful Location of turn relative to other authors/speakers Location of sentence in turn & document Broader context (beyond target sentence) is useful

Sequential patterns of disagreement Emotion context Simple cosine similarity is not so useful Prosodic features not being taken advantage of More Challenges Explicit agreement & disagreement do not capture all the phenomena associated with alignment & distancing Implicit (dis)agreement via stating an opposite opinion A: The video is still an allegation B: The video is hard evidence or rhetorical question

or a rhetorical question A: Such a topic is far more broad than the current article but should certainly contain a link back to this one. B: How is the [[Iraq invasion controversy]] suggestion more broad? Support vs. attack Well, you have proven yoruself [sic] to be a man with no brain Steph- vent away that sucks These phenomena are hard for human annotators to more consistently (exception: citation labels?) Different studies may group or distinguish them Example Wikipedia Talk Page

The victims were teenagers, not children. Furthermore, the teenagers were throwing rocks and makeshift grenades at the soldiers. Second, the video is still an allegation. We should wait until the investigation is completed before putting it up. RegisteredUser1 The video is hard evidence. If this was 1945, you'd be telling us not to include any footage of the Nazi concentration camps until the Germans had concluded that they committed war crimes. As for your suggestions that those children *deserved* what happened because they allegedly throw rocks at soldiers carrying assault rifles, I find that as offensive as suggesting that America deserved the 9/11 attack because of its foreign policies. AnonymousUser1 THEY WEREN'T CHILDREN! The article makes NO mention of children whatsoever. So before you all let your emotions run wild over this: a) they weren't children b) they had hand grenades. RegisteredUser1 YES THEY WERE CHILDREN! Watch the video. The soldiers are clearly acting in hatred and blood-lust, not selfdefense. Defending them is like defending a child molester or serial murderer. The video SHOWS children being assaulted. AnonymousUser2 A 14 year old is definitely a child. There's a reason we don't let 14 year-olds drink, vote, drive, "consent" to sex with adults, or sign legal agreements without a guardian. RegisteredUser2 At 14 you are definitely a teenager, not a child. 14 year olds can throw a grenade and shoot a rifle, and know the

consequences of their actions. Furthermore 18 isn't the age of majority in Iraq so far as I know. In much of the world the drinking and driving ages are 14 and 16. The world is not centered upon our American beliefs, and it's high time that we started accepting that in ALL situations, not just the ones we deem acceptable. I'm absolutely sickened by the brainwashed vehemence and anti-US hatred expressed by so many so called "liberals" on Wikipedia. - RegisteredUser1 In the English language the word adult is generally not used for people under the age of 18. If you want to use it differently you need to explain it in the article in order not to be misleading. Please calm down and do not personally attack others as "brainwashed" or spreading "hatred". RegisteredUser4 Summary Why look for (dis)agreement, support, etc? Dissecting discussions for influence, subgroups, affiliation, successful problem solving, etc Understanding citation impact

These tasks are very related to sentiment detection, except that the target is often part of the problem Different ways of handling agreement vs. support The neutral class is huge dont ignore it Computational advice: Many better alternatives to Nave Bayes Consider features beyond n-grams

Recently Viewed Presentations

  • New Jersey Department of Human Services Office of

    New Jersey Department of Human Services Office of

    The law requires that any direct care employee who refuses a random test shall be terminated from employment. Also per the law, any direct care employee who tests positive for the unlawful use of a controlled dangerous substance, based on...
  • Chapter 1.1: Marketing Basics

    Chapter 1.1: Marketing Basics

    A popular three-day outdoor concert event for country or rock music must be held at a location near the customer base and where nearby businesses, such as hotels and restaurants, can accommodate the needs of the fans. Promotion. is essential...
  • Scientific Method - Michigan State University

    Scientific Method - Michigan State University

    Pragmatic action - how we put the scientific approach into practice In science, correct method is the one that is agreed upon and used appropriately Want agreement and consensus 4. ... When is something true? Use inductive reasoning to decide...
  • Literary Romanticism, Realism and Naturalism

    Literary Romanticism, Realism and Naturalism

    Literary Romanticism, Realism and Naturalism . Romanticism. In its prime during the Renaissance. Usually a story that brings the reader to escape from reality. Deals with distant lands and times. More exciting and adventurous than real life.
  • Developing a Spoken Tutorial Dialogue System

    Developing a Spoken Tutorial Dialogue System

    Preliminaries: subjectivity of word w Subjectivity of word w Subjectivity of word w Subjectivity of word sense wi Method -Step 1 Given word w Find distributionally similar words [Lin 1998] DSW = {dswj | j = 1 .. n} Experiment...
  • BLOCK REVIEW - Sewanhaka High School

    BLOCK REVIEW - Sewanhaka High School

    8. Gravity Model (bottom) Effect of distance on interaction (inverse) 1) larger places attract people, ideas, and commodities more than smaller places, 2) places closer together have a greater attraction . 3) world class cities will have a high relationship...
  • Lecture 3 - Turn Based Stochastic Games

    Lecture 3 - Turn Based Stochastic Games

    Theorem: For a stopping TBSG:1. The optimality equations have a unique solution.2. ... Can still use value iteration and some form of policy iteration. The Value Iteration Operator. ??(?)is the optimal value vector of an ?-step game with . ......
  • Factors that Affect Climate -

    Factors that Affect Climate -

    Continentality: How Does It Affect Climate? Remember: continentality is the effect of location on a continent on the climate of a place. Inland locations typically have larger temperature ranges and (possibly) drier conditions than maritime locations. ... The formula for...