Developing a Spoken Tutorial Dialogue System

Developing a Spoken Tutorial Dialogue System

Word Sense and Subjectivity Jan Wiebe Rada Mihalcea University of Pittsburgh University of North Texas Introduction Growing interest in the automatic extraction of opinions, emotions, and sentiments in text (subjectivity) Subjectivity Analysis: Applications

Opinion-oriented question answering: How do the Chinese regard the human rights record of the United States? Product review mining: What features of the ThinkPad T43 do customers like and which do they dislike? Review classification: Is a review positive or negative toward the movie? Tracking emotions toward topics over time: Is anger ratcheting up or cooling down toward an issue or event? Etc. Introduction

Continuing interest in word sense Sense annotated resources being developed for many languages www.globalwordnet.org Active participation in evaluations such as SENSEVAL Word Sense and Subjectivity Though both are concerned with text meaning, they have mainly been investigated independently

Subjectivity Labels on Senses S Alarm, dismay, consternation (fear resulting from the awareness of danger) O Alarm, warning device, alarm system (a device that signals the occurrence of some undesirable event) Subjectivity Labels on Senses S Interest, involvement -- (a sense of concern with and curiosity about someone or something; "an interest in

music") O Interest -- (a fixed charge for borrowing money; usually a percentage of the amount borrowed; "how much interest do you pay on your mortgage?") WSD using Subjectivity Tagging He spins a riveting plot which grabs and holds the readers interest. Sense 4 a sense of concern with and curiosity about someone or something S Sense 1 a fixed charge for borrowing money O

Sense 4 Sense 1? WSD System Sense 1 Sense 4? The notes do not pay interest. WSD using Subjectivity Tagging He spins a riveting plot which grabs and holds the readers interest. S Subjectivity Classifier

Sense 4 a sense of concern with and curiosity about someone or something S Sense 1 a fixed charge for borrowing money O O Sense 4 Sense 1? WSD System Sense 1 Sense 4? The notes do not pay interest.

WSD using Subjectivity Tagging He spins a riveting plot which grabs and holds the readers interest. S Subjectivity Classifier Sense 4 a sense of concern with and curiosity about someone or something S Sense 1 a fixed charge for borrowing money O O Sense 4

Sense 1? WSD System Sense 1 Sense 4? The notes do not pay interest. Subjectivity Tagging using WSD S O? He spins a riveting plot which grabs and holds the readers interest. Subjectivity Classifier

O S? The notes do not pay interest. Subjectivity Tagging using WSD S O? He spins a riveting plot which grabs and holds the readers interest. Sense 4 S Sense 4 a sense of Subjectivity Classifier concern with and curiosity about someone or something

O Sense 1 a fixed charge WSD System for borrowing money O S? Sense 1 The notes do not pay interest. Subjectivity Tagging using WSD S O? He spins a riveting plot which grabs and holds the readers interest.

Sense 4 S Sense 4 a sense of Subjectivity Classifier concern with and curiosity about someone or something O Sense 1 a fixed charge WSD System for borrowing money O S?

Sense 1 The notes do not pay interest Goals Explore interactions between word sense and subjectivity Can subjectivity labels be assigned to word senses? Manually Automatically Can subjectivity analysis improve word sense disambiguation? Can word sense disambiguation improve subjectivity analysis? Future work

Outline Motivation and Goals Assigning Subjectivity Labels to Word Senses Manually Automatically Word Sense Disambiguation using Automatic Subjectivity Analysis Conclusions

Prior Work on Subjectivity Tagging Identifying words and phrases associated with subjectivity Think ~ private state; Beautiful ~ positive sentiment Hatzivassiloglou & McKeown 1997; Wiebe 2000; Kamps & Marx 2002; Turney 2002; Esuli & Sabastiani 2005; Etc Subjectivity classification of sentences, clauses, phrases, or word instances in context subjective/objective; positive/negative/neutral Riloff & Wiebe 2003; Yu & Hatzivassiloglou 2003; Dave et al 2003; Hu & Liu 2004; Kim & Hovy 2004; Etc.

Here: subjectivity labels are applied to word senses Outline Motivation and Goals Assigning Subjectivity Labels to Word Senses Manually Automatically

Word Sense Disambiguation using Automatic Subjectivity Analysis Conclusions Annotation Scheme Assigning subjectivity labels to WordNet senses S: subjective O: objective B: both Annotators are given the synset and its hypernym S

Alarm, dismay, consternation (fear resulting form the awareness of danger) Fear, fearfulness, fright (an emotion experiences in anticipation of some specific pain or danger (usually accompanied by a desire to flee or fight)) Subjective Sense Definition When the sense is used in a text or conversation, we expect it to express subjectivity, and we expect the phrase/sentence containing it to be subjective. Objective Senses: Observation

We dont necessarily expect phrases/sentences containing objective senses to be objective Would you actually be stupid enough to pay that rate of interest? Will someone shut that darn alarm off? Subjective, but not due to interest or alarm Objective Sense Definition When the sense is used in a text or conversation, we dont expect it to express subjectivity and, if the phrase/sentence containing it is subjective,

the subjectivity is due to something else. Senses that are Both Covers both subjective and objective usages Example: absorb, suck, imbibe, soak up, sop up, suck up, draw, take in, take up (take in, also metaphorically; The sponge absorbs water well; She drew strength from the Ministers Words) Annotated Data 64 words; 354 senses

Balanced subset [32 words; 138 senses]; 2 judges The ambiguous nouns of the SENSEVAL-3 English Lexical Task [20 words; 117 senses]; 2 judges [Mihalcea, Chklovski & Kilgarriff, 2004] Others [12 words; 99 senses]; 1 judge Annotated Data: Agreement Study 64 words; 354 senses Balanced subset [32 words; 138 senses]; 2 judges 16 words have both S and O senses 16 words do not (8 only S and 8 only O) All subsets balanced between nouns and verbs Uncertain tags also permitted

Inter-Annotator Agreement Results Overall: Kappa=0.74 Percent Agreement=85.5% Inter-Annotator Agreement Results Overall: Kappa=0.74 Percent Agreement=85.5% Without the 12.3% cases when a judge is U:

Kappa=0.90 Percent Agreement=95.0% Inter-Annotator Agreement Results Overall: Kappa=0.74 Percent Agreement=85.5% 16 words with S and O senses: Kappa=0.75 16 words with only S or O: Kappa=0.73 Comparable difficulty

Inter-Annotator Agreement Results 64 words; 354 senses The ambiguous nouns of the SENSEVAL-3 English Lexical Task [20 words; 117 senses] 2 judges U tags not permitted Even so, Kappa=0.71 Outline Motivation and Goals Assigning Subjectivity Labels to Word Senses Manually Automatically

Word Sense Disambiguation using Automatic Subjectivity Analysis Conclusions Related Work unsupervised word-sense ranking algorithm of [McCarthy et al 2004] That task: approximate corpus frequencies of word senses Our task: predict a word-sense property (subjectivity)

method for learning subjective adjectives of [Wiebe 2000] That task: label words Our task: label word senses Overview Main idea: assess the subjectivity of a word sense based on information about the subjectivity of a set of distributionally similar words in a corpus annotated with subjective expressions MPQA Opinion Corpus

10,000 sentences from the world press annotated for subjective expressions [Wiebe at al., 2005] www.cs.pitt.edu/mpqa Subjective Expressions Subjective expressions: opinions, sentiments, speculations, etc. (private states) expressed in language Examples

His alarm grew. The leaders roundly condemned the Iranian Presidents verbal assault on Israel. He would be quite a catch. That doctor is a quack. Preliminaries: subjectivity of word w Unannotated Corpus (BNC) Lin 1998 Annotated Corpus

(MPQA) subj(w) = #insts(DSW) in SE - #insts(DSW) not in SE #insts (DSW) DSW = {dsw1, , dswj} Subjectivity of word w Unannotated Corpus (BNC) Annotated Corpus (MPQA) subj(w) = #insts(DSW) in SE - #insts(DSW) not in SE #insts (DSW)

[-1, 1] [highly objective, highly subjective] DSW = {dsw1, , dswj} Subjectivity of word w Unannotated Corpus (BNC) Annotated Corpus (MPQA) dsw1inst1 +1 dsw1inst2

-1 dsw2inst1 +1 subj(w) = +1 -1 +1 3 DSW = {dsw1,dsw2} = 1/3 Subjectivity of word sense wi Annotated

Corpus (MPQA) [-1, 1] subj(wi) = Rather than 1, add or subtract sim(wi,dswj) dsw1inst1 +sim(wi,dsw1) dsw1inst2 -sim(wi,dsw1)

dsw2inst1 +sim(wi,dsw2) +sim(wi,dsw1) - sim(wi,dsw1) + sim(wi,dsw2) 2 * sim(wi,dsw1) + sim(wi,dsw2) Method Step 1 Given word w Find distributionally similar words [Lin 1998] DSW = {dswj | j = 1 .. n} Experiment with top 100 and 160

Method Step 2 DSW1 DSW2 word w = Alarm Panic Detector Sense w1 fear sim(w1,panic) sim(w1,detector

) Sense w2 a sim(w2,panic) sim(w2, detector) resulting from the awareness of danger device that signals the occurrence of some undesirable Method Step 2

Find the similarity between each word sense and each distributionally similar word wnss1( wsi , dsw j ) sim( wsi , dsw j ) wnss1(ws , dsw ) i' j i 'senses ( w ) wnss1( wsi , dsw j )

max ksenses( dsw j ) wnss ( wsi , dswkj ) wnss can be any concept-based similarity measure between word senses we use Jiang & Conrath 1997 Method Step 2 Find the similarity between each word sense and each distributionally similar word

wnss1( wsi , dsw j ) sim( wsi , dsw j ) wnss1(ws , dsw ) i' j i 'senses ( w ) wnss1( wsi , dsw j ) max ksenses( dsw j )

wnss ( wsi , dswkj ) wnss can be any concept-based similarity measure between word senses we use Jiang & Conrath 1997 Method Step 2 Find the similarity between each word sense and each distributionally similar word wnss1( wsi , dsw j ) sim( wsi , dsw j )

wnss1(ws , dsw ) i' j i 'senses ( w ) wnss1( wsi , dsw j ) max ksenses( dsw j ) wnss ( wsi , dswkj )

wnss can be any concept-based similarity measure between word senses we use Jiang & Conrath 1997 Method Step 2 Find the similarity between each word sense and each distributionally similar word wnss1( wsi , dsw j ) sim( wsi , dsw j ) wnss1(ws , dsw ) i' j

i 'senses ( w ) wnss1( wsi , dsw j ) max ksenses( dsw j ) wnss ( wsi , dswkj ) wnss can be any concept-based similarity measure between word senses we use Jiang & Conrath 1997

Method Step 2 Find the similarity between each word sense and each distributionally similar word wnss1( wsi , dsw j ) sim( wsi , dsw j ) wnss1(ws , dsw ) i' j i 'senses ( w ) wnss1( wsi , dsw j )

max ksenses( dsw j ) wnss ( wsi , dswkj ) wnss can be any concept-based similarity measure between word senses we use Jiang & Conrath 1997 Method Step 3 Input: word sense wi of word w DSW = {dswj | j = 1..n} sim(wi,dswj)

MPQA Opinion Corpus Output: subjectivity score subj(wi) Method Step 3 j totalsim = #insts(dswj) * sim(wi,dswj) 1 subj = 0 for each dswj in DSW: for each instance k in insts(dswj): if k is in a subjective expression: subj += sim(wi,dswj) else:

subj -= sim(wi,dswj) subj(wi) = subj / totalsim Method Optional Variation if k is in a subjective expression: subj += sim(wi,dswj) else: subj -= sim(wi,dswj) w1 w2 dsw1 dsw2 dsw3 dsw1 dsw2 dsw3

w3 dsw1 dsw2 dsw3 Selected Evaluation Calculate subj scores for all word senses, and sort them While 0 is a natural candidate for division between S and O, we perform the evaluation for

different thresholds in [-1,+1] Calculate the precision of the algorithm at different points of recall Evaluation Automatic assignment of subjectivity for 272 word senses (no DSW instances for 82 senses) Baseline: random selection of S labels Number of assigned S labels matches number of S labels in the gold standard (recall = 1.0) Evaluation: precision/recall curves

1 baseline 0.9 selected 0.8 all Precision 0.7 0.6 0.5

0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Recall 1 Number of distri-butionally similar words = 160

Evaluation Break-even point Point where precision and recall are equal Number of Algorithm DSW similarity-all 100 similarity-selected 100 similarity-all 160 similarity-selected 160 baseline

- Break-even point 0.41 0.50 0.43 0.50 0.27 Outline Motivation and Goals Assigning Subjectivity Labels to Word Senses Manually

Automatically Word Sense Disambiguation using Automatic Subjectivity Analysis Conclusions Overview

Augment an existing WSD system with a feature reflecting the subjectivity of the context of the ambiguous word Compare the performance of original and subjectivity-aware WSD systems The ambiguous nouns of the SENSEVAL-3 English Lexical Task SENSEVAL-3 data Original WSD System Integrates local and topical features: Local: context of three words to the left and right, their part-of-speech Topical: top five words occurring at least three times in the context of a word sense

[Ng & Lee, 1996], [Mihalcea, 2002] Nave Bayes classifier [Lee & Ng, 2003] Automatic Subjectivity Classifier Rule-based automatic sentence classifier from [Wiebe & Riloff 2005] Included in OpinionFinder; available at: www.cs.pitt.edu/mpqa/

Subjectivity Tagging for WSD Used to tag sentences of the SENSEVAL-3 data that contain target nouns Sentencej Sentencek interest interest atmosphere S Subjectivity Classifier O

S Sentencei WSD using Subjectivity Tagging Sentencei Original WSD System interest Sense 4 Sense 1

S Subjectivity Classifier S, O, or B Subjectivity Aware WSD System Sense 4 a fixed charge for borrowing money Sense 1 a sense of concern with and curiosity about someone or something

Words with S and O Senses Word argument atmosphere difference difficulty image interest judgment plan sort source Average Classifier Senses Baseline basic + subj

5 49.4% 51.4% 54.1% 6 65.4% 65.4% 66.7% 5 40.4% 54.4% 57.0% 4 17.4% 47.8% 52.2% 7 36.5% 41.2% 43.2% 7 41.9% 67.7% 68.8% 7 28.1% 40.6% 43.8% 3 81.0% 81.0% 81.0% 4 65.6% 66.7% 67.7% 9 40.6% 40.6% 40.6% 46.6% 55.6% 57.5% < < < < <

< < = < = 4.3% error reduction; significant (p < 0.05 paired t-test) Words with Only O Senses Word arm audience bank degree disc organization paper

party performance shelter Average Classifier Senses Baseline basic + subj 6 82.0% 85.0% 84.2% 4 67.0% 74.0% 74.0% 10 62.6% 62.6% 62.6% 7 60.9% 71.1% 71.1% 4 38.0% 65.6% 66.4% 7 64.3% 64.3% 64.3% 7 25.6% 49.6% 48.0% 5 62.1% 62.9% 62.9% 5 26.4% 34.5% 34.5%

5 44.9% 65.3% 65.3% 53.3% 63.5% 63.3% > = = = < = > = = = Conclusions

Can subjectivity labels be assigned to word senses? Manually Good agreement; Kappa=0.74 Very good when uncertain cases removed; Kappa=0.90 Automatically Method substantially outperforms baseline Showed feasibility of assigning subjectivity labels to the fine-grained level of word senses Conclusions Can subjectivity analysis improve word sense disambiguation? Improves performance, but mainly for words with both S and

O senses (4.3% error reduction; significant (p < 0.05)) Performance largely remains the same or degrades for words that dont Assign subjectivity labels to WordNet; WSD system should consult WordNet tags to decide when to pay attention to the contextual subjectivity feature. Thank You Refining WordNet Semantic Richness Find inconsistencies and gaps

Verb assault attack, round, assail, last out, snipe, assault (attack in speech or writing) The editors of the left-leaning paper attacked the new House Speaker But no sense for the noun as in His verbal assault was vicious Observation MPQA corpus Corpus somewhat noisy for our task MPQA annotates subjective expressions Objective senses can appear in subjective expressions Hypothesis: subjective senses tend to appear

more often in subjective expressions than objective senses do, and so the appearance of words in subjective expressions is evidence of sense subjectivity WSD using Subjectivity Tagging Hypothesis: instances of subjective senses are more likely to be in subjective sentences, so sentence subjectivity is an informative feature for WSD of words with both subjective and objective senses Subjective Sense Examples He was boiling with anger Seethe, boil (be in an agitated emotional state; The customer was seething with anger)

Be (have the quality of being; (copula, used with an adjective or a predicate noun); John is rich; This is not a good answer) Subjective Sense Examples Whats the catch? Catch (a hidden drawback; it sounds good but whats the catch?) Drawback (the quality of being a hindrance; he pointed out all the drawbacks to my plan) That doctor is a quack. Quack (an untrained person who pretends to be a

physician and who dispenses medical advice) Doctor, doc, physician, MD, Dr., medico Objective Sense Examples The alarm went off Alarm, warning device, alarm system (a device that signals the occurrence of some undesirable event) Device (an instrumentality invented for a particular purpose; the device is small enough to wear on your wrist; a device intended to conserve water The water boiled Boil (come to the boiling point and change from a liquid to

vapor; Water boils at 100 degrees Celsius) Change state, turn (undergo a transformation or a change of position or action) Objective Sense Examples He sold his catch at the market Catch, haul (the quantity that was caught; the catch was only 10 fish) Indefinite quantity (an estimated quantity) The ducks quack was loud and brief Quack (the harsh sound of a duck) Sound (the sudden occurrence of an audible event)

Recently Viewed Presentations

  • Voices of the World

    Voices of the World

    Because every word in the English language is not used nationally. Geographers have established word usage boundaries, known as isoglosses. English is spoken by over ½ billion people and is the official language of over 50 countries. The English language...
  • Closing the Gap? Critical perspectives of the implementation

    Closing the Gap? Critical perspectives of the implementation

    This research will use a qualitative case study methodology, and consult decolonising methodologies throughout the research. Research design: The proposed research will follow a qualitative case study design. Involving a . selection of case studies .
  • First steps - University College London

    First steps - University College London

    Submissions to Turnitin are made via UCL Moodle. You will be able to view your . Originality Report [Submissions are anonymised] [Your assignment grades will be available in Moodle] Feedback will be provided in [Moodle/paper copy/email] Advice, support and a...
  • Biology of Fishes ENV 422/NRE 422/EEB440

    Biology of Fishes ENV 422/NRE 422/EEB440

    Important considerations in fish locomotion. We can identify some fishes that are specialized for one trait, however, most fishes use a variety of modes of swimming and are locomotor generalists as opposed to locomotor specialists. Most fishes must cruise to...
  • Ethical Guidelines for Research

    Ethical Guidelines for Research

    HPTN Ethics Guidance for Research Kathleen M. MacQueen (FHI) Jeremy Sugarman (Duke University) On behalf of the HPTN Ethics Working Group Quarraisha Abdool Karim (Chair), Ronald Bayer, Solomon R. Benatar, Marge Chigwanda, Dennis Dixon, Deborah Donnell, Laura Guay, Stella Kirkendale,...
  • Finding Buried Treasure or Marooned on a Deserted

    Finding Buried Treasure or Marooned on a Deserted

    The largest RTO by sales volume is PJM Interconnection, which covers a region that includes all or part of 13 states and the District of Columbia with over 61 million people. In 2014, PJM produced 837,796 . Giga. watt hours...
  • Soil Erosion - Winston-Salem/Forsyth County Schools

    Soil Erosion - Winston-Salem/Forsyth County Schools

    Terracing and Contour plowing. Terracing and contour plowing slows down erosion by slowing runoff. Runoff . is water flowing on a slope. Fast runoff means great erosion. Slope has faster runoff. Terracing and contour plowing.
  • Evolution - School District of the Chathams

    Evolution - School District of the Chathams

    The Tempo of Evolution. Gradualism: gradual change over a long period of time leads to species formation. Punctuated Equilibrium: Periods of rapid change in species are separated by periods of little or no change. (Successful species may stay unchanged for...