Title: Supervised Categorization for Habitual versus Episodic Sentences
1Supervised Categorization for Habitual versus
Episodic Sentences
- Thomas Mathew
- tam52_at_georgetown.edu
- Graham Katz
- egk7_at_georgetown.edu
- Department of Linguistics
- Georgetown University
2Introduction
- Habitual sentences state general facts
- Describe properties of a class
- Bears eat blackberries
- Characteristic of specific individual
- Angus Young wears school uniforms on stage
- Is stative however main verb can be dynamic
- Episodic sentences report on a finite number of
specific events - Mary ate a steak
- Angus Young wore a school uniform twice this
week - Why the distinction matters ?
- Event extraction
- Document summarization
3Scope
- Determine automatically whether a sentence is
habitual or episodic on the basis of sentence
internal information - John smoked cigarettes when he was young
habitual - John smoked a cigarette this morning
episodic - Note Lexically stative predicates excluded
- Italians like wine
- Do not exhibit habitual/specific ambiguity
4Related Work
- Sangweon Suh (2006)
- Distinguish generic from specific NP reference in
context - Cats like tuna
- A cat ate the tuna
- Eric Siegel (1995), Michael Brent (1990)
- Determine whether verb is stative or eventive
- He called his father
- He resembles his father
- On basis of distribution of verbs with overt
features - Siegel (1995) uses co-occurrence frequencies of
14 features
5Approach
- Supervised Classification
- Built training corpus
- Selected features for machine learning
- Evaluated features
- Applied Machine Learning algorithms
-
6Annotation of Corpus
- Generated set of 1,816 sentences with 72 verb
types by - Randomly selecting sentences from Penn Treebank
(WSJ Brown) - Ignoring sentences with a lexically stative
predicate - Adding all sentences in Penn Treebank whose main
verb was a morphological variant of a verb from
initial set -
7Annotation of Corpus
- Annotated each sentence as habitual/episodic by
- Checking for explicit attribution
- Frequency adverbs (usually, often) habitual
- Quantificational temporals (every night)
habitual - Habitual past (used to) habitual
- Definite temporals (yesterday) episodic
- Tested whether sentence meaning changed by adding
modifier usually - No change in meaning indicated habitual
- Examining discourse context
- Assumed bunching of categories in a discourse
- Applying intuitive semantic judgment
- Single event or habit
8Data
- Verbs varied significantly in lexical bias
- report almost only episodic, require almost only
habitual - Final step
- Eliminated highly biased
- lexical verbs
- Final data set
- 1,052 sentences
- 57 verb forms
- Baseline distribution
9Features
- Selected 14 sentence internal features
- Features that can be derived from annotation
scheme of Penn Treebank - Evaluated features relevance to classification
- Compare feature distribution by category against
baseline -
10Tense
Hungarian Radio saves its most politically
outspoken broadcasts for around midnight
habitual Mickie laughed episodic
11Aspect
Everyone else was running episodic The school
has received letters from parents episodic
12Temporals
Every time I closed my eyes, I saw gray eyes
rushing at me with a knife habitual On
Tuesday, Trellborgs directors announced plans to
spin off two big divisions as separately quoted
companies on Stockholms stock exchange
episodic
13Subject Features
Commands go only from an office to the man of
nearest lower rank habitual The women
indicated which family member usually did
household chores episodic
14Object Features
Not surprisingly, he sometimes bites
habitual In Los Angeles, in our lean years, we
gave parties habitual Robert Bernstein,
chairman and president of Random House Inc.,
announced his resignation from the publishing
house he has run for 23 years episodic
15Conditionals
After all, gold prices soar when inflation is
high habitual
16Prepositional Features
Anheuser-Busch announced its plan at the same
time it reported third quarter net income rose a
lower-than-anticipated 5.2 to 238.3 million
episodic Treasury prices ended mixed in light
trading episodic You ve got blood on your
cheek episodic
17Feature Analysis Summary
- Reliable features for episodicity
- Less reliable features for habituality
18Feature Limitations
- Problem areas
- Semantics of predicate arguments
- She was moving like a ballet dancer
- She was moving in café society as Lady Diana
Harrington - Semantics of predicate
- He is meeting a girl from Brooklyn
- He is seeing a girl from Brooklyn
- Sentence-external factors (discourse)
- John rarely ate fruit. He just ate oranges
- John didnt eat much at breakfast. He just ate
oranges -
- Sentences with dual-category
- Too rare to analyze statistically
- After all, in all five recessions since 1960,
stocks declined
19Machine Learning
- Considered three classifiers
- Rule-based
- Association Rule Classifier
- Decision Tree (J48) Classifier
- Probabilistic
- Naïve Bayes
- Evaluated against baseline where all sentences
blindly with majority-class (episodic) - 73.1 overall precision
-
-
20Association Rule Classifier
- Applied Predictive Apriori algorithm (Scheffer
2004) for multivariate analysis - Algorithm generates n-best feature patterns
predicting a category - Manually pruned results
- Only patterns selecting for episodicity gt 85
- Only patterns selecting for habituality gt 80
- If R1 Ì R2, discard R2
- If sorted list R1, R2 .. Rn has same coverage
as R1, R2 .. Rn1 for category, discard Rn1 - Model
- 4 patterns (213) are habitual 173 times
- 11 patterns (882) are episodic 735 times
21Association Rule based Classifier
22Decision Tree (J48) Classifier
- Wekas implementation of C4.5
- Used ten-fold cross validation for evaluation
- Model
- 2 patterns (184) are habitual 161 times
- 2 patterns (829) are episodic 727 times
23Decision Tree (J48) Classifier
- Impact of feature groups (J48)
- All select roughly the same number of episodic
sentences - Variation is more on habitual/incorrect sentences
-
24Results
- Classifier Performance
-
- 1 Not evaluated using an independent validation
set
- Habituality Recall
- Tense and presence of a quantificational temporal
are best indicators of habituality - However both do not provide sufficient coverage
of habitual examples by themselves
25Conclusion
- Syntactic features is a viable method for
category disambiguation - Identification of episodic sentences outperforms
identification of habitual sentences - There are more overt markers of habituality
however more features show bias for episodicity - Performance
- Impact of lexical verb and sentence external
features - Feature extraction process in some cases
approximation - Annotation errors/consistency in corpus
26Future Work
- Impact of discourse
- Independently annotate sentence, predecessor,
successor in isolated context - Weighting factor for ambiguous situations
- Annotate sentence, predecessor, successor
conscious of context -
27? Questions ?