Handling of missing values - PowerPoint PPT Presentation

About This Presentation

Title:

Handling of missing values

Description:

Handling of missing values in lexical acquisition N ria Bel Universitat Pompeu Fabra By Automatic Lexical Information Acquisition we .. try to find how to build ... – PowerPoint PPT presentation

Number of Views:54

Avg rating:3.0/5.0

Slides: 17

Provided by: rc072

Learn more at: http://www.lrec-conf.org

Category:

more less

Transcript and Presenter's Notes

Title: Handling of missing values

1

Handling of missing values
in lexical acquisition
Núria Bel
Universitat Pompeu Fabra

2
By Automatic Lexical Information Acquisition we
..

try to find how to build repositories of
language dependent lexical information
automatically. Many technologies behind
applications (MT, IE, Automatic Summarization,
Sentiment Analysis, Opinion Mining, Question
Answering, etc.) do need this information to work

("paralelo" AST ALO
"paralel" ATR POST CL (PF-AS
PM-OS SF-A SM-O) FC (NPP) LY
AMENTE MC ("a") PLC (NG)
PRED (ESTAR SER) TA (OBJ-P REL)
AUTHOR "juan" DATE "31-Aug-99" SITE
"FB52")
("fiesta" NST ALO
"fiest" CL (PF-AS SF-A) GD
(F) KN MS PLC (NF) TYN
(ABS) AUTHOR "juan" DATE
"28-Aug-99" SITE "FB52")
Entries borrowed from MT system Incyta (Metal
family)
3
Cue Based Lexical Acquisition

Differences in the distribution of certain
contexts separate words of different classes
(Harris, 1951).
For example some / many mud
Words (types) can be represented in terms of a
collection of contexts where their occurrence or
not in these contexts is taken as hints or cues
for a word to be classified as being of a
particular class.

4
Words occurrences are represented as vectors and
used to train a classifier.

_at_data
15,2,8,4,0,8,1,0,1,0,0,0,0,0
Number of times the word has been observed in
each of the defined contexts.
Non occurrence in particular contexts is as
informative as occurrence.
We use supervised classifiers (Support Verb
Machines, Decision Trees) to predict the class
(Abstract, Mass, etc.) of new words.

5
Cues, classification and state-of-the-art results

Merlo and Stevenson (2001) selected very specific
cues for classifying verbs into a number of Levin
(1993) based verbal classes animacy of the
subject, passives, ...
Baldwin (2005) used general features, such as the
pos tags of neighboring words for type
classification.
Joanis et al. (2007) used the frequency of filled
syntactic positions or slots, tense and voice of
occurring verbs, etc., to describe the whole
system of English verbal classes.
Difficult to compare the results, but .. an
accuracy of about 70

6
The problem missing values
7
The Sparse data problem

Joanis and Stevenson, 2003 Joanis et al. 2007
Korhonen et al. 2008 mention that they have to
face the problem of sparse data, many of the
types/words are low in frequency and show up very
little information.
Most of the words will appear very little (i.e.
Zipff distribution) and therefore will show few
cues.
Yallop et al. (2005) calculated that in the
100M-word British National Corpus, from a total
of 124,120 distinct adjectives, 70,246 occur only
once.
The cues we can use as information are mutually
exclusive, i.e. an adjective can be prenominal
and postnominal, but if it only occurs once, it
will only show one cue, the other ones being a
zero value.
Even when appearing more frequently, the optional
nature and variety of the contexts of occurrence
are the origin of missing values also for those
types that occur more than once.

8
Zero values and learning

Zero values create not only a problem of enough
information to decide, but a further uncertainty
when learning from the data.
A zero value could be indeed a negative value,
i.e. the cue is that it has not been observed,
but it could be that the cue was just not
observed in the examined corpus because of
various reasons
When there are many zero values, the cue loses
its predictive power because of the mentioned
uncertainty.
Katz (1987) and Baayen and Sproat (1996), among
others, acknowledged the importance of
preprocessing low frequency events and Joanis et
al. (2007) also decided to smooth the data, even
working with more than 1000 occurrences per verb
in the BNC.

9
Our smoothing experiment Harmonization based
on linguistic information
10
Intuitively How likely is that a 0 is just an
unobserved feature and not a true 0, given the
values of other observations?

To classify Abstract/Concrete nouns in English
Cue 1 is suffix ness, -ism, . For
Abstracts (Light 1996)
Cue 2 is determiners such, little, much ..
For Abstracts
Cue 3 is adjectives like big, small, For
Concrete
P(cue_110,1,0)
P(abstractyes0,1,0) P(cue_11abstractyes)
P(abstractno0,1,0) P(cue_11abstractno)

We use the information of observed features to
assess the likelihood of a particular unobserved
cue.
Harmonization is substituting 0 values by the
likelihood of being 1 given the other cues
observed.
BUT
In order to get P(cue_110,1,0) we need to
have P(cue_nclass) and for all cues in the
vector.

12
The challenge how to get P(cue_nclass) with so
many 0s in the data ? By estimating the
P(cue_nclass) with linguistic information

Abstract Concrete
Suffixno 0.5 1.0
Suffixyes 0.5 0.0
SC_Adjno 1.0 0.5
SC_Adjyes 0.0 0.5
The probability of being Concrete and having
suffix ness is 0

13
Harmonization effects in Spanish Mass experiment
Harmonized Frequency types
0,1,0,1,0,1,1,0,1,0,0,1,1,0 0,3,0,1,0,1,1,0,1,0,0,1,1,0 agua (water)
1,1,0.5,0.5,0.5,1,1,1,1,0,0,0,0,0 1,2,0,0,0,2,1,1,2,0,0,0,0,0 acero (steel)
0.5,0.5,0.5,0.5,0.5,0.5,1,0.5,0.5,0,0,0,0,0 0,0,0,0,0,0,1,0,0,0,0,0,0,0 desabastecimiento (shortage)
0.02,0.02,0.02,0.02,0.02,0.02,0.02,0.02,0.02,0.47,0.47,0.47,0.47,0.47 0,0,0,0,0,0,0,0,0,0,0,0,0,0 aceptabilidad (acceptability)
14
Results of the experiments