Handling of missing values - PowerPoint PPT Presentation

About This Presentation
Title:

Handling of missing values

Description:

Handling of missing values in lexical acquisition N ria Bel Universitat Pompeu Fabra By Automatic Lexical Information Acquisition we .. try to find how to build ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 17
Provided by: rc072
Learn more at: http://www.lrec-conf.org
Category:

less

Transcript and Presenter's Notes

Title: Handling of missing values


1
  • Handling of missing values
  • in lexical acquisition
  • Núria Bel
  • Universitat Pompeu Fabra

2
By Automatic Lexical Information Acquisition we
..
  • try to find how to build repositories of
    language dependent lexical information
    automatically. Many technologies behind
    applications (MT, IE, Automatic Summarization,
    Sentiment Analysis, Opinion Mining, Question
    Answering, etc.) do need this information to work

("paralelo" AST ALO
"paralel" ATR POST CL (PF-AS
PM-OS SF-A SM-O) FC (NPP) LY
AMENTE MC ("a") PLC (NG)
PRED (ESTAR SER) TA (OBJ-P REL)
AUTHOR "juan" DATE "31-Aug-99" SITE
"FB52")
("fiesta" NST ALO
"fiest" CL (PF-AS SF-A) GD
(F) KN MS PLC (NF) TYN
(ABS) AUTHOR "juan" DATE
"28-Aug-99" SITE "FB52")
Entries borrowed from MT system Incyta (Metal
family)
3
Cue Based Lexical Acquisition
  • Differences in the distribution of certain
    contexts separate words of different classes
    (Harris, 1951).
  • For example some / many mud
  • Words (types) can be represented in terms of a
    collection of contexts where their occurrence or
    not in these contexts is taken as hints or cues
    for a word to be classified as being of a
    particular class.

4
Words occurrences are represented as vectors and
used to train a classifier.
  • _at_data
  • 15,2,8,4,0,8,1,0,1,0,0,0,0,0
  • Number of times the word has been observed in
    each of the defined contexts.
  • Non occurrence in particular contexts is as
    informative as occurrence.
  • We use supervised classifiers (Support Verb
    Machines, Decision Trees) to predict the class
    (Abstract, Mass, etc.) of new words.

5
Cues, classification and state-of-the-art results
  • Merlo and Stevenson (2001) selected very specific
    cues for classifying verbs into a number of Levin
    (1993) based verbal classes animacy of the
    subject, passives, ...
  • Baldwin (2005) used general features, such as the
    pos tags of neighboring words for type
    classification.
  • Joanis et al. (2007) used the frequency of filled
    syntactic positions or slots, tense and voice of
    occurring verbs, etc., to describe the whole
    system of English verbal classes.
  • Difficult to compare the results, but .. an
    accuracy of about 70

6
The problem missing values
7
The Sparse data problem
  • Joanis and Stevenson, 2003 Joanis et al. 2007
    Korhonen et al. 2008 mention that they have to
    face the problem of sparse data, many of the
    types/words are low in frequency and show up very
    little information.
  • Most of the words will appear very little (i.e.
    Zipff distribution) and therefore will show few
    cues.
  • Yallop et al. (2005) calculated that in the
    100M-word British National Corpus, from a total
    of 124,120 distinct adjectives, 70,246 occur only
    once.
  • The cues we can use as information are mutually
    exclusive, i.e. an adjective can be prenominal
    and postnominal, but if it only occurs once, it
    will only show one cue, the other ones being a
    zero value.
  • Even when appearing more frequently, the optional
    nature and variety of the contexts of occurrence
    are the origin of missing values also for those
    types that occur more than once.

8
Zero values and learning
  • Zero values create not only a problem of enough
    information to decide, but a further uncertainty
    when learning from the data.
  • A zero value could be indeed a negative value,
    i.e. the cue is that it has not been observed,
    but it could be that the cue was just not
    observed in the examined corpus because of
    various reasons
  • When there are many zero values, the cue loses
    its predictive power because of the mentioned
    uncertainty.
  • Katz (1987) and Baayen and Sproat (1996), among
    others, acknowledged the importance of
    preprocessing low frequency events and Joanis et
    al. (2007) also decided to smooth the data, even
    working with more than 1000 occurrences per verb
    in the BNC.

9
Our smoothing experiment Harmonization based
on linguistic information
10
Intuitively How likely is that a 0 is just an
unobserved feature and not a true 0, given the
values of other observations?
  • To classify Abstract/Concrete nouns in English
  • Cue 1 is suffix ness, -ism, . For
    Abstracts (Light 1996)
  • Cue 2 is determiners such, little, much ..
    For Abstracts
  • Cue 3 is adjectives like big, small, For
    Concrete
  • P(cue_110,1,0)
  • P(abstractyes0,1,0) P(cue_11abstractyes)
  • P(abstractno0,1,0) P(cue_11abstractno)

11
  • We use the information of observed features to
    assess the likelihood of a particular unobserved
    cue.
  • Harmonization is substituting 0 values by the
    likelihood of being 1 given the other cues
    observed.
  • BUT
  • In order to get P(cue_110,1,0) we need to
    have P(cue_nclass) and for all cues in the
    vector.

12
The challenge how to get P(cue_nclass) with so
many 0s in the data ? By estimating the
P(cue_nclass) with linguistic information
  • Abstract Concrete
  • Suffixno 0.5 1.0
  • Suffixyes 0.5 0.0
  • SC_Adjno 1.0 0.5
  • SC_Adjyes 0.0 0.5
  • The probability of being Concrete and having
    suffix ness is 0

13
Harmonization effects in Spanish Mass experiment
Harmonized Frequency types
0,1,0,1,0,1,1,0,1,0,0,1,1,0 0,3,0,1,0,1,1,0,1,0,0,1,1,0 agua (water)
1,1,0.5,0.5,0.5,1,1,1,1,0,0,0,0,0 1,2,0,0,0,2,1,1,2,0,0,0,0,0 acero (steel)
0.5,0.5,0.5,0.5,0.5,0.5,1,0.5,0.5,0,0,0,0,0 0,0,0,0,0,0,1,0,0,0,0,0,0,0 desabastecimiento (shortage)
0.02,0.02,0.02,0.02,0.02,0.02,0.02,0.02,0.02,0.47,0.47,0.47,0.47,0.47 0,0,0,0,0,0,0,0,0,0,0,0,0,0 aceptabilidad (acceptability)
14
Results of the experiments
  • Spanish Mass English Abstract
  • Experiment DT SVM DT SVM
  • Mean 74.2 63.8 57.8 61.0
  • Trimmed mead 77.5 67.4 55.6 61.0
  • Frequency 79.9 79.1 61.4 64.1
  • Harmonized 82.8 80.7 76.1 70.1
  • Baseline 74.8 61.5

15
Error Analysis Future work
  • Frequency information to filter noise has been
    neutralized
  • Future work is about how to handle missing values
    and noise together.

16
  • Thanks for your attention !
Write a Comment
User Comments (0)
About PowerShow.com