Semantic Annotation for Interlingual Representation of Mulilingual Texts - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Semantic Annotation for Interlingual Representation of Mulilingual Texts

Description:

Need to use some inference to know that the meaning is the same. LREC 2004 Workshop ... of source meaning. Translator error ... Start capturing meaning: ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 29
Provided by: Eduar61
Category:

less

Transcript and Presenter's Notes

Title: Semantic Annotation for Interlingual Representation of Mulilingual Texts


1
Semantic Annotation for Interlingual
Representation of Mulilingual Texts
  • Teruko Mitamura (CMU), Keith Miller (MITRE),
  • Bonnie Dorr (Maryland), David Farwell (NMSU),
    Nizar Habash (Columbia), Stephen Helmreich
    (NMSU), Eduard Hovy (ISI), Lori Levin (CMU), Owen
    Rambow (Columbia),
  • Flo Reeder (MITRE), Advaith Siddharthan
    (Columbia)
  • LREC 2004 Workshop Beyond Named Entity
    Recognition
  • Semantic labelling for NLP tasks

2
(No Transcript)
3
IAMTC (Interlingua Annotation of Multilingual
Corpora) Project
  • Collaboration
  • New Mexico State University
  • University of Maryland
  • Columbia University
  • MITRE
  • Carnegie Mellon University
  • ISI, University of Southern California

4
Goals of IAMTC
  • Interlingua design
  • Three levels of depth
  • Annotation methodology
  • manuals, tools, evaluations
  • Annotated multi-parallel texts
  • Foreign language original and multiple English
    translations
  • Foreign languages Arabic, French, Hindi,
    Japanese, Korean, Spanish

5
Getting at Meaning(Two translations of Korean
original text)
  • Starting January 1st
  • of next year
  • customers of SK Telecom
  • can change their service company to
  • LG Telecom or KTF
  • Once a service company swap has been made,
  • customers
  • are not allowed to change
  • companies again
  • within the first three months,
  • although they can cancel
  • the change
  • anytime within 14 days
  • if problems
  • such as poor call quality
  • are experienced.
  • Starting on January 1
  • of next year,
  • SK Telecom subscribers
  • can switch to
  • less expensive LG Telecom or KTF.
  • The Subscribers
  • cannot switch again
  • to another provider
  • for the first 3 months,
  • but they can cancel
  • the switch
  • in 14 days
  • if they are not satisfied with services
  • like voice quality.

6
Color Key
  • Black same meaning and same expression
  • Green small syntactic difference
  • Blue Lexical difference
  • Red Not contained in the other text
  • Purple Larger difference.
  • Need to use some inference to know that the
    meaning is the same

7
Getting at meaning(Two translations of a
Japanese original text)
  • This year,
  • which has already seen
  • the announcement
  • of the birth
  • of Mitsubishi Chemical Corporation
  • as well as
  • the continuous
  • numbers of big mergers,
  • may
  • too
  • be recorded
  • as the "year of the merger
  • for all we know.
  • This year,
  • too,
  • in addition to
  • the birth
  • of Mitsubishi Chemical,
  • which has already been announced,
  • other rather large-scale mergers
  • may continue,
  • and be recorded
  • as a "year of mergers."

More lexical similarity. More differences in
dependency relations.
8
Toward a Theory of Annotation
  • Recently, sharp increase in number of annotated
    resources being built
  • Penn Treebank, Propbank, many others
  • For annotation, need
  • Theory behind phenomena being annotated (for)
  • Annotation termsets (even WordNet, FrameNet,
    verbnet, HowNet)
  • Standard (?) annotation corpus (same old
    Treebank?)
  • Annotation toolsthey make an immense difference
  • Carefully considered annotation procedure
    (interleaving per text vs. per sentence, etc.)
  • Reconciliation and consistency checking
    procedures
  • Evaluation measures, appropriately defined

9
Corpus and Data
  • Initial Corpus
  • 10 texts in each language
  • 2 translations each into English
  • Interlingua designed for MT
  • Multiple English translations of same source show
    translation divergences. Some phenomena
  • Lexical level word changes
  • Syntactic level phrasing, thematization,
    nominalization
  • Semantic level additional/different content
  • Discourse level multi-clause structure, anaphor
  • Pragmatic level Speech Acts, implicatures,
    style, interpersonal
  • Causes of divergence
  • Genuine ambiguity/vagueness of source meaning
  • Translator error/reinterpretation

10
IL Development Staged, deepening
  • IL0 simple dependency tree gives structure
  • IL1 semantic annotations for Nouns, Verbs, Adjs,
    Advs, and Theta Roles
  • Not yet semanticbuy?sell, many remaining
    simplifications
  • Concept senses from ISIs Omega ontology
  • Theta Roles from Dorrs LCS work
  • Elaborate annotation manuals
  • Tiamat annotation interface
  • Post-annotation reconciliation process and
    interface
  • Evaluation scores annotator agreement
  • IL2 that comes next

11
Details of IL0
  • Deep syntactic dependency representation
  • Removes auxiliary verbs, determiners, and some
    function words
  • Normalizes passives, clefts, etc.
  • Includes syntactic roles (Subj, Obj)
  • Construction
  • Dependency parsed using Connexor (English)
  • Tapanainen and Jarvinen, 1997
  • Hand-corrected
  • Extensive manual and instructions on IAMTC Wiki
    website

12
Example of IL0
Sheikh Mohammed, who is also the Defense Minister
of the United Arab Emirates, announced at the
inauguration ceremony that we want to make Dubai
a new trading center
TrEd, Pajas, 1998
13
Example of IL0
  • Sheikh Mohammed, who is also the Defens Minister
    of the United Arab Emirates, announced at the
    inauguration ceremony that we want to make Dubai
    a new trading center
  • announced V Root
  • Mohamed PN Subj
  • Sheikh PN Mod
  • Defense_Minister PN Mod
  • who Pron Subj
  • also Adv Mod
  • of P Mod
  • UAE PN Obj
  • at P Mod
  • ceremony N Obj
  • inauguration N Mod

14
Details of IL1
  • Intermediate semantic representation
  • Annotations performed manually by each person
    alone
  • Associate open-class lexical items with Omega
    Ontology items
  • Replace syntactic relations by one of approx. 20
    semantic (theta) roles (from Dorr), e.g., AGENT,
    THEME, GOAL, INSTR
  • No treatment of prepositions, quantification,
    negation, time, modality, idioms, proper names,
    NP-internal structure
  • Nodes may receive more than one concept
  • Average about 1.2
  • Manual under development annotation tool built

15
Example of IL1
Sheikh Mohammed, who is also the Defense Minister
of the United Arab Emirates, announced at the
inauguration ceremony that we want to make Dubai
a new trading center
16
Example of IL1 internal representation
  • The study led them to ask the Czech government to
    recapitalize CSA at this level.
  • 3, lead, V, lead, Root, LEADltGET, GUIDE
  • 2, study, N, study, AGENT, SURVEYltWORK, REPORT
  • 4, they, N, they, THEME, ---, ---
  • 6, ask, V, ask, PROPOSITION, ---, ---
  • 9, government, N, government, GOAL,
    AUTHORITIES,
  • GOVERNMENTAL-ORGANIZATION
  • 8, Czech, Adj, Czech, MOD,
    CZECHCZECHOSLOVAKIA, ---
  • 11, recapitalize, V, recapitalize,
    PROP, CAPITALIZEltSUPPLY, INVEST
  • 12, csa, N, csa, THEME,
    AIRLINEltLINE, ---
  • 16, at, P, value_at, GOAL, ---,
    ---
  • 15, level, N, level, ---,
    DEGREE, MEASURE
  • 14, this, Det, this,
    ---, ---, ---

Semantic Roles
Concepts from the Omega Ontology
17
Details of IL2 In development
  • Start capturing meaning
  • Handle proper names one of around 5 classes
    (PERSON, LOCATION, TIME, ORGANIZATION)
  • Conversives (buy vs. sell) at the FrameNet level
  • Non-literal language usage (open the door to
    customers vs. start doing business)
  • Extended paraphrases involving syntax, lexicon,
    grammatical features
  • Possible incorporation of other standardized
    notations for temporal and spatial expressions
  • Still excluded
  • Quantification and negation
  • Discourse structure
  • Pragmatics

18
Omega ontology
  • Single set of all semantic terms, taxonomized and
    interconnected (http//omega.isi.edu)
  • Merger of existing ontologies and other
    resources
  • Manually built top structure from ISI
  • WordNet (110,000 nodes) from Princeton
  • Mikrokosmos (6000 nodes) from NMSU
  • Penman Upper model (300 nodes) from ISI
  • 1-million instances (people, locations) from ISI
  • TAP domain relations from Stanford
  • Undergoing constant reconciliation and pruning
  • Used in several past projects (metadata formation
    for database integration MT QA summarization)

19
Dependency parser and Omega ontology
Omega (ISI)110,000 concepts (WordNet,
Mikrokosmos, etc.), 1.1 mill instances URL
http//omega.isi.edu
Dependency parser (Prague)
20
Tiamat annotation interface
For each new sentence
Step 1 find Omega concepts for objects and events
Candidate concepts
Step 2 select event frame (theta roles)
21
Evaluation webpage
22
Evaluation
  • Three approaches to evaluation
  • Inter-annotator agreement completed
  • Sentence generation from extracted annotation
    structure to be completed
  • Comparison of interlingual structures (graph
    comparisons) not planned
  • Inter-annotator agreement Is the IL sufficiently
    defined to permit consistent annotation?
  • Impacts ontology, theta-roles coverage and
    precision

23
Annotation Issues
  • Post-annotation consistency checking
  • Novice annotators may make inconsistent
    annotations within the same text.
  • Intra-annotator consistency checking procedure
  • e.g. If two nodes in different sentences are
    co-indexed, then annotators must ensure that the
    two nodes carry the same meaning in the context
    of the two different sentences
  • Post-annotation reconciliation

24
2. Post-annotation reconciliation
  • Question How much can annotators be brought into
    agreement?
  • Procedure
  • Annotator sees all annotations, votes
    Yes/Maybe/No on each
  • Annotators then discuss all differences
    (telephone conf)
  • Annotators then vote again, independently
  • We collapse all Yes and Maybe votes, compare them
    with No to identify all serious disagreement
  • Result
  • Annotators derive common methodology
  • Small errors and oversights removed during
    discussion
  • Inter-annotator agreement improved
  • Serious problems of interpretation or error
    identified

25
Annotation across Translations
  • Question How different are the translations?
  • Procedure
  • Annotator sees annotations across both
    translations, identifies differences of form and
    meaning
  • Annotator selects true meaning(s)
  • Results (work still in progress)
  • Impacts ontology richness/conciseness
  • Improvement in Interlingua representation depth
  • Useful for IL2 design development
  • Observations
  • This is very hard work
  • Methodology unclear what is seen first, how to
    show alternatives, what to do with results

26
Principal problems to date
  • Proper nouns
  • Proposed solution automatically tag with one of
    6 types (Person, Location, Org, DateTime, etc.)
  • Noun compounds
  • Alternatives tag head only parse and tag whole
    structure
  • Omega is too rich
  • Hard to distinguish from the others
  • Granularity of concept selection
  • Light verbs
  • Proposed solution rephrase to remove light verb
    if possible (take a shower ? shower, but
    take a shower ? ?)
  • Vagueness and ambiguity
  • Annotate all plausible senses (propose as Urge
    and Suggest)
  • Idioms and metaphors
  • Proposed solution ?

27
Discussion and conclusion
  • Results are encouraging
  • But more work must be done to solidify them
  • Outcomeshow have we done?
  • IL design partly, and IL2 in the works
  • Annotation methodology, manuals, tools, evals
     yes
  • Annotated parallel texts approx. 150 done
  • Six texts, two translations, 10-12 annotators
  • Next steps
  • Foreign language annotation standards and tools
  • Development of IL2
  • Addressing coverage gaps (1/3 of open class words
    marked as having no concept)
  • Generation of surface structure from deep
    structure
  • Is it possible?

28
Contact information
  • URLs and Wiki pages
  • Project website http//aitc.aitcnet.org/nsf/iamtc
    /
Write a Comment
User Comments (0)
About PowerShow.com