Cross-lingual projection of Semantics - PowerPoint PPT Presentation

About This Presentation
Title:

Cross-lingual projection of Semantics

Description:

Translation = Recreation of text based on content and target ... annotation of German / English. Evaluation of semantic ... English/French part of ... – PowerPoint PPT presentation

Number of Views:27
Avg rating:3.0/5.0
Slides: 25
Provided by: sebasti60
Category:

less

Transcript and Presenter's Notes

Title: Cross-lingual projection of Semantics


1
Cross-lingual projection of Semantics
  • Sebastian Pado
  • IGK Colloquium
  • Dec 16th 2004

2
Overview
  1. Background Role Semantics
  2. Semantic Projection
  3. Current and Future Work

3
Framework Role semantics
  • Predicate-argument structure,
  • Theta roles, who did what to whom

Agent
Recipient
Theme
Peter gives Mary a book
NB. No treatment of discourse relations,
modality, negation, etc.
4
Flavours of role semantics
  • Top-down approach common, intuitively defined
    roleset for all verbs
  • give is Mary Recipient or Goal or Patient?
  • resemble Subj vs. Obj
  • Bottom-up approach Frame Semantics
  • Frames Conceptual rep of a situation Statement,
    Giving, Transaction
  • Each frame is introduced by a target say, give,
    buy
  • Roles are frame-specific

5
Frame Semantics
  • An Example Frame Giving
  • Targets give, hand out, receive
  • Roles Donor, Recipient, Theme
  • The Berkeley FrameNet Project
  • English Frame Lexicon
  • 200 Frames, 2.500 words (V/N/Adj)
  • Typically 3-6 roles per frame
  • Corpus of 60.000 annotated instances

6
Frame Semantics An Example
7
What do Role Semantics buy us?
  • Surface-independent representation
  • Solves the paraphrase problem
  • Peter gives the book to Mary
  • Mary receives the book from Peter
  • Flexible basis for QA, Inference etc.
  • Aljoscha Burchardts PhD
  • Common cross-lingual semantic rep

8
Semantic Role Assignment
  • Task Automatic tagging of roles on free text
  • Important for NLP applications
  • Linking (syntax-semantics interface)
  • Statistical modelling (as classification)
  • Frame semantically coherent targets
  • Targets show linking idiosyncrasies
  • Give Sub - Donor, Dobj - Theme, To-PP/Iobj - Rec
  • Get Sub - Rec, Dobj - Theme, From-PP - Donor
  • Needs lots of training data

9
Moving to another language
  • SALSA Manual creation and use of a German corpus
    with semantic annotation
  • Basis TIGER newspaper corpus, 1.5m words
  • English frames (mostly) work for German
  • Frame concept language-independent
  • But Annotation slow and error-prone
  • Total effort gt 10 person years
  • Can we use the English data for German?

10
Overview
  1. Background Role Semantics
  2. Semantic Projection
  3. Current and Future Work

11
Central idea Semantic Projection
  • Find a large, parallel bilingual corpus
  • E/G part of EUROPARL (25m words)
  • Assign semantic roles on English side
  • Train automatic tagger on English data
  • Project semantics over to German
  • Step 1 Find semantic equivalences via word
    alignment
  • Step 2 Project frame
  • Step 3 Project roles
  • Result Large German annotated corpus

12
Projection Example
Arriving
Arriving
Peter comes home
Peter kommt nach Hause
  • Three assumptions to make this work

13
Assumption 1
  • Semantic representation is parallel

Arriving
Arriving
Peter comes home
Peter kommt nach Hause
14
Semantic (im-)parallelism
  • Frame definition based on realisable roles
  • German and English typologically similar
  • Mostly, same frames evoked
  • Aspect is problematic
  • Proper differences
  • We finish by 12 oclock Activity_finish
  • Wir sind um 12 Uhr fertig Activity_done_state
  • Same aspect, lexicalised differently
  • I finish by saying
  • Abschliessend sage ich

15
Assumption 2
  • There is always parallel lexical material that is
    semantically equivalent

Arriving
Arriving
Peter comes home
Peter kommt nach Hause
16
(Im)parallelism of lexical material
  • We only need semantic parallelism, only for
    targets and roles
  • Dont care about discourse, modality, etc.
  • Dont care about exact wording
  • Insights from translation science
  • Translation Recreation of text based on content
    and target language norms
  • Frame structures propositional content
  • Specific register
  • Specific domain (no cultural differences)

17
Assumption 3
  • Word Alignment provides semantic equivalence

Arriving
Arriving
Peter comes home
Peter kommt nach Hause
18
Word Alignment as Semantic Equivalence
  • Current Word Alignment models use co-occurrence
    to determine alignment
  • But co-occurrence ! semantic equivalence
  • decide entscheiden Entscheidung treffen
  • insist bestehen darauf
  • Problems Phrasal verbs, Idioms, Support Verbs
    (Funktionsverbgefuege), Noise proper

19
Overview
  1. Background Role Semantics
  2. Semantic Projection
  3. Current and Future Work

20
Current Work (1)
  • Empirical assessment of assumptions
  • Manual annotation of parallel corpus sample
  • Independent annotation of German / English
  • Evaluation of semantic parallelism
  • Evaluation of lexical parallelism
  • Evaluation of automatic word alignment

21
Current Work (2)
  • Token-wise word alignment too noisy
  • decide - treffen Deciding?
  • Instead Find reliable type equivalences
  • Statistics over complete corpus, filtering
  • Removal of German collocations
  • Result German frame lexicon
  • Target x can evoke frames a,b,c
  • Project frame only if licensed by German lexicon

22
Current Work (3)
  • Projection of roles Find equivalences between
    constituents
  • Define pairwise similarities
  • Efficiently identify best match
  • Graph matching
  • Probabilistic model
  • Choice points
  • Definition of similarities
  • Bijective correspondence, yes or no?
  • Implementation

23
Future Work
  • Thorough Evaluation
  • Filtering
  • Projection will be noisy
  • Training a German semantic tagger
  • Evaluation wrt coverage, accuracy
  • Combination with manually annotated data (SALSA)
  • Using another language
  • English/French part of EUROPARL

24
Conclusion
  • Automatic creation of semantically annotated data
    for a new language
  • Projection of annotation from known
    languageusing a word-aligned parallel corpus
  • Theory in place
  • Potential Problems
  • Semantics may diverge
  • Lexical material may diverge
  • Word Alignment noisy
  • Empirical evaluation underway
Write a Comment
User Comments (0)
About PowerShow.com