Title: Interlingua-based MT
1Interlingua-based MT
2Interlingua-based Machine Translation
Interlingua
- Syntactic transfer-based MT
- Couples the syntax of the two languages
- What if we abstract away the syntax
- All that remains is meaning
- Meaning is the same across languages
- Simplicity Only N components needed to translate
among N languages - Two small problems
- What is meaning?
- How do we represent meaning?
Semantic Interpretation
Semantic Generation
Syntactic Structure
Syntactic Structure
Transfer-based MT
Syntactic Generation
Parsing
Direct MT
Source
Target
English analyzer
Spanish analyzer
Japanese analyzer
Interlingual representation
Spanish Generator
Japanese Generator
English generator
3Example of Interlingua Machine Translation
Interlingua representation
4Ingredients of a semantic representation
- language neutral
- Syntactic variations should result is the same
semantics - sense of a word
- deep semantic role labels
- scope of quantifiers, adverbials, adjectives
- polarity information
- Distinguish between
- surface structure (syntactic structure) and
- deep structure (semantic structure) of
sentences. - Different forms of semantic representation
- logic formalisms
- ontology / semantic representation languages
- Case Frame Structures (Filmore)
- Conceptual Dependy Theory (Schank)
- Description Logic (DL) and similar KR languages
- Ontologies
5Text Meaning Representation
- Lexicon has two components
- Syntactic part
- Semantic constraints part
- Given a sentence, the syntactic part analyzes
the input syntactically and the semantic
constraints create semantic expressions that can
be evaluated. - Ontology specifies the type hierarchy
- Used for checking selectional restrictions
- Selectional restrictions used for word-sense
disambiguation - e.g. accident is an event organization has
humans
6Constructing a Semantic Representation
- General approach
- Start with surface structure derived from parser.
- Map surface structure to semantic structure
- Use phrases as sub-structures.
- Find concepts and representations for central
phrases (e.g. VP, NP, then PP) - Assign phrases to appropriate roles around
central concepts (e.g. bind PP into VP
representation).
7Semantic Representation
- Semantic Representations are based on some form
of (formal) Representation Language. - Semantics Networks
- Conceptual Dependency Graphs
- Case Frames
- Ontologies
- DL and similar KR languages
- Important note Difference between relations
between text strings and referents in the world.
8Ontology (Interlingua) approach
- Ontology a language-independent classification
of objects, events, relations - A Semantic Lexicon, which connects lexical items
to nodes (concepts) in the ontology - An analyzer that constructs Interlingua
representations and selects an appropriate one
9Semantic Lexicon
- Provides a syntactic context for the appearance
of the lexical item - Provides a mapping for the lexical item to a node
in the ontology (or more complex associations) - Provides connections from the syntactic context
to semantic roles and constraints on these roles
10Constructing an InterLingua Representation
- For each syntactic analysis
- Access all semantic mappings and contexts for
each lexical item. - Create all possible semantic representations.
- Test them for coherency of structure and content.
11Basic Semantic Dependency - Example
Input John makes tools Syntactic Analysis
cat verb root make tense present subject
 root john cat noun-proper object  roo
t   tool cat noun number plural
12Lexicon Entries for John and tool
John-n1 syn-struc root john cat noun-proper
sem-struc human name john gender
male
tool-n1 syn-struc root tool cat n sem-struc
tool
13Ontological Representation - Example
Relevant extract from the specification of the
ontological concept used to describe the
appropriate meaning of make manufacturing-activi
ty... agent human theme artifact
14Semantic Dependency Component
The basic semantic dependency component of the
Text Meaning Representation (TMR) for John
makes tools manufacturing-activity-7 agent human
-3 theme set-1 element tool cardinality gt
1
15semantic representation of try-v3
try-v3 syn-struc root try cat v subj
root var1 cat n xcomp root
var2 cat v form OR infinitive
gerund sem-struc set-1 element-type refsem-1
cardinality gt1 refsem-1 sem event agent
var1 effect refsem-2 modality modality-
type epiteuctic modality-scope refsem-2 mod
ality-value lt 1 refsem-2 value var2 sem ev
ent
Means non finished action outcome unclear
16Why is Iraq developing weapons of mass
destruction?
17Word sense Disambiguation
- Methods
- Constraint checking
- make sure the constraints imposed on context are
met - Graph traversal
- is-a links are inexpensive
- other links are more expensive
- the cheapest structure is the most coherent one
- Hunter-gatherer processing
- find (hunt) and eliminate (kill) unlikely
interpretations - collect (gather) remaining interpretations
18Ontological Semantics An example semantic
representation language
19Ontological semantics is a computationally
tractable theory of meaning in natural language
as well as a suite (OntoSem) of implemented NLP
programs and a set of static knowledge resources
that support these programs. Ontological
semantics deals directly with extraction, represen
tation and manipulation of text meaning. Ontosem
text analyzers produce interpreted knowledge
ready to be used in reasoning-heavy applications
that include question answering, cross-document
and cross-lingual text summarization, question
answering, machine translation and others.
Support of intelligent human-computer
interaction in domain- and task-oriented
environments is squarelywithin the purview of
ontological semantics.
20Ontological semantics concentrates on content of
representations and is adaptable to a number of
different representation formats. Ontological
semantics is both a producer and aconsumer of
knowledge deriving text meaning isitself a
knowledge-intensive task
21- OntoSem
- is devoted to processing naturally occurring
texts - strives for high-quality results first
followed by concern for broad coverage - expects unexpected inputs
- seeks quality heuristics of any provenance
(knowledge- based or probabilistic,
cooccurrence-based) - does not grant syntax a privileged position
among the providers of heuristics for
semantic processing - does not make a strong distinction between
semantics and pragmatics - is applicable to any natural language
22Ontological-semantic analyzers take natural
language texts as inputs and generate
machine- tractable text meaning representations
(TMRs) that form the basis of various reasoning
processes. Sample Input Sentence Iran, Iraq
and North Korea on Wednesday rejected an
accusation by President Bush that they are
developing weapons of mass destruction. The TMR
(presented graphically) for the above isas
follows
23Output A Text Meaning Representation (TMR)
This presentation is simplified the system, in
fact, derives much more from text event
instances are shown in ellipses object
instances, in rectangles only caserole and set
membership relations are shown (as labels on
links) numerical constraints can be fuzzy, as
in the cardinality of SET-1226.
24A pretty-printed fragment of the actual TMR
representation for sample input
25- Ontological-semantic systems centrally rely on
the following - static knowledge resources
- a language-independent ontology that
includes knowledge about types of
entities in the world, - e.g., ATHLETE, WELD or SPEED
- ontology-oriented lexicons (and onomasticons,
or lexicons of proper names) for each
natural language in the system and - a fact repository containing instances of
ontological concepts, e.g., Andre
Agassi - (ATHLETE-3176) or the Apollo 13 mission
(SPACEFLIGHT-142)
26A Sample Screen of the Ontology/Lexicon/Fact
Repository Browsing and Editing Environment
27(No Transcript)
28(No Transcript)
29(diagnosis (diagnosis-n1 (cat n) (anno
(def "") (ex "The diagnosis (of cancer) (by
the specialist) was made quickly")
(comments "")) (syn-struc ((root var0)
(cat n) diagnosis (pp-adjunct
((root of) (root var1) (cat prep) (opt )
of (obj ((root var2) (cat n)))))
disease (pp-adjunct ((root by) (root
var3) (cat prep) (opt ) by (obj ((root
var4) (cat n))))))) someone
(sem-struc (DIAGNOSE
the ontological mapping (agent (value
var4)) the case roles (theme (value
var2))) (var1 (null-sem )) blocks
compositional analysis of preps (var3
(null-sem )))) )
30(cancer (cancer-n1 (cat n) (anno (def "a
disease") (ex "") (comments "") )
(syn-struc ((n ((root var1) (cat n) (opt )))
animal part as modifier (root var0)
(cat n) cancer ))
(sem-struc (CANCER (location (value
var1) (sem animal-part))) ) )
31(cancer-n2 (cat n) (anno (def "a sign of
the zodiac") (ex "") (comments "") )
(syn-struc ((root var0) (cat n) ))
(sem-struc (CANCER-ZODIAC) ) ) )
32- Currently Available Static Knowledge Sources for
English - Ontology of about 6,500 concepts (about
95,000 property-value pairs) - English lexicon of about 40,000 entries
- Fact repository of about 20,000 facts (outside
medical domain) - English Onomasticon of about 350,000 entries
- Tokenization knowledge, morphological and
syntactic grammars - for a number of languages
33The analyzers conceptual architecture
(in reality, not strictly pipelined)
TMR
SyntacticAnalyzer
SemanticAnalyzer
Preprocessor
Processing Modules
Grammar Ecology MorphologySyntax
Lexicon and Onomasticon
Ontology and Fact Repository
Static Knowledge Resources
34- The basic (who did what to whom) semantic
dependency is derived, in the general case, on
the basis of - lexical-semantic expectations (selectional
restrictions) recorded in the ontology and the
lexicon and - syntactic dependency derived from the results of
syntactic analysis.
35The beginnings of system evaluation
Run I raw Run II preprocessor output
correct Run III preprocessor and syntactic
analysis output correct
36In addition to the basic semantic dependency,
TMRs also include parameterized information
provided by the microtheoriesof aspect, modality
(including speaker attitudes), time, style and
others. Most of these microtheories have been
implemented. All would benefit from further
work. We are also actively looking
into possibilities of borrowing some
microtheories -- either in toto or partially.
37(No Transcript)
38(No Transcript)
39(No Transcript)
40(No Transcript)
41FrameNet Another example of semantic
representation
- Frame Semantics (Fillmore 1976, 1977, ..)
- Frame a conceptual structure or prototypical
situation - Frame elements (roles)
- Identify participants of the situation
- Are local to their frame
- Frame evoking elements (verbs, nouns, adjectives)
introduce frames - E.g. VERDICT
- The juryJudge convicted himDefentant on the
counts of theftCharges. - On Thursday a juryJudge found the
youthDefendant guilty of wounding Mr Lay
Finding - Berkeley FrameNet Project
- Database of frames for core lexicon of English
- Current release 610 frames, about 9000 lexical
units
42Types of Relations
- FrameNet Relations
- Frame hierarchy inherits
- Subframes
- Contextual Relations between instantiated frames
and roles - Syntactic and/or semantic embedding
- Discourse relations
- Anaphoric relations
- Inferences
- On the basis of both
43A Case Study
- In the first trial in the world in connection
with the terrorist attacks of 11 September 2001,
the Higher Regional Court of Hamburg has passed
down the maximum sentence. Mounir al Motassadeq
will spend 15 years in prison. The 28-year-old
Moroccan was found guilty as an accessory to
murder in more than 3000 cases.
44FrameNet as a Net Frame-to-Frame Relations
- Subframe relation
- Super frame represents complex event
- Subframes represent sub-events
- Subframes usually inherit some roles of the super
frame
Criminal process
Arraignment
Arrest
Sentencing
Trial
45Local Roles
- In the first trial in the world in connection
with the terroristAssailant attacks of 11
September 2001TimeCase, the Higher Regional
Court of HamburgCourt has passed down the
maximumType sentence.
46Local Roles
- Mounir al MotassadeqInmates will spend 15
yearsDuration in prison.
47Local Roles
- The 28-year-old MoroccanDefendant was found
guiltyFinding as an accessory to
murderFocalEntity in more than 3000
casesVictim Charge.
48Unfilled Roles
- Target Frame Frame roles Filler (given vs.
Induced) - trial TRIAL CASE terrorist attacks (1)
- CHARGE accessory to murder (2)
- COURT Higher Regional Court (3)
- DEFENDANT ... 28-year-old Moroccan (4)
- attacks ATTACK ASSAILANT terrorist (5)
- VICTIM ... (6) TIME (exth.) 11
September 2001 (7) - sentence SENTENCING CONVICT Mounir al
Motassadeq (8) COURT Higher Regional
Court (9) TYPE ... maximum sentence (10) - prison PRISON INMATES ... Mounir al
Motassadeq (11) DURATION (exth.) 15
years (12) - found VERDICT CASE terrorist attacks (13)
- CHARGE accessory to murder (14) DEFENDAN
T 28-year-old Moroccan (15) FINDING
... guilty (16) - accessory ASSISTANCE CO-AGENT (17)
- FOCAL_ENTITY murder (18)
- HELPER ... 28-year-old Moroccan (19)
49- Target Frame Frame roles Filler (given vs.
Induced) - trial TRIAL CASE terrorist attacks
(1) - CHARGE accessory to murder (2)
- COURT Higher Regional Court (3)
- DEFENDANT ... 28-year-old Moroccan (4)
- attacks ATTACK ASSAILANT terrorist
(5) - VICTIM ... (6)
- TIME (exth.) 11 September 2001 (7)
- sentence SENTENCING CONVICT Mounir al
Motassadeq (8) - COURT Higher Regional Court (9)
- TYPE ... maximum sentence (10)
- prison PRISON INMATES ... Mounir al
Motassadeq (11) DURATION (exth.) 15
years (12) - Found VERDICT CASE terrorist
attacks (13) - CHARGE accessory to murder (14)
50- Target Frame Frame roles Filler (given vs.
Induced) - trial TRIAL CASE terrorist attacks (1)
- CHARGE accessory to murder (2)
- COURT Higher Regional Court (3)
- DEFENDANT ... 28-year-old Moroccan (4)
- attacks ATTACK ASSAILANT terrorist (5)
- VICTIM ... (6)
- TIME (exth.) 11 September 2001 (7)
- sentence SENTENCING CONVICT Mounir al
Motassadeq (8) - COURT Higher Regional Court (9)
- TYPE ... maximum sentence (10)
- prison PRISON INMATES ... Mounir al
Motassadeq (11) - DURATION (exth.) 15 years (12)
- found VERDICT CASE terrorist
attacks (13)
51Linking Frames and Roles in Context
- At the instance level
- given frame instances f1F1 and f2F2, where
- f1 and f2 stand in a contextual relation (syn,
sem, discourse) - frame types F1 and F2 stand in some frame
relation - gt identify role instances (referents) of f1 and
f2 (r1 ( r0) r2)
inferred relation
frame relation
context-related instances
52Linking Frames and Roles in Context
- In the first trial in the world in connection
with the terrorist attacks of 11 September 2001,
the Higher Regional Court of Hamburg has passed
down the maximum sentence.
Criminal Process
Sentencing
Court
Trial
frame relation
53Linking Frames and Roles in Context
- In the first trial (f1) in the world in
connection with the terrorist attacks of 11
September 2001, the Higher Regional Court of
Hamburg (r2) has passed down the maximum
sentence (f2).
Criminal Process
Sentencing
Court
Functional Embedding
Trial
The Higher Regional Court of Hamburg
frame relation
context-related instances
54Linking Frames and Roles in Context
In the first trial (f1) in the world in
connection with the terrorist attacks of 11
September 2001, the Higher Regional Court of
Hamburg (r2r0 r1) has passed down the maximum
sentence (f2).
Criminal Process
Sentencing
Court
Functional Embedding
Trial
The Higher Regional Court of Hamburg
frame relation
context-related instances
inferred relation
55Linking Frames and Roles in Context
- At the type level (more involved)
- If instances of frame roles f1F1 and f2F2 are
often found co-referent within particular
contextual relations - gt Hypothesize a frame relation between F1 and F2
inferred relation
(no) frame relation
context-related instances
56Linking Frames and Roles in Context
the Higher Regional Court of Hamburg has
passed down the Maximum sentence. Mounir al
Motassadeq will spend 15 years in prison.
Prison
- New Frame Relation
- (Role Binding ConvictInmates)
Sentencing
Inmates
Discourse Relation
Convict
(Co-reference)
inferred relations
(no) frame relation
context-related instances
57Frame, Contextual, and Inferred Relations
CRIMINAL PROCESS
SENTENCING (1)
TRIAL (1)
PRISON (2)
Defendant
Inmates
Duration
Convict
Type
Case
Charge
Court
Court
VERDICT (3)
Defendant
Case
Charge
Finding
ASSISTANCE (3)
KILLING (3)
(1)
sentence number
Subframe/FE
Contextual Relation
Killer
Helper
Co_agent
Focal_entity
Victim
Inferred Relation
58CRIMINAL PROCESS
SENTENCING
TRIAL
PRISON
Defendant
Inmates (Motus.)
Duration (15Y)
Convict
Duration (maximum)
Court (Hmbg.)
Case (9/11)
Charge
VERDICT
Defendant (the Moroccan)
Case
Charge (accessory)
ASSISTANCE
Hierarchy/Subframe/FE
KILLING
Contextual Relations
Helper
Co_agent
Goal (murder)
Inference
Killer
Victim (3000)
In the first trial .. the higher Regional Court
.. has passed down the maximum sentence. Mounir
al Motussadeq will spend 15 years in prison. The
28-year-old Moroccan was found guilty as an
accessory to murder in .. 3000 cases.
59Statistical Semantic Role Labeling
60References
- Jurafsky, D. J. H. Martin, Speech and Language
Processing, Prentice-Hall, 2000. (Chapters 9 and
10) - Helmreich, S., From Syntax to Semantics,
Presentation in the 74.419 Course, November 2003. - Nirenburg, S. V. Raskin, Ontological Semantics,
MIT Press, 2004. - Wordnet, http//wordnet.princeton.edu/
- Suggested Upper Merged Ontology (SUMO),
http//www.ontologyportal.org/