Title: The CMU Babylon Interlingua
1The CMU Babylon Interlingua
- Lori Levin, Alon Lavie, Donna Gates, Dorcas
Wallace, Kay Peterson, Ahmed Badran
2Outline
- Strengths of task-based interlingua
- How to evaluate an interlingua
3Example of Task Oriented Sentences
- What is your given/family name?
- What is your nationality?
- Where were you born?
- How old are you?
- Do you have any identification?
- I sprained my ankle yesterday.
- The headache started three days ago.
4Example of Descriptive Sentences
- Yesterday I slipped on some rubble on my way to
the market. - The headache started after the soldiers searched
my house and took all of our food.
5Multilingual Translation with an Interlingua
Analyzers
French
German
English
Italian
Japanese
Chinese
Interlingua
Arabic
Korean
Korean
Arabic
Japanese
Chinese
English
Italian
Generators
French
German
6Advantages of Interlingua
- Avoid the n-squared problem for all-ways
translation. - Mono-lingual grammar development teams.
- Paraphrase
- Generate a new source language sentence from the
interlingua so that the user can confirm the
meaning - Add a new language easily
- get all-ways translation to all previous
languages by adding one grammar for analysis and
one grammar for generation
7Speech ActsSpeaker intention vs literal meaning
- Can you pass the salt?
- Literal meaning The speaker asks for information
about the hearers ability. - Speaker intention The speaker requests the
hearer to perform an action.
8Domain Actions Extended, Domain-Specific Speech
Acts
- give-informationexperiencehealth-status
- It hurts.
- give-informationmed-procedurehealth-status
- I will examine the rash
- request-informationpersonal-data
- What is your name?
9Components of the Interchange Format
- Instructions
- Delete sample document icon and replace with
working document icons as follows - Create document in Word.
- Return to PowerPoint.
- From Insert Menu, select Object
- Click Create from File
- Locate File name in File box
- Make sure Display as Icon is checked.
- Click OK
- Select icon
- From Slide Show Menu, Select Action Settings.
- Click Object Action and select Edit
- Click OK
- speaker c (client)
- speech act give-information
- concept experiencehealth-status
- argument (experienceri, health-status(pain
, severitysevere, identifiabilityno),
body-locationleg) - I have a severe pain in my leg.
10Domain Actions Interlingua and Lexical Semantic
Interlingua
- and how will you be paying for this
- Domain Action representation
- arequest-informationpayment (methodquestion)
- Lexical Semantic representation
- predicate pay
- time future
- agent hearer
- product distance proximate, type
demonstrative - manner question
- Lexical semantic representation wont work for
- What method of payment will you use.
-
11Formulaic Utterances
- Good night.
- tisbaH cala xEr
- waking up on good
- Romanization of Arabic from CallHome Egypt
12Same intention, different syntax
- rigly bitiwgacny
- my leg hurts
- candy wagac fE rigly
- I have pain in my leg
- rigly bitiClimny
- my leg hurts
- fE wagac fE rigly
- there is pain in my leg
- rigly bitinqaH calya
- my leg bothers on me
- Romanization of Arabic from CallHome Egypt.
13Language Independence
- Comes from representing speaker intention rather
than literal meaning for formulaic and
task-oriented sentences.
14IF Tagged Data
- Babylon Medical 327 patient sentences tagged
- Other Medical Data 580 doctor patient
sentences tagged
15Current Babylon IF Specification
- Speech-acts 75
- DA Concepts 132
- Arguments 359 (155 embedded)
- Values 8548 (276 classes)
16Complementary Approaches
- Domain actions limited to task oriented
sentences - Lexical Semantics less appropriate for formulaic
speech acts that should not be translated
literally
17Disadvantages of Interlingua
- Meaning is arbitrarily deep.
- What level of detail do you stop at?
- If it is too simple, meaning will be lost in
translation. - If it is too complex, analysis and generation
will be too difficult. - If it is too complex, it cannot be used reliably
at different research sites. - Has to be applicable to all languages
- The L word linguistics.
- Human development time
18Evaluation of Interlinguas
- Reliability
- Analysis and generation grammars can be written
by different people and at different sites - Measured by intercoder agreement and cross-site
evaluation - Corollary keep it simple
- Expressivity
- Must be detailed enough to represent the
different meanings in the domain. - Measured by no-tag rate and end-to-end
performance - Scalability
- Must scale up without a loss of reliability.
- Measured by coverage rate
19Comparison of two interlinguas
- I would like to make a reservation for the fourth
through the seventh of July. - C-STAR II, 1997-1999.
- crequest-actionreservationtemporalhotel
- (time(start-timemd4, end-time(md7,july)))
- NESPOLE, 2000-2002.
- cgive-informationdispositionreservation
- accommodation
- (disposition(whoI, desire),
- reservation-spec(reservation,
- identifiabilityno),
- accommodation-spechotel,
- object-time(start-time(md4),
- end-time(md7, month7,
- incl-exclinclusive)))
20Comparison of four databases(travel domain, role
playing, spontaneous speech)
Same data, different interlingua
- C-STAR II English database tagged with C-STAR II
interlingua 2278 sentences - C-STAR II English database tagged with NESPOLE
interlingua 2564 sentences - NESPOLE English database tagged with NESPOLE
interlingua 1446 sentences - Only about 50 of the vocabulary overlaps with
the C-STAR database. - Combined database tagged with NESPOLE
interlingua 4010 sentences
Significantly larger domain
21Example of failure of reliability
- Input 300, right?
- Interlingua verify (time300)
- Poor choice of speech act name does it mean
that the speaker is confirming the time or
requestingh verification from the user? - Output 300 is right.
22Measuring reliability
- Intercoder agreement How often do human experts
assign the same interlingua representation?
23Measuring Reliability Cross-site evaluations
- Compare performance of
- Analyzer ? interlingua ? generator
- Where the analyzer and generator are built at the
same site (or by the same person) - Where the analyzer and generator are built at
different sites (or by different people who may
not know each other) - C-STAR II interlingua comparable end-to-end
performance within sites and across sites. - around 60 acceptable translations from speech
recognizer output. - NESPOLE interlingua cross-site end-to-end
performance is lower.
24Measuring Expressivity
- No-tag rate
- Can a human expert assign an interlingua
representation to each sentence? - C-STAR II no-tag rate 7.3
- NESPOLE no-tag rate 2.4
- 300 more sentences were covered in the C-STAR
English database
25Measuring Scalability Coverage Rate
- What percent of the database is covered by the
top n most frequent domain actions?
26Measuring Scalability Number of domain actions
as a function of database size
- Sample size from 100 to 3000 sentences in
increments of 25. - Average number of unique domain actions over ten
random samples for each sample size. - Each sample includes a random selection of
frequent and infrequent domain actions.
27(No Transcript)
28Conclusions
- An interlingua based on domain actions is
suitable for task-oriented dialogue - Reliable
- Good coverage
- It is possible to evaluate an interlingua for
- Realiability
- Expressivity
- Scalability