Simultaneous Multilingual Search for Translingual Information Retrieval - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Simultaneous Multilingual Search for Translingual Information Retrieval

Description:

WHERE HAS [UN Secretary General Kofi Annan] BEEN AND WHEN? ... nana maria annan. kofi annan. ?????? ?????????. ?????? ????????? (49 variants) ahnuld ... – PowerPoint PPT presentation

Number of Views:98
Avg rating:3.0/5.0
Slides: 28
Provided by: Kris341
Category:

less

Transcript and Presenter's Notes

Title: Simultaneous Multilingual Search for Translingual Information Retrieval


1
Simultaneous Multilingual Search for Translingual
Information Retrieval
  • Kristen Parton1
  • Kathleen McKeown1
  • James Allan2
  • Enrique Henestroza1

1
2
2
Motivation Cross-Lingual IR
  • User needs to search documents in other languages

Documents
Search Results in Document Language(s)
Query in User Language
?????? ????? ????? ???? ????? ?????? ??????? ??
?????
stereotypes of Arabs
3
Task Redefinition Translingual IR
  • User needs to search documents in other languages
    and get back translated results

Documents
Search Results in User Language
Query in User Language
Queen Rania Al-Abdullah discusses stereotypes of
Arabs
stereotypes of Arabs
4
Task Redefinition Translingual IR
  • User needs to search documents in other languages
    and get back translated results
  • For translingual applications, integrating CLIR
    and result translation can improve both relevance
    and translation quality

5
Outline
  • Approaches to CLIR
  • SMLIR for Translingual IR
  • Query-Directed MT Post-Editing
  • System Evaluation
  • Conclusions and Future Work

6
Approaches to CLIR
  • Map query and/or documents to common
    representation

Schwarzenegger
??? ?? ?????????? ???? ????? ????????? ??
???????
???? ?? ???????? ?? ???? ???? ?????? ??????????
?????? ...
...??? ???? ????? ????? ????? ?????????? ??????
????????? .
Doc1
Doc2
Doc3
7
Approaches to CLIR
  • Map query and/or documents to common
    representation
  • Document translation (DT) pre-translation query
    expansion

Schwarzenegger Schwarznegger Schwartzenegger ...
The failure of all proposals made by
Schwarzenegger in a referendum
It should be mentioned that wArznjr is also a
nasseer of the Olympic Movement
besides the star and the governor of the state
of California Arnold Schwarznegger .
Doc1
Doc2
Doc3
8
Approaches to CLIR
  • Map query and/or documents to common
    representation
  • Document translation (DT) pre-translation query
    expansion
  • Query translation (QT) post-translation query
    expansion

Schwarzenegger Schwarznegger Schwartzenegger ...
?????????? ???????? ????????? ??????????
??? ?? ?????????? ???? ????? ????????? ??
???????
???? ?? ???????? ?? ???? ???? ?????? ??????????
?????? ...
...??? ???? ????? ????? ????? ?????????? ??????
????????? .
Doc1
Doc2
Doc3
9
Approaches to CLIR
  • Map query and/or documents to common
    representation
  • Document translation (DT) pre-translation query
    expansion
  • Query translation (QT) post-translation query
    expansion

Schwarzenegger Schwarznegger Schwartzenegger ...
?????????? ???????? ????????? ??????????
??? ?? ?????????? ???? ????? ????????? ??
???????
???? ?? ???????? ?? ???? ???? ?????? ??????????
?????? ...
...??? ???? ????? ????? ????? ?????????? ??????
????????? .
Doc1
Doc2
Doc3
10
Query Translation vs. Document Translation
  • Trade-offs
  • Translation resources
  • Approximate DT Oard 00, Chen 04
  • Translation quality
  • Handling synonymy
  • Hybrid methods
  • McCarley 99, Chen Gey 04 Run QT and DT
    searches, merge results and rerank
  • Wang Oard 06 Use bidirectional word
    alignments to capture information from QT and DT

11
Hybrid Merged Method
  • Merge and re-rank results of two searches
    McCarley 99
  • DT Query indexed document translations
  • QT Translated query indexed source documents
  • Problems
  • Different document lengths, query lengths
  • Raw IR scores not comparable across queries
  • Many ways to re-rank, merge searches

Merged Results
Doc2 Doc3 Doc1
12
Outline
  • Approaches to CLIR
  • SMLIR for Translingual IR
  • Query-Directed MT Post-Editing
  • System Evaluation
  • Conclusions and Future Work

13
Simultaneous Multilingual IR (SMLIR)
  • Indexed document source document translation
  • Query original query query translations
    (expansions)

Query
?????????? ???????? ????????? ??????????
Schwarzenegger Schwarznegger
It should be mentioned that wArznjr is also a
nasseer of the Olympic Movement
besides the star and the governor of the state
of California Arnold Schwarznegger .
The failure of all proposals made by
Schwarzenegger in a referendum
???? ?? ???????? ?? ???? ???? ?????? ??????????
?????? ...
...??? ???? ????? ????? ????? ?????????? ??????
????????? .
??? ?? ?????????? ???? ????? ????????? ??
???????
Doc1
Doc2
Doc3
14
Simultaneous Multilingual IR (SMLIR)
  • Multilingual (probabilistic) structured queries
  • Treat query term and its translations as synonyms
  • SMLIR Hybrid vs. Merged Hybrid
  • No need for re-ranking or raw score normalization
  • Single index, one search
  • Query time comparable to Merged in practice

15
Outline
  • Approaches to CLIR
  • SMLIR for Translingual IR
  • Query-Directed MT Post-Editing
  • System Evaluation
  • Conclusions and Future Work

16
Relevance Lost in Translation
  • Statistical MT makes mistakes
  • Bad translations of relevant documents may be
    perceived as irrelevant
  • Detection IR match in source language but not in
    document translation ? Bad translation?
  • Correction Replace bad translation with query
    term

It was the Iraqi sajidah AlryAwy had stopped
Sajida al-Rishawi
????? ????????
????? ???????? ????? ???????? ????? ...
17
Query-Directed MT Post-Editing
  • Use query translation word alignments to
    rewrite incorrect machine translation (MT)
  • Considerations errors in query translation,
    incorrect word alignments

It was the Iraqi sajidah AlryAwy had stopped
It was the Iraqi Sajida al-Rishawi had stopped
Sajida al-Rishawi
????? ????????
????? ???????? ????? ???????? ????? ...
Translated document with word alignments
Edited translation
18
Outline
  • Approaches to CLIR
  • SMLIR for Translingual IR
  • Query-Directed MT Post-Editing
  • System Evaluation
  • Conclusions and Future Work

19
Experiment Setup
  • Part of Darpa GALE question-answering task
  • WHERE HAS UN Secretary General Kofi Annan BEEN
    AND WHEN?
  • Multilingual English, Chinese, Arabic
  • Multimodal speech, text Multigenre formal,
    informal
  • Evaluation Corpus
  • 102,859 Chinese documents
  • Translated into English using RWTH statistical
    machine translation system
  • Searches run using Indri (Lemur) IR system
  • Relevance judgments
  • 145 queries, 8,785 documents judged
  • A document is Relevant or Not Relevant for a
    query
  • Judgments based on Chinese text, by Chinese
    native speakers

20
Evaluation Points
  • Query Translation Strategies
  • English query ? Chinese query
  • Run SMLIR searches, evaluate results
  • Cross-lingual IR Approaches
  • Using Chinese and/or English query, search over
    Chinese and/or translated documents
  • Machine Translation Post-Editing
  • Detect errors in result translations
  • Rewrite translations

21
Query Translation for SMLIR
  • GALE queries are name-centric
  • Statistical machine translation (SMT) failed to
    translate many names in corpus
  • Wikipedia for name translation Ferrandez et al.
    07
  • Generated by humans, edited by humans
  • Contains slang, name variations, common
    misspellings
  • Noisy, some intentional spam
  • Large variation in quantity/quality by language

22
User-Generated Synonyms and Translations
23
Query Translation Strategies for SMLIR
  • MT dictionary probabilistic translation
    dictionary derived from word alignments
  • Wikipedia for name translations not
    probabilistic
  • Combination did not help?

24
CLIR Evaluation
  • SMLIR significantly outperforms all
  • DT significantly better than QT
  • Poor performance of QT degrades Merged

25
Results Query-Directed SMT Post-Editing
  • Post-Editing
  • Detect possible incorrect name translations
  • If translated name is not a synonym of query,
    rewrite name
  • Very conservative algorithm does not handle
    deletions
  • Experiment
  • 127 queries, top 10 documents
  • 28 queries triggered post-editing
  • 15 of name matches were rewritten
  • Evaluation
  • 101 rewrites examined 93 Acceptable, 6 Not
    Acceptable

26
Conclusions
  • SMLIR Novel and effective approach for
    integrating document and query translation in
    CLIR
  • Query-directed SMT post-editing shows promise
  • More sophisticated editing possible, beyond just
    names
  • Future work evaluating whole system for
    end-to-end question answering
  • Combining CLIR and machine translation can
    improve both search relevance and translation
    accuracy

27
Thank you!
  • This work was supported in part by the Defense
    Advanced Research Projects Agency (DARPA) under
    contract number HR0011-06-C-0023, in part by an
    NSF Graduate Research Fellowship, and in part by
    the Center for Intelligent Information Retrieval
    at the University of Massachusetts.
  • Thanks very much to Bob Armstrong for making the
    annotation happen. Thanks also to Mark Smucker
    and Giridhar Kumaran for help with INDRI
    interface and corpus issues, and Ben Carterette
    for help with estimated MAP.  We would also like
    to thank the members of the NIGHTINGALE machine
    translation team for translation data, especially
    Nizar Habash and Mahmoud Ghoneim.

Questions?kristen_at_cs.columbia.edu
Write a Comment
User Comments (0)
About PowerShow.com