Clustering - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

Clustering

Description:

Dictionary-based Methods ... Dictionary-based methods for cross- lingual information ... Querying across languages: A dictionary-based approach to multilingual ... – PowerPoint PPT presentation

Number of Views:68
Avg rating:3.0/5.0
Slides: 13
Provided by: wcr
Category:

less

Transcript and Presenter's Notes

Title: Clustering


1
CS529 Proposal Presentation Instructor Prof.
Frieder
Resolving Ambiguity Using Dictionary-based
Methods in CLIR
Presented by Dongmei
Jia
April, 2004
2
Dictionary-based Methods
  • Dictionaries used in CLIR Bilingual machine
    readable dictionaries (MRD)
  • One biggest issue Ambiguity happens in query
    translation
  • caused by addition of extraneous terms to
    query
  • failure to translate
    technical terms
  • failure to translate
    phrases or translate them poorly

3
Resolving Ambiguity
  • Query expansion via local feedback applied in
    papers Dictionary-based methods for cross-
    lingual information retrieval
  • Resolving Ambiguity for Cross-Language
    Retrieval
  • A method by which a query is modified by the
    addition of terms found in docs known to be
    relevant to the query
  • Assume top retrieved documents are relevant
  • Include Pre-translation and Post-translation
  • --- theres one problem

4
Problem 1 Statement
  • Some initial query terms may only have one
    translation while other terms may have many
    translations in the target language
  • When trying to retrieve documents by using the
    kind of translated query, we may not get the
    real relevant docs.
  • Then applying query expansion techniques, the
    effectiveness may not get improved, or even
    decreased

5
TREC Example
  • TREC topic example
  • Number 441
  • Lyme disease
  • Description
  • How do you prevent and treat Lyme disease?
  • Term Lyme may have only one translation in
    the bilingual dictionary, but terms prevent,
    treat and so on may have bunch of meanings.
  • Lyme is a more important query term --
    search key

6
Proposed Strategy 1
  • Give query terms different weights
  • Increase the weight of the search keys which
    have only few translations. Always they are the
    most important words of a query, and vice versa
  • In relevance feedback, rank the retrieved docs
    based on not only tfidf, but also the weights of
    query terms.
  • The top N terms got for query expansion
    should be reduce ambiguity more effectively

7
Problem 2 Statement
  • In the same paper, it said combining pre- and
    post-translation expansion is most effective and
    improves precision and recall
  • However, they only examined a single language
    pair English to Spanish, and relied on the
    Collinss English-Spanish electronic dictionary
  • Problem specific technical query terms rarely
    can be translated in the general dictionary. It
    will cause ambiguity.

8
Proposed Approach 2
  • Use some technical dictionaries with the general
    one
  • Cons
  • it can translate unique terms which were not
    found in general dictionaries
  • The terms of special dictionary are often
    unambiguous
  • The same TREC example term Lyme may only occur
    in the medical dictionary, but may not occur in
    the general one.

9
Further Disambiguation
  • Co-occurrence Statistic Model can be used to do
    phrase translation with phrase MRD
  • it may reduce ambiguity by inferring the
    correct translation of phrases which were not
    translated via the phrase dictionary

10
Test Environment
  • It may consist of
  • TREC Cross-language topics, documents
  • A general MRD for English-Spanish translation
  • One or more technical MRDs for English-Spanish
    translation
  • A phrase English-Spanish dictionary
  • A retrieval system, i.e. INQUERY

11
Evaluation
  • Compare avg precisions at different recall for
    WBW (baseline) / query expansion/query expansion
    with weighting strategy / monolingual (best case)
  • Compare avg precision at different recall for WBW
    using only general bilingual MRD (baseline) /
    general MRD with technical MRD / general MRD with
    technical MRD and phrase MRD / general MRD with
    technical MRD, phrase MRD and Co-occurrence /
    monolingual (best case)

12
Reference
  • Lisa Ballesteros and W. Bruce Croft.
    Dictionary-based methods for cross- lingual
    information retrieval Proceedings of the 7th
    International DEXA Conference on Database and
    Expert Systems Applications
  • David A. Hull and Gregory Grefenstette. Querying
    across languages A dictionary-based approach to
    multilingual information retrieval Proceedings
    of the 19th International Conference on Research
    and development in Information Retrieval, page
    49-57, 1996
  • L. Ballesteros, W.B. Croft, Resolving Ambiguity
    for Cross-Language Retrieval Proceedings of ACM
    SIGIR, 64-71, 1998.
  • M. Aljlayl and O. Frieder, "Effective
    Arabic-English Cross-Language Information
    Retrieval via Machine Readable Dictionaries and
    Machine Translation,"
  • M. Aljlayl, O. Frieder, and D. Grossman, On
    Bidirectional English-Arabic Search, Journal of
    the American Society of Information Science and
    Technology, 53(13), November 2002.
  • Lisa Ballesteros and W. Bruce Croft. Phrasal
    translation and query expansion techniques for
    cross-language information retrieval.
  • Leah S. Larkey and Margaret E. Connell.
    Structured Queries, Language Modeling, and
    Relevance Modeling in Cross-Language Information
    Retrieval
  • David A. Grossman and Ophir Frieder. Information
    Retrieval Algorithms and Heuristics
Write a Comment
User Comments (0)
About PowerShow.com