Cross-Language Information Retrieval - PowerPoint PPT Presentation

About This Presentation

Title:

Cross-Language Information Retrieval

Description:

... se//0.31 demande//0.24 demander//0.08 peut//0.07 merveilles//0.04 question//0.02 savoir//0.02 on//0.02 bien//0.01 merveille//0.01 pourrait//0.01 Unidirectional: ... – PowerPoint PPT presentation

Number of Views:199

Avg rating:3.0/5.0

Slides: 26

Provided by: Prefer1021

Learn more at: https://courses.ischool.berkeley.edu

Category:

more less

Transcript and Presenter's Notes

Title: Cross-Language Information Retrieval

1
Cross-Language Information Retrieval

Applied Natural Language Processing
October 29, 2009
Douglas W. Oard

2
What Do People Search For?

Searchers often dont clearly understand
The problem they are trying to solve
What information is needed to solve the problem
How to ask for that information
The query results from a clarification process
Dervins sense making

Need
Gap
Bridge
3
Taylors Model of Question Formation
Q1 Visceral Need
Q2 Conscious Need
Intermediated Search
Q3 Formalized Need
Q4 Compromised Need (Query)
4
Design Strategies

Foster human-machine synergy
Exploit complementary strengths
Accommodate shared weaknesses
Divide-and-conquer
Divide task into stages with well-defined
interfaces
Continue dividing until problems are easily
solved
Co-design related components
Iterative process of joint optimization

5
Human-Machine Synergy

Machines are good at
Doing simple things accurately and quickly
Scaling to larger collections in sublinear time
People are better at
Accurately recognizing what they are looking for
Evaluating intangibles such as quality
Both are pretty bad at
Mapping consistently between words and concepts

6
Process/System Co-Design
7
Supporting the Search Process
Source Selection
Choose
8
Supporting the Search Process
Source Selection
9
Search Component Model
Utility
Human Judgment
Information Need
Document
Query Formulation
Query
Document Processing
Query Processing
Representation Function
Representation Function
Query Representation
Document Representation
Comparison Function
Retrieval Status Value
10
Relevance

Relevance relates a topic and a document
Duplicates are equally relevant, by definition
Constant over time and across users
Pertinence relates a task and a document
Accounts for quality, complexity, language,
Utility relates a user and a document
Accounts for prior knowledge

11
Okapi Term Weights
TF component
IDF component
12
A Ranking Function Okapi BM25
13
Estimating TF and DF for Query Terms
f1 f2 f3 f4
20
50
2
5
0.4
50
30
40
200
0.3
e1
0.3
0.4
0.1
0.2
0.2
0.420 0.35 0.22 0.150 14.9
0.1
0.450 0.340 0.230 0.1200 58
14
Learning to Translate

Lexicons
Phrase books, bilingual dictionaries,
Large text collections
Translations (parallel)
Similar topics (comparable)
Similarity
Similar pronunciation, similar users
People

15
Hieroglyphic
Demotic
Greek
16
Statistical Machine Translation
Señora Presidenta , había pedido a la
administración del Parlamento que garantizase
Madam President , I had asked the administration
to ensure that
17
Bidirectional Translation
wonders of ancient world (CLEF Topic 151)
18
Experiment Setup

Test collections
Document processing
Stemming, accent-removal (CLEF French)
Word segmentation, encoding conversion (TREC
Chinese)
Stopword removal (all collections)
Training statistical translation models (GIZA)

Source CLEF01-03 TREC-5,6
Query language English English
Document language French Chinese
of topics 151 54
of documents 87,191 139,801
Avg of rel docs 23 95
FBIS et al.
Europarl
Parallel corpus
English-Chinese
English-French
Languages
1,583,807
672,247
of sentence pairs
M1(10)
M1(10), HMM(5), M4(5)
Models (iterations)
19
(No Transcript)
20
Pruning Translations
Cumulative Probability Threshold
Translations
0.0 0.1 0.2 0.3 0.4 0.5 0.6
0.7 0.8 0.9 1.0
f1 (0.32) f2 (0.21) f3 (0.11) f4 (0.09) f5
(0.08) f6 (0.05) f7 (0.04) f8 (0.03) f9
(0.03) f10 (0.02) f11 (0.01) f12 (0.01)
f1
f1 f2 f3 f4 f5
f1 f2 f3 f4
f1 f2 f3 f4 f5 f6 f7
f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12
f1
f1
f1 f2
f1 f2
f1 f2 f3
f1
21
Unidirectional without Synonyms (PSQ)
CLEF French
TREC-5,6 Chinese