Title: Statistical Machine Translation Models for Personalized Search
1Statistical Machine Translation Models for
Personalized Search
- Rohini U
- AOL India RD, Bangalore India
- Rohini.uppuluri_at_corp.aol.com
- Vamshi Ambati
- Language Technologies Institute
- Carnegie Mellon University Pittsburgh, USA
- vamshi_at_cs.cmu.edu
- Vasudeva Varma,
- SIEL, LTRC, IIIT Hyderabad, India
- vv_at_iiit.ac.in
2Agenda
- Introduction
- Related Work
- Background
- User Profile as Translation Model
- Personalized Search
- Learning User Profile
- Re-ranking
- Experiments
- Conclusions and Future Work
3Introduction
- Current Web Search engines
- Provide users with documents relevant to their
information need - Issues
- Information overload
- To cater Hundreds of millions of users
- Terabytes of data
- Poor description of Information need
- Short queries - Difficult to understand
- Word ambiguities
- Users only see top few results
- Relevance
- subjective depends on the user
- One size Fits all ???
4Continued..
- Search is not a solved problem!
- Poorly described information need
- Java (Java island / Java programming language
) - Jaguar (cat /car)
- Lemur (animal / lemur tool kit)
- SBH (State bank of Hyderabad/Syracuse
Behavioral Health care) - Given prior information
- I am into biology best guess for Jaguar?
- past queries - information retrieval, language
modeling best guess for lemur?
5Review of Personalized Search
-
- Personalized Search
- Query logs Machine learning
Language modeling Community based
Others
6Statistical Language Modeling based Approaches
Introduction
- Statistical language modeling task of
estimating probability distribution that captures
statistical regularities of natural language - Applied to a number of problems Speech, Machine
Translation, IR, Summarization
7Statistical Language Modeling based Approaches
Background
Lemur
Query Formulation Model
Query
Given a query, which is most likely to be the
Ideal Document?
P(Q/D) P(q1.qn/D) ? P(qi/D)
User Information need Ideal Document
In spite of the progress, not much work to
capture, model and integrate user context !
8Noisy Channel based approach Motivation
Query Generation Process (Noisy Channel)
Ideal Document
Retrieval
Query Generation Process (Noisy Channel)
9Similar to Statistical Machine Translation
- Given an english sentence translate into french
- Given a query, retrieve documents closer to ideal
document
Noisy channel 1
English Sentence
French Sentence
P(e/f)
Noisy Channel 2
Ideal Document
Query
P(q/w)
10Learning user profile
- User profile Translation Model
- Triples (qw,dw,p(qw/dw))
- Use Statistical Machine Translation methods
- Learning user profile training a translation
model - In SMT Training a translation model
- From Parallel texts
- Using EM algorithm
11Learning User profile
- Extracting Parallel Texts
- From Queries and corresponding snippets from
clicked documents - Training a Translation Model
- GIZA - an open source tool kit widely used for
training translation models in Statistical
Machine Translation research.
12Sample user profile
13Reranking
- Recall, in general LM for IR
- Noisy Channel based approach
lemur
P(lemur/retrieval)
Lemur encyclopedia brief
Lemur toolkit information retireval
Lemur - Encyclopedia gives a brief description of
the physical traits of this animal.
The Lemur toolkit for language modeling and
information retrieval is documented and made
available for download.
D1
D4
14Experiments
- Performed evaluation on explicit feedback data
collected from 7 users - Experiments
- Comparison with Contextless Ranking
- Comparison between different training models and
contexts
15 Data and Set up
- Data
- Explicit Feedback data collected from 7 users
- For each query, each user examined top 10
documents and identified top 10 documents - Collected the top 10 results for all queries.
Total documents 3469 documents - Set up
- 3469 documents - created lucene index.
- For reranking, first retrieve the results using
lucene and then rerank them using the noisy
channel approach. - We perform 10 fold cross validation
16Data
17Metrics
- Precision_at_n
- Number of documents relevant / n
18Set up
User Profile Learner
Train Data
User Profiles
Data
Test Data
Reranker
Reranked Results
19(No Transcript)
20Results
21Results
I - Document Training and Document Testing II
- Document Training and Snippet Testing III -
Snippet Training and Document Testing IV -
Snippet Training and Snippet Testing
22Conclusions and Future Work
- Proposed a stat MT based approach for modeling
user model - Captures Richer context, relations between q and
w. - In future,
- N-gram based method trigrams etc
- Noisy Channel based method bigram
23 24 25References
- Adam Berger and John D. Lafferty. 1999.
Information retrieval as statistical translation.
In Research and Development in Information
Retrieval, pages 222229. - Peter F. Brown, Vincent J. Della Pietra, Stephen
A. Della Pietra, and Robert L. Mercer. 1993. The
mathematics of statistical machine translation
parameter estimation. Comput. Linguist.,
19(2)263311. - W. Bruce Croft, Stephen Cronen-Townsend, and
Victor Larvrenko. 2001. Relevance feedback and
personalization - A language modeling perspective. In DELOS
Workshop Personalisation and Recommender Systems
in Digital Libraries. - Jamie Allan et. al. 2003. Challenges in
information retrieval language modeling. In SIGIR
Forum, volume 37 Number 1. - K. Sugiyama K. Hatano and M. Yoshikawa. 2004.
Adaptive web search based on user profile
constructed without any effort from users. In
Proceedings of WWW 2004, page 675 684. - Victor Lavrenko and W. Bruce Croft. 2001.
Relevance-based language models. In Research and
Development in Information Retrieval, pages
120127. - F. Liu, C. Yu, and W. Meng. 2002. Personalized
web search by mapping user queries to categories.
In Proceedings of the eleventh international
conference on Information and knowledge
management, ACM Press, pages 558565. - Tom Mitchell. 1997. Machine Learning. McGrawHill.
26- Franz Josef Och and Hermann Ney. 2003. A
systematic comparison of various statistical
alignment models. Computational Linguistics,
29(1)1951. - Jay M. Ponte and W. Bruce Croft. 1998. A language
modeling approach to information retrieval. In
Research and Development in Information
Retrieval, pages 275281. - A. Pretschner and S. Gauch. 1999. Ontology based
personalized search. In ICTAI., pages 391398. - J. J. Rocchio. 1971. Relevance feedback in
information retrieval, the smart retrieval
system. Experiments in Automatic Document
Processing, pages 313323. - G. Salton and C. Buckley. 1990. Improving
retrieval performance by relevance feedback.
Journal of the American Society of Information
Science, 41288297. - Xuehua Shen, Bin Tan, and Chengxiang Zhai. 2005.
Implicit user modeling for personalized search.
In Proceedings of CIKM 2005. - F. Song and W. B. Croft. 1999. A general language
model for information retrieval. In Proceedings
on the 22nd annual international ACM SIGIR
conference, page 279280. - Micro Speretta and Susan Gauch. 2004.
Personalizing search based on user search
histories. In Thirteenth International Conference
on Information and Knowledge Management (CIKM
2004). - Chengxiang Zhai and John Lafferty. 2001. A study
of smoothing methods for language models applied
to ad hoc information retrieval. In Proceedings
of ACM SIGIR01, pages 334342.