Statistical Machine Translation Models for Personalized Search - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Statistical Machine Translation Models for Personalized Search

Description:

From Queries and corresponding snippets from clicked documents. Training a Translation Model ... Training and Snippet Testing. III - Snippet Training and ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 27
Provided by: researc88
Category:

less

Transcript and Presenter's Notes

Title: Statistical Machine Translation Models for Personalized Search


1
Statistical Machine Translation Models for
Personalized Search
  • Rohini U
  • AOL India RD, Bangalore India
  • Rohini.uppuluri_at_corp.aol.com
  • Vamshi Ambati
  • Language Technologies Institute
  • Carnegie Mellon University Pittsburgh, USA
  • vamshi_at_cs.cmu.edu
  • Vasudeva Varma,
  • SIEL, LTRC, IIIT Hyderabad, India
  • vv_at_iiit.ac.in

2
Agenda
  • Introduction
  • Related Work
  • Background
  • User Profile as Translation Model
  • Personalized Search
  • Learning User Profile
  • Re-ranking
  • Experiments
  • Conclusions and Future Work

3
Introduction
  • Current Web Search engines
  • Provide users with documents relevant to their
    information need
  • Issues
  • Information overload
  • To cater Hundreds of millions of users
  • Terabytes of data
  • Poor description of Information need
  • Short queries - Difficult to understand
  • Word ambiguities
  • Users only see top few results
  • Relevance
  • subjective depends on the user
  • One size Fits all ???

4
Continued..
  • Search is not a solved problem!
  • Poorly described information need
  • Java (Java island / Java programming language
    )
  • Jaguar (cat /car)
  • Lemur (animal / lemur tool kit)
  • SBH (State bank of Hyderabad/Syracuse
    Behavioral Health care)
  • Given prior information
  • I am into biology best guess for Jaguar?
  • past queries - information retrieval, language
    modeling best guess for lemur?

5
Review of Personalized Search
  • Personalized Search
  • Query logs Machine learning
    Language modeling Community based
    Others

6
Statistical Language Modeling based Approaches
Introduction
  • Statistical language modeling task of
    estimating probability distribution that captures
    statistical regularities of natural language
  • Applied to a number of problems Speech, Machine
    Translation, IR, Summarization

7
Statistical Language Modeling based Approaches
Background
Lemur
Query Formulation Model
Query
Given a query, which is most likely to be the
Ideal Document?
P(Q/D) P(q1.qn/D) ? P(qi/D)
User Information need Ideal Document
In spite of the progress, not much work to
capture, model and integrate user context !
8
Noisy Channel based approach Motivation

Query Generation Process (Noisy Channel)
Ideal Document
Retrieval
Query Generation Process (Noisy Channel)
9
Similar to Statistical Machine Translation
  • Given an english sentence translate into french
  • Given a query, retrieve documents closer to ideal
    document

Noisy channel 1
English Sentence
French Sentence
P(e/f)
Noisy Channel 2
Ideal Document
Query
P(q/w)
10
Learning user profile
  • User profile Translation Model
  • Triples (qw,dw,p(qw/dw))
  • Use Statistical Machine Translation methods
  • Learning user profile training a translation
    model
  • In SMT Training a translation model
  • From Parallel texts
  • Using EM algorithm

11
Learning User profile
  • Extracting Parallel Texts
  • From Queries and corresponding snippets from
    clicked documents
  • Training a Translation Model
  • GIZA - an open source tool kit widely used for
    training translation models in Statistical
    Machine Translation research.

12
Sample user profile
13
Reranking
  • Recall, in general LM for IR
  • Noisy Channel based approach
  • P(Q/D) ? P(qi/D)

lemur
P(lemur/retrieval)
Lemur encyclopedia brief
Lemur toolkit information retireval
Lemur - Encyclopedia gives a brief description of
the physical traits of this animal.
The Lemur toolkit for language modeling and
information retrieval is documented and made
available for download.
D1
D4
14
Experiments
  • Performed evaluation on explicit feedback data
    collected from 7 users
  • Experiments
  • Comparison with Contextless Ranking
  • Comparison between different training models and
    contexts

15

Data and Set up
  • Data
  • Explicit Feedback data collected from 7 users
  • For each query, each user examined top 10
    documents and identified top 10 documents
  • Collected the top 10 results for all queries.
    Total documents 3469 documents
  • Set up
  • 3469 documents - created lucene index.
  • For reranking, first retrieve the results using
    lucene and then rerank them using the noisy
    channel approach.
  • We perform 10 fold cross validation

16
Data
17
Metrics
  • Precision_at_n
  • Number of documents relevant / n

18
Set up
User Profile Learner
Train Data
User Profiles
Data
Test Data
Reranker
Reranked Results
19
(No Transcript)
20
Results
21
Results
I - Document Training and Document Testing II
- Document Training and Snippet Testing III -
Snippet Training and Document Testing IV -
Snippet Training and Snippet Testing
22
Conclusions and Future Work
  • Proposed a stat MT based approach for modeling
    user model
  • Captures Richer context, relations between q and
    w.
  • In future,
  • N-gram based method trigrams etc
  • Noisy Channel based method bigram

23
  • Questions?

24
  • Thank you

25
References
  • Adam Berger and John D. Lafferty. 1999.
    Information retrieval as statistical translation.
    In Research and Development in Information
    Retrieval, pages 222229.
  • Peter F. Brown, Vincent J. Della Pietra, Stephen
    A. Della Pietra, and Robert L. Mercer. 1993. The
    mathematics of statistical machine translation
    parameter estimation. Comput. Linguist.,
    19(2)263311.
  • W. Bruce Croft, Stephen Cronen-Townsend, and
    Victor Larvrenko. 2001. Relevance feedback and
    personalization
  • A language modeling perspective. In DELOS
    Workshop Personalisation and Recommender Systems
    in Digital Libraries.
  • Jamie Allan et. al. 2003. Challenges in
    information retrieval language modeling. In SIGIR
    Forum, volume 37 Number 1.
  • K. Sugiyama K. Hatano and M. Yoshikawa. 2004.
    Adaptive web search based on user profile
    constructed without any effort from users. In
    Proceedings of WWW 2004, page 675 684.
  • Victor Lavrenko and W. Bruce Croft. 2001.
    Relevance-based language models. In Research and
    Development in Information Retrieval, pages
    120127.
  • F. Liu, C. Yu, and W. Meng. 2002. Personalized
    web search by mapping user queries to categories.
    In Proceedings of the eleventh international
    conference on Information and knowledge
    management, ACM Press, pages 558565.
  • Tom Mitchell. 1997. Machine Learning. McGrawHill.

26
  • Franz Josef Och and Hermann Ney. 2003. A
    systematic comparison of various statistical
    alignment models. Computational Linguistics,
    29(1)1951.
  • Jay M. Ponte and W. Bruce Croft. 1998. A language
    modeling approach to information retrieval. In
    Research and Development in Information
    Retrieval, pages 275281.
  • A. Pretschner and S. Gauch. 1999. Ontology based
    personalized search. In ICTAI., pages 391398.
  • J. J. Rocchio. 1971. Relevance feedback in
    information retrieval, the smart retrieval
    system. Experiments in Automatic Document
    Processing, pages 313323.
  • G. Salton and C. Buckley. 1990. Improving
    retrieval performance by relevance feedback.
    Journal of the American Society of Information
    Science, 41288297.
  • Xuehua Shen, Bin Tan, and Chengxiang Zhai. 2005.
    Implicit user modeling for personalized search.
    In Proceedings of CIKM 2005.
  • F. Song and W. B. Croft. 1999. A general language
    model for information retrieval. In Proceedings
    on the 22nd annual international ACM SIGIR
    conference, page 279280.
  • Micro Speretta and Susan Gauch. 2004.
    Personalizing search based on user search
    histories. In Thirteenth International Conference
    on Information and Knowledge Management (CIKM
    2004).
  • Chengxiang Zhai and John Lafferty. 2001. A study
    of smoothing methods for language models applied
    to ad hoc information retrieval. In Proceedings
    of ACM SIGIR01, pages 334342.
Write a Comment
User Comments (0)
About PowerShow.com