Research in IR at MS

About This Presentation
Title:

Research in IR at MS

Description:

Domain/Obj Modeling. Not all objects are equal ... potentially big win. A priori importance ... Beyond batch IR model ('query- results') Consider larger task ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 53
Provided by: sdum

less

Transcript and Presenter's Notes

Title: Research in IR at MS


1
Research in IR at MS
  • Microsoft Research (http//research.microsoft.com)
  • Adaptive Systems and Interaction - IR/UI
  • Machine Learning and Applied Statistics
  • Data Mining
  • Natural Language Processing
  • Collaboration and Education
  • MSR Cambridge
  • MSR Beijing
  • Microsoft Product Groups many IR-related

2
IR Research at MSR
  • Improvements to representations and matching
    algorithms
  • New directions
  • User modeling
  • Domain modeling
  • Interactive interfaces
  • An example Text categorization

3
Traditional View of IR

4
Whats Missing?
5
Domain/Obj Modeling
  • Not all objects are equal potentially big win
  • A priori importance
  • Information use (readware collab filtering)
  • Inter-object relationships
  • Link structure / hypertext
  • Subject categories - e.g., text categorization.
    text clustering
  • Metadata
  • E.g., reliability, recency, cost -gt combining

6
User/Task Modeling
  • Demographics
  • Task -- Whats the users goal?
  • e.g., Lumiere
  • Short and long-term content interests
  • e.g., Implicit queries
  • Interest model f(content_similarity, time,
    interest)
  • e.g., Letiza, WebWatcher, Fab

7
Information Use
  • Beyond batch IR model (query-gtresults)
  • Consider larger task context
  • Knowledge management
  • Human attention is critical resource
  • Techniques for automatic information
    summarization, organization, discover, filtering,
    mining, etc.
  • Advanced UIs and interaction techniques
  • E.g, tight coupling of search, browsing to
    support information management

8
The Broader View of IR
User Modeling
Domain Modeling
Information Use
9
Text Categorization Road Map
  • Text Categorization Basics
  • Inductive Learning Methods
  • Reuters Results
  • Future Plans

10
Text Categorization
  • Text Categorization assign objects to one or
    more of a predefined set of categories using text
    features
  • Example Applications
  • Sorting new items into existing structures (e.g.,
    general ontologies, file folders, spam vs.not)
  • Information routing/filtering/push
  • Topic specific processing
  • Structured browsing search

11
Text Categorization - Methods
  • Human classifiers (e.g., Dewey, LCSH, MeSH,
    Yahoo!, CyberPatrol)
  • Hand-crafted knowledge engineered systems (e.g.,
    CONSTRUE)
  • Inductive learning of classifiers
  • (Semi-) automatic classification

12
Classifiers
  • A classifier is a function f(x) conf(class)
  • from attribute vectors, x(x1,x2, xd)
  • to target values, confidence(class)
  • Example classifiers
  • if (interest AND rate) OR (quarterly),
    then confidence(interest) 0.9
  • confidence(interest) 0.3interest 0.4rate
    0.1quarterly

13
Inductive Learning Methods
  • Supervised learning to build classifiers
  • Labeled training data (i.e., examples of each
    category)
  • Learn classifier
  • Test effectiveness on new instances
  • Statistical guarantees of effectiveness
  • Classifiers easy to construct and update
    requires only subject knowledge
  • Customizable for individuals categories and
    tasks

14
Text Representation
  • Vector space representation of documents
  • word1 word2 word3 word4 ...
  • Doc 1 lt1, 0, 3, 0, gt
  • Doc 2 lt0, 1, 0, 0, gt
  • Doc 3 lt0, 0, 0, 5, gt
  • Text can have 107 or more dimensions
  • e.g., 100k web pages had 2.5 million distinct
    words
  • Mostly use simple words, binary weights, subset
    of features

15
Feature Selection
  • Word distribution - remove frequent and
    infrequent words based on Zipfs law
    log(frequency) log(rank) constant

16
Feature Selection (contd)
  • Fit to categories - use mutual information to
    select features which best discriminate category
    vs. not
  • Designer features - domain specific, including
    non-text features
  • Use 100-500 best features from this process as
    input to learning methods

17
Inductive Learning Methods
  • Find Similar
  • Decision Trees
  • Naïve Bayes
  • Bayes Nets
  • Support Vector Machines (SVMs)
  • All support
  • Probabilities - graded membership
    comparability across categories
  • Adaptive - over time across individuals

18
Find Similar
  • Aka, relevance feedback
  • Rocchio
  • Classifier parameters are a weighted combination
    of weights in positive and negative examples --
    centroid
  • New items classified using
  • Use all features, idf weights,

19
Decision Trees
  • Learn a sequence of tests on features, typically
    using top-down, greedy search
  • Binary (yes/no) or continuous decisions

f1
!f1
f7
!f7
20
Naïve Bayes
  • Aka, binary independence model
  • Maximize Pr (Class Features)
  • Assume features are conditionally independent -
    math easy surprisingly effective

21
Bayes Nets
  • Maximize Pr (Class Features)
  • Does not assume independence of features -
    dependency modeling

22
Support Vector Machines
  • Vapnik (1979)
  • Binary classifiers that maximize margin
  • Find hyperplane separating positive and negative
    examples
  • Optimization for maximum margin
  • Classify new items using

23
Support Vector Machines
  • Extendable to
  • Non-separable problems (Cortes Vapnik, 1995)
  • Non-linear classifiers (Boser et al., 1992)
  • Good generalization performance
  • Handwriting recognition (LeCun et al.)
  • Face detection (Osuna et al.)
  • Text classification (Joachims)
  • Platts Sequential Minimal Optimization (SMO)
    algorithm very efficient

24
SVMs Platts SMO Algorithm
  • SMO is a very fast way to train SVMs
  • SMO works by analytically solving the smallest
    possible QP sub-problem
  • Substantially faster than chunking
  • Scales somewhere between

25
Text Classification Process
text files
Index Server
word counts per file
Find similar
Feature selection
data set
Learning Methods
Support vector machine
Decision tree
Naïve Bayes
Bayes nets
test classifier
26
Reuters Data Set (21578 - ModApte split)
  • 9603 training articles 3299 test articles
  • Example interest article
  • 2-APR-1987 063519.50
  • west-germany
  • b f BC-BUNDESBANK-LEAVES-CRE 04-02 0052
  • FRANKFURT, March 2
  • The Bundesbank left credit policies unchanged
    after today's regular meeting of its council, a
    spokesman said in answer to enquiries. The West
    German discount rate remains at 3.0 pct, and the
    Lombard emergency financing rate at 5.0 pct.
  • REUTER
  • Average article 200 words long

27
Reuters Data Set (21578 - ModApte split)
  • 118 categories
  • An article can be in more than one category
  • Learn 118 binary category distinctions
  • Most common categories (train, test)
  • Trade (369,119)
  • Interest (347, 131)
  • Ship (197, 89)
  • Wheat (212, 71)
  • Corn (182, 56)
  • Earn (2877, 1087)
  • Acquisitions (1650, 179)
  • Money-fx (538, 179)
  • Grain (433, 149)
  • Crude (389, 189)

28
Category interest
rate1
rate.t1
lending0
prime0
discount0
pct1
year1
year0
29
Category Interest
  • Example SVM features -
  • -0.71 dlrs
  • -0.35 world
  • -0.33 sees
  • -0.25 year
  • -0.24 group
  • -0.24 dlr
  • -0.24 january
  • 0.70 prime
  • 0.67 rate
  • 0.63 interest
  • 0.60 rates
  • 0.46 discount
  • 0.43 bundesbank
  • 0.43 baker

30
Accuracy Scores
  • Based on contingency table
  • Effectiveness measure for binary classification
  • error rate (bc)/n
  • accuracy 1 - error rate
  • precision (P) a/(ab)
  • recall (R) a/(ac)
  • break-even (PR)/2
  • F measure 2PR/(PR)

31
Reuters - Accuracy ((RP)/2)
Recall labeled in category among those stories
that are really in category
Precision really in category among those
stories labeled in category
Break Even (Recall Precision) / 2
32
ROC for Category - Grain
Recall labeled in category among those stories
that are really in category
Precision really in category among those
stories labeled in category
33
ROC for Category - Earn
34
ROC for Category - Acq
35
ROC for Category - Money-Fx
36
ROC for Category - Grain
37
ROC for Category - Crude
38
ROC for Category - Trade
39
ROC for Category - Interest
40
ROC for Category - Ship
41
ROC for Category - Wheat
42
ROC for Category - Corn
43
SVM Dumais et al. vs. Joachims
  • Top 10 Reuters categories, microavg BE

44
Reuters - Sample Size (SVM)
sample set 1
sample set 2
45
Reuters - Other Experiments
  • Simple words vs. NLP-derived phrases
  • NLP-derived phrases
  • factoids (April_8, Salomon_Brothers_International)
  • mulit-word dictionary entries (New_York,
    interest_rate)
  • noun phrases (first_quarter, modest_growth)
  • No advantage for Find Similar, Naïve Bayes, SVM
  • Vary number of features
  • Binary vs. 0/1/2 features
  • No advantage of 0/1/2 for Decision Trees, SVM

46
Number of Features - NLP
47
Reuters Summary
  • Accurate classifiers can be learned automatically
    from training examples
  • Linear SVMs are efficient and provide very good
    classification accuracy
  • Best results for this test collection
  • Widely applicable, flexible, and adaptable
    representations

48
Text Classification Horizon
  • Other applications
  • OHSUMED, TREC, spam vs. not-spam, Web
  • Text representation enhancements
  • Use of hierarchical category structure
  • UI for semi-automatic classification
  • Dynamic interests

49
More Information
  • General stuff -http//research.microsoft.com/sdum
    ais
  • SMO -http//research.microsoft.com/jplatt

50
(No Transcript)
51
Optimization Problem to Train SVMs
  • Let desired output of ith example (1/-1)
  • Let actual output
  • Margin

52
Dual Quadratic Programming Problem
  • Equivalent dual problem Lagrangian saddle point
  • One-to-one relationship between Lagrange
    multipliers a and examples
Write a Comment
User Comments (0)