Research in IR at MS

About This Presentation

Title:

Research in IR at MS

Description:

Domain/Obj Modeling. Not all objects are equal ... potentially big win. A priori importance ... Beyond batch IR model ('query- results') Consider larger task ... – PowerPoint PPT presentation

Number of Views:39

Avg rating:3.0/5.0

Slides: 53

Provided by: sdum

more less

Transcript and Presenter's Notes

Title: Research in IR at MS

1
Research in IR at MS

Microsoft Research (http//research.microsoft.com)
Adaptive Systems and Interaction - IR/UI
Machine Learning and Applied Statistics
Data Mining
Natural Language Processing
Collaboration and Education
MSR Cambridge
MSR Beijing
Microsoft Product Groups many IR-related

2
IR Research at MSR

Improvements to representations and matching
algorithms
New directions
User modeling
Domain modeling
Interactive interfaces
An example Text categorization

3
Traditional View of IR

4
Whats Missing?
5
Domain/Obj Modeling

Not all objects are equal potentially big win
A priori importance
Information use (readware collab filtering)
Inter-object relationships
Link structure / hypertext
Subject categories - e.g., text categorization.
text clustering
Metadata
E.g., reliability, recency, cost -gt combining

6
User/Task Modeling

Demographics
Task -- Whats the users goal?
e.g., Lumiere
Short and long-term content interests
e.g., Implicit queries
Interest model f(content_similarity, time,
interest)
e.g., Letiza, WebWatcher, Fab

7
Information Use

Beyond batch IR model (query-gtresults)
Consider larger task context
Knowledge management
Human attention is critical resource
Techniques for automatic information
summarization, organization, discover, filtering,
mining, etc.
Advanced UIs and interaction techniques
E.g, tight coupling of search, browsing to
support information management

8
The Broader View of IR
User Modeling
Domain Modeling
Information Use
9
Text Categorization Road Map

Text Categorization Basics
Inductive Learning Methods
Reuters Results
Future Plans

10
Text Categorization

Text Categorization assign objects to one or
more of a predefined set of categories using text
features
Example Applications
Sorting new items into existing structures (e.g.,
general ontologies, file folders, spam vs.not)
Information routing/filtering/push
Topic specific processing
Structured browsing search

11
Text Categorization - Methods

Human classifiers (e.g., Dewey, LCSH, MeSH,
Yahoo!, CyberPatrol)
Hand-crafted knowledge engineered systems (e.g.,
CONSTRUE)
Inductive learning of classifiers
(Semi-) automatic classification

12
Classifiers

A classifier is a function f(x) conf(class)
from attribute vectors, x(x1,x2, xd)
to target values, confidence(class)
Example classifiers
if (interest AND rate) OR (quarterly),
then confidence(interest) 0.9
confidence(interest) 0.3interest 0.4rate
0.1quarterly

13
Inductive Learning Methods

Supervised learning to build classifiers
Labeled training data (i.e., examples of each
category)
Learn classifier
Test effectiveness on new instances
Statistical guarantees of effectiveness
Classifiers easy to construct and update
requires only subject knowledge
Customizable for individuals categories and
tasks

14
Text Representation

Vector space representation of documents
word1 word2 word3 word4 ...
Doc 1 lt1, 0, 3, 0, gt
Doc 2 lt0, 1, 0, 0, gt
Doc 3 lt0, 0, 0, 5, gt
Text can have 107 or more dimensions
e.g., 100k web pages had 2.5 million distinct
words
Mostly use simple words, binary weights, subset
of features

15
Feature Selection

Word distribution - remove frequent and
infrequent words based on Zipfs law
log(frequency) log(rank) constant

16
Feature Selection (contd)

Fit to categories - use mutual information to
select features which best discriminate category
vs. not
Designer features - domain specific, including
non-text features
Use 100-500 best features from this process as
input to learning methods

17
Inductive Learning Methods

Find Similar
Decision Trees
Naïve Bayes
Bayes Nets
Support Vector Machines (SVMs)
All support
Probabilities - graded membership
comparability across categories
Adaptive - over time across individuals

18
Find Similar

Aka, relevance feedback
Rocchio
Classifier parameters are a weighted combination
of weights in positive and negative examples --
centroid
New items classified using
Use all features, idf weights,

19
Decision Trees

Learn a sequence of tests on features, typically
using top-down, greedy search
Binary (yes/no) or continuous decisions

f1
!f1
f7
!f7
20
Naïve Bayes

Aka, binary independence model
Maximize Pr (Class Features)
Assume features are conditionally independent -
math easy surprisingly effective

21
Bayes Nets

Maximize Pr (Class Features)
Does not assume independence of features -
dependency modeling

22
Support Vector Machines

Vapnik (1979)
Binary classifiers that maximize margin
Find hyperplane separating positive and negative
examples
Optimization for maximum margin
Classify new items using

23
Support Vector Machines

Extendable to
Non-separable problems (Cortes Vapnik, 1995)
Non-linear classifiers (Boser et al., 1992)
Good generalization performance
Handwriting recognition (LeCun et al.)
Face detection (Osuna et al.)
Text classification (Joachims)
Platts Sequential Minimal Optimization (SMO)
algorithm very efficient

24
SVMs Platts SMO Algorithm

SMO is a very fast way to train SVMs
SMO works by analytically solving the smallest
possible QP sub-problem
Substantially faster than chunking
Scales somewhere between

25
Text Classification Process
text files
Index Server
word counts per file
Find similar
Feature selection
data set
Learning Methods
Support vector machine
Decision tree
Naïve Bayes
Bayes nets
test classifier
26
Reuters Data Set (21578 - ModApte split)

9603 training articles 3299 test articles
Example interest article
2-APR-1987 063519.50
west-germany
b f BC-BUNDESBANK-LEAVES-CRE 04-02 0052
FRANKFURT, March 2
The Bundesbank left credit policies unchanged
after today's regular meeting of its council, a
spokesman said in answer to enquiries. The West
German discount rate remains at 3.0 pct, and the
Lombard emergency financing rate at 5.0 pct.
REUTER
Average article 200 words long

27
Reuters Data Set (21578 - ModApte split)

118 categories
An article can be in more than one category
Learn 118 binary category distinctions
Most common categories (train, test)

Trade (369,119)
Interest (347, 131)
Ship (197, 89)
Wheat (212, 71)
Corn (182, 56)

Earn (2877, 1087)
Acquisitions (1650, 179)
Money-fx (538, 179)
Grain (433, 149)
Crude (389, 189)

28
Category interest
rate1
rate.t1
lending0
prime0
discount0
pct1
year1
year0
29
Category Interest

Example SVM features -

-0.71 dlrs
-0.35 world
-0.33 sees
-0.25 year
-0.24 group
-0.24 dlr
-0.24 january

0.70 prime
0.67 rate
0.63 interest
0.60 rates
0.46 discount
0.43 bundesbank
0.43 baker

30
Accuracy Scores

Based on contingency table
Effectiveness measure for binary classification
error rate (bc)/n
accuracy 1 - error rate
precision (P) a/(ab)
recall (R) a/(ac)
break-even (PR)/2
F measure 2PR/(PR)

31
Reuters - Accuracy ((RP)/2)
Recall labeled in category among those stories
that are really in category
Precision really in category among those
stories labeled in category
Break Even (Recall Precision) / 2
32
ROC for Category - Grain
Recall labeled in category among those stories
that are really in category
Precision really in category among those
stories labeled in category
33
ROC for Category - Earn
34
ROC for Category - Acq
35
ROC for Category - Money-Fx
36
ROC for Category - Grain
37
ROC for Category - Crude
38
ROC for Category - Trade
39
ROC for Category - Interest
40
ROC for Category - Ship
41
ROC for Category - Wheat
42
ROC for Category - Corn
43
SVM Dumais et al. vs. Joachims