Title: Information Filtering
1Information Filtering
- Evaluation of Filtering Systems
- IEEE Paper Contest Fall 2002
2- Introduction to information filtering
- What is filtering
- Other info. seeking processes
- Paradigms
- Profile Modeling
- Evaluation of filtering systems
- Privacy in filtering systems
-
3Other info. seeking processes
4Filtering vs. Retrieval
Grand Challenge
Filtering
Information Source change rate
Retrieval
information need change rate
53 subtasks of Filtering
- Collection
- Active
- Passive
- Selection
- Display
- Interactive
- Non Interactive
Collection
Selection
Display
6Two paradigms of filtering systems
- Content-Based
- SIFT
- InfoScope
- Social
- Tapestry
- Uses a Client/Server mechanism to generate a
ranked list - GroupLens
- Chicken and the Egg problem
-
7A Typical filtering system
n0,1
Human Judgment
j
User Interest space I
Document space D
Info. Need
Document
p
d
Representation space R
profile
Representation
c
Comparison Function
n0,1
8User modeling Machine Learning
- User Model
- Explicit (like SIFT)
- Implicit (in machine learning)
- Users behavior
- Elements of the environment
- Evidence of Users behavior
- Explicit feedback
- Implicit feedback (InfoScope)
9sources of implicit evidence about users
interests
- Read/Ignored
- Saved/Delete
- Replied or not
- Reading time
10Machine learning approaches
- Rule induction
- Instance based
- Statistical classification
- Neural networks
- Genetic algorithms
- and more
11Evaluation strategies
- Precision and Recall
- problems
- Recall needs total number of rel. docs.
- Precision does not tell everything.
12Utility Functions
- Linear Utility Functions
- LF13R - 2N if p(rel)gt.4
- LF23R - N if p(rel)gt.25
13Major problems
- The average will be dominated by topics with
large retrieved sets. - Difficult to compare performance across topics
14Solutions
- Nonlinear Utility functions
- NF1 6R.5 N
- NF2 6R.8 N
- Scaling
-
15Scaling
- Divide by max utility scores for each topic
- problems
- It is flawed by negative scores.
- Inconsistency with precision and recall.
-
-
16- Suppose we have two systems where
- Precision(X)gtPrecision(Y)
- Recall (X)gt Recall(Y)
- if U(X) and U(Y) are negative or we use
- nonlinear utility we can have
- U(X) lt U(Y) !!!
17A more sophisticated formula
-
- Us(S,T)
- (max(U(S,T),U(S)) -U(S))/(max U(T)-U(S))
- Problem
- Evaluation highly dependent on the
- value of S.
18 TREC 9Resorting to the good old
friend
- Precision-Oriented function
-
- T9P(rel. ret. Docs)/ max (target , ret. Docs)
19Privacy
- Privacy becomes an issue when a system collects
information about its user - Its important either in commercial and personal
application
20Privacy in content-based Filtering
- Preventing unauthorized access to profiles
- Password
- Encryption
- preventing reconstruction of useful information
about user profile - Traffic analysis problem
21Privacy in social filtering
- Using pseudonym
- Encrypted transmission of annotation to
authorized users
22resources
- A Conceptual Framework for Text Filtering
- Douglas W. Oard Gary Marchionini
- Information filtering and information retrieval
two sides of the same coin? - Nicholas J. Belkin W. Bruce Croft
- The TREC-7 Filtering Track Final Report
-
- The TREC-8 Filtering Track Final Report
- David A. Hull Stephen Robertson
- The TREC-9 Filtering Track Final Report
- Ellen M. Voorhees
23The End
Ershad Rahimikia M.S. Makarem