Title: Nessun titolo diapositiva
1Conceptual structures in modern information
retrieval
Claudio Carpineto Fondazione Ugo
Bordoni Roma carpinet_at_fub.it
2Overview
- Keyword-based IR and early conceptual approaches
- Context and concepts in modern topical IR
- Emerging IR tasks requiring knowledge structures
- Research at FUB
- Conclusions
3Vector-based IR
4Term weighting
- tf.idf and vector space model (Salton) very
popular - in70s and 80s
- BM25 (Robertson) has been the state of the art
- in the 90s
- Several recent term-weighting functions based on
- statistical language modeling (Ponte,
Lafferty) - A new weighting framework based on deviation
- from randomness information gain (FUB UG)
5(No Transcript)
6Inherent limitations of keyword-based IR
- Vocabulary problem
- Relations are ignored
7Early approaches to conceptual IR
- n-grams (Salton 1975, Maarek 1989)
- parse tree (Dillon 1983, Metzler 1989)
- case relations (Fillmore 1968, Somers 1987)
- conceptual graphs (Dick 1991)
8Why early conceptual IR not successful
- No best representation scheme
- Manual coding too costly
- Automated coding too hard
- Training required both for the indexer and the
user - Effectiveness not clearly demonstrated
- Retrieval task often not appropriate
9Overview
- Vector-based IR and early conceptual approaches
- Context and concepts in modern topical IR
- Emerging IR tasks requiring knowledge structures
- Research at FUB
- Conclusions
10Evolution of topical IR
- Very short queries
- Heterogeneous collections
- Unreliable sources
- Interactive sessions
11Model of modern topical IR
12(No Transcript)
13Performance of retrieval feedback versus query
difficulty
14Ranking based on interdocument similarity
- Cluster hypothesis (van Rijsbergen 1978)
- Approaches
- - Matching the query against document clusters
(Willet 1988) - - Matching the query against transformed document
- representations (GVSM, Wong 1987, LSI,
Deerwester 1990) - Computing the conceptual distance between query
and - documents (Order-theoretical ranking,
Carpineto 2000)
15Order-theoretical ranking
16Performance of order-theoretical ranking
- Better than hierarchic clustering and comparable
to - best matching on the whole collection
- Markedly better than both hierarchic clustering
and - best matching on non-matching relevant
documents - Order-theoretical ranking does not scale up well
but - it is synergistic with best matching document
ranking
17Overview
- Vector-based IR and early conceptual approaches
- Context and concepts in modern topical IR
- Emerging IR tasks requiring knowledge structures
- Research at FUB
- Conclusions
18Question Answering
Task Closed-class questions in unrestricted
domains with no guarantee of answer and result
possibly scattered over multiple documents
19Question Answering
- Approach
- Recognize type of queries
- Retrieve relevant documents
- Find sought entities near question words
- Fall back to best-matching passage
- retrieval in case of failure
20Web Information Retrieval
21Web Information Retrieval
Current tasks named-entity finding task topic
distillation task
- Approach
- Use of multiple methods
- Combination of results via interpolation and
- normalization schemes
22XML document retrieval
Goal Use document structure to improve precision
and recall of unstructured queries concerts
this weekend at Sofia under 20 euros
- Approaches
- Automatic inference of query structure
- Semi-automatic query annotation
- Hybrid query languages
23Overview
- Vector-based IR and early conceptual approaches
- Context and concepts in modern topical IR
- Emerging IR tasks requiring knowledge structures
- Research at FUB
- Conclusions
24Recommender systems
Related keyword feature versus Context-
dependent query reformulation
25(No Transcript)
26(No Transcript)
27Combining text retrieval and text mining with
concept lattices
Goal
Integration of multiple search
strategies (querying, browsing, thesaurus
climbing, bounding) into a unique Web interface
28Conclusions
The use of conceptual structures surfaces in
traditional topic relevance retrieval and it is
at the heart of many non-topical retrieval
tasks Towards conceptual search
- Understand term meaning
- Adapt to the user
- Can translate between applications
- Explainable
- Capable of filtering and summarization