Title: Information Retrieval and Databases: Synergies and Syntheses
1Information Retrieval and Databases Synergies
and Syntheses
- IDM Workshop Panel
- 15 Sep 2003
- Jayavel Shanmugasundaram
- Cornell University
210000 foot view of Data Management
Information Retrieval Systems
Ranked Keyword Search
Queries
Complex and Structured
Database Systems
Structured
Unstructured
Data
310000 foot view of Data Management
Information Retrieval Systems
Ranked Keyword Search
Queries
Complex and Structured
Database Systems
Structured
Unstructured
Data
4Applications
- Information discovery over structured databases
- Keyword search over relational databases
- DBXplorer Agrawal et al.
- DISCOVER Hristidis et al.
- BANKS Hulgeri et al.
510000 foot view of Data Management
Information Retrieval Systems
Ranked Keyword Search
Queries
Complex and Structured
Database Systems
Structured
Unstructured
Data
610000 foot view of Data Management
Information Retrieval Systems
Ranked Keyword Search
Queries
Complex and Structured
Database Systems
Structured
Unstructured
Data
7Applications
- Content management
- Mix of structured and unstructured data
- Database with date and time of accident
(structured data) and accident description
(unstructured data) - Semi-structured data
- Scientific documents, Shakespeares plays,
- Support flexible ranked keyword search interface
over such data - XRANK Guo et al., SIGMOD 2003
- XIRQL Fuhr et al., SIGIR 2001
8XML Keyword Search
ltworkshop date28 July 2000gt lttitlegt XML
and Information Retrieval A SIGIR 2000 Workshop
lt/titlegt lteditorsgt David Carmel, Yoelle
Maarek, Aya Soffer lt/editorsgt ltproceedingsgt
ltpaper id1gt
lttitlegt XQL and Proximal Nodes lt/titlegt
ltauthorgt Ricardo Baeza-Yates lt/authorgt
ltauthorgt Gonzalo Navarro
lt/authorgt ltabstractgt We
consider the recently proposed language
lt/abstractgt ltsection
nameIntroductiongt
Searching on structured text is becoming more
important with XML
lt/sectiongt
ltcite xmlnsxlinkhttp//www.acm.org/www8/paper/x
mlqlgt lt/citegt lt/papergt
- Most specific results (exploits structure!)
- Ranking at granularity of elements
910000 foot view of Data Management
Information Retrieval Systems
Ranked Keyword Search
Queries
Complex and Structured
Database Systems
Structured
Unstructured
Data
10Applications
- The Internet is enabling end-users to directly
ask queries and explore results - E.g., Used car marketplace
- Find all bright red ford mustangs that cost
less than 20 of the average price of cars in its
class - Characteristics of queries
- Keyword search (for ease of use)
- Complex query operations (information synthesis)
- Want to see ranked results!
11Towards Unifying DB and IR
- No standard query language for both DB and IR
- SQL and XQuery mostly database query languages
- Currently developing TeXQuery a full-text search
extension to XQuery - With S. Amer-Yahia, C. Botev, J. Robie
- Full composability of database and IR primitives,
ranking - Submitted to W3C committee on full-text
extensions to XQuery
12Summary
- Applications have mix of structured (DB domain)
and unstructured (IR domain) data - Stark difference in how they can be processed
- Benefits of unifying DB IR
- Ranked keyword search (information discovery)
over both structured and unstructured data - Complex queries over structured/semi-structured
data - A truly unified data store
- Need to generalize DB and IR techniques