Databases and Information Retrieval: Rethinking the Great Divide - PowerPoint PPT Presentation

About This Presentation

Title:

Databases and Information Retrieval: Rethinking the Great Divide

Description:

Databases and Information Retrieval: Rethinking the Great Divide ... Option 1: Tie together existing DB and IR systems. Example: Approaches based on SQL/MM ... – PowerPoint PPT presentation

Number of Views:68

Avg rating:3.0/5.0

Slides: 12

Provided by: jayavelsha

Learn more at: https://www.cs.cornell.edu

Category:

more less

Transcript and Presenter's Notes

Title: Databases and Information Retrieval: Rethinking the Great Divide

1
Databases and Information RetrievalRethinking
the Great Divide

SIGMOD Panel
14 Jun 2005
Jayavel Shanmugasundaram
Cornell University

2
10000 Foot View of Data Management
Information Retrieval Systems
Ranked Keyword Search
Queries
Complex and Structured
Database Systems
Structured
Unstructured
Data
3
Bridging the Great Divide

Option 1 Tie together existing DB and IR systems
Example Approaches based on SQL/MM
Option 2 Extend existing DB systems with IR
functionality, or vice versa
Example Add searching and ranking to RDBMSs
Option 3 Design a new data management system
from the ground-up
Example Quark data management system

4
Why Option 1 Wont Work
Information Retrieval Systems
Ranked Keyword Search
Queries
Complex and Structured
Database Systems
Structured
Unstructured
Data
5
Bridging the Great Divide

Option 1 Tie together existing DB and IR systems
Example Approaches based on SQL/MM
Drawback Not powerful enough
Option 2 Extend existing DB systems with IR
functionality, or vice versa
Example Add searching and ranking to RDBMSs
Option 3 Design a new data management system
from the ground-up
Example Quark data management system

6
ltworkshop date28 July 2000gt lttitlegt XML
and Information Retrieval A SIGIR 2000 Workshop
lt/titlegt lteditorsgt David Carmel, Yoelle
Maarek, Aya Soffer lt/editorsgt ltproceedingsgt
ltpaper id1gt
lttitlegt XQL and Proximal Nodes lt/titlegt
ltauthorgt Ricardo Baeza-Yates lt/authorgt
ltauthorgt Gonzalo Navarro
lt/authorgt ltabstractgt We
consider the recently proposed language
lt/abstractgt ltsection
nameIntroductiongt
Searching on structured text is becoming more
important with XML
lt/sectiongt
ltcite xmlnsxlinkhttp//www.acm.org/www8/paper/x
mlqlgt lt/citegt lt/papergt
Find relevant elements in important workshops
between the years 1999 and 2001 that are about
Ricardo and XML
7
Why Extending (R)DBMSs Wont Work

Violates many assumptions hardwired into
current database systems
Structured queries over structured fields,
keyword search queries over text fields
Is author name a structured or text field?
Operators have precise, well-defined semantics
Even the query result is not well-defined do we
return a paper or a workshop?
Scoring is an attribute tacked on as a relational
attribute
How can this scoring generalize IR scoring?

8
Why Extending IR Systems Wont Work

IR systems provide little support for structured
data
No support for complex operators
How can complex queries be evaluated?
Scoring does not take structure into account
How can scoring capture both structured and
unstructured data?

9
Bridging the Great Divide

Option 1 Tie together existing DB and IR systems
Example Approaches based on SQL/MM
Drawback Not powerful enough
Option 2 Extend existing DB systems with IR
functionality, or vice versa
Example Add searching and ranking to RDBMSs
Drawback Shoehorns alien functionality into
already complex systems
Option 3 Design a new data management system
from the ground-up
Example Quark data management system

10
Why Option 3 Will Work

Designed ground-up with three principles
Structural data independence
Users can issues any query (complex and keyword)
over any data (structured and unstructured)
Generalized scoring
Scoring works over any mix of structured and
unstructured data (e.g., XRank over HTML and XML)
Flexible query language
Allows for arbitrary return results and scores
(e.g., TeXQuery, precursor to XQuery Full-Text,
NEXI)

11
Bridging the Great Divide