Databases and Information Retrieval: Rethinking the Great Divide - PowerPoint PPT Presentation

1 / 11
About This Presentation
Title:

Databases and Information Retrieval: Rethinking the Great Divide

Description:

Databases and Information Retrieval: Rethinking the Great Divide ... Option 1: Tie together existing DB and IR systems. Example: Approaches based on SQL/MM ... – PowerPoint PPT presentation

Number of Views:68
Avg rating:3.0/5.0
Slides: 12
Provided by: jayavelsha
Category:

less

Transcript and Presenter's Notes

Title: Databases and Information Retrieval: Rethinking the Great Divide


1
Databases and Information RetrievalRethinking
the Great Divide
  • SIGMOD Panel
  • 14 Jun 2005
  • Jayavel Shanmugasundaram
  • Cornell University

2
10000 Foot View of Data Management
Information Retrieval Systems
Ranked Keyword Search
Queries
Complex and Structured
Database Systems
Structured
Unstructured
Data
3
Bridging the Great Divide
  • Option 1 Tie together existing DB and IR systems
  • Example Approaches based on SQL/MM
  • Option 2 Extend existing DB systems with IR
    functionality, or vice versa
  • Example Add searching and ranking to RDBMSs
  • Option 3 Design a new data management system
    from the ground-up
  • Example Quark data management system

4
Why Option 1 Wont Work
Information Retrieval Systems
Ranked Keyword Search
Queries
Complex and Structured
Database Systems
Structured
Unstructured
Data
5
Bridging the Great Divide
  • Option 1 Tie together existing DB and IR systems
  • Example Approaches based on SQL/MM
  • Drawback Not powerful enough
  • Option 2 Extend existing DB systems with IR
    functionality, or vice versa
  • Example Add searching and ranking to RDBMSs
  • Option 3 Design a new data management system
    from the ground-up
  • Example Quark data management system

6
ltworkshop date28 July 2000gt lttitlegt XML
and Information Retrieval A SIGIR 2000 Workshop
lt/titlegt lteditorsgt David Carmel, Yoelle
Maarek, Aya Soffer lt/editorsgt ltproceedingsgt
ltpaper id1gt
lttitlegt XQL and Proximal Nodes lt/titlegt
ltauthorgt Ricardo Baeza-Yates lt/authorgt
ltauthorgt Gonzalo Navarro
lt/authorgt ltabstractgt We
consider the recently proposed language
lt/abstractgt ltsection
nameIntroductiongt
Searching on structured text is becoming more
important with XML
lt/sectiongt
ltcite xmlnsxlinkhttp//www.acm.org/www8/paper/x
mlqlgt lt/citegt lt/papergt
Find relevant elements in important workshops
between the years 1999 and 2001 that are about
Ricardo and XML
7
Why Extending (R)DBMSs Wont Work
  • Violates many assumptions hardwired into
    current database systems
  • Structured queries over structured fields,
    keyword search queries over text fields
  • Is author name a structured or text field?
  • Operators have precise, well-defined semantics
  • Even the query result is not well-defined do we
    return a paper or a workshop?
  • Scoring is an attribute tacked on as a relational
    attribute
  • How can this scoring generalize IR scoring?

8
Why Extending IR Systems Wont Work
  • IR systems provide little support for structured
    data
  • No support for complex operators
  • How can complex queries be evaluated?
  • Scoring does not take structure into account
  • How can scoring capture both structured and
    unstructured data?

9
Bridging the Great Divide
  • Option 1 Tie together existing DB and IR systems
  • Example Approaches based on SQL/MM
  • Drawback Not powerful enough
  • Option 2 Extend existing DB systems with IR
    functionality, or vice versa
  • Example Add searching and ranking to RDBMSs
  • Drawback Shoehorns alien functionality into
    already complex systems
  • Option 3 Design a new data management system
    from the ground-up
  • Example Quark data management system

10
Why Option 3 Will Work
  • Designed ground-up with three principles
  • Structural data independence
  • Users can issues any query (complex and keyword)
    over any data (structured and unstructured)
  • Generalized scoring
  • Scoring works over any mix of structured and
    unstructured data (e.g., XRank over HTML and XML)
  • Flexible query language
  • Allows for arbitrary return results and scores
    (e.g., TeXQuery, precursor to XQuery Full-Text,
    NEXI)

11
Bridging the Great Divide
  • Option 1 Tie together existing DB and IR systems
  • Example Approaches based on SQL/MM
  • Drawback Not powerful enough
  • Option 2 Extend existing DB systems with IR
    functionality, or vice versa
  • Example Add searching and ranking to RDBMSs
  • Drawback Shoehorns alien functionality into
    already complex systems
  • Option 3 Design a new data management system
    from the ground-up
  • Example Quark data management system
  • Most promising alternative!
Write a Comment
User Comments (0)
About PowerShow.com