AnHai Doan - PowerPoint PPT Presentation

About This Presentation
Title:

AnHai Doan

Description:

DB IR IE II, in a best-effort, Web 2.0 fashion. 5. Broader Impacts ... Raises novel research issues. mass collab, best-effort, extraction, helping Joe Six-Pax ... – PowerPoint PPT presentation

Number of Views:101
Avg rating:3.0/5.0
Slides: 6
Provided by: zam37
Learn more at: https://dsf.berkeley.edu
Category:
Tags: anhai | doan

less

Transcript and Presenter's Notes

Title: AnHai Doan


1
Managing Unstructured Data
  • AnHai Doan
  • University of Wisconsin-Madison

2
Unstructured Data ...
  • Appears in many forms
  • emails, Web pages, memos, call center text
    record, etc.
  • Is pervasive
  • 80 of the world data, and is growing
  • Managed by many players
  • SIGIR/WWW/KDD/AAAI, Google/Yahoo/Microsoft/IBM

We should work on it, or risk missing the
boat! But what sets us apart from the above guys?
3
Structure System Focus!
  • Make it very easy to extract structures from raw
    data
  • in raw form ? keyword search / bag analysis
  • many apps want to go beyond that, they want
    structure
  • we should encourage this ? back to our play
    ground
  • not just DB IR, but DB IR IE
  • Instead of working on isolated research problems,

    lets build end-to-end UDMS
  • should repeat what we did with System R / Ingres
    system blueprint, followed by 20 years of rapid
    progress
  • unifies accelerate our research efforts
  • keeps work grounded, make impact

4
What Does this System Look Like?
Joe Hellerstein
Flexible modes of interaction
Extraction Integration
Joe Six-Pack
Mass collaboration Best-effort, pay-as-you-go,
improving over time Scale up to huge data (by
running over clusters)
  • DB IR IE II, in a best-effort, Web 2.0
    fashion

5
Broader Impacts
  • Great for many current applications
  • e-science, business, personal data, Web data,
    etc.
  • Great for many current research topics
  • IR, integration, PIM, data spaces
  • user interfaces, HCI, mashup
  • provenance, uncertainty
  • cluster management
  • query processing
  • monitoring, handling changes, pub/sub systems
  • Raises novel research issues
  • mass collab, best-effort, extraction, helping Joe
    Six-Pax
  • Helps define data mgt principles in broader
    contexts
Write a Comment
User Comments (0)
About PowerShow.com