Trio: A System for Data, Uncertainty, and Lineage - PowerPoint PPT Presentation

About This Presentation
Title:

Trio: A System for Data, Uncertainty, and Lineage

Description:

Amy. witness. Three possible. instances. 8. DATA. UNCERTAINTY. LINEAGE. Six possible. instances ... (Amy, Honda): 0.5 (Amy,Toyota): 0.3 (Amy, Mazda): 0.2 (Betty, ... – PowerPoint PPT presentation

Number of Views:57
Avg rating:3.0/5.0
Slides: 26
Provided by: jennife93
Category:

less

Transcript and Presenter's Notes

Title: Trio: A System for Data, Uncertainty, and Lineage


1
Trio A System for Data, Uncertainty, and Lineage
  • Search stanford trio
  • http//i.stanford.edu/trio

2
People
  • Current
  • Jennifer Widom (faculty)
  • Omar Benjelloun (post-doc)
  • Parag Agrawal, Anish Das Sarma, Shubha Nabar
    (PhD)
  • Michi Mutsuzaki (MS)
  • Tomoe Sugihara (visitor)
  • Incoming
  • Martin Theobald (post-doc)
  • Raghu Murthy (MS)
  • Ander de Keijzer (visitor)
  • Alums
  • Alon Halevy, Ashok Chandra (visitors)
  • Chris Hayworth (MS)

3
Why Uncertainty Lineage?
  • Many applications seem to need both
  • From a technical standpoint, it turns out that
  • lineage...
  • Enables simple and consistent representation of
    uncertain data
  • Correlates uncertainty in query results with
    uncertainty in the input data
  • Can make computation over uncertain data more
    efficient

4
Trio Components
  • Data Model
  • ULDBs (Uncertainty-Lineage Databases)
  • Simple extension to relational model
  • Query Language
  • TriQL Simple extension to SQL, well-defined
    semantics and intuitive behavior
  • System
  • Version 1 Complete system and GUI built
    on top of conventional DBMS

5
Running Example Crime-Solving
  • Saw(witness,car) // may be uncertain
  • Drives(person,car) // may be uncertain
  • Suspects(person) pperson(Saw ? Drives)

6
Our Model for Uncertainty
  • 1. Alternatives
  • 2. ? (Maybe) Annotations
  • 3. Confidences

7
Our Model for Uncertainty
  • 1. Alternatives uncertainty about value
  • 2. ? (Maybe) Annotations
  • 3. Confidences

Three possible instances

8
Our Model for Uncertainty
  • 1. Alternatives
  • 2. ? (Maybe) uncertainty about presence
  • 3. Confidences

?
Six possible instances
9
Our Model for Uncertainty
  • 1. Alternatives
  • 2. ? (Maybe) Annotations
  • 3. Confidences weighted uncertainty

?
Six possible instances, each with a probability
10
Models for Uncertainty
  • Our model (so far) is not especially new
  • We spent some time exploring the space of models
    for uncertainty ICDE 06, journal
  • Tension between understandability and
    expressiveness
  • Our model is understandable
  • But it is not complete, or even closed under
    common operations

11
Our Model is Not Closed
Suspects pperson(Saw ? Drives)
Does not correctly capture possible instances in
the result
CANNOT
?
?
?
12
Lineage to the Rescue
  • Lineage
  • Captures where data came from
  • In Trio A function ? from alternatives to other
    alternatives (or external sources)

13
Example with Lineage
Suspects pperson(Saw ? Drives)
?(31) (11,2),(21,2)
?
?(32,1) (11,1),(22,1) ?(32,2) (11,1),(22,2)
?
?
?(33) (11,1), 23
14
Uncertainty-Lineage Databases (ULDBs)
  • 1. Alternatives
  • 2. ? (Maybe) Annotations
  • 3. Confidences
  • 4. Lineage
  • ULDBs are closed and complete
  • VLDB 06

15
ULDBs Lineage
  • Conjunctive lineage sufficient for most
    operations
  • Duplicate-elimination Disjunctive lineage
  • Difference Negative lineage
  • General case after multiple operations/queries
    Boolean formula

16
ULDBs Interesting Questions
  • Data-minimality extraneous alternatives,
    extraneous ?
  • Lineage-minimality harder
  • Membership tuple and table, some-instance and
    all-instances
  • Coexistence multiple tuples
  • Extraction remove tables, retain
    possible-instances

17
Example Extraneous Data
?
extraneous
?
?
18
Example Coexistence
?
Cant coexist
?
?
?
19
Querying ULDBs Semantics
  • Query Q on ULDB D

implementation of Q
D
D
D Result
operational semantics
possible instances
representation of instances
Q on each instance
D1, D2, , Dn
Q(D1), Q(D2), , Q(Dn)
20
Querying ULDBs TriQL
  • Basic TriQL SQL with new semantics
  • Obeys commutative diagram for uncertain data
  • Tracks lineage
  • Query results new table or on-the-fly
  • Implemented TriQL also built-in predicates
    conf(), lineage(), lineage()

21
Additional TriQL Constructs
  • Language manual on web site
  • Horizontal subqueries
  • Refer to tuple alternatives as a relation
  • Unmerged (horizontal duplicates)
  • Flatten, GroupAlts
  • NoLineage, NoConf, NoMaybe
  • Query-specified confidences done
  • Data modification statements

22
Confidence Computation
  • Confidences computed on-demand based on lineage
  • Confidence of alternative A is function of
    confidences in ?(A)
  • Permits any query plan for data computation
  • Default probabilistic interpretation, but queries
    can override

SELECT person, min(conf(Saw),conf(Drives)) as
conf FROM Saw, Drives WHERE Saw.car Drives.car
23
Trio System Version 1
TrioExplorer (GUI client)
  • DDL commands
  • TriQL queries
  • Schema browsing
  • Table browsing
  • Explore lineage
  • On-demand
  • confidence
  • computation

Command-line client
Trio API and translator (Python)
  • Verticalize
  • Shared IDs for
  • alternatives
  • Columns for
  • confidence,?

Standard SQL
  • Table types
  • Schema-level
  • lineage structure

Standard relational DBMS
  • conf()
  • lineage()
  • lineage()

Encoded Data Tables
Trio Metadata
  • One per result
  • table
  • Uses unique IDs

Trio Stored Procedures
Lineage Tables
24
Current Future Topics
  • Algorithms confidence computation, coexistence
  • extraneous data
  • Minimize lineage traversal
  • Memoization
  • Batch operations
  • System
  • Full query language
  • More internal processing ?
  • Storage and indexing
  • Statistics and query optimization

25
Current Future Topics
  • Top-K by confidence
  • Extend basic uncertainty model
  • Incomplete relations
  • Continuous uncertainty
  • Correlated uncertainty ?
  • External lineage,
  • update lineage,
  • versioning
Write a Comment
User Comments (0)
About PowerShow.com