An Extensible Query Browser - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

An Extensible Query Browser

Description:

Support the implementation of database systems (prototypes) with new kinds of ... implemented on top of BerkeleyDB storage manager. written in C . Optimizer: ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 41
Provided by: lgivp2
Category:

less

Transcript and Presenter's Notes

Title: An Extensible Query Browser


1
SECONDO An Extensible DBMS Platform for Research
Prototyping and Teaching
Ralf Hartmut Güting Fernuniversität Hagen,
Germany
2
Introduction
  • Suppose you had invented a new DBMS data model
    and wished to implement it, how would you proceed
    ?
  • Our goals
  • Support the implementation of database systems
    (prototypes) with new kinds of data models, e.g.
    graph-oriented, sequence-oriented, temporal,
    semi-structured models, XML ...
  • Design a clean, generic, extensible system
    architecture under which a wide range of data
    models can be implemented.
  • Make the architecture modular, with well defined
    interfaces for modules, so that certain classes
    of components can be developed and exchanged
    easily (like e.g. a graphics card or a processor
    in a PC).

3
Introduction
  • Extensible DBMS. Two major directions
  • Fixed object model with ADT extensibility
  • e.g. relational or object-relational. Extensible
    by atomic data types and their operations.
    Extensibility to support this, e.g. by
  • data structures for data types
  • implementation of operations
  • index structures
  • join methods
  • optimization rules
  • cost functions
  • Examples Starburst, POSTGRES, Gral, Informix
    Universal Server
  • Well-structured with clear interfaces
  • But object model is fixed.

4
Introduction - Extensible DBMS
  • Toolkits
  • No fixed data model any more. Set of tools such
    as storage manager, optimizer generator is
    provided.
  • Examples GENESIS, EXODUS, DASDBS, SHORE,
    Volcano, PREDATOR
  • Great flexibility, arbitrary data models.
  • But little structure. Very hard for implementor.
  • SECONDO
  • No fixed data model, but a fixed meta-model for
    describing data models.
  • Great flexibility, arbitrary data models.
  • Well structured, with clear interfaces.

5
Introduction
  • The SECONDO vision
  • Develop a database system (frame) without a data
    model.
  • Describe the data model and query language to the
    system in a high-level specification language.
  • Describe a query processing subsystem to the
    (frame) system in the same language.
  • Describe also optimization rules in this
    language.
  • Design the query processing subsystem as a
    collection of algebras, and implement each
    algebra in an algebra module.
  • Design clean, simple, easy-to-use interfaces and
    tools for the implementation of algebra modules,
    so that
  • research groups anywhere can independently
    encapsulate and make available their latest
    research results (prototypes) in the form of
    algebra modules, and so
  • contribute to an open library of query
    processing modules.

6
Concepts Second-Order Signature
  • A formalism to describe
  • a descriptive algebra, defining a data model and
    query language,
  • an executable algebra, specifying a collection of
    data structures and operations capable of
    representing the data model and implementing the
    query language,
  • rules to enable a query optimizer to map
    descriptive algebra terms to executable algebra
    terms (query plans).

7
Concepts Second-Order Signature
  • Basic idea Use two coupled signatures. The first
    signature describes a type system, the second an
    algebra over the types generated by the first
    signature.
  • Example
  • DATA int, real, bool
  • DATA SET set
  • Terms of the first signature ( types of the type
    system)
  • int, real, bool, set(int), set(real), set(bool)
  • are sorts of the second.
  • data in DATA.
  • data data bool

8
Concepts Second-Order Signature
  • Specifying a Descriptive Algebra
  • kinds IDENT, DATA, TUPLE, REL
  • type constructors
  • DATA int, real, bool, string
  • (IDENT DATA) TUPLE tuple
  • TUPLE REL rel
  • Example term ( type, schema)
  • rel(tuple((name, string), (age, int))

9
Concepts Second-Order Signature Specifying a
Descriptive Algebra
  • operators
  • data in DATA.
  • data data bool _ _
  • rel rel(tuple) in REL.
  • rel (tuple bool) rel select _ _
  • tuple tuple(list) in TUPLE, attrname in IDENT,
  • member(attrname, attrtype, list).
  • tuple attrname attrtype attr ( _, _)
  • A query
  • people selectfun (p person) attr(p, age) gt 20

10
Concepts Second-Order Signature
  • Specifying an Executable Algebra
  • kinds IDENT, DATA, TUPLE, RELREP
  • type constructors
  • DATA int, real, bool, string
  • (IDENT DATA) TUPLE tuple
  • TUPLE RELREP srel, relrep

11
Concepts Second-Order Signature Specifying an
Executable Algebra
  • operators
  • ...
  • tuple in TUPLE.
  • relrep(tuple) stream(tuple) feed _
  • stream(tuple) (tuple bool)
    stream(tuple) filter _ _
  • stream(tuple) srel(tuple) consume _
  • A query plan
  • people feed filterfun (p person) attr(p, age)
    gt 20 consume

12
Concepts Second-Order Signature
  • Commands
  • A database is a pair (T, O) where T is a finite
    set of named types, and O a finite set of named
    objects. Six basic commands to manipulate a
    database
  • type ltidentifiergt lttype expressiongt
  • delete type ltidentifiergt
  • create ltidentifiergt lttype expressiongt
  • let ltidentifiergt ltvalue expressiongt
  • update ltidentifiergt ltvalue expressiongt
  • delete ltidentifiergt
  • query ltvalue expressiongt
  • Example
  • type city_rel srel(tuple((name, string), (pop,
    int), (country, string)))
  • create cities city_rel update cities ...
  • query cities feed filterattr(., pop) gt 1000000
    consume

13
SECONDO System Overview
  • Three major components
  • Secondo Kernel
  • implements specific data models
  • extensible by algebra modules
  • provides query processing over the implemented
    algebras
  • implemented on top of BerkeleyDB storage manager
  • written in C
  • Optimizer
  • core capability conjunctive query optimization
  • currently supports a relational model with an
    SQL-like language
  • written in PROLOG
  • GUI
  • extensible interface for an extensible DBMS like
    Secondo
  • extensible by viewers
  • sophisticated spatial / spatio-temporal viewer,
    extensible by data types
  • written in Java

14
SECONDO System Overview
  • Components work together
  • GUI sends executable query (query plan) to the
    kernel, displays result
  • GUI sends query to optimizer, receives plan,
    sends plan to kernel, displays result
  • optimizer sends commands and executable queries
    to kernel to get information about DB objects,
    e.g. selectivities

15
The SECONDO Kernel
  • Further Commands and Examples
  • Further commands for
  • creating, deleting databases
  • create database geo
  • inquiring about algebras, type constructors,
    operations, as well as types and objects in a
    database
  • list type constructors
  • list algebras
  • importing, exporting objects and databases
  • save rivers to rivers
  • restore database geo from geo
  • transaction management
  • commit transaction

16
The SECONDO Kernel - Further Commands and
Examples
  • Some example commands and queries
  • create x int
  • update x 7
  • let inc fun(nint) n 1
  • query secondo contains second

17
The SECONDO Kernel - Further Commands and Examples
  • Next example uses objects
  • Kreis rel(tuple(KName string, ..., Gebiet
    region))
  • kreis_Gebiet rtree(tuple(KName string, ...,
    Gebiet region))
  • magdeburg region
  • The following query finds neighbour counties of
    magdeburg
  • query
  • kreis_Gebiet Kreis windowintersectsbbox(magdebur
    g)
  • filter.Gebiet touches magdeburg
  • filternot(.KName contains "Magdeburg")
  • projectKName
  • consume
  • rel(Tuple) x rtree(Tuple) x rectangle -gt
    stream(Tuple) windowintersects _ _ _
  • region -gt rectangle bbox ( _ )
  • region x region -gt bool touches _ _

18
  • (E) Secondo gt query kreis_Gebiet Kreis
    windowintersectsbbox(magdeburg) filter.Gebiet
    touches magdeburg filternot(.KName contains
    "Magdeburg") projectKName consume
  • (E) Secondo -gt
  • (query
  • (consume
  • (project
  • (filter
  • (filter
  • (windowintersects
    kreis_Gebiet Kreis
  • (bbox magdeburg))
  • (fun
  • (tuple1 TUPLE)
  • (touches
  • (attr tuple1 Gebiet)
  • magdeburg)))
  • (fun
  • (tuple2 TUPLE)
  • (not
  • (contains

19
  • Analyze query ...
  • 112711 -gt elapsed time 000 minutes. Used CPU
    Time 0.17 seconds.
  • Execute ...
  • 112711 -gt elapsed time 000 minutes. Used CPU
    Time 0.22 seconds.
  • KName LK Schönebeck
  • KName LK Bördekreis
  • KName LK Ohre-Kreis
  • KName LK Jerichower Land
  • (E) Secondo gt

20
The SECONDO Kernel
  • System Architecture

Multi-Threaded Server
Single-Threaded User Interface
DDL Query Language
Command Manager
Query Compiler
Descriptive Algebra
Optimizer
Rules
Executable Algebra
Query Processor Catalog
Alg1
Alg2
Alg3
Algn
Tools
Operating System Storage Manager
21
The SECONDO Kernel System Architecture
  • Tools

Nested Lists
SecondoSMI
Tuple Manager
SOS Parser
DBArrays, FLOBs, LOBS
22
The SECONDO Kernel System Architecture
  • Structure of Algebra Modules
  • Each algebra module offers some type constructors
    and some operators. The module contains

Type constructor
Operator
TransformType Select Evaluate
TypeCheck
Create/Delete In/Out Open/Close Save/Clone
Representation DS
23
The SECONDO Kernel System Architecture
  • Some currently available algebra modules

24
The Secondo Kernel Query Processing
  • Representation of Types, Queries, Values
  • Nested lists are used extensively in the system,
    to represent
  • type expressions
  • value expressions (queries, query plans)
  • constants in queries
  • values of data types in the external
    representation (for exchange with other
    applications)
  • Type expressions
  • rel(tuple((name, string), (pop, int), (country,
    string))
  • (rel (tuple ((name string) (pop int) (country
    string))))
  • Queries
  • cities feed filterfun (c city) attr(c, pop) gt
    1000000 consume
  • (consume (filter (feed cities) (fun (c city) (gt
    (attr c pop) 1000000))))

25
The Secondo Kernel Query Processing
Representation of Types, Queries, Values
  • Values in external representation (e.g. list
    representation of a polygon)
  • ((1.0 3.8) (4.0 3.8) (2.5 6.0))
  • Generic constants in queries (lttype expressiongt
    ltvalue listgt)
  • (polygon ((1.0 3.8) (4.0 3.8) (2.5 6.0)))
  • Values in internal representation a single word
    of storage.

26
The Secondo Kernel Query Processing
  • Processing Type Expressions Kind Checking
  • Process commands
  • type ltidentifiergt lttype expressiongt
  • create ltidentifiergt lttype expressiongt
  • Check whether type constructors are applied
    correctly in the type expression.
  • (type city_rel
  • (rel (tuple ((name string) (pop int) (country
    string))))
  • )
  • DATA int, real, bool, string
  • (IDENT DATA) TUPLE tuple
  • TUPLE REL rel
  • Use TypeCheck function of each type constructor.
    System frame provides a function CheckKind for
    each kind that can be used to check
    quantification over kinds.

27
The Secondo Kernel Query Processing
  • Processing Value Expressions Type Checking and
    Evaluation
  • Process commands
  • update ltidentifiergt ltvalue expressiongt
  • let ltidentifiergt ltvalue expressiongt
  • query ltvalue expressiongt
  • Three steps
  • Annotating the Query
  • Building the Operator Tree
  • Evaluation

28
The Secondo Kernel Query Processing
Processing Value Expressions
  • Annotating the Query
  • (consume (filter (feed cities) (fun (c city) (gt
    (attr c pop) 1000000))))
  • Process nested list tree bottom-up, annotating
    each node with its type and other information.
    Includes type checking.
  • For each atom or sublist s, return a list of the
    form
  • ((s class ...) type)
  • e.g.
  • ((feed operator 2 1) none)
  • When an application of an operator to a list of
    arguments
  • (op arg1 ... argn)
  • is recognized, the operators TransformType
    function is called with the types of arg1, ...,
    argn to check whether argument types are correct
    and to return the result type.

29
The Secondo Kernel Query Processing
Processing Value Expressions
  • Building the Operator Tree
  • (consume (filter (feed cities) (fun (c city) (gt
    (attr c pop) 1000000))))

30
The Secondo Kernel Query Processing
Processing Value Expressions
  • Evaluation
  • function eval (t node) WORD
  • input a node t of the operator tree
  • output the value of the subtree rooted in t
  • method
  • if t is an object or indirect object then lookup
    the value and return it
  • else t is an operator node
  • for each subtree ti of t do
  • if the root of ti is marked as function (F) or
    stream (S)
  • then argi ti
  • else argi eval(ti)
  • end
  • end
  • call the operators evaluation function with
    argument vector arg and return its result
  • end
  • end eval

31
The Secondo System Query Processing
Processing Value Expressions
  • Evaluation
  • Operator evaluation functions all have the same
    generic interface. They call query processor
    primitives to evaluate arguments (subtrees) that
    are functions or streams
  • getArguments
  • request
  • open
  • request
  • close
  • received
  • Evaluation functions for stream operators return
    special values YIELD or CANCEL.

to evaluate a parameter function subtree
32
The Optimizer
  • Performs conjunctive query optimization given a
    set of relations and a set of selection or join
    predicates, find a good plan. Uses a new
    algorithm for this.
  • Based on shortest path search in a predicate
    order graph.

10
1/2
1/10
1/5
r
p
p
q
r
q
20
50
100
500
r
p
1000
50
10
q
q
p
r
500
200
100
q
p
r
  • Selectivities of predicates are determined in
    advance by evaluating selections and joins on
    small sample relations. Selectivities once
    determined are stored for future use.
  • Optimizer implements an SQL-like language in a
    notation adapted to PROLOG.

33
The Optimizer
  • Optimization algorithm
  • For given relations and predicates, construct the
    POG.
  • For each edge, construct plan edges. Controlled
    by optimization rules for selections and joins.
  • For given sizes of arguments and selectivities of
    predicates, assign sizes to all nodes
    (intermediate results). Also annotate edges of
    POG with selectivities.
  • For each plan edge, compute its cost. Based on
    sizes of arguments, selectivity along the edge,
    and cost function for each operator occurring in
    a plan edge.
  • Use algorithm of Dijkstra to find a shortest path
    from bottom to top node through the graph of plan
    edges. This is the plan.

34
The Optimizer Sample Query
  • 2 ?- sql select count() from orte as o, plz as
    p1, plz as p2 where oort p1ort, p2plz
    p1plz 7, (p2plz mod 5) 0, p1plz gt 30000,
    oort contains "o".
  • selectivity 0.000560748
  • selectivity 1.6377e-005
  • selectivity 0.700935
  • selectivity 0.31
  • Destination node 31 reached at iteration 6
  • Height of search tree for boundary is 4
  • The best plan is
  • Orte feed o filter(.Ort_o contains "o")
    loopjoinplz_Ort plz exactmatch.Ort_o p1
    filter(.PLZ_p1 gt 30000) loopjoinplz_PLZ plz
    exactmatch(.PLZ_p1 7) p2
    filter((.PLZ_p2 mod 5) 0) count
  • Estimated Cost 66818.7
  • Command succeeded, result
  • 273
  • Yes

35
The Graphical User Interface
  • extensible interface for an extensible DBMS like
    Secondo
  • extensible by viewers
  • sophisticated spatial / spatio-temporal viewer,
    extensible by data types

36
The Graphical User Interface
37
Teaching
  • Nice environment for teaching concepts of
    database systems
  • clean architecture
  • attractive mix of known concepts and
    implementation techniques and novel features for
    extensibility
  • source code accessible, in essential parts well
    documented
  • user manual, programmers guide available
  • can be used for bachelor, master theses
  • can be used for student group projects

38
Teaching
  • A Student Project on Extensible Database
    Systems
  • Duration one term, currently 28 students (7
    groups of 4 students)
  • Stage 1 (6 weeks) Solve individually a number of
    extension exercises
  • Write a small algebra containing data types for
    point, line segment and triangle with a few
    operations. Also implement some stream
    operations.
  • Add a few simple operations to the relational
    algebra, e.g. duplicate removal by hashing.
  • Learn to manage large objects by implementing a
    polygon type.
  • Make data types for point, segment, and triangle
    available as attribute types for relations.
  • An exercise with the storage management
    interface.
  • Write extensions of the optimizer, for example,
    rules to use an R-tree and a loopjoin. (Optional
    for students with knowledge of PROLOG. Required
    are programming capabilities in C and Java.)
  • Extend the GUI by writing a simple viewer. Also
    extend the Hoese-Viewer by display classes for
    point, segment and triangle

39
Teaching - A Student Project on Extensible
Database Systems
  • Stage 2 (6 weeks) As a group, build a more
    significant extension to SECONDO.
  • Examples
  • algebra for midi music
  • algebra for jpeg images
  • algebra for mp3 music

40
Summary
  • Clean architecture to implement database systems
    with new data models. Allows extensions to DBMS
    object model (rather than only new attribute
    types).
  • Clear separation between data model independent
    system frame and data model dependent parts.
  • Execution system described to system frame
    through second-order signature formalism,
    centered around type constructors and operators.
  • Execution system structured into algebra modules.
    Allows development of a library of query
    processing modules.
  • Nice environment for teaching architecture and
    implementation concepts of database systems.
Write a Comment
User Comments (0)
About PowerShow.com