Title: An Extensible Query Browser
1SECONDO An Extensible DBMS Platform for Research
Prototyping and Teaching
Ralf Hartmut Güting Fernuniversität Hagen,
Germany
2Introduction
- Suppose you had invented a new DBMS data model
and wished to implement it, how would you proceed
? - Our goals
- Support the implementation of database systems
(prototypes) with new kinds of data models, e.g.
graph-oriented, sequence-oriented, temporal,
semi-structured models, XML ... - Design a clean, generic, extensible system
architecture under which a wide range of data
models can be implemented. - Make the architecture modular, with well defined
interfaces for modules, so that certain classes
of components can be developed and exchanged
easily (like e.g. a graphics card or a processor
in a PC).
3Introduction
- Extensible DBMS. Two major directions
- Fixed object model with ADT extensibility
- e.g. relational or object-relational. Extensible
by atomic data types and their operations.
Extensibility to support this, e.g. by - data structures for data types
- implementation of operations
- index structures
- join methods
- optimization rules
- cost functions
- Examples Starburst, POSTGRES, Gral, Informix
Universal Server - Well-structured with clear interfaces
- But object model is fixed.
4Introduction - Extensible DBMS
- Toolkits
- No fixed data model any more. Set of tools such
as storage manager, optimizer generator is
provided. - Examples GENESIS, EXODUS, DASDBS, SHORE,
Volcano, PREDATOR - Great flexibility, arbitrary data models.
- But little structure. Very hard for implementor.
- SECONDO
- No fixed data model, but a fixed meta-model for
describing data models. - Great flexibility, arbitrary data models.
- Well structured, with clear interfaces.
5Introduction
- The SECONDO vision
- Develop a database system (frame) without a data
model. - Describe the data model and query language to the
system in a high-level specification language. - Describe a query processing subsystem to the
(frame) system in the same language. - Describe also optimization rules in this
language. - Design the query processing subsystem as a
collection of algebras, and implement each
algebra in an algebra module. - Design clean, simple, easy-to-use interfaces and
tools for the implementation of algebra modules,
so that - research groups anywhere can independently
encapsulate and make available their latest
research results (prototypes) in the form of
algebra modules, and so - contribute to an open library of query
processing modules.
6Concepts Second-Order Signature
- A formalism to describe
- a descriptive algebra, defining a data model and
query language, - an executable algebra, specifying a collection of
data structures and operations capable of
representing the data model and implementing the
query language, - rules to enable a query optimizer to map
descriptive algebra terms to executable algebra
terms (query plans). -
7Concepts Second-Order Signature
- Basic idea Use two coupled signatures. The first
signature describes a type system, the second an
algebra over the types generated by the first
signature. - Example
- DATA int, real, bool
- DATA SET set
- Terms of the first signature ( types of the type
system) - int, real, bool, set(int), set(real), set(bool)
- are sorts of the second.
- data in DATA.
- data data bool
8Concepts Second-Order Signature
- Specifying a Descriptive Algebra
- kinds IDENT, DATA, TUPLE, REL
- type constructors
- DATA int, real, bool, string
- (IDENT DATA) TUPLE tuple
- TUPLE REL rel
- Example term ( type, schema)
- rel(tuple((name, string), (age, int))
9Concepts Second-Order Signature Specifying a
Descriptive Algebra
- operators
- data in DATA.
- data data bool _ _
- rel rel(tuple) in REL.
- rel (tuple bool) rel select _ _
- tuple tuple(list) in TUPLE, attrname in IDENT,
- member(attrname, attrtype, list).
- tuple attrname attrtype attr ( _, _)
- A query
- people selectfun (p person) attr(p, age) gt 20
10Concepts Second-Order Signature
- Specifying an Executable Algebra
- kinds IDENT, DATA, TUPLE, RELREP
- type constructors
- DATA int, real, bool, string
- (IDENT DATA) TUPLE tuple
- TUPLE RELREP srel, relrep
11Concepts Second-Order Signature Specifying an
Executable Algebra
- operators
- ...
- tuple in TUPLE.
- relrep(tuple) stream(tuple) feed _
- stream(tuple) (tuple bool)
stream(tuple) filter _ _ - stream(tuple) srel(tuple) consume _
- A query plan
- people feed filterfun (p person) attr(p, age)
gt 20 consume
12Concepts Second-Order Signature
- Commands
- A database is a pair (T, O) where T is a finite
set of named types, and O a finite set of named
objects. Six basic commands to manipulate a
database - type ltidentifiergt lttype expressiongt
- delete type ltidentifiergt
- create ltidentifiergt lttype expressiongt
- let ltidentifiergt ltvalue expressiongt
- update ltidentifiergt ltvalue expressiongt
- delete ltidentifiergt
- query ltvalue expressiongt
- Example
- type city_rel srel(tuple((name, string), (pop,
int), (country, string))) - create cities city_rel update cities ...
- query cities feed filterattr(., pop) gt 1000000
consume
13SECONDO System Overview
- Three major components
- Secondo Kernel
- implements specific data models
- extensible by algebra modules
- provides query processing over the implemented
algebras - implemented on top of BerkeleyDB storage manager
- written in C
- Optimizer
- core capability conjunctive query optimization
- currently supports a relational model with an
SQL-like language - written in PROLOG
- GUI
- extensible interface for an extensible DBMS like
Secondo - extensible by viewers
- sophisticated spatial / spatio-temporal viewer,
extensible by data types - written in Java
14SECONDO System Overview
- Components work together
- GUI sends executable query (query plan) to the
kernel, displays result - GUI sends query to optimizer, receives plan,
sends plan to kernel, displays result - optimizer sends commands and executable queries
to kernel to get information about DB objects,
e.g. selectivities
15The SECONDO Kernel
- Further Commands and Examples
- Further commands for
- creating, deleting databases
- create database geo
- inquiring about algebras, type constructors,
operations, as well as types and objects in a
database - list type constructors
- list algebras
- importing, exporting objects and databases
- save rivers to rivers
- restore database geo from geo
- transaction management
- commit transaction
16The SECONDO Kernel - Further Commands and
Examples
- Some example commands and queries
- create x int
- update x 7
- let inc fun(nint) n 1
- query secondo contains second
-
17The SECONDO Kernel - Further Commands and Examples
- Next example uses objects
- Kreis rel(tuple(KName string, ..., Gebiet
region)) - kreis_Gebiet rtree(tuple(KName string, ...,
Gebiet region)) - magdeburg region
- The following query finds neighbour counties of
magdeburg - query
- kreis_Gebiet Kreis windowintersectsbbox(magdebur
g) - filter.Gebiet touches magdeburg
- filternot(.KName contains "Magdeburg")
- projectKName
- consume
- rel(Tuple) x rtree(Tuple) x rectangle -gt
stream(Tuple) windowintersects _ _ _ - region -gt rectangle bbox ( _ )
- region x region -gt bool touches _ _
-
18- (E) Secondo gt query kreis_Gebiet Kreis
windowintersectsbbox(magdeburg) filter.Gebiet
touches magdeburg filternot(.KName contains
"Magdeburg") projectKName consume - (E) Secondo -gt
- (query
- (consume
- (project
- (filter
- (filter
- (windowintersects
kreis_Gebiet Kreis - (bbox magdeburg))
- (fun
- (tuple1 TUPLE)
- (touches
- (attr tuple1 Gebiet)
- magdeburg)))
- (fun
- (tuple2 TUPLE)
- (not
- (contains
19- Analyze query ...
- 112711 -gt elapsed time 000 minutes. Used CPU
Time 0.17 seconds. - Execute ...
- 112711 -gt elapsed time 000 minutes. Used CPU
Time 0.22 seconds. - KName LK Schönebeck
- KName LK Bördekreis
- KName LK Ohre-Kreis
- KName LK Jerichower Land
- (E) Secondo gt
20The SECONDO Kernel
Multi-Threaded Server
Single-Threaded User Interface
DDL Query Language
Command Manager
Query Compiler
Descriptive Algebra
Optimizer
Rules
Executable Algebra
Query Processor Catalog
Alg1
Alg2
Alg3
Algn
Tools
Operating System Storage Manager
21The SECONDO Kernel System Architecture
Nested Lists
SecondoSMI
Tuple Manager
SOS Parser
DBArrays, FLOBs, LOBS
22The SECONDO Kernel System Architecture
- Structure of Algebra Modules
- Each algebra module offers some type constructors
and some operators. The module contains
Type constructor
Operator
TransformType Select Evaluate
TypeCheck
Create/Delete In/Out Open/Close Save/Clone
Representation DS
23The SECONDO Kernel System Architecture
- Some currently available algebra modules
24The Secondo Kernel Query Processing
- Representation of Types, Queries, Values
- Nested lists are used extensively in the system,
to represent - type expressions
- value expressions (queries, query plans)
- constants in queries
- values of data types in the external
representation (for exchange with other
applications) - Type expressions
- rel(tuple((name, string), (pop, int), (country,
string)) - (rel (tuple ((name string) (pop int) (country
string)))) - Queries
- cities feed filterfun (c city) attr(c, pop) gt
1000000 consume - (consume (filter (feed cities) (fun (c city) (gt
(attr c pop) 1000000))))
25The Secondo Kernel Query Processing
Representation of Types, Queries, Values
- Values in external representation (e.g. list
representation of a polygon) - ((1.0 3.8) (4.0 3.8) (2.5 6.0))
- Generic constants in queries (lttype expressiongt
ltvalue listgt) - (polygon ((1.0 3.8) (4.0 3.8) (2.5 6.0)))
- Values in internal representation a single word
of storage.
26The Secondo Kernel Query Processing
- Processing Type Expressions Kind Checking
- Process commands
- type ltidentifiergt lttype expressiongt
- create ltidentifiergt lttype expressiongt
- Check whether type constructors are applied
correctly in the type expression. - (type city_rel
- (rel (tuple ((name string) (pop int) (country
string)))) - )
- DATA int, real, bool, string
- (IDENT DATA) TUPLE tuple
- TUPLE REL rel
- Use TypeCheck function of each type constructor.
System frame provides a function CheckKind for
each kind that can be used to check
quantification over kinds.
27The Secondo Kernel Query Processing
- Processing Value Expressions Type Checking and
Evaluation - Process commands
- update ltidentifiergt ltvalue expressiongt
- let ltidentifiergt ltvalue expressiongt
- query ltvalue expressiongt
- Three steps
- Annotating the Query
- Building the Operator Tree
- Evaluation
28The Secondo Kernel Query Processing
Processing Value Expressions
- Annotating the Query
- (consume (filter (feed cities) (fun (c city) (gt
(attr c pop) 1000000)))) - Process nested list tree bottom-up, annotating
each node with its type and other information.
Includes type checking. - For each atom or sublist s, return a list of the
form - ((s class ...) type)
- e.g.
- ((feed operator 2 1) none)
- When an application of an operator to a list of
arguments - (op arg1 ... argn)
- is recognized, the operators TransformType
function is called with the types of arg1, ...,
argn to check whether argument types are correct
and to return the result type.
29The Secondo Kernel Query Processing
Processing Value Expressions
- Building the Operator Tree
- (consume (filter (feed cities) (fun (c city) (gt
(attr c pop) 1000000))))
30The Secondo Kernel Query Processing
Processing Value Expressions
- Evaluation
- function eval (t node) WORD
- input a node t of the operator tree
- output the value of the subtree rooted in t
- method
- if t is an object or indirect object then lookup
the value and return it - else t is an operator node
- for each subtree ti of t do
- if the root of ti is marked as function (F) or
stream (S) - then argi ti
- else argi eval(ti)
- end
- end
- call the operators evaluation function with
argument vector arg and return its result - end
- end eval
-
31The Secondo System Query Processing
Processing Value Expressions
- Evaluation
- Operator evaluation functions all have the same
generic interface. They call query processor
primitives to evaluate arguments (subtrees) that
are functions or streams - getArguments
- request
- open
- request
- close
- received
- Evaluation functions for stream operators return
special values YIELD or CANCEL. -
to evaluate a parameter function subtree
32The Optimizer
- Performs conjunctive query optimization given a
set of relations and a set of selection or join
predicates, find a good plan. Uses a new
algorithm for this. - Based on shortest path search in a predicate
order graph. -
10
1/2
1/10
1/5
r
p
p
q
r
q
20
50
100
500
r
p
1000
50
10
q
q
p
r
500
200
100
q
p
r
- Selectivities of predicates are determined in
advance by evaluating selections and joins on
small sample relations. Selectivities once
determined are stored for future use. - Optimizer implements an SQL-like language in a
notation adapted to PROLOG. -
33The Optimizer
- Optimization algorithm
- For given relations and predicates, construct the
POG. - For each edge, construct plan edges. Controlled
by optimization rules for selections and joins. - For given sizes of arguments and selectivities of
predicates, assign sizes to all nodes
(intermediate results). Also annotate edges of
POG with selectivities. - For each plan edge, compute its cost. Based on
sizes of arguments, selectivity along the edge,
and cost function for each operator occurring in
a plan edge. - Use algorithm of Dijkstra to find a shortest path
from bottom to top node through the graph of plan
edges. This is the plan.
34The Optimizer Sample Query
- 2 ?- sql select count() from orte as o, plz as
p1, plz as p2 where oort p1ort, p2plz
p1plz 7, (p2plz mod 5) 0, p1plz gt 30000,
oort contains "o". - selectivity 0.000560748
- selectivity 1.6377e-005
- selectivity 0.700935
- selectivity 0.31
- Destination node 31 reached at iteration 6
- Height of search tree for boundary is 4
- The best plan is
- Orte feed o filter(.Ort_o contains "o")
loopjoinplz_Ort plz exactmatch.Ort_o p1
filter(.PLZ_p1 gt 30000) loopjoinplz_PLZ plz
exactmatch(.PLZ_p1 7) p2
filter((.PLZ_p2 mod 5) 0) count - Estimated Cost 66818.7
- Command succeeded, result
- 273
- Yes
35The Graphical User Interface
- extensible interface for an extensible DBMS like
Secondo - extensible by viewers
- sophisticated spatial / spatio-temporal viewer,
extensible by data types
36The Graphical User Interface
37Teaching
- Nice environment for teaching concepts of
database systems - clean architecture
- attractive mix of known concepts and
implementation techniques and novel features for
extensibility - source code accessible, in essential parts well
documented - user manual, programmers guide available
- can be used for bachelor, master theses
- can be used for student group projects
38Teaching
- A Student Project on Extensible Database
Systems - Duration one term, currently 28 students (7
groups of 4 students) - Stage 1 (6 weeks) Solve individually a number of
extension exercises - Write a small algebra containing data types for
point, line segment and triangle with a few
operations. Also implement some stream
operations. - Add a few simple operations to the relational
algebra, e.g. duplicate removal by hashing. - Learn to manage large objects by implementing a
polygon type. - Make data types for point, segment, and triangle
available as attribute types for relations. - An exercise with the storage management
interface. - Write extensions of the optimizer, for example,
rules to use an R-tree and a loopjoin. (Optional
for students with knowledge of PROLOG. Required
are programming capabilities in C and Java.) - Extend the GUI by writing a simple viewer. Also
extend the Hoese-Viewer by display classes for
point, segment and triangle
39Teaching - A Student Project on Extensible
Database Systems
- Stage 2 (6 weeks) As a group, build a more
significant extension to SECONDO. - Examples
- algebra for midi music
- algebra for jpeg images
- algebra for mp3 music
40Summary
- Clean architecture to implement database systems
with new data models. Allows extensions to DBMS
object model (rather than only new attribute
types). - Clear separation between data model independent
system frame and data model dependent parts. - Execution system described to system frame
through second-order signature formalism,
centered around type constructors and operators. - Execution system structured into algebra modules.
Allows development of a library of query
processing modules. - Nice environment for teaching architecture and
implementation concepts of database systems.