Title: O-O, What Are They Doing
1- O-O, What Are They Doing
- to Relational Databases?
- (The Evolution of DB2 Universal Database)
Michael J. Carey IBM Almaden January 1999
2- Plan for Today's Presentation
- The relational DBMS revolution
- The object-relational DBMS evolution
- O-R features in DB2 Universal Database V5.2
- Some O-R implementation tradeoffs (V5.2)
- What lies ahead for DB2 UDB O-R databases?
- Questions (and possibly answers)
Please ask questions throughout...!
3- The Relational DBMS Revolution
- The pre-relational era (1970's)
- Graph-based data models
- Hierarchical (IMS), network (Codasyl)
- Low-level, navigational interfaces
- Labor-intensive and error-prone
- The relational era (1980's)
- Simple, abstract data model
- Database set of relations ("tables")
- 3 schema levels views, base tables, physical
schema - Algebra of set-oriented operations
- High-level, declarative interfaces
- SQL, QBE, et al
- Embedded languages, 4GLs
4- The Relational Model (in one slide)
- Employees and departments
Department
dno
name
10
Toy
20
Shoe
Employee
eno
name
salary
dept
1
Lou
10000000
10
7
Laura
150000
20
?
22
Mike
80000
20
select E.name, E.salary, D.no from Employee E,
Department D where E.salary lt 100000 and D.name
'Shoe' and E.dept D.dno
5- Relational Databases A Success Story
- The relational model has been a big success
- Simplicity has made research tractable
- Data independence yields productivity gains
- Both academia and industry have benefitted
- Relational DBMS "goodies" include
- Efficient query optimization and execution
- Well-defined transaction semantics and support
- Excellent multi-user performance and robustness
- Views for data independence, authorization
- Constraints, triggers, and stored procedures for
(shared) business rule capture/enforcement - "The" success story for parallel computing
6- We've Achieved Nirvana ... Right?
- The world is becoming increasingly complex
- New data types are appearing (e.g., multimedia)
- Real-world data doesn't fit neatly into tables
- Entities and relationships (vs. tables)
- Variance among entities (vs. homogeneity)
- Set-valued attributes (vs. normalization)
- Advanced applications bring complex data
- E.g., CAD/CAM data management, web data
management, geographic information management,
medical data management, (your favorite
application goes here) - So maybe objects are the answer...?
- Yes, if we can keep all the relational "goodies"!
7- The Object-Relational DBMS Evolution
- O-R extension 1 Abstract data types (ADTs)
- New column types and functions/methods
- E.g., text, image, audio, video, time series,
point, line, OLE... - For modeling new kinds of facts about enterprise
entities - Infrastructure for extenders/datablades/cartridges
- O-R extension 2 Row types
- Types and functions/methods for rows of tables
- Desirable features include references,
inheritance, methods, late binding, and
collection-valued attributes - For modeling enterprise entities with
relationships behavior - Infrastructure for DBMS-native object management
- Recent SQL3 merger Structured types
- Can use for types of columns and/or tables
8- "Not Your Father's Employee Type"
- Beyond name, rank, and serial number
- New attribute types
- Location (2-d point), job description (text),
photo (image), ... - Associated functions
- Distance(point, point), contains(text, string),
... - Beyond your basic employee record
- Employees come in different flavors
- Emp, RSM, Programmer, Manager, Temp, ...
- Employees have many known relationships
- Manager, department, projects, ...
- Employees have behavior
- Age(Emp), qualified(Emp, Job), hire(Emp), ...
An Employee is a simple "business object"
9The OSF Project at IBM Almaden
- OSF stands for "Object Strike Force"
- Semi-autonomous group "outside" UDB development
- Focus object-relational extensions for DB2 UDB
- Both near-term and longer-term interests
- Collaborate with our Toronto and Santa Teresa
labs - Significant activities to date
- Prototyped "row type" support for DB2 UDB
- Delivered in DB2 UDB Version 5.2 (9/98)
- Significantly revised SQL3 draft standard
- Working on next step plus future technologies
10DB2 Universal Database, Version 5
- DB2 for Common Servers (Version 2)
- User-defined column types (UDTs/distinct types)
- User-defined functions (UDFs)
- Binary/character large objects (BLOBs/CLOBs)
- Distinct types new data types for columns
- Ex create distinct type US_Dollar as Real with
comparisons - US_Dollar is an available UDT with functions ,
ltgt, lt, lt, gt, gt, US_Dollar(Real),
Real(US_Dollar) - User-defined functions associated operations
- create function CA_Tax (US_Dollar) returns
US_Dollar external name 'money!US_Dollar'
language C
11DB2 Universal Database, Version 5 (cont.)
- Lots of other interesting features as well, e.g.
- Constraints and triggers
- Recursive queries
- OLAP support (cube and rollup)
- Extenders (based on UDTs/UDFs)
- Wide range of hardware/software platforms
- PCs Windows95, NT, OS/2, SCO
- Unix workstations AIX, Solaris, HP/UX
- Parallel platforms SMPs, MPPs (e.g., SP2)
- Descended from Almaden's Starburst system
- Extensible query compiler (with rule-based query
rewrite and query optimizer components)
12New O-R Features in DB2 UDB V5.2
- Structured types and references
- Named types with attributes, O-O subtyping model
- Ref(T) for directly modelling relationships
- Typed tables and table hierarchies
- Oid (user-provided) plus a column per attribute
of T - Subtables for querying and managing subtype
instances - Query language extensions
- Substitutability for queries/updates (data
independence ) - Path expressions for querying relationships
easily - Functions/predicates for runtime type inquiries
- Object views (via a novel approach)
- Virtual table hierarchies for flexible access
control - Also facilitates O-O views of legacy tables
13- Employee and department tables in the (late) 90's
mgr
dept
14- Structured Types and References
- Create structured types (and references)
create type Person_t as ( name Varchar(40),
birthyear Integer ) create type Emp_t under
Person_t as ( salary Integer, dept
Ref(Dept_t) ) create type Exec_t under Emp_t as
( bonus Integer ) create type Student_t
under Person_t as ( major Varchar(20) )
15- Structured Types and References (cont.)
- Create structured types (cont).
create type Dept_t as ( name Varchar(20),
budget Integer, headcount Integer,
mgr Ref(Emp_t) )
- Okay, so I lied (a little) on the last slide...
alter type Emp_t add attribute dept Ref(Dept_t)
16- Typed Tables and Table Hierarchies
- Now create typed tables (and subtables)
create table person of Person_t (ref is oid
user generated) create table emp of Emp_t under
person (dept with options scope
dept) create table exec of Exec_t under
emp create table student of Student_t under
person create table dept of Dept_t (ref is
oid user generated, mgr with options scope
emp)
17- SQL Query Extensions (by example)
select E. from emp E where E.birthyear gt 1970
and E.salary gt 50000
- Data modification (insert update/delete)
insert into emp values (Emp_t('e100'), 'John
Smith', 1968, 65000, (select oid
from dept where name 'Database')) update
person set birthyear birthyear 1 where name
'John Smith'
select E.name, E.dept-gtname from emp E where
E.dept-gtmgr-gtdept-gtmgr-gtname 'Lou Gerstner'
18- Querying Table Hierarchies An Example
Dept
Person
name
birthyear
oid
P1
Harold
1970
P2
Carol
1958
Emp
name
birthyear
oid
dept
P3
Hamid
1956
_
P4
Lou
1940
1940
19- SQL Query Extensions (cont.)
- Support for type-dependent queries
select from only (emp) E where dept-gtbudget gt
10000000 select name from person P where
deref(oid) is of type (only Emp_t,
Student_t) select type_name(deref(E.oid)),
E. from outer (emp) E where e.oid Emp_t('e13')
20- Other Data Definition Features
- ref is for object id column
- Unique, user-generated (on insert)
- scope clause for reference columns
- Provides critical information to the query
optimizer - not null constraints
- Definable at any level of a table hierarchy
- Enforced for indicated table and its subtables
- unique constraints
- Root level (and columns) only
- create index for physical schema
- Unique or non-unique index on root table
- Non-unique index on subtable
21- Other Data Definition Features (cont.)
- Authorization model for table hierarchies
- grant and revoke on table or subtables
- Substitutability implicit subtable authorization
on columns inherited from an authorized
supertable - Ex 1 select privilege on person table
- Ex 2 update privilege on salary column of emp
table - Some operations require authorization everywhere
- deref function
- is of type predicate and type_xxx functions
- SQL3 also supports granting of table/subtable
privileges with hierarchy option
22- Typed views and view hierarchies
mgr
dept
- Requirements virtual table hierarchies
- Typed rows with (derived) object ids
- Views may be quite different from base data
- Support for interconnected "view schemas"
23- Create types for use in views
create type VPerson_t as ( name
Varchar(40) ) create type VEmp_t under
VPerson_t as ( dept Ref(VDept_t) ) create
type VStudent_t under VPerson_t as ( kind
Varchar(8) ) create type VDept_t as ( name
Varchar(20), mgr Ref(VEmp_t) )
24- Now create typed views (and subviews)
create view vperson of VPerson_t (ref is oid user
generated) as select VPerson_t(Varchar(oid)),
name from only (person) create view vemp of
VEmp_t under vperson (dept with options scope
vdept) as select VEmp_t(Varchar(oid)), name,
VDept_t(Varchar(dept)) from emp where salary
gt 0 create view vstudent of VStudent_t under
vperson as select VStudent_t(Varchar(oid)),
name, case when major like
'Engineer' then 'Geek'
else 'non-Geek' end from student create
view vdept of VDept_t ...
25O-R Implementation Issues/Tradeoffs
- Some guiding principles for DB2 UDB V5.2
- Performance must equal/exceed relational
equivalents - Design amenable to future plans w.r.t. type
evolution - Structured types must be supported in columns
(someday) - Localize initial changes to query compiler where
possible - Want "free" indexing, rewrites, optimization,
parallelization, ... - Influenced by discussions with a CAD/CAM vendor
- Information on existing approach and
installations - Requirements for efficiency of new products
- Let's look briefly at two areas
- Table hierarchy representation
- References and path query processing
26Implementing Table Hierarchies
- Implementation table approach
- One physical table per table hierarchy with
- Type tag column (to distinguish subtable rows)
- Object id column
- Columns for all columns of the root table and its
subtables - Vertical partitioning approach
- One physical root table with
- Type tag column
- Object id column
- Columns for each root table column
- N physical delta tables (one per subtable) with
- Object id column
- Columns for each column introduced by this
subtable
27Implementing Table Hierarchies (cont.)
- Horizontal partitioning approach
- N separate physical tables with
- Object id column
- Columns for every subtable column (inherited or
not) - So what did we do for UDB V5.2...?
- Vertical partitioning approach rejected quickly
- Too many joins to materialize subtable rows
- Multi-column constraints and indices problematic
- Horizontal partitioning approach rejected
eventually - Uniqueness issue for user-generated object ids
- Query complexity for multi-hierarchy join queries
- Ex select p.name, q.name from Person p, Person
q where ... - Implementation table approach taken for V5.2
- Appeared to give us the most "free" functionality
- Adopted despite row size (null columns) downside
28References and Path Expressions
- Reference values in tables should have a scope
- "Other end" info for query rewrite and join
optimization - Ditto for authorization checking (static vs.
dynamic) - Schema makes overly wide references unnecessary
- Uniqueness is hierarchy-relative, enforced with
an index - V5.2 self-references (object ids) are
user-generated - CAD/CAM vendor had "legacy references" in files
- Different users have different id generation
schemes - Loading cyclic data (e.g., emplt-gt dept) is messy
and slow - Ditto for creating objects from an object cache
29References and Path Expressions (cont.)
- Path expressions are logically equivalent to
subqueries
select E.name, E.dept-gtname, E.dept-gtmgr-gtname fr
om emp E where E.dept-gtheadcount gt 10
- Actual approach shared subquery generation (QGM)
- Compute common paths (prefixes) once to save work
- Not every SQL context accepts an actual subquery
- Also need to handle non-serializable locking
levels - Efficiency obtained through query rewrite, e.g.
- Subquery to outer-join transformation
- Outer-join to join transformation where possible
30Where We Are Today in UDB
- V5.2 of UDB contains new O-R features
- Structured types with inheritance
- Object tables and table hierarchies
- References and path expressions
- Object views and view hierarchies
- Moreover, so does the SQL3 standard
- Includes object views and user-defined references
- IBM, Oracle, Informix heading in same general
direction - Work continuing on O-R extensions
- Let's have a brief look...
31Additional Object Table Support
- Business rule mechanisms for typed tables
- Check constraints on tables/subtables
(w/inheritance) - Referential integrity constraints to and from
tables/subtables - Triggers on tables/subtables
- Object modeling and management extensions
- User-defined reference types (ref using)
- More flexible object view definitions
- Type and instance (i.e., row) evolution
- Structured types for attributes/columns
- Work in progress at IBM Santa Teresa Lab
- Functions/methods just around the corner as well
32Other Exploratory O-R Work
- Efficient support for collection types
- Multivalued attributes (e.g., Project.team)
- Flavors set, multiset, array, list, ...
- Need to integrate into SQL, support querying well
- Some experience from a first prototype
- Other activities (and open problems)
- Java mappings bindings for O-R data
- XML data-centric web sites ("d-commerce")
- Business object servers (caching/consistency)
- Heterogeneous data O-R database systems
- User-defined and/or external indexing
- Optimizer "hooks" for new data types
- Etc.!
33Partial List of UDB O-R Contributors
- Almaden Research Center
- Mike Carey, Don Chamberlin, Srinivasa Narayanan,
Bennet Vance C.M. Park Guido Moerkotte - Santa Teresa Lab
- Nelson Mattos
- Gene Fuh, Michelle Jou, Brian Tran
- Toronto Lab
- Doug Doole, Serge Rielau, Rick Swagerman
- Leo Lao, Walid Rjaibi, Calisto Zuzarte
- Cheryl Greene, various other consultants/hecklers
- And as for future versions of UDB
- Your name could appear here! (MS/PhD)
34 35- What About Object-Oriented DBMSs?
- OOPL DBMS OO-DBMS
- Commonly based on C or Smalltalk
- Persistence, collections, queries, versions, ...
- Lots of interesting and useful research results
- O-O data models and query languages
- O-O query processing, system architecture,
performance - Various products (O2, Objectstore, Versant,
Objectivity, ...) - No widespread commercial acceptance
- Many differences across systems (despite ODMG-93)
- Never really caught up to RDBMS techology
- Schema compilation, evolution painful
- Missing many of the relational "goodies"
- Single-language focus, lack of (relational) tools
36- Stonebraker Fellow Criteria (found on web)
- Industrial database researcher
- PhD from UC Berkeley
- Must agree with the following motto
- Databases are the answer...!
- What was the question again...?
- At least 6' tall
- Had a PhD thesis advisor with first name Mike
- Produced a PhD student with first name Mike