O-O, What Are They Doing - PowerPoint PPT Presentation

About This Presentation

Title:

O-O, What Are They Doing

Description:

Object views (via a novel approach) Virtual table hierarchies for flexible access control ... create table emp of Emp_t under person (dept with options scope dept) ... – PowerPoint PPT presentation

Number of Views:109

Avg rating:3.0/5.0

Slides: 37

Provided by: MikeC159

Learn more at: https://dsf.berkeley.edu

Category:

more less

Transcript and Presenter's Notes

Title: O-O, What Are They Doing

1

O-O, What Are They Doing
to Relational Databases?
(The Evolution of DB2 Universal Database)

Michael J. Carey IBM Almaden January 1999
2

Plan for Today's Presentation

The relational DBMS revolution
The object-relational DBMS evolution
O-R features in DB2 Universal Database V5.2
Some O-R implementation tradeoffs (V5.2)
What lies ahead for DB2 UDB O-R databases?
Questions (and possibly answers)

Please ask questions throughout...!
3

The Relational DBMS Revolution

The pre-relational era (1970's)
Graph-based data models
Hierarchical (IMS), network (Codasyl)
Low-level, navigational interfaces
Labor-intensive and error-prone
The relational era (1980's)
Simple, abstract data model
Database set of relations ("tables")
3 schema levels views, base tables, physical
schema
Algebra of set-oriented operations
High-level, declarative interfaces
SQL, QBE, et al
Embedded languages, 4GLs

The Relational Model (in one slide)

Employees and departments

Department
dno
name

10
Toy

20
Shoe
Employee
eno
name
salary
dept

1
Lou
10000000
10

7
Laura
150000
20
?

22
Mike
80000
20
select E.name, E.salary, D.no from Employee E,
Department D where E.salary lt 100000 and D.name
'Shoe' and E.dept D.dno
5

Relational Databases A Success Story

The relational model has been a big success
Simplicity has made research tractable
Data independence yields productivity gains
Both academia and industry have benefitted
Relational DBMS "goodies" include
Efficient query optimization and execution
Well-defined transaction semantics and support
Excellent multi-user performance and robustness
Views for data independence, authorization
Constraints, triggers, and stored procedures for
(shared) business rule capture/enforcement
"The" success story for parallel computing

We've Achieved Nirvana ... Right?

The world is becoming increasingly complex
New data types are appearing (e.g., multimedia)
Real-world data doesn't fit neatly into tables
Entities and relationships (vs. tables)
Variance among entities (vs. homogeneity)
Set-valued attributes (vs. normalization)
Advanced applications bring complex data
E.g., CAD/CAM data management, web data
management, geographic information management,
medical data management, (your favorite
application goes here)
So maybe objects are the answer...?
Yes, if we can keep all the relational "goodies"!

The Object-Relational DBMS Evolution

O-R extension 1 Abstract data types (ADTs)
New column types and functions/methods
E.g., text, image, audio, video, time series,
point, line, OLE...
For modeling new kinds of facts about enterprise
entities
Infrastructure for extenders/datablades/cartridges
O-R extension 2 Row types
Types and functions/methods for rows of tables
Desirable features include references,
inheritance, methods, late binding, and
collection-valued attributes
For modeling enterprise entities with
relationships behavior
Infrastructure for DBMS-native object management
Recent SQL3 merger Structured types
Can use for types of columns and/or tables

"Not Your Father's Employee Type"

Beyond name, rank, and serial number
New attribute types
Location (2-d point), job description (text),
photo (image), ...
Associated functions
Distance(point, point), contains(text, string),
...
Beyond your basic employee record
Employees come in different flavors
Emp, RSM, Programmer, Manager, Temp, ...
Employees have many known relationships
Manager, department, projects, ...
Employees have behavior
Age(Emp), qualified(Emp, Job), hire(Emp), ...

An Employee is a simple "business object"
9
The OSF Project at IBM Almaden

OSF stands for "Object Strike Force"
Semi-autonomous group "outside" UDB development
Focus object-relational extensions for DB2 UDB
Both near-term and longer-term interests
Collaborate with our Toronto and Santa Teresa
labs
Significant activities to date
Prototyped "row type" support for DB2 UDB
Delivered in DB2 UDB Version 5.2 (9/98)
Significantly revised SQL3 draft standard
Working on next step plus future technologies

10
DB2 Universal Database, Version 5

DB2 for Common Servers (Version 2)
User-defined column types (UDTs/distinct types)
User-defined functions (UDFs)
Binary/character large objects (BLOBs/CLOBs)
Distinct types new data types for columns
Ex create distinct type US_Dollar as Real with
comparisons
US_Dollar is an available UDT with functions ,
ltgt, lt, lt, gt, gt, US_Dollar(Real),
Real(US_Dollar)
User-defined functions associated operations
create function CA_Tax (US_Dollar) returns
US_Dollar external name 'money!US_Dollar'
language C

11
DB2 Universal Database, Version 5 (cont.)

Lots of other interesting features as well, e.g.
Constraints and triggers
Recursive queries
OLAP support (cube and rollup)
Extenders (based on UDTs/UDFs)
Wide range of hardware/software platforms
PCs Windows95, NT, OS/2, SCO
Unix workstations AIX, Solaris, HP/UX
Parallel platforms SMPs, MPPs (e.g., SP2)
Descended from Almaden's Starburst system
Extensible query compiler (with rule-based query
rewrite and query optimizer components)

12
New O-R Features in DB2 UDB V5.2

Structured types and references
Named types with attributes, O-O subtyping model
Ref(T) for directly modelling relationships
Typed tables and table hierarchies
Oid (user-provided) plus a column per attribute
of T
Subtables for querying and managing subtype
instances
Query language extensions
Substitutability for queries/updates (data
independence )
Path expressions for querying relationships
easily
Functions/predicates for runtime type inquiries
Object views (via a novel approach)
Virtual table hierarchies for flexible access
control
Also facilitates O-O views of legacy tables

A Simple Example

Employee and department tables in the (late) 90's

mgr
dept
14

Structured Types and References

Create structured types (and references)

create type Person_t as ( name Varchar(40),
birthyear Integer ) create type Emp_t under
Person_t as ( salary Integer, dept
Ref(Dept_t) ) create type Exec_t under Emp_t as
( bonus Integer ) create type Student_t
under Person_t as ( major Varchar(20) )
15

Structured Types and References (cont.)

Create structured types (cont).

create type Dept_t as ( name Varchar(20),
budget Integer, headcount Integer,
mgr Ref(Emp_t) )

Okay, so I lied (a little) on the last slide...

alter type Emp_t add attribute dept Ref(Dept_t)
16

Typed Tables and Table Hierarchies

Now create typed tables (and subtables)

create table person of Person_t (ref is oid
user generated) create table emp of Emp_t under
person (dept with options scope
dept) create table exec of Exec_t under
emp create table student of Student_t under
person create table dept of Dept_t (ref is
oid user generated, mgr with options scope
emp)
17

SQL Query Extensions (by example)

Substitutability

select E. from emp E where E.birthyear gt 1970
and E.salary gt 50000

Data modification (insert update/delete)

insert into emp values (Emp_t('e100'), 'John
Smith', 1968, 65000, (select oid
from dept where name 'Database')) update
person set birthyear birthyear 1 where name
'John Smith'

Path expressions

select E.name, E.dept-gtname from emp E where
E.dept-gtmgr-gtdept-gtmgr-gtname 'Lou Gerstner'
18

Querying Table Hierarchies An Example

Dept
Person
name
birthyear
oid
P1
Harold
1970
P2
Carol
1958
Emp
name
birthyear
oid
dept
P3
Hamid
1956
_
P4
Lou
1940
1940
19

SQL Query Extensions (cont.)

Support for type-dependent queries

select from only (emp) E where dept-gtbudget gt
10000000 select name from person P where
deref(oid) is of type (only Emp_t,
Student_t) select type_name(deref(E.oid)),
E. from outer (emp) E where e.oid Emp_t('e13')
20

Other Data Definition Features

ref is for object id column
Unique, user-generated (on insert)
scope clause for reference columns
Provides critical information to the query
optimizer
not null constraints
Definable at any level of a table hierarchy
Enforced for indicated table and its subtables
unique constraints
Root level (and columns) only
create index for physical schema
Unique or non-unique index on root table
Non-unique index on subtable

Other Data Definition Features (cont.)

Authorization model for table hierarchies
grant and revoke on table or subtables
Substitutability implicit subtable authorization
on columns inherited from an authorized
supertable
Ex 1 select privilege on person table
Ex 2 update privilege on salary column of emp
table
Some operations require authorization everywhere
deref function
is of type predicate and type_xxx functions
SQL3 also supports granting of table/subtable
privileges with hierarchy option

Object Views in DB2 UDB

Typed views and view hierarchies

mgr
dept

Requirements virtual table hierarchies
Typed rows with (derived) object ids
Views may be quite different from base data
Support for interconnected "view schemas"

Types For Object Views

Create types for use in views

create type VPerson_t as ( name
Varchar(40) ) create type VEmp_t under
VPerson_t as ( dept Ref(VDept_t) ) create
type VStudent_t under VPerson_t as ( kind
Varchar(8) ) create type VDept_t as ( name
Varchar(20), mgr Ref(VEmp_t) )
24

Typed View Hierarchies

Now create typed views (and subviews)

create view vperson of VPerson_t (ref is oid user
generated) as select VPerson_t(Varchar(oid)),
name from only (person) create view vemp of
VEmp_t under vperson (dept with options scope
vdept) as select VEmp_t(Varchar(oid)), name,
VDept_t(Varchar(dept)) from emp where salary
gt 0 create view vstudent of VStudent_t under
vperson as select VStudent_t(Varchar(oid)),
name, case when major like
'Engineer' then 'Geek'
else 'non-Geek' end from student create
view vdept of VDept_t ...
25
O-R Implementation Issues/Tradeoffs

Some guiding principles for DB2 UDB V5.2
Performance must equal/exceed relational
equivalents
Design amenable to future plans w.r.t. type
evolution
Structured types must be supported in columns
(someday)
Localize initial changes to query compiler where
possible
Want "free" indexing, rewrites, optimization,
parallelization, ...
Influenced by discussions with a CAD/CAM vendor
Information on existing approach and
installations
Requirements for efficiency of new products
Let's look briefly at two areas
Table hierarchy representation
References and path query processing

26
Implementing Table Hierarchies

Implementation table approach
One physical table per table hierarchy with
Type tag column (to distinguish subtable rows)
Object id column
Columns for all columns of the root table and its
subtables
Vertical partitioning approach
One physical root table with
Type tag column
Object id column
Columns for each root table column
N physical delta tables (one per subtable) with
Object id column
Columns for each column introduced by this
subtable

27
Implementing Table Hierarchies (cont.)

Horizontal partitioning approach
N separate physical tables with
Object id column
Columns for every subtable column (inherited or
not)
So what did we do for UDB V5.2...?
Vertical partitioning approach rejected quickly
Too many joins to materialize subtable rows
Multi-column constraints and indices problematic
Horizontal partitioning approach rejected
eventually
Uniqueness issue for user-generated object ids
Query complexity for multi-hierarchy join queries
Ex select p.name, q.name from Person p, Person
q where ...
Implementation table approach taken for V5.2
Appeared to give us the most "free" functionality
Adopted despite row size (null columns) downside

28
References and Path Expressions

Reference values in tables should have a scope
"Other end" info for query rewrite and join
optimization
Ditto for authorization checking (static vs.
dynamic)
Schema makes overly wide references unnecessary
Uniqueness is hierarchy-relative, enforced with
an index
V5.2 self-references (object ids) are
user-generated
CAD/CAM vendor had "legacy references" in files
Different users have different id generation
schemes
Loading cyclic data (e.g., emplt-gt dept) is messy
and slow
Ditto for creating objects from an object cache

29
References and Path Expressions (cont.)

Path expressions are logically equivalent to
subqueries

select E.name, E.dept-gtname, E.dept-gtmgr-gtname fr
om emp E where E.dept-gtheadcount gt 10

Actual approach shared subquery generation (QGM)
Compute common paths (prefixes) once to save work
Not every SQL context accepts an actual subquery
Also need to handle non-serializable locking
levels
Efficiency obtained through query rewrite, e.g.
Subquery to outer-join transformation
Outer-join to join transformation where possible

30
Where We Are Today in UDB

V5.2 of UDB contains new O-R features
Structured types with inheritance
Object tables and table hierarchies
References and path expressions
Object views and view hierarchies
Moreover, so does the SQL3 standard
Includes object views and user-defined references
IBM, Oracle, Informix heading in same general
direction
Work continuing on O-R extensions
Let's have a brief look...

31
Additional Object Table Support

Business rule mechanisms for typed tables
Check constraints on tables/subtables
(w/inheritance)
Referential integrity constraints to and from
tables/subtables
Triggers on tables/subtables
Object modeling and management extensions
User-defined reference types (ref using)
More flexible object view definitions
Type and instance (i.e., row) evolution
Structured types for attributes/columns
Work in progress at IBM Santa Teresa Lab
Functions/methods just around the corner as well

32
Other Exploratory O-R Work

Efficient support for collection types
Multivalued attributes (e.g., Project.team)
Flavors set, multiset, array, list, ...
Need to integrate into SQL, support querying well
Some experience from a first prototype
Other activities (and open problems)
Java mappings bindings for O-R data
XML data-centric web sites ("d-commerce")
Business object servers (caching/consistency)
Heterogeneous data O-R database systems
User-defined and/or external indexing
Optimizer "hooks" for new data types
Etc.!

33
Partial List of UDB O-R Contributors

Almaden Research Center
Mike Carey, Don Chamberlin, Srinivasa Narayanan,
Bennet Vance C.M. Park Guido Moerkotte
Santa Teresa Lab
Nelson Mattos
Gene Fuh, Michelle Jou, Brian Tran
Toronto Lab
Doug Doole, Serge Rielau, Rick Swagerman
Leo Lao, Walid Rjaibi, Calisto Zuzarte
Cheryl Greene, various other consultants/hecklers
And as for future versions of UDB
Your name could appear here! (MS/PhD)