Title: Object-Relational Database Systems:
1- Object-Relational Database Systems
- Evolution Beats Revolution
Michael J. Carey IBM Almaden Research Center
Smalltalk
Java
New Data Types
Queries
Navi- gation
Appl. Dev. Tools
SQL
C
2- The relational DBMS revolution
- Relational model and query language
- Why relational succeeded
- Why relational isn't enough, and some options
- The object-oriented DBMS revolution
- Object-oriented model(s) and query language(s)
- Why object-oriented "failed"
- Why wrappers will fail as well
- The object-relational DBMS evolution
- The object-relational model and query language
- Current products and examples
- Performance and other challenges
3- The Relational DBMS Revolution
- The pre-relational era (1970's)
- Graph-based data models
- Hierarchical model (e.g., IMS)
- Network model (e.g., Codasyl)
- Low-level, navigational interfaces
- Labor-intensive and error-prone
- The relational era (1980's)
- Simple, abstract data model
- Database set of relations ("tables")
- 3 schema levels views, base tables, physical
schema - Algebra of set-oriented operations
- High-level, declarative interfaces
- SQL, Quel, QBE
- Embedded languages, 4GLs
4- The Relational Model (by example)
- Employees and departments
Department
dno
name
10
Toy
20
Shoe
Employee
eno
name
salary
dept
1
Lou
10000000
10
7
Laura
150000
20
22
Mike
80000
20
?
select E.name, E.salary, D.no from Employee E,
Department D where E.salary lt 100000 and D.name
'Shoe' and E.dept D.dno
5- Relational DBMS "Goodies"
- Relational query processing
- Queries range over tables and/or views
- Programmers use a declarative language (SQL)
- Query optimizer picks the lowest-cost query plan
- Alternative access paths, join orders, join
methods, and so on (based on indices and database
characteristics) - Result data independence
- Support for (shared) business logic
- Integrity constraints
- Check constraints, referential integrity
constraints - Triggers, stored procedures, views, authorization
- Performance and robustness
- Buffering, locking, crash recovery, replication,
...
6- We've Achieved Nirvana ... Right?
- Relations are surely the answer!
- Simple, high-level model for programmers
- Easy to distribute data and parallelize queries
- But what was the question?
- Sometimes difficult to model "real world" data
- Entities and relationships (versus tables)
- Variance among entities (versus homogeneity)
- Set-valued attributes (versus normalization)
- Demanding new database applications
- New applications bring new data types
- Complex objects are problematic
- "A relational database is like a garage which
forces you to take your car apart and store the
pieces in little drawers..."
7- Throw in the towel
- OOPL your favorite file system
- Object-oriented DBMS
- Tightly integrated OOPL w/built-in DBMS
- Object-oriented client wrapper
- Loosely integrated OOPL relational DBMS
- Object-relational DBMS
- Newly integrated Relational model OO features
Which solution is the "right" one...?
8- Let's Examine the Problem Space
- Stonebraker's 4-quadrant model
Complex
Simple
Complex
9- The Object-Oriented DBMS Revolution
- Motivated by new database applications, e.g.
- Computer-aided engineering
- Document management
- Geographic data management
- Engineering applications were early drivers
- Complex data structures ("pointer spaghetti")
- Navigational data access required
- Tight coupling between applications and data
- Version management support needed
- Approach OOPL DBMS OO-DBMS
- Commonly based on C or Smalltalk
- Persistence, collections, versions, queries, ...
10- No OO "Ted Codd" Stepped Forward
- Object-Oriented Database System Manifesto
- Mandatory features
- Complex objects, identity, encapsulation
- Inheritance w/substitutability and late binding
- Computationally complete methods
- Extensible type system, persistence
- Secondary storage, concurrency and recovery
- Ad hoc queries
- Optional features
- Multiple inheritance, static type checking
- Distribution, long transactions, versions
- Individual choices
- Programming paradigm/language
- Details and uniformity of object model
11- Lots of research results
- Object data models and features
- OO query languages and processing techniques
- Client-server architectures and performance
- Significant commercial progress
- Important and innovative systems
- E.g., O2, ObjectStore, ODE
- Quite a few commercial product offerings
- GemStone, Objectivity, ObjectStore, Ontos, O2,
Matisse, Poet, Versant, others - The ODMG-93 standard (release 2.0)
- Consortium of OO-DBMS startups
- Three key parts ODL, OQL, C binding
12- But the Revolution "Failed" (0B)
- Lingering OO-DBMS differences
- Query power, API details, implementation twists
- Piecewise ODMG standard conformance (ex OQL!)
- Still behind R-DBMSs in important ways
- Codasyl-like schema compilation cycle
- Schema evolution painful, if supported
- Typically missing many useful "goodies"
- Support for multiple application languages
- Query optimization, views, authorization,
constraints, triggers, multi-user scalability and
robustness, ... - Other factors (niche market)
- SQL-based application building tools
- Architecturally biased towards "fat clients"
13- OO Client Wrappers are the Answer...
- Available from a number of vendors
- Persistence Software, Ontologic, HP, Next, ...
- Language-specific relational wrappers
- Proxy classes for C or Smalltalk (or Java)
- Mapping of row data into language objects
- Client-side (or middle-tier) object caching and
method execution - Why is this approach attractive?
- Can develop OO applications today, against
existing enterprise data, for "business objects"
14- Paradigm mismatch for querying
- C or Smalltalk for simple business logic
and navigation, against object-oriented schema - SQL for queries, against relational schema
- Choice forced for business logic rules
- Do on server, using DBMS facilities?
- Check constraints, referential integrity
constraints, triggers, stored procedures,
authorization - Do on client, using OO wrapper facilities?
- C or Smalltalk (or Java) programming
- This had better be a stop-gap solution
- R-DBMS could become a storage manager, throwing
away 20 years of successful RD!
15- The Object-Relational DBMS Evolution
- Third Generation Database System Manifesto
- Support rich object structures and rules
- Rich type system, inheritance, encapsulation
- Functions, optional unique ids, rules/trigggers
- Subsume second generation database systems
- High-level query-oriented interface
- Stored and virtual collections
- Updatable views
- Data model/performance feature separation
- Open to other subsystems (tools, middleware)
- Accessible from multiple languages
- Layered persistence-oriented language bindings
- SQL support ("intergalactic dataspeak")
- Query-shipping architecture
16- "Not Your Father's Employee Type"
- Beyond name, rank, and serial number
- Several new attribute types
- Location (2-d point), job description (text),
photo (image), ... - Associated functions
- Distance(point, point), contains(text, string),
... - Beyond your basic employee record
- Employees come in different flavors
- Emp, RSM, Programmer, Manager, Temp, ...
- Employees have many known relationships
- Manager, department, projects, ...
- Employees have behavior
- Age(Emp), qualified(Emp, Job), hire(Emp), ...
An Employee is a "business object"
17- Two Flavors of O-R Object Extensions
- Object extension 1 Abstract data types (ADTs)
- New column types and functions
- E.g.,text, image, audio, video, time series,
point, line, OLE... - For modeling new kinds of facts about enterprise
entities - Object extension 2 Row types
- Types and functions for rows of tables
- Includes inheritance, references, set-valued
attributes - For modeling business objects with relationships
behavior - Impact on schemas and query language SQL3
- Schemas tables at the top, OO richness within
- Queries extensions to support the added richness
- Structured types support both ADT and row type
object modeling needs (unified type system)
18- To define and use a "black box ADT", a user will
- Implement its internal structure and functions in
an external programming language (e.g., C/C,
Java) - Use the DDL to register the type with the DBMS
- Size of an instance of the type
- Input (constructor) and output functions
- Other functions and operators, including
signatures and linkable implementations - Costs and other properties for query optimizer
- Use the new type like a built-in data type
- Now available for defining columns of tables
- Functions and operators become available in
queries
19- Example Illustra Black Box ADT
- Point as a "black box ADT" (written in C)
create type Point ( internallength 16
-- typedef struct double x, double y
point input point_in -- for
reading in Point constants output point_out
-- for displaying Point
results ) create function point_in(Text)
returns Point as external name
'MI_HOME/functions/point.so' language C create
function point_out(Point) returns Text
as external name 'MI_HOME/functions/point.so' la
nguage C
20- Example Illustra Black Box ADT (cont.)
- Now we can put an end to "Pointless" queries...!
create function further_west(Point, Point)
returns Boolean as external name
'MI_HOME/functions/pointfuns.so' language
C select E1.name, E1.location from Emp E1, Emp
E2 where further_west(E1.location, E2.location)
and E2.name 'Mike' create binary operator
binding to further_west select E1.name,
E1.location from Emp E1, Emp E2 where E1.location
gtgt E2.location and E2.name 'Mike'
21- To define and use a "white box ADT", a user will
- Describe its internal structure using SQL3 DDL
- Attribute definitions are column-like
- Advantages heterogeneity, nulls, nesting,
constraints, ... - Implement its functions either directly in SQL or
in his/her favorite external programming language - Utilize system-generated accessors and mutators
- Finish explaining the type to the DBMS using DDL
- For query optimizer, as before
- Use the new type like a built-in data type
- In tables and queries, as before
- Note this is just a SQL3 structured type
definition that's primarily intended for use in
columns
22- Example DB2 UDB/OSF White Box ADT
- Point as a "white box ADT" (written in SQL3)
create type Point as ( x double, y
double, ) create function distance(p1 Point, p2
Point) returns Point language SQL inline not
variant return sqrt((p2..y-p1..y)(p2..y-p1..y)
(p2..x-p1..x)(p2..x-p1..x)) select
E.name from Emp E, City C where C.name 'San
Jose' and distance(E.location, C.center) lt 25
23- Of Extenders, Blades, and Cartridges
- High performance demands "deep" integration
- Optimizer must know about an ADT operator's...
- Execution cost (especially for expensive
functions) - Logical properties (e.g., transitivity, negator,
...) - Selectivity estimates (i.e., filtering/matching
power) - Relationship to access methods (both old and new)
- DBMS runtime must invoke functions efficiently
- Static vs. dynamic loading, fenced vs. unfenced
execution - Partnerships and third-party packages
- E.g., DB2's text, image, and spatial extenders
- Package contains types, functions, access
methods, optimizer information, and SQL DDL
statements for all of the above
24- To define and use a "row type", a user will
- Create the desired structured type using SQL3 DDL
- Columns, plus (optional) specification of a
supertype - Create functions/methods involving the type
- Arguments of the new type, w/overloading in the
case of methods - Create one or more tables of the indicated type
- Type hierarchy (if any) yields corresponding
table hierarchies
25- Example SQL3 Row Types (plus Sets...)
- Employees are people, so ...
create type Emp_t under Person_t as ( salary
Float, job_description Varchar(100), department
ref(Dept), projects set(ref(Project) ) create
table IBM_Emps of Emp_t under IBM_People (...)
create type Person_t as( name Varchar(20), birth
date Date) method age( ) returns Integer
language SQL create method age( ) for
Person_t return year(current date) -
year(birthdate) create table IBM_People of
Person_t (ref is self) (Note this is
approximate SQL3 syntax)
26- SQL's query constructs, extended with the ability
to access these features (a la SQL3 plus sets) - User-defined functions in queries (w/late method
binding) - Dereferencing of references (path expressions)
- Queries over nested collections (table
expressions) - For example, find unexplainable discrepencies
between employees' and managers' salaries
select E.name, E.manager-gtname,
display(E.photo) from IBM_Emps E where E.salary gt
E.department-gtmanager-gtsalary and
E.department-gtmanager-gtage( ) gt E.self-gtage(
) and not contains(E.job_description, "Java")
27- Other OR-Related Features
- Support for large objects
- Multimedia data types aren't small (e.g., video)
- Special handling required for efficiency
- Minimal copying, piecewise retrieval, optional
logging, movement to/from files, separate storage
area from other attributes - DB2 has blob, clob, and dbclob types (up to 2GB)
- Support for active data (triggers and
constraints) - Ex
create trigger me_too after insert on
IBM_Emps referencing new as newemp foreach row
mode db2sql when salary gt department-gtmanager-gtsal
ary begin atomic set newemp.department-gtmanager-gt
salary newemp.salary end
28- OR-DBMS Technology Status
- Many OR-DBMS research results
- Postgres, EXODUS, Starburst, ...
- OODB query processing research
- Commercial systems exist today
- IBM DB2 CS (V2.1) and CA-Ingres
- User-defined types functions, large objects,
triggers - Illustra, UniSQL/X
- Early providers of ADTs, row objects, inheritance
- IBM DB2 UDB, Informix, Oracle
- "Universal server" products contain subsets of
all this stuff - Standards right around the corner
- SQL3 is "hardening" and has an object part with
structured types, table hierachies, user-defined
functions and methods, object views, ....
29- Some OR-DBMS Performance Issues
- Bucky OR-DBMS benchmark from UW-Madison
- Based on a hypothetical university schema
- Exercised a range of OR-DBMS features
- Row types, inheritance, late binding, subtables
- Queries involving path expressions and/or sets
- ADTs (black or white box) and functions
- In Proc. 1997 ACM SIGMOD Conference
- Tested a first-generation OR-DBMS product
- OR versus relational simulation, same DB engine
- Showed benefits of (complex) ADTs, indexes on
functions - Indicated areas where query optimization needs
schema support scope for path expressions,
inverse relationships - Turned up bugs and performance problems (e.g.,
sets)
30OR Enterprise Scenerio (w/Challenges)
- Object-relational server managing the database
- ADTs w/inheritance and multi-language support
- Row types, integrated with all of SQL (OO views,
authorization, triggers, constraints, etc.) - High-function, OO, caching front-ends
- Support for desktop and middle-tier (web!)
applications - OR object model at all levels, for queries and
navigation - Clean bindings for OOPLs (Java, C, Smalltalk)
- Methods/queries running on client or server
- Likewise for triggers and constraints
- Business rules specified implemented once!
- In SQL ( OOPL), running where appropriate
31- Multi-Tier Integration Challenges
- Good mappings and interfaces to provide
object-relational objects to OOPLs - Java, C, Smalltalk, others
- Full query support in addition to navigation
- Challenges in querying and caching
- Intelligent querying over cache database
- Correct and efficient caching of view objects
- Update-related challenges
- Triggers and constraints of all types
- View objects (both directions)
- Method execution on client or server
- Java should be very useful here
32- Legacy Data Access Challenges
- Some data will live outside the OR-DBMS
- Older DBMSs (both relational pre-relational)
- Specialized data stores (documents, images, ...)
- Applications (i.e., legacy transactions)
- Object-relational middleware is the answer!
- Table functions can handle simple cases now
- Distributed OR query engine (a la DataJoiner) can
mediate between new applications and legacy data - Resulting appearance is that of an integrated OR
database, accessible via SQL3 APIs and OO tools
33- Front-End Integration Legacy Data Access
34- Relational DBMS era 1980's, early 1990's.
- Significantly raised the levels of abstraction
productivity - Only "real" parallel computing success story to
date, too! - Object DBMS era Should have been early 1990's...
- Never made it out of the (mainstream) starting
gate - Object-relational DBMS era You are there!
- Object enhancements to relational DBMSs
- ADTs (white box, black box) and functions
- Row types with inheritance, references, sets, ...
- Vastly reduces the "impedence mismatch" w/OOPLs
- Today's OO wrappers are an interim solution
- Possibilities abound for nice OO/OR tools
- Will have OR middleware as well as engines