Title: Databases and Information Systems 4
1Databases and InformationSystems 4
- Richard Cooper (rich_at_dcs)
- and
- Tony Printezis (tony_at_dcs)
2The Fundamental Problem
- Database Systems have been very successful in
providing good support for managing data which is
fairly large and fairly complex - What happens when
- the data gets very much larger
- the data gets very much more complex
3Contents of Course
- Week 1 (Richard)
- Introduction
- Overview of RDB/ORDB/OODB
- Week 2 (Richard)
- Orthogonal Persistence
- Object Oriented Database Systems
4Contents of Course
- Week 3 (Tony)
- Java Object Serialization
- The PJama API
- Week 4 (Tony)
- Object Caching and Object Faulting
- Pointer Swizzling
5Contents of Course 3
- Week 5 (Tony)
- Garbage Collection - Disk Behaviour
- Object Promotion
- Week 6 (Tony)
- Object Eviction
- Orthogonal Persistence for Java
6Contents of Course 4
- Week 7 (Tony)
- Store Organisation
- Garbage Collection
- Week 8 (Richard)
- Object Query Languages
- Transaction Models
7Contents of Course 5
- Week 9 (Richard)
- Transaction Models for Multi-Site Databases
- Schema Evolution
- Week 10
- Specialised Indexing (Ela)
- XML (Richard)
8Assumptions about Database Use
- As database systems evolved, it was assumed that
- 1. There was a central data store with lots of
distributed users. - 2. The data was relatively simple (largely
alphanumeric). - 3. The data was regular and complete.
- 4. There was a lot of data, but there was also
an implicit limit to the size. - 5. The users were either consumers or
specialised creators
9The Real World
- Now we have
- data all over the place
- in all kinds of structures
- much of it is text
- even more of it is graphical or aural
- vast amounts of it
- some of it is missing or is structured
differently in different places - users with various kinds of interest/involvement
10When Data is Small
- You can get away with
- non-linear algorithms
- hand-crafted code and data
- an ad hoc structure
- implicit rules and informal conventions
11When Data Gets Large
- You must have
- linear or (better still) incremental algorithms
- systematic code and data management
- regular structures, frameworks and tools to
support them - explicit, visible and interpretable rules
12When Data is also Long Lived
- We have the hardware to keep data for a very long
time - and there are often laws forcing us to do so
- However, long-lived data tends to change
- new data is added
- it is restructured
- the software expected to handle it evolves
- Can you read a ten year old floppy???
13When Data is also Heterogeneous
- Information Systems increasingly must bring
together data produced - of different kinds (numeric and multi-media)
- separately (e.g. in merged companies)
- for different purposes
- using different technologies
- As though they were all designed to work together
14Large, Long-lived, Heterogenous and Unstoppable
- Because the data supports continuous operations
- utilities, banking, airlines, public service
- You may not stop such systems if
- you want to change the hardware or software
- you want to change your database
- you want to change the application
- there are hardware or software failures
- there are operations which require exclusive
access
15This is the Reality we Live With
- There are lots of examples
- shared scientific data (e.g. genomic data)
- e-business
- governmental systems and health-care data
- computer aided design and manufacturing
- geographic information systems
- etc., etc.
16And There Are Many More Media for Data Access
- Not just a private network, but also
- the internet
- digital television
- mobile devices
- etc., etc.
17How To Cope 1
- Software Re-use
- not just small libraries such as Java APIs
- but large components, such as
- databases, payroll packages, GUI packages, etc.
- Standardised Frameworks
- CORBA, DCOM, EJB, .NET, XML
18How to Cope 2
- Generate code rather than write it
- since much code is repetitious and can be
generated from - a high-level notation or by reflecting over data
- Work incrementally
- revolution is never affordable
- plan and resource route for transition
- remember the users!
19The Fundamental Coping Device
- Effective high-level and complex standards for
representing - data (relations not enough)
- applications (regular, strict languages needed)
- distributed systems (CORBA, etc.)
- processes (UML, business processes, etc.)
- etc., etc.
20But also ...
- It may be necessary to create new storage
techniques to fit new data structures - It will be necessary to invent new storage
structures to manage the new complexity - There is need for work at both
- the implementation level and
- the usability level
21Lecture 2
- New Requirements on DB Functions
- Why Relations Won't Do
- Extending Relations
- Historical and Deductive Databases
- Object Relational Databases
- Oracle Objects, SQL3, etc.
- Object Oriented Databases
- intro only
22New Applications withNew Requirements
- 1. CAD, CAE, CIM
- 2. Computer Aided Software Engineering
- 3. Office Information Systems
- 4. Geographic Information Systems
- 5. Hypermedia Systems
- Data is large, often graphical, multiple
versions required, data is complex
23Requirements which carry over from Traditional
Applications
- Efficient access to large amounts of data
- Recovery mechanisms
- Security mechanisms
- Data independence
- Distribution of data
24Requirements Modified by the New Applications I
- Transactions
- in traditional applications, these are short -
milliseconds to book a seat - in novel applications, they may be long - hours
or days to edit a design - in traditional applications they are competitive
- don't book the same seat twice - in novel applications they may be co-operative
e.g. collaboration on design development
25Requirements Modified by the New Applications II
- Integrity Constraints are much more important
- as the data is more semantically complex
- some of the semantics is best expressed as
constraints - User Interfaces play a greater rôle
- the data is manageable only if appropriate
visualised - complex operations must be made usable
26Requirements Modified by the New Applications III
- Data is organised differently
- Trad. Apps Novel Apps
- Numbers of Objects Large Small
- Number of Types Small Large
- Object size Small Large/Huge
-
27New Requirements Made by the New Applications I
- Complex Data Structures
- Just sets of records won't do
- Object identity easier than primary keys
- Implicit references easier than foreign keys
- First Normal Form is a Killer!
- Multimedia Data Types
28New Requirements Made by the New Applications II
- The Database must hold Code
- to hold complex derived data
- to hold "active values"
- Multiple Versions
- We only want one bank account record at any time
- But many alternative designs
- Building configurations becomes a problem
29Can We Go On UsingRelational DBMS?
- Only with increased mapping problems
- The RM only has two ways of relating two pieces
of data - They are in the same record.
- They are in two records connected by a foreign
key.
30The Semantic Poverty of the RM
- The former is used for
- grouping attributes
- 1-1 relationships
- compound attributes
- connecting keys of M-N relationships
- The latter is used for
- multi-valued attributes
- sub-typing
- one-many attributes
31Other Problem with RDBs
- You can't do recursive queries
- e.g. "Return all the ancestors of X"
- Nor much support for constraints
- e.g. "All employees earn less than their boss"
- You can't add new operations
- e.g. "Return the volume of a building"
- Impedance mismatch
- if you have use a PL this has a different data
model than does SQL
32Three Approaches for Progress
- Start with traditional DBMS Object-Relational
System and extend its modelling power - or
- Start with rich data model Object Oriented DBMS
and add DBMS facilities - or
- Start with a Programming Persistent Prog
Language Language and add DBMS facilities - Manifesto Wars
33The Third-Generation Database System Manifesto I
- Three tenets
- Besides traditional data management services,
third generation DBMSs will provide support for
richer object structures and rules - Third generation DBMSs must subsume second
generation systems - Third generation DBMSs must be open to other
subsystems
34The Third-Generation Database System Manifesto II
- Thirteen Propositions
- Rich type system Inheritance
- Functions/encapsulation OIDs only if no
primary key - Rules (triggers and constraints) are
important - The query language should be central to all
access - ManualAutomatic Collections Update through
views - Performance and data model should be kept
separate - Multiple Prog. Languages SQL is the de facto
standard - Persistent extension of languages is good
- Network communication through queries and results
35The Object Oriented System Manifesto I
- Mandatory Features
- Complex Objects Object Identity Encapsulation
- Types and Classes Inheritance Late binding
- Ad hoc querying Extensibility Persistence
- Efficient storage Concurrency Recovery
- Computational completeness
- Disagreement
- Integrity constraints DB Admin Tools Views
- Schema Evolution Tools
36The Object Oriented System Manifesto II
- Optional Features
- Multiple inheritance Type checking
- Distribution Design Transactions Versions
- Open Choices
- Programming paradigm Type system Uniformity
37The Third Manifesto
- The relational model is still important and OO
features should be orthogonal - Like
- relations relational algebra up front
- integrity constraints mutiple and single
inheritance - computational completeness static type checking
- Don't like
- SQL, object Ids and null values
38Two Extensions of RDBMS
- Historical DBMS
- keep all past states of the database
- Deductive DBMS
- derived data as well as base data
- uses a language like Prolog to add the derived
data
39Historical DBMS
- Old records are kept when they are deleted to
answer queries like "give balance on 1/10/88?" - Records have two extra fields - creation and
deletion dates - delete sets the deletion field
- insert sets the creation field
- update sets the deletion field and creates a new
record - Two notions of time
- when the data is valid and when it is entered
40Deductive DBMS (DDB)
- A DDB is made up of two kinds of component
- facts are simple base assertions - i.e. records
- father( jane, john ) mother( jill, jane)
- rules are ways of deriving more facts
- grandfather( C, G ) - parent( C, P ), father(
P, G ) - parent( C, P ) - father( C, P ), etc.
- Queries are rules with variables to be filled in
- grandfather( X, john )? - who are john's
grandchildren
41Object-Relational Databases
- Also known as
- Extended relational databases
- Complex object databases
- Main features
- get rid of First Normal Form
- add methods to tables
- Main examples
- Oracle 8/i onwards, SQL3, Infomix
42The Main Additions to RDBs
- User defined abstract data types
- Row types so that one value can include a nested
complex value - Collection types for domains
- Inclusion of user-defined functions defined on
types - Inheritance
- Multimedia data types and large objects
43SQL3 (Evolving Standard)
- This is a massive extension to SQL and has
- computational completeness
- row types
- user-defined types
- user-defined procedures, functions and operators
- type constructors for arrays, sets, lists and
multisets - support for large objects - BLOBs and CLOBs
- recursion
44Row Types in SQL3
- A row type is a sequence of field name/type pairs
- i.e. the type of a row of a table - In SQL3 it can also be the domain of a column
- create table Branch( branchNo longInt,
- address row( street varchar(20),
- city varchar(20) ) )
- Row types can be named
- create row type EmpRT( Ename varchar(35), age
integer ) - create table Employee of type EmpRT
45User-Defined Types (UDTs) in SQL3
- These are a means of defining new domain types in
SQL3, e.g. - create type StaffNumberType as varchar(5) final
- More generally a UDT is an abstract data type
with - (non First Normal Form) fields
- constructor methods
- observer and mutator (get and set) methods
- general methods
46UDT Example
- create type personType as
- ( private dateOfBirth Date,
- public fname VARCHAR(15) not null,
- public lname VARCHAR(15) not null,
- function age(p PersonType) returns integer
- return / code to calculate age /
- end )
- ref is system generated // see later
- instantiable // if not, only subtypes are
- not final // can have sub-types
47Subtypes and Supertypes
- Given a type, we can create a subtype, e.g.
- create type StaffType under PersonType as
- ( staffNo varchar(6), etc.
- This works by creating an extra attribute which
refers to a PersonType value - This also works at the table level
- create table Manager under Staff( MgrStartDate
Date) - This creates a table with all the columns of
Staff duplicated and all manager records in both
tables
48References
- In SQL3 it is possible to set up OID style
references. - On slide 46 we said that PersonType had
system-generated references, so we can do - create table Branch as
- ( branchNo integer,
- address addressType,
- manager ref(PersonType)
- ..... )
- In this, the value is a system-generated OID
49Collection Types
- SQL3 supports four collection types
- ARRAY - one dimensional fixed length array
- LIST - ordered and allows duplicates
- SET - unordered and does not allow duplicates
- MULTISET - unordered and allows duplicates
- E.g. if PersonType has an attribute
- nextOfKin set(PersonType)
- The following makes sense
- select fName, lName, count(NextOfKin)
50Triggers
- Triggers are pieces of code which act when some
condition is met. Each trigger defines - the event and whether to act before or after it
occurs - whether to operate on each row or only once
- what to do
- create trigger MailNewStaffNextOfKin
- after insert on Staff referencing new row as ST
- begin
- insert into StaffToMail values ( select P.name,
P.address - from Person where ST.nextOFKin1
ST.staffNo ) - end
51Large Objects
- Large objects are increasingly important and
there are two kinds - Binary Large Objects (BLOBs)
- Character Large Objects (CLOBs)
- You can
- Concatenate them and do "substring" operations
- Overlay and trim them
- Return the length
52Recursion
- SQL3 permits linearly recursive queries, such as
- with recursive AllManagers( staffNo,
managerStaffNo) - (select staffNo, managerStaffNo
- from Staff
- union
- select in.staffNo, out.managerStaffNo
- from AllManager in, Staff out
- where in.managerStaffNo out.staffNo )
53Objects in Oracle
- The object option in Oracle8 provides, among
other things - user-defined data types
- the use of objects directly by use of the ref
keyword - collection types including variable length arrays
- multimedia data types
54User Defined Types
- UDTs have a name, attributes and methods
- create type Person as object
- ( name varchar2(30),
- address varchar2(40),
- member function getName return varchar2(30)
) - Constructor methods - as usual
- Comparison methods - to help order objects
- General methods
55Ref Types
- Attributes with object types have their domains
declared using ref - create type Person as OBJECT
- ( name VARCHAR2(30),
- spouse ref person )
- For an object P of type Person, you can then do
- P.spouse.name // to get the name of P's
spouse
56Collection Types
- There are two collection types
- Arrays (called VARRAYs)
- create type Prices as varray(10) of number(1,2)
- Tables (called nested tables)
- create type PersonTable as table of Person
- Now we can have columns whose domains are either
of the above
57Object Views
- An object view is a virtual table of objects
- useful to evolve relational applications into
object applications - create table Person (NINum varchar2(9),
- Name varchar2(30), Age number)
- create view OldView with object oid (NUNum) as
- select NINum, Name, Age from Person
- where Age gt 40
- Update through views permitted where sensible
58Comparing ORDBs and OODBs
- ORDBs are better for
- integrating a pre-existing RDB
- traditional DBMS facilities (security, recovery
etc.) - OODBs are better for
- advanced transactions, navigational queries
- schema evolution
- integrating a programming language
59Lecture 3Orthogonal Persistence
- Why Orthogonal Persistence is important
- What Orthogonal Persistence is
- Principles of Orthogonal Persistence
- How to achieve Orthogonal Persistence
- Examples of Persistence Mechanisms
60The Problem
- Traditional data intensive programming requires
programmers to be distracted trying to arrange
storage for the data - Fortran programs files
- Cobol Network Databases
- C, etc. Relations
- This distraction slows productivity
61Too Many Mappings!
62Defining Persistence
- Persistence is the length of time for which a
piece of data (including program) continues to
exist. - from until the end of the block it was declared
in - to outliving the program which constructed it
- Most systems provide different persistence
mechanisms for different data. - Often systems only permit some data long term
persistence - e.g. JOS.
63Orthogonal Persistence
- is the automatic management of data so that it
may - outlive an individual program execution
- automatically moving to and from backing store
- be used concurrently by more than one program
- not just storing a heap image - e.g. LISP,
SmallTalk - dynamic binding of names and types
- be used by successive program versions
- requires an evolution mechanism
64Principles of OP
- Data of any type (including multimedia and code
fragments) should have an equal right to all
levels of persistence - All of the data is stored completely
- The data retains its structure when stored
- The code is the same whatever the persistence of
its data
65Why is this Important?
- Every departure from these rules creates an
irregularity that the programmer has to work
around - data types which cannot be stored in the same way
as everything else - rebuilding incomplete structures
- dealing with referential integrity problems
- different code for transient and persistent data
66Other Benefits of OP
- Only one persistence technique to learn
- Avoids extra code which obscures the application
logic - Permits code re-use
- But how does the programmer assign a persistence
level variously to the data? - Any data can persist but for this application
which should?
67Mechanisms for Indicating Persistence
- Explicit write statements - not in the spirit of
OP - Persistence indicated by class or type
- ODMG supports this
- The E language had "Shadow" classes - one for
each real class - Persistence indicated at object declaration or at
object creation - some OODBs do this
- Persistence by reachability
- this will be our favourite, you'll see!
68Persistent Class Examples
- Classes declared to be persistent
- persistent class Person
- early ODMG proposal
- class Person implements Serializable
- Java - native code can't play
- class Person public d_Object
- ODMG proposal for C
69Persistent Object Examples
- persistent Person P
- Person P new Person(MyDB)
- Person P is created in the database
- Person Q new Person( P )
- Person P is created in the database "near to"
Person P.
70Persistence by Reachability
- Some objects are explicitly stored - persistent
roots - Any other object which is pointed to by a root is
automatically stored as well - Objects pointed to by those objects are also
stored - in fact, the transitive closure of references
from the roots are stored - This is similar to Garbage Collection
71Example of Reachability
Memory
The rest of the tree is dragged in as well
A tree in memory
Explicit storage of tree root
The Database
72Using Persistence by Reachability
- The data must be organised around the idea of
persistent roots and their transitive closures - Note this is not new
- An RDB has each relation as a root whose
transitive closure is the set of records - ORDB and OODB databases can be organised the same
way - Except other structures may now be used - e.g. a
tree
73History of OP
- 1978 - Identified by Atkinson
- 1978 - 1982 - Search for a suitable language
- 1983 - 1988 - PS-algol
- 1988 - 1995 - Napier88
- 1985 - present, ideas gradually appear in
commercial systems - 1995 - 2000 - Pjama, Persistent Java
74What the Research has Entailed
- Identification of language with suitable
properties - regularity, popularity
- Identification of necessary techniques
- store organisation, memory management, organising
the movement of data - Implementation of those techniques efficiently
75Lecture 4
- Persistent Programming Languages
- What is a suitable language?
- Some examples
- Object Oriented Database Systems
- Features
- Examples
- The Object Data Management Group Standard
76A Suitable Language to Make Persistent
- A persistent programming language is one which
accords with the principles of persistence (slide
64) - In building a persistent language other aspects
of a language are desirable - regularity and small number of constructs
- since irregularities and more constructs increase
the number of aspects that the persistence layer
must cope with
77PS-algol
- This added persistence to S-algol a simple and
regular form of algol at St. Andrews - complex object structure, but object domains were
all of the same type - procedures are first-class objects which means an
object can a have a piece of code as a component,
there are variables which hold procedures, etc. - databases as objects in which you can enter name,
value pairs to be persistent roots - persistence by reachability from those
- anything can persist
78Napier88
- More powerful version of PS-algol developed at St
Andrews and Glasgow - complex objects but now the domains are typed
- single procedure to return the persistent store
as the sole persistent root objects - databases inserted immediately below this
- abstract data types and other type constructors
- hyper-programming allows programming directly
against the database - image data type
79Persistent Java
- PJama was developed in Glasgow from 1995 onwards
- Allows Java objects to be bound into the
persistent store and retrieved - Much more on this in subsequent lectures
80Object Oriented Databases
- An Object Oriented Database has the following
features - Objects can persist
- Object identifiers and references
- Encapsulation of data and methods
- Inheritance
- Dynamic binding of code to data
81Example
82Problems with OODBs
- They are hard to implement
- Adding concurrency, distribution, efficiency,
reliability and querying to an OO system is
difficult - They use different persistence mechanisms
- They use different OO models
- and different OO languages
- They have been produced by small, unstable
companies
83Differences in Object Models
- Are scalars objects?
- Can properties be public?
- if not how is the optimiser going to work?
- Are there other information hiding controls? -
e.g. friends - Multiple or single inheritance
- What can be made persistent and how?
84History of OODBMs
- First products in the field use Smalltalk/Own
Language - 1986/7 - GemStone and Vbase
- Big companies toy with the idea
- 1987 - DEC (Trellis/Owl) and Hewlett-Packard
(IRIS) - C Products in Late Eighties
- Ontos, Versant, Objectivity, ObjectStore
- Other models 1990 onwards
- O2, POET, UniSQL, Jasmine, etc.
85Gemstone/J
- Started as persistent Smalltalk
- Switch now to Java
- Distributed Java Beans and EJBs
- Servlets and JSP
- CORBA
- etc.
- OQL, Transactions, etc.
86Jasmine
- From Computer Associates (INGRES RDB)
- Studio for application development
- Java
- Multimedia classes
- Authoring tools
- Web development facilities
87POET
- Java and C
- OQL
- Targeted at small applications
- Transactions and Locking
- Schema versions
- Event Notification
- Security and Authorisation
- Object factory - putting objects into RDBs
88The Object Data Management Group (ODMG)
- Set up by Rick Catell at Sun and the main OODB
vendors - voting members - Sun, POET, Objectivity, Excelon
- reviewer members - CERN, Versant, CA, NEC and
Micro Data Base Systems - academic members
- membership always changing!
89What are the ODMG Doing?
- an architecture for OODBMS
- a logical data model expressed as a class
hierarchy - a data definition language, ODL
- a data interchange format, OIF
- a query language, OQL
- a number of Object Manipulation Languages (OMLs)
- bindings to Java, C and SmallTalk
90ODMG - OO Features Appropriate for Databases
- Special Treatment of Literal Values
- A DB cannot afford to make an integer an object
- Separate Provision for Relationships
- Most OO models are not very good at relationships
- ODMG provides for automatically maintained
relationships - i.e. when one side changes so
does the other - Domain Types - date and time domains
- Objects for Database Management
- databases, transactions, locks, sessions,
schemata - Metadata Management
91The ODMG Data Model
- The data model is defined in terms of a number of
types which include - Interfaces - describe the abstract behaviour of
objects - Classes - describe the abstract behaviour and
state of objects - Collections - sets, bags, lists, arrays,
dictionaries - Constructed Types - enumerations, structures and
unions - Objects (with identity) and Literals (no identity)
92The Type Hierarchy
93Example (ODL)
- struct Address int house String road ...
- defines a complex literal (not an object)
- interface Person String name int age ...
- defines an uninstantiable object structure
- class Employee Person int StaffNo Dept d ...
- defines an instantiable object structure
- "" is inheritance which can be multiple
94Relationships
- Attributes and relationships are distinguished
- class Employee Person
- attribute int StaffNo
- relationship Dept d inverse DeptEmployees
... - class Dept
- relationship setltEmployeegt Employees inverse
Employee d ... - Relationships can have automatically maintained
inverses
95Extents
- The extent of a type is the set of instances of
that type in the database - The extent of a subtype is a subset of the extent
of the supertype - The DB designer can request that the extent of a
class is maintained automatically - A particular implementation may include indexes
and keys