Query Optimization in Object Databases

About This Presentation

Title:

Query Optimization in Object Databases

Description:

Query Optimizers transform query in query plans composed of low level operations ... Side effect introduced through cursors. 20. G. Gardarin. The OFL Language ... – PowerPoint PPT presentation

Number of Views:106

Avg rating:3.0/5.0

Slides: 86

Provided by: georgesg8

Category:

more less

Transcript and Presenter's Notes

Title: Query Optimization in Object Databases

1
Query Optimization in Object Databases

Georges GARDARIN

Laboratoire PRiSM/UVSQ
G. Gardarin
2
1. Introduction

Object models provide ADTs, inheritance, complex
structures, relationships and object identity
Query Optimizers transform query in query plans
composed of low level operations to evaluate on
the object collections
New techniques are required for supporting the
object-oriented features

3
Outline

Object Query Languages
Complex Object Algebra
Operator Algorithms
Query Plan Transformations
Cost Models
Search Strategies
Open Problems

4
Overview

Presentation of the various topics of Query
Processing in OODBMSs
Topics are not independent
Operators depend on data structures (index)
Search strategies depend on cost model
An optimizer has to consider all aspects
Complex piece of software
Has to be extensible for additional features
data types
access methods
operators

5
Vocabulary

Collection a set, list, array or bag of objects
Query a user query in high level language
Predicate a term of a query criteria
Qualification a logical expression of
predicates
Operator a low level accessor to 1 or several
collections
Annotation the selected algorithm for executing
an operator
Query plan a program of annotated operator
Cluster a group of related objects stored
together in a bucket
Index an accelerator by value of an attribute
Path index an accelerator by values along a path

6
2. Object Query Languages

Extension of SQL with
user defined functions in predicates and results
user defined comparison predicates
path expressions to traverse relationships
flattening, grouping and degrouping operators
automatic scan of inheritance hierarchies
Two standards are under construction
The object standard of ODMG (OQL)
The object-relational standard of ISO/ANSI (SQL3)

7
Database Example (1)
Vehicle
Maker
Number
Color
Company
String
String
President
Name
City
Employee
String
String
Ssn
BirthDate
Name
Float
String
8
Query Example

Object identity
SELECT E.Name, C.Name
FROM Employee E, Company C
WHERE C.President E
Paths and method
SELECT Number
FROM Vehicle
WHERE Color "Red"
AND Vehicle.Maker.City "Paris"
AND Vehicle.Maker.President.age() lt 50

9
Database Example (2)
Company
Employs
Name
City
Person
Owns
Name
Age
Vehicle
Number
Power
10
Qualified Path Expression

OQL form
SELECT C.Name, P.Name, V.Number
FROM C IN Companies, P IN C.Employs, V IN P.Owns
WHERE C.City"Paris" AND P.Agelt30 AND V.Powgt10
Direct form
SELECT C.Name, P.Name, V.Number
FROM Companies C, Persons P, Vehicles V
WHERE
CCity"Paris".Employs.PAgelt30.Owns.VPowgt10

11
Exercice Queries

Express in OQL, then with qualified path
expressions, a set of given queries.

12
3. Complex Object Algebra

Generalization of relational algebra
Set-oriented processing of objects
A set of operations on collections of objects
generating collections of objects
Different types of collections
class extent, set, bag, list, array
Any query can be expressed as a complex object
algebra expression
Logical algebra annotated for execution

13
The LORA Algebra

Finance and Gardarin 1991

LoraOp
SearchOp
UpdateOp
TransactOp
GroupOp
Filter Map
SetOp
Join
Sort
RJoin
VJoin
- RemoveDup - Aggregate - Nest - Unnest
- Union - Intersect - Difference
14
Main Operator Signatures

JOINCol,Exp,Col gtCol
OUTER_JOIN Col,Exp,Col gt Col
SORT Col, Exp gt Col
AGG Col, Exp, Exp gt Col
NEST Col, Nest_Exp gt Col
UNNEST Col, Nest_exp gtCol
RDUPLICATE Col, Exp gt Col
FILTER Col, Qual gtCol
MAP Col, Exp,Qual gt Col

MINUS Col, Col gt Col
DIVIDE Col, Col gtCol
UNION Col, Col gt Col
OUTER_UNION Col, Col gt Col
INTERSECT Col, Colgt Col
UPDATE Col, Col, Ident, Assignement gtCol
INSERT Col, Col gt Col
DELETE Col, Col, Ident gt Col

15
Algebraic Tree
SELECT Number FROM Vehicle WHERE Color
"Red" AND Vehicle.Maker.City "Paris" AND
Vehicle.Maker.President.age() lt 50
Filter(,Number)
RJoin(Maker)
Filter(City"Paris",)
RJoin(President)
Filter(Color"Red")
Filter(age()lt50,)
Vehicle
Company
Employee
16
The ENCORE Algebra (1)

Shaw and Zdonik 1990
Select(InputCollection, p)
s?(s in InputCollection) ? p(s)
Image(InputCollection, f T)
f(s)?s in InputCollection
Project(InputCollection,lt(A1, f1), ...,(An, fn)gt)
ltA1 f1(s), ...,An fn(s)gt?(s in
InputCollection)

17
The ENCORE Algebra (2)

Nest(InputCollection,Ai)
ltA1 s.A1, ...,Ai t, ...,An s.Angt??r ?s
(r in t ? s in InputCollection ? s.Ai r)
UnNest(InputCollection,Ai)
ltA1 s.A1, ...,Ai t, ...,An s.Angt?s in
InputCollection???t in Ai
Flatten(InputCollection)
r??t in InputCollection ? r in t
DupEliminate (InputCollection)
Coalesce (InputCollection, Ai)

18
The ENCORE Algebra (3)

OJoin(InputCollection1,InputCollection2,A1,A2,p)
ltA1 s, A2 rgt?s in InputCollection1 ? r in
InputCollection2 ? p(s,r)
Set-oriented operations
Union
Intersection
Difference
with set membership based on object identity

19
The OFL Operators

Gardarin Machucca 1995
Navigational traversal often interesting
Existential quantification
Better control of query plans, smaller
granularity
Mixing navigational and set-oriented traversal
Based on Bachus functional approach
Processing of collections of objects
Side effect introduced through cursors

20
The OFL Language

Definition Abstract Collection
A container of objects encapsulated by a finite
set of behavioral and traversal functions.
Constructions
Composition f.g (x) f(g(x))
Path expressions f0.f1....fn(x)
Conditional If_Then_Else (p, f1, f2) (x)
Iteration While (p,f)
Sequence Sequence(f1, f2, , fn)

21
Collection Traversal in OFL

Quantified function ????Apply to all
A second order function of signature ForAll(C, p,
f) that applies a function f to all objects of a
collection C satisfying a predicate p.
Quantified function ?????Apply to any
A second order function of signature ForAny(C, p,
f) that applies a function f to any object of a
collection C satisfying a predicate p.
Iterator and Annotations
Each quantified function works on an iterator
Set-oriented or navigational traversal is selected

22
Database Example
23
Translating Query in OFL

SELECT p.lastname
FROM p in Person
WHERE exists v in p.owner v.color "Red"
ForAll(Person P, null,
ForAny(Owner(P) V, StringEqual(Color(V),
"Red"), LastName(P) ) )

24
A More Complex Query

OQL Query
SELECT tuple(p.lastname, v.price, c.partlabel)
FROM p in Person, v in p.owner, c in v.composed
WHERE p.age 16 and v.pricec.price
OFL translation
ForAll(Find(AgeIndex,16) P, null,
ForAll(Owner(P) V, null,
ForAll(Composed(V)T,IntegerEqual(Price(V),Price(T
)),
Tuple (LastName(P), Price(V), PartLabel(T))))))

25
Further Operators

Recursive operators
FixPoint(ResultCollection, InitializationExpressio
n,RecursivePredicate, RecursiveExpression,
FinalExpression)
gives the OFL program
Sequence(OFLInitializationExpression,
While(OFLRecursivePredicate,
OFLRecursiveExpression),
OFLFinalExpression)

26
Exercice Algebra

Write in OFL the definition of LORA operations
exemple
Join(InputCollection1,InputCollection2,ResultColle
ction, JoinPredicate, ProjectionExpression)
gives the OFL program
ForAll(InputCollection1,null,
ForAll(InputCollection2, OFLJoinPredicate,
InsertResultCollection(ResultCollection,
OFLProjectionExpression))))

27
4. Algebraic Operator Algorithms

Classical relational operators still valid ...
Filtering with a predicate (Restriction)
Sequential scan
Index scan, clustered or non-clustered
Value-based join
Nested loop join
iterate on the outer collection and compare each
outer object with each object in the inner
collection
Merge join
sort on join fields the two collections and then
merge
Hash join
hash the outer collection on join fields, scan
the inner table and probe the hashed collection

28
Path Traversals

Paths may involve multiple collections
Each collection can be qualified by predicates

29
Depth-First-Fetch

Depth-First-Fetch (DFF) is the natural algorithm
for evaluating a path expression.
It follows the path from the root to the target
collection, using a depth first graph traversal
algorithm.
The corresponding operator is an n-ary operator
denoted DFF.
Advantages
no intermediate results, simple pointer chasing
result are assembled one at a time allowing
pipeline
efficient when the memory size is large enough to
avoid swapping of objects

30
Breadth-First-Fetch

Breadth-First-Fetch (BFF) traversal processes the
tree of objects using a Forward Join (FJ)
algorithm which is based on pointer chasing
between two collections.
Successive binary joins of collections are
performed from the source collection to the
target, following the path in a forward order.
Advantages
no multiple fetch of objects
requires the construction of hashed support table
to memorize FJ results

31
Reverse-Breadth-First-Fetch

Reverse-Breadth-First-Fetch (RBFF) performs a
sequence of binary joins between two neighbor
collections to traverse the path, but it proceeds
in the reverse order of the path.
Thus, each join is called a Reverse Join (RJ).
The join criterion is the member-ship of the
second collection object identifier to the first
collection pointer attribute values.
Advantages
efficient when predicate(s) in last collection(s)
selective
requires supporting tables and value-based joins

32
Illustration of BFF RBFF
33
Further Algorithms

Various algorithms are available for each
operator
Combination of operators can be applied
to traverse long paths
to derive new algorithms
hash both sort buckets merge buckets
limited breadth-first-fetch
Cost is dependent of many factors
physical organization of objects
size of collections
selectivity of predicates
available memory size
possible degree of parallelism

34
Exercice Algorithms

List all the possible annotated query plans to
process the query
SELECT C.Name, P.Name, V.Number
FROM C IN Companies, P IN C.Employs, V IN P.Owns
WHERE C.City"Paris" AND P.Agelt30 AND V.Powgt10
over the database schema

Company
Employs
Name
City
Person
Owns
String
String
Name
Age
Vehicle
String
Int
Number
Power
Int
Int
35
5. Query Plan Transformations

Query rewrite Algebraic rewrite of query tree
semantic transformations based on properties of
data types and integrity constraints
syntactic transformation based on properties of
operators
Query planning Selection of best algorithms
annotation of logical operators with selected
algorithms
cost of an annotated algorithm often dependent of
result of previous algorithm
e.g., no sort needed if result sorted
Query rewrite and query planning are not
independent

36
Extensible Optimizers

Closed Optimizer
set of operators and transformations fixed
heuristic-based or cost-based selection of plans
efficient but hard to modify and extend
e.g., Oracle 7.3, SQL Server 10, ...
Extensible Optimizer
extensible set of operators and transformations
rule-based generation of query plans
selection of "best" plan using a search strategy
e.g., Exodus, Starbust and DB2 CS, Esprit EDS
IDEA, Illustra, ...

37
Rewrite Rule Base
From Gardarin, Finance DKE 93
Query Plan

Common Expression Detection
Syntactic
Optimization
Semantic
Optimization
Predicate
Simplification

Cost model Heuristics
Modular Rule Base
STRATEGY
Optimized Query Plan
38
Syntactic Rewrite Rules (1)

Restrict through Union Pushing Rule
Restrict(Union(C1,C2)) ltgt
Union(Restrict(C1,C2))
Restrictions through Super Class Pushing Rule
Restrict (Super(C1,C2)) ltgt Super(Restrict(C1),Re
strict(C2))

39
Syntactic Rewrite Rules (2)

Join Commutativity
Join (C1,C2) ltgt
Join (C2,C1)
Join Associativity
(C1 Join C2) Join C3) ltgt
C1 Join (C2 Join C3)
Restrict through Join Pushing Rule
Restrict(Join(C1,C2)) ltgt
Join(Restrict(C1),Restrict(C2))

40
Planning Rules

Join method choice
JoinNL (C1,C2) ltgt JoinSM (C1,C2)
JoinHP (C1,C2) ltgt JoinSM (C1,C2)
Depth First Fetch introduction
DFF(C1,C2,C3) ltgt Join(C1,Join(C2,C3))
Index Scan introduction
Scan(C1,P) ltgt Scan(IScan(C1,I(P)),PI(P))

41
Semantic Rules

Integrity constraints
Type(x) Square ltgt
Type(x) Polygon and large(x) long(x)
User function properties
draw(xy) draw(x) draw(y)

42
What Rule Language ?

Rules are often complex to express
Conditions on qualifications, operators, results,
...
Proposed rule languages
C rewriting procedure Exodus, Starbust
if ltC proceduregt is true then ltC proceduregt
Practical but hard to extend optimizer
Side effective rule language Finance91
WHEN ltQuery Expressiongt IF ltConditiongt
THEN ltQuery Expression'gt UNDER ltActiongt
Complex to implement for pattern matching
OQL Query equivalence Florescu95
Parametrized Query Parametrized Query
Lack of generality (e.g., query planning not
possible)

43
Choice of Best Query Plan
Algebraic Tree
Database Schema
Query Plans
Transformation Rule base
Query Plan Generator
Search Strategy
Cost Model
action ? cost ? float goak
??boolean
"Best" Query Plan
44
Exercice Rules

Given a linear path expression from collection 1
to i, determine the number of distinct query
plans to process the query, assuming that 3
algorithms are available to process any path
expression (DFF, BFF, RBFF)
Give the rule base to generate those plans

........
Ci
C1
Ci-1
C2
45
6. Cost Models

Extension of relational cost model to handle
Object identifiers
Path indexes
Object linking and embedding
Clustering
Takes into account CPU cost and I/O cost
CPU cost ? Number of examined objects
I/O cost ? Number of pages read

46
Collection Parameters

C number of pages of collection C
C number of objects of collection C
Ci number of pages of cluster i of collection
C
Ci number of objects of cluster i of
collection C
SC average object size in collection C
SProj average size of projection result
M available memory size
Sel(Qual) selectivity of qualification Qual
Sel(Pred) selectivity of indexed predicate Pred

47
I/O Scan Formulas

Sequential scan
I/O cost I/OScan I/OResult
I/OScan C
I/OResult Sel(Qual)CSProj/SC - M if gt 0 else
0
Unclustered index scan
I/O cost I/OIndex I/OHitI/OResult
I/OIndex Blevel(I)
I/OHit Yao(C,C,Sel(Pred)C)
Clustered index scan
I/O cost I/OIndex I/OHitI/OResult
I/OHit Sel(Pred)C

48
Object Clustering

Clustering par classe
Regroupement de toutes les instances d'une même
classe dans un même fichier
Clustering par composition
Regroupement d' un objet d'une classe avec un ou
plusieurs de ses objets composants.
Placement adapté aux parcours de chemin
Clustering aléatoire
les objets sont placés dans l'ordre de leur
création, dans un espace unique.

49
Clustered Collection Cases
Cluster objects on disk Reduce the
number of IOs
(Placement trees are represented by directed
graphs)

Default clustering
Simple clustering

COMPANY
50
More Clustering Cases

Conjunctive clustering
Disjunctive clustering

COMPANY
PROPOSAL
5
10
COMMAND
51
Clustering Statistics on Partitions
How many pages will have to be loaded to scan the
collection A ?
IIAII Cardinality IAI Number of disk blocks

Can be maintained by the system
Can be evaluated

SA Average Size of Object Sp Available Page
Size DA,B Average Number of distinct
references .....
52
Clustering Statistics Example

ICOMPANYI

IPRODUCTI
53
Yao Number of Clustered Block Hits

Yao function Yao( IICII , ICI , k number of
selected objects )
returns the number of block hits
Yao' function sum Yao functions applied on each
involved cluster
Given a clustered collection C and p the number
of partitions to be scanned, we have
where ki is the number of objects to be selected
in cluster i

54
Yao Example

x in Companies, x.assetgt100 000 kF
x in Companies, x.assetgt100 000 kF and
x.product.yearlt1980
x in Companies, x.assetgt100 000 kF and
x.product.yearlt1980
and x.product.command.Nlt1000

Yao( ) Yao( ) Yao( ) Yao(
)
Yao( ) Yao( )
Yao( )
55
I/O Join Formulas

Nested loops
C1 C1C2
Merge join
cost(sort(C1)) cost(sort(C2))
cost(merge(C1,C2)) cost(Result)
?C1logC1 ?C2logC2
C1C2 ...
Hash join
cost(hash(C1)) cost(scan(C2))
C2cost(probe(C1)) cost(Result)
C1 C2 ...

56
Parameters for Links

fanC1,C2 average number of references from a C1
object to a C2 object
DC1,C2 number of distinct references from a C1
object to a C2 object
XC1,C2 number of C1 objects having no reference
to C2 object
ZC1,C2 average number of distinct references
from C1 objects having at least one reference to
C2 object
ZC1,C2 DC1,C2 C1 / (C1- XC1,C2 )

57
I/O Path Traversal Formulas

Cost of DFF Gardarin, Gruser, Tang 96
Large memory (no swap)
Small memory (worst case)
DFF is efficient with large memory

58
Exercice Cost Model

Compare the I/O costs of DFF, BFF and RBFF
Discuss the advantage of each of them according
to memory size and predicate selectivity

59
7. Search Strategies
from Lanzelotte 1992
Search
Strategy
Randomized
Enumerative
Genetic Algorithm
Iterative Improvement
Exhaustive Search
Augmentation Heuristic
Tabu Search
Simulated Annealing
Mixte (2 phases)
60
Exhaustive Search

Function Exhaustive(Query)
p Parse(Query) // Set the initial plan
S p // S is the set of all investigated
plans
while not StopCond()
p' Transform (p) // Apply a
transformation rule
if p' ? S then
? p p'
Insert (S, p') // Maintain the set of
investigated plans
return Optimal(S) // Select best plan among
all generated plans
?

61
Illustration of ES
Parse(Query)
r2
r1
r3
r4
r5
r6
r7
r9
r8
r10
SELECT MINIMAL COST PLAN
Up to StopCond (Exhausted time)
r11
62
Classical Ameliorations

Reduce search space and control rule selection
Select profitable/best move at each step
Introduce a gain estimator for each rule
Apply only rules with best estimators
Avoids loops and applying rules in both
directions
Such approaches find only a local minimum
Risk of fall into a hole
Iterative improvement minimize risk

63
Iterative Improvement Scheme

II randomly chooses an initial processing tree.
It then accepts only those downhill moves.
This is called local optimization.
When the local condition is reached, II picks up
a new random state, and performs local
optimization from that state.
The process is repeated until a stopping
condition is met.
The global minimum is the best local minimum
found till now.

64
Iterative Improvement Procedure

Procedure II()
p Initialize() // set an initial state, i.e.,
pick a random PT for evaluating the query
OptimalPlan p // Initialize optimal plan
while not(stopping_condition) do // Loop for
global optimization (on various initial states)
while not(local_condition) do // Loop for
local optimization
p move(p) // Apply a valid transformation
to p
if (Cost(p)ltCost(p)) then p p // Keep
plan if less costly
if cost(p)ltcost(OptimalPlan) then OptimalPlan
p // Select optimal plan
p RandomPlan // Move to next random
selected plan
Return(OptimalPlan)

65
Illustration of I.I.
Parse(Query)
Rand(Parse(Query))
Rand(Rand((Parse(Query)))
Profitable r'1
Profitable r"1
Profitable r1
Profitable r"2
Profitable r2
SELECT MINIMAL COST PLAN
66
Simulated Annealing Scheme

SA also starts at a random processing tree and
generates the next state by applying a
transformation rule on the current processing
tree.
Differently from II, SA accepts both downhill and
uphill moves. Uphill moves are allowed with the
probability
The parameter temperature decreases when the
inner block reaches an equilibrium point. Thus
the uphill moves are being accepted with less and
less probability.
When a stopping condition is satisfied, the best
traversed plan is selected as optimal.

67
Simulated Annealing Procedure

Procedure SA()
p Initialize() // set an initial state, i.e.,
pick a random PT for evaluating the query
OptimalPlan p // Initialize optimal plan
TT0 // Initialize temperature
while not(stopping_condition) do // Loop for
global optimization (on various initial states)
while not(equilibrium) do // Loop for local
optimization
p move(p) // Apply a valid transformation
to p
deltacost(p)-cost(p) // Compute
differential cost
if (deltalt0) then p p // If cost reduced
pick new plan
if (deltagt0) then p p with probability
e-delta/T // If cost increased, accept if hot
if cost(p)ltcost(OptimalPlan) then OptimalPlan
p // Maintain optimal plan
Treduce(T) // Reduce temperature
Return(OptimalPlan)

68
Illustration of S.A.
Cost
Parse(Query)
Profitable r1
Non Profit. r2
Profitable r3
Profitable r4
Profitable r6
Non Profit. r5
SELECTED PLAN
Moves
69
Tabu Search Scheme

TS is a general meta-heuristic procedure for
global optimization, which performs an aggressive
exploration of the state space ( best possible
move, with restriction list)
TS starts from a randomly generated initial
state, and repeatedly performs moves from a state
to a neighbor one.
At each iteration the procedure generates a
subset V of the set N(S) of the neighbors of the
current state S and select the best.
The subset does not contain any state which is
recorded in the Tabu list. This avoids the
cycling or at least reduces its probability. The
Tabu list is updated each time the current state
is updated. This forbids moves which should bring
back to a previous explored state.

70
Tabu Search Procedure

Procedure TS()
p Initialize() // set an initial state, i.e.,
pick a random PT for evaluating the query
OptimalPlan p // Initialize optimal plan
T // initialize Tabu list
while not(stopping condition) do // global
loop
generate the set VN(S)-T by applying move(S)
// All move accepted except tabu
choose the best solution p V // Pick best
move
T (T-(oldest)) ??p // Update the tabu list
by removing oldest plans and adding picked
if cost(p)ltcost(OptimalPlan) then OptimalPlan
p // Maintain optimal plan
return(OptimalPlan)

71
Comparison of Strategies
Cost of Best Plan
II is the best with good random sampling Tabu
looks attractive
Tabu
SA
Tabu
II Join exchange
II Swap

200 400 600 800 1000 1200 1400

Number of moves
72
Genetic Algorithm

Genetic Algorithm (GA) is a non-gradient
optimization algorithm used for the search of
local extremes (minimum or maximum) of functions
with many variables and functional extremes.
These functions are usually defined on very
complex and discrete domains.
The basic idea of GA is to use principles of
evolution of organisms in nature.
Instead of working on one particular solution at
a time, it considers a population of solutions.

73
GA Principle
Initialisation
Mutation
Crossover
Evaluation
Sort
Selection
Terminate
No
Yes
74
GA Phases

Initialization - randomly generate an initial
small population of solutions (i.e., processing
trees) from the whole search space.
Mutation - choose one solution (i.e., processing
tree) from the population, and apply
transformation rules to it.
Crossover - randomly choose two solutions from
the population, and exchange their common
subtrees in order to generate two new processing
trees.
Evaluation - for each solution, evaluate the
value of its fitness function (i.e., cost
function),
Sort - sort all solutions according to their cost
values.
Selection - choose certain number of the best
solutions from the result of Sort as the parents
of the next generation.
Termination - check termination criteria for
stopping the optimization.

75
Gene Base for 5 collections
76
Mutation Operator
77
Crossover operator
78
Improved GA
79
GA Procedure

Procedure GA()
Generate the initial population PopuBasePopu
// Initialize the base population at random
Sort(Popu) // Sort population of PTs on
increasing cost
OptimalPlan Popu0 // Keep the best
traversed plan
While not (stopping_condition) do
Percent 0
While Percent lt Part BasePopu do // Apply
Crossover to Part of the population
p1 PopuRandom(BasePopu) // Randomly
choose p1 and p2 from Popu
p2 PopuRandom(BasePopu)
Crossover(p1, p2) // Apply Crossover if
possible
Percent Percent 2
For the rest in Popu do Mutation // Apply
Mutation for the rest of the population
For (i0 i lt NewPopu i) do
evaluate(Popui) // Compute cost for new
population
Sort(Popu) // Sort population of PTs on
increasing cost
if (Popu0 lt OptimalPlan) then OptimalPlan
Popu0 // Keep the best traversed plan
// Optional replacement of the worst elements
Percent 0
i BasePopu // Initialize for replacement
While Percent lt Repl BasePopu do // Apply
replacement to Repl of the population

G. Gardarin
80
8. Open Problems

Efficient control of rule applications
What is the best strategy (Genetic ?)
Simple but accurate rule gain estimator
(Priorities ?)
Estimation of method costs
Statistics keep average and variations at each
call
Revelation the user provide a cost estimate
attribute
Disencapsulation understand the method code
Problem late binding complexifies the
estimation
Querying bulk types
Collections may be list, array, trees or matrices
Ordered collections may requires additional
operators
Cost model should be extended to capture
aggregates

G. Gardarin
81
Exercice Strategies

Discuss the advantages and inconvenients of each
search strategy
Compare them in case of a small rule base and a
large rule base

G. Gardarin
82
Conclusion

The Query Optimizer is a key component of a DBMS
Relational techniques can be generalized
Object algebra
Cost model
Search strategies
New techniques are required for
Extensibility of data types
Optimizing path expressions
Optimizing method calls
Optimizing bulk data types
Path index maintenance and access
Not much is done on new features in OODBMS

G. Gardarin
83
For More Informations (1)

Finance, Gardarin, IEEE DE 91
LORA Algebra and EDS extensible optimizer
Finance, Gardarin, DKE 93
Rule Language for extensible optimizer
Gardarin,Gruser, Tang, VLDB95
Cost model for OODBs Scan, DFF, validation on
O2
Gardarin,Gruser, Tang, VLDB96
Analytical experimental comparisons of
DFF,BFF,RBFF
Gardarin, in Advances in OO DB Systems,
Springer94
Object rule language, optimisation of recursive
updates extension of LORA to recursion

G. Gardarin
84
For More Informations (2)

Mitchell, Zdonik, Dayal, in Advances in OO DB
Systems , Springer V. 94
Optimization of OO query languages Problems and
Approaches
Cluet, Delobel, SIGMOD 92
A General Framework for the Optimization of
Object-Oriented Queries
Kemper, Moerkotte, VLDB 90
Advance Query Processing in An Object Bases Using
Access Support Relations
Lanzelotte, Valduriez, VLDB 91
Extending the Search Strategy in a Query Optimizer

G. Gardarin
85
Path Index

Multi-index Gemstone
les chemins sont des séquences de variables
appartenant à la structure des objets.
Les index sont définis pour chaque lien du chemin
les index représentent des identifiants d'objet
ou les valeurs des variables.
Implémentés comme des B trees.
Nested index Bertino 89
une seule entrée définie pour toute la longueur
d'un chemin.
Accès au début seulement en connaissant la fin du
chemin.
Utile seulement si l'on connait parfaitement le
chemin et qu'il est utilisé souvent.
Difficile à maintenir.
Path index Bertino 89 Kemper 90
associe la fin du chemin avec tous les suffixes
du chemin
il fonctionne avec des sous-chemins
Implémenté comme des relations
chaque colonne d'un tuple correspond à un pas du
chemin
chaque champs du tuple contient un identifiant
d'objet ou une valeur.