Title: Query Optimization in Object Databases
1Query Optimization in Object Databases
Laboratoire PRiSM/UVSQ
G. Gardarin
21. Introduction
- Object models provide ADTs, inheritance, complex
structures, relationships and object identity - Query Optimizers transform query in query plans
composed of low level operations to evaluate on
the object collections - New techniques are required for supporting the
object-oriented features
3Outline
- Object Query Languages
- Complex Object Algebra
- Operator Algorithms
- Query Plan Transformations
- Cost Models
- Search Strategies
- Open Problems
4Overview
- Presentation of the various topics of Query
Processing in OODBMSs - Topics are not independent
- Operators depend on data structures (index)
- Search strategies depend on cost model
- An optimizer has to consider all aspects
- Complex piece of software
- Has to be extensible for additional features
- data types
- access methods
- operators
5Vocabulary
- Collection a set, list, array or bag of objects
- Query a user query in high level language
- Predicate a term of a query criteria
- Qualification a logical expression of
predicates - Operator a low level accessor to 1 or several
collections - Annotation the selected algorithm for executing
an operator - Query plan a program of annotated operator
- Cluster a group of related objects stored
together in a bucket - Index an accelerator by value of an attribute
- Path index an accelerator by values along a path
62. Object Query Languages
- Extension of SQL with
- user defined functions in predicates and results
- user defined comparison predicates
- path expressions to traverse relationships
- flattening, grouping and degrouping operators
- automatic scan of inheritance hierarchies
- Two standards are under construction
- The object standard of ODMG (OQL)
- The object-relational standard of ISO/ANSI (SQL3)
7Database Example (1)
Vehicle
Maker
Number
Color
Company
String
String
President
Name
City
Employee
String
String
Ssn
BirthDate
Name
Float
String
8Query Example
- Object identity
- SELECT E.Name, C.Name
- FROM Employee E, Company C
- WHERE C.President E
- Paths and method
- SELECT Number
- FROM Vehicle
- WHERE Color "Red"
- AND Vehicle.Maker.City "Paris"
- AND Vehicle.Maker.President.age() lt 50
9Database Example (2)
Company
Employs
Name
City
Person
Owns
Name
Age
Vehicle
Number
Power
10Qualified Path Expression
- OQL form
- SELECT C.Name, P.Name, V.Number
- FROM C IN Companies, P IN C.Employs, V IN P.Owns
- WHERE C.City"Paris" AND P.Agelt30 AND V.Powgt10
- Direct form
- SELECT C.Name, P.Name, V.Number
- FROM Companies C, Persons P, Vehicles V
- WHERE
- CCity"Paris".Employs.PAgelt30.Owns.VPowgt10
11Exercice Queries
- Express in OQL, then with qualified path
expressions, a set of given queries.
123. Complex Object Algebra
- Generalization of relational algebra
- Set-oriented processing of objects
- A set of operations on collections of objects
generating collections of objects - Different types of collections
- class extent, set, bag, list, array
- Any query can be expressed as a complex object
algebra expression - Logical algebra annotated for execution
13The LORA Algebra
- Finance and Gardarin 1991
LoraOp
SearchOp
UpdateOp
TransactOp
GroupOp
Filter Map
SetOp
Join
Sort
RJoin
VJoin
- RemoveDup - Aggregate - Nest - Unnest
- Union - Intersect - Difference
14Main Operator Signatures
- JOINCol,Exp,Col gtCol
- OUTER_JOIN Col,Exp,Col gt Col
- SORT Col, Exp gt Col
- AGG Col, Exp, Exp gt Col
- NEST Col, Nest_Exp gt Col
- UNNEST Col, Nest_exp gtCol
- RDUPLICATE Col, Exp gt Col
- FILTER Col, Qual gtCol
- MAP Col, Exp,Qual gt Col
- MINUS Col, Col gt Col
- DIVIDE Col, Col gtCol
- UNION Col, Col gt Col
- OUTER_UNION Col, Col gt Col
- INTERSECT Col, Colgt Col
- UPDATE Col, Col, Ident, Assignement gtCol
- INSERT Col, Col gt Col
- DELETE Col, Col, Ident gt Col
15Algebraic Tree
SELECT Number FROM Vehicle WHERE Color
"Red" AND Vehicle.Maker.City "Paris" AND
Vehicle.Maker.President.age() lt 50
Filter(,Number)
RJoin(Maker)
Filter(City"Paris",)
RJoin(President)
Filter(Color"Red")
Filter(age()lt50,)
Vehicle
Company
Employee
16The ENCORE Algebra (1)
- Shaw and Zdonik 1990
- Select(InputCollection, p)
- s?(s in InputCollection) ? p(s)
- Image(InputCollection, f T)
- f(s)?s in InputCollection
- Project(InputCollection,lt(A1, f1), ...,(An, fn)gt)
- ltA1 f1(s), ...,An fn(s)gt?(s in
InputCollection)
17The ENCORE Algebra (2)
- Nest(InputCollection,Ai)
- ltA1 s.A1, ...,Ai t, ...,An s.Angt??r ?s
(r in t ? s in InputCollection ? s.Ai r) - UnNest(InputCollection,Ai)
- ltA1 s.A1, ...,Ai t, ...,An s.Angt?s in
InputCollection???t in Ai - Flatten(InputCollection)
- r??t in InputCollection ? r in t
- DupEliminate (InputCollection)
- Coalesce (InputCollection, Ai)
18The ENCORE Algebra (3)
- OJoin(InputCollection1,InputCollection2,A1,A2,p)
- ltA1 s, A2 rgt?s in InputCollection1 ? r in
InputCollection2 ? p(s,r) - Set-oriented operations
- Union
- Intersection
- Difference
- with set membership based on object identity
19The OFL Operators
- Gardarin Machucca 1995
- Navigational traversal often interesting
- Existential quantification
- Better control of query plans, smaller
granularity - Mixing navigational and set-oriented traversal
- Based on Bachus functional approach
- Processing of collections of objects
- Side effect introduced through cursors
20The OFL Language
- Definition Abstract Collection
- A container of objects encapsulated by a finite
set of behavioral and traversal functions. - Constructions
- Composition f.g (x) f(g(x))
- Path expressions f0.f1....fn(x)
- Conditional If_Then_Else (p, f1, f2) (x)
- Iteration While (p,f)
- Sequence Sequence(f1, f2, , fn)
21Collection Traversal in OFL
- Quantified function ????Apply to all
- A second order function of signature ForAll(C, p,
f) that applies a function f to all objects of a
collection C satisfying a predicate p. - Quantified function ?????Apply to any
- A second order function of signature ForAny(C, p,
f) that applies a function f to any object of a
collection C satisfying a predicate p. - Iterator and Annotations
- Each quantified function works on an iterator
- Set-oriented or navigational traversal is selected
22Database Example
23Translating Query in OFL
- SELECT p.lastname
- FROM p in Person
- WHERE exists v in p.owner v.color "Red"
- ForAll(Person P, null,
- ForAny(Owner(P) V, StringEqual(Color(V),
"Red"), LastName(P) ) )
24A More Complex Query
- OQL Query
- SELECT tuple(p.lastname, v.price, c.partlabel)
- FROM p in Person, v in p.owner, c in v.composed
- WHERE p.age 16 and v.pricec.price
- OFL translation
- ForAll(Find(AgeIndex,16) P, null,
- ForAll(Owner(P) V, null,
- ForAll(Composed(V)T,IntegerEqual(Price(V),Price(T
)), - Tuple (LastName(P), Price(V), PartLabel(T))))))
25Further Operators
- Recursive operators
- FixPoint(ResultCollection, InitializationExpressio
n,RecursivePredicate, RecursiveExpression,
FinalExpression) - gives the OFL program
- Sequence(OFLInitializationExpression,
- While(OFLRecursivePredicate,
OFLRecursiveExpression), - OFLFinalExpression)
26Exercice Algebra
- Write in OFL the definition of LORA operations
- exemple
- Join(InputCollection1,InputCollection2,ResultColle
ction, JoinPredicate, ProjectionExpression) - gives the OFL program
- ForAll(InputCollection1,null,
- ForAll(InputCollection2, OFLJoinPredicate,
- InsertResultCollection(ResultCollection,
- OFLProjectionExpression))))
274. Algebraic Operator Algorithms
- Classical relational operators still valid ...
- Filtering with a predicate (Restriction)
- Sequential scan
- Index scan, clustered or non-clustered
- Value-based join
- Nested loop join
- iterate on the outer collection and compare each
outer object with each object in the inner
collection - Merge join
- sort on join fields the two collections and then
merge - Hash join
- hash the outer collection on join fields, scan
the inner table and probe the hashed collection
28Path Traversals
- Paths may involve multiple collections
- Each collection can be qualified by predicates
29Depth-First-Fetch
- Depth-First-Fetch (DFF) is the natural algorithm
for evaluating a path expression. - It follows the path from the root to the target
collection, using a depth first graph traversal
algorithm. - The corresponding operator is an n-ary operator
denoted DFF. - Advantages
- no intermediate results, simple pointer chasing
- result are assembled one at a time allowing
pipeline - efficient when the memory size is large enough to
avoid swapping of objects
30Breadth-First-Fetch
- Breadth-First-Fetch (BFF) traversal processes the
tree of objects using a Forward Join (FJ)
algorithm which is based on pointer chasing
between two collections. - Successive binary joins of collections are
performed from the source collection to the
target, following the path in a forward order. - Advantages
- no multiple fetch of objects
- requires the construction of hashed support table
to memorize FJ results
31Reverse-Breadth-First-Fetch
- Reverse-Breadth-First-Fetch (RBFF) performs a
sequence of binary joins between two neighbor
collections to traverse the path, but it proceeds
in the reverse order of the path. - Thus, each join is called a Reverse Join (RJ).
The join criterion is the member-ship of the
second collection object identifier to the first
collection pointer attribute values. - Advantages
- efficient when predicate(s) in last collection(s)
selective - requires supporting tables and value-based joins
32Illustration of BFF RBFF
33Further Algorithms
- Various algorithms are available for each
operator - Combination of operators can be applied
- to traverse long paths
- to derive new algorithms
- hash both sort buckets merge buckets
- limited breadth-first-fetch
- Cost is dependent of many factors
- physical organization of objects
- size of collections
- selectivity of predicates
- available memory size
- possible degree of parallelism
34Exercice Algorithms
- List all the possible annotated query plans to
process the query - SELECT C.Name, P.Name, V.Number
- FROM C IN Companies, P IN C.Employs, V IN P.Owns
- WHERE C.City"Paris" AND P.Agelt30 AND V.Powgt10
- over the database schema
Company
Employs
Name
City
Person
Owns
String
String
Name
Age
Vehicle
String
Int
Number
Power
Int
Int
355. Query Plan Transformations
- Query rewrite Algebraic rewrite of query tree
- semantic transformations based on properties of
data types and integrity constraints - syntactic transformation based on properties of
operators - Query planning Selection of best algorithms
- annotation of logical operators with selected
algorithms - cost of an annotated algorithm often dependent of
result of previous algorithm - e.g., no sort needed if result sorted
- Query rewrite and query planning are not
independent
36Extensible Optimizers
- Closed Optimizer
- set of operators and transformations fixed
- heuristic-based or cost-based selection of plans
- efficient but hard to modify and extend
- e.g., Oracle 7.3, SQL Server 10, ...
- Extensible Optimizer
- extensible set of operators and transformations
- rule-based generation of query plans
- selection of "best" plan using a search strategy
- e.g., Exodus, Starbust and DB2 CS, Esprit EDS
IDEA, Illustra, ...
37Rewrite Rule Base
From Gardarin, Finance DKE 93
Query Plan
- Common Expression Detection
- Syntactic
- Optimization
- Semantic
- Optimization
- Predicate
- Simplification
Cost model Heuristics
Modular Rule Base
STRATEGY
Optimized Query Plan
38Syntactic Rewrite Rules (1)
- Restrict through Union Pushing Rule
- Restrict(Union(C1,C2)) ltgt
- Union(Restrict(C1,C2))
- Restrictions through Super Class Pushing Rule
- Restrict (Super(C1,C2)) ltgt Super(Restrict(C1),Re
strict(C2))
39Syntactic Rewrite Rules (2)
- Join Commutativity
- Join (C1,C2) ltgt
- Join (C2,C1)
- Join Associativity
- (C1 Join C2) Join C3) ltgt
- C1 Join (C2 Join C3)
- Restrict through Join Pushing Rule
- Restrict(Join(C1,C2)) ltgt
- Join(Restrict(C1),Restrict(C2))
40Planning Rules
- Join method choice
- JoinNL (C1,C2) ltgt JoinSM (C1,C2)
- JoinHP (C1,C2) ltgt JoinSM (C1,C2)
- Depth First Fetch introduction
- DFF(C1,C2,C3) ltgt Join(C1,Join(C2,C3))
- Index Scan introduction
- Scan(C1,P) ltgt Scan(IScan(C1,I(P)),PI(P))
41Semantic Rules
- Integrity constraints
- Type(x) Square ltgt
- Type(x) Polygon and large(x) long(x)
- User function properties
- draw(xy) draw(x) draw(y)
42What Rule Language ?
- Rules are often complex to express
- Conditions on qualifications, operators, results,
... - Proposed rule languages
- C rewriting procedure Exodus, Starbust
- if ltC proceduregt is true then ltC proceduregt
- Practical but hard to extend optimizer
- Side effective rule language Finance91
- WHEN ltQuery Expressiongt IF ltConditiongt
- THEN ltQuery Expression'gt UNDER ltActiongt
- Complex to implement for pattern matching
- OQL Query equivalence Florescu95
- Parametrized Query Parametrized Query
- Lack of generality (e.g., query planning not
possible)
43Choice of Best Query Plan
Algebraic Tree
Database Schema
Query Plans
Transformation Rule base
Query Plan Generator
Search Strategy
Cost Model
action ? cost ? float goak
??boolean
"Best" Query Plan
44Exercice Rules
- Given a linear path expression from collection 1
to i, determine the number of distinct query
plans to process the query, assuming that 3
algorithms are available to process any path
expression (DFF, BFF, RBFF) - Give the rule base to generate those plans
........
Ci
C1
Ci-1
C2
456. Cost Models
- Extension of relational cost model to handle
- Object identifiers
- Path indexes
- Object linking and embedding
- Clustering
- Takes into account CPU cost and I/O cost
- CPU cost ? Number of examined objects
- I/O cost ? Number of pages read
46Collection Parameters
- C number of pages of collection C
- C number of objects of collection C
- Ci number of pages of cluster i of collection
C - Ci number of objects of cluster i of
collection C - SC average object size in collection C
- SProj average size of projection result
- M available memory size
- Sel(Qual) selectivity of qualification Qual
- Sel(Pred) selectivity of indexed predicate Pred
47I/O Scan Formulas
- Sequential scan
- I/O cost I/OScan I/OResult
- I/OScan C
- I/OResult Sel(Qual)CSProj/SC - M if gt 0 else
0 - Unclustered index scan
- I/O cost I/OIndex I/OHitI/OResult
- I/OIndex Blevel(I)
- I/OHit Yao(C,C,Sel(Pred)C)
- Clustered index scan
- I/O cost I/OIndex I/OHitI/OResult
- I/OHit Sel(Pred)C
48Object Clustering
- Clustering par classe
- Regroupement de toutes les instances d'une même
classe dans un même fichier - Clustering par composition
- Regroupement d' un objet d'une classe avec un ou
plusieurs de ses objets composants. - Placement adapté aux parcours de chemin
- Clustering aléatoire
- les objets sont placés dans l'ordre de leur
création, dans un espace unique.
49Clustered Collection Cases
Cluster objects on disk Reduce the
number of IOs
(Placement trees are represented by directed
graphs)
- Default clustering
- Simple clustering
COMPANY
50More Clustering Cases
- Conjunctive clustering
- Disjunctive clustering
COMPANY
PROPOSAL
5
10
COMMAND
51Clustering Statistics on Partitions
How many pages will have to be loaded to scan the
collection A ?
IIAII Cardinality IAI Number of disk blocks
- Can be maintained by the system
- Can be evaluated
SA Average Size of Object Sp Available Page
Size DA,B Average Number of distinct
references .....
52Clustering Statistics Example
ICOMPANYI
IPRODUCTI
53Yao Number of Clustered Block Hits
- Yao function Yao( IICII , ICI , k number of
selected objects ) - returns the number of block hits
- Yao' function sum Yao functions applied on each
involved cluster - Given a clustered collection C and p the number
of partitions to be scanned, we have - where ki is the number of objects to be selected
in cluster i
54Yao Example
- x in Companies, x.assetgt100 000 kF
- x in Companies, x.assetgt100 000 kF and
x.product.yearlt1980 - x in Companies, x.assetgt100 000 kF and
x.product.yearlt1980 - and x.product.command.Nlt1000
Yao( ) Yao( ) Yao( ) Yao(
)
Yao( ) Yao( )
Yao( )
55I/O Join Formulas
- Nested loops
- C1 C1C2
- Merge join
- cost(sort(C1)) cost(sort(C2))
cost(merge(C1,C2)) cost(Result) - ?C1logC1 ?C2logC2
C1C2 ... - Hash join
- cost(hash(C1)) cost(scan(C2))
C2cost(probe(C1)) cost(Result) - C1 C2 ...
56Parameters for Links
- fanC1,C2 average number of references from a C1
object to a C2 object - DC1,C2 number of distinct references from a C1
object to a C2 object - XC1,C2 number of C1 objects having no reference
to C2 object - ZC1,C2 average number of distinct references
from C1 objects having at least one reference to
C2 object - ZC1,C2 DC1,C2 C1 / (C1- XC1,C2 )
57I/O Path Traversal Formulas
- Cost of DFF Gardarin, Gruser, Tang 96
- Large memory (no swap)
- Small memory (worst case)
- DFF is efficient with large memory
58Exercice Cost Model
- Compare the I/O costs of DFF, BFF and RBFF
- Discuss the advantage of each of them according
to memory size and predicate selectivity
597. Search Strategies
from Lanzelotte 1992
Search
Strategy
Randomized
Enumerative
Genetic Algorithm
Iterative Improvement
Exhaustive Search
Augmentation Heuristic
Tabu Search
Simulated Annealing
Mixte (2 phases)
60Exhaustive Search
- Function Exhaustive(Query)
- p Parse(Query) // Set the initial plan
- S p // S is the set of all investigated
plans - while not StopCond()
- p' Transform (p) // Apply a
transformation rule - if p' ? S then
- ? p p'
- Insert (S, p') // Maintain the set of
investigated plans -
-
- return Optimal(S) // Select best plan among
all generated plans - ?
61Illustration of ES
Parse(Query)
r2
r1
r3
r4
r5
r6
r7
r9
r8
r10
SELECT MINIMAL COST PLAN
Up to StopCond (Exhausted time)
r11
62Classical Ameliorations
- Reduce search space and control rule selection
- Select profitable/best move at each step
- Introduce a gain estimator for each rule
- Apply only rules with best estimators
- Avoids loops and applying rules in both
directions - Such approaches find only a local minimum
- Risk of fall into a hole
- Iterative improvement minimize risk
63Iterative Improvement Scheme
- II randomly chooses an initial processing tree.
- It then accepts only those downhill moves.
- This is called local optimization.
- When the local condition is reached, II picks up
a new random state, and performs local
optimization from that state. - The process is repeated until a stopping
condition is met. - The global minimum is the best local minimum
found till now.
64Iterative Improvement Procedure
- Procedure II()
- p Initialize() // set an initial state, i.e.,
pick a random PT for evaluating the query - OptimalPlan p // Initialize optimal plan
- while not(stopping_condition) do // Loop for
global optimization (on various initial states) - while not(local_condition) do // Loop for
local optimization - p move(p) // Apply a valid transformation
to p - if (Cost(p)ltCost(p)) then p p // Keep
plan if less costly -
- if cost(p)ltcost(OptimalPlan) then OptimalPlan
p // Select optimal plan - p RandomPlan // Move to next random
selected plan -
- Return(OptimalPlan)
65Illustration of I.I.
Parse(Query)
Rand(Parse(Query))
Rand(Rand((Parse(Query)))
Profitable r'1
Profitable r"1
Profitable r1
Profitable r"2
Profitable r2
SELECT MINIMAL COST PLAN
66Simulated Annealing Scheme
- SA also starts at a random processing tree and
generates the next state by applying a
transformation rule on the current processing
tree. - Differently from II, SA accepts both downhill and
uphill moves. Uphill moves are allowed with the
probability - The parameter temperature decreases when the
inner block reaches an equilibrium point. Thus
the uphill moves are being accepted with less and
less probability. - When a stopping condition is satisfied, the best
traversed plan is selected as optimal.
67Simulated Annealing Procedure
- Procedure SA()
- p Initialize() // set an initial state, i.e.,
pick a random PT for evaluating the query - OptimalPlan p // Initialize optimal plan
- TT0 // Initialize temperature
- while not(stopping_condition) do // Loop for
global optimization (on various initial states) - while not(equilibrium) do // Loop for local
optimization - p move(p) // Apply a valid transformation
to p - deltacost(p)-cost(p) // Compute
differential cost - if (deltalt0) then p p // If cost reduced
pick new plan - if (deltagt0) then p p with probability
e-delta/T // If cost increased, accept if hot - if cost(p)ltcost(OptimalPlan) then OptimalPlan
p // Maintain optimal plan -
- Treduce(T) // Reduce temperature
-
- Return(OptimalPlan)
68Illustration of S.A.
Cost
Parse(Query)
Profitable r1
Non Profit. r2
Profitable r3
Profitable r4
Profitable r6
Non Profit. r5
SELECTED PLAN
Moves
69Tabu Search Scheme
- TS is a general meta-heuristic procedure for
global optimization, which performs an aggressive
exploration of the state space ( best possible
move, with restriction list) - TS starts from a randomly generated initial
state, and repeatedly performs moves from a state
to a neighbor one. - At each iteration the procedure generates a
subset V of the set N(S) of the neighbors of the
current state S and select the best. - The subset does not contain any state which is
recorded in the Tabu list. This avoids the
cycling or at least reduces its probability. The
Tabu list is updated each time the current state
is updated. This forbids moves which should bring
back to a previous explored state.
70Tabu Search Procedure
- Procedure TS()
- p Initialize() // set an initial state, i.e.,
pick a random PT for evaluating the query - OptimalPlan p // Initialize optimal plan
- T // initialize Tabu list
- while not(stopping condition) do // global
loop - generate the set VN(S)-T by applying move(S)
// All move accepted except tabu - choose the best solution p V // Pick best
move - T (T-(oldest)) ??p // Update the tabu list
by removing oldest plans and adding picked - if cost(p)ltcost(OptimalPlan) then OptimalPlan
p // Maintain optimal plan -
- return(OptimalPlan)
71Comparison of Strategies
Cost of Best Plan
II is the best with good random sampling Tabu
looks attractive
Tabu
SA
Tabu
II Join exchange
II Swap
- 200 400 600 800 1000 1200 1400
Number of moves
72Genetic Algorithm
- Genetic Algorithm (GA) is a non-gradient
optimization algorithm used for the search of
local extremes (minimum or maximum) of functions
with many variables and functional extremes. - These functions are usually defined on very
complex and discrete domains. - The basic idea of GA is to use principles of
evolution of organisms in nature. - Instead of working on one particular solution at
a time, it considers a population of solutions.
73GA Principle
Initialisation
Mutation
Crossover
Evaluation
Sort
Selection
Terminate
No
Yes
74GA Phases
- Initialization - randomly generate an initial
small population of solutions (i.e., processing
trees) from the whole search space. - Mutation - choose one solution (i.e., processing
tree) from the population, and apply
transformation rules to it. - Crossover - randomly choose two solutions from
the population, and exchange their common
subtrees in order to generate two new processing
trees. - Evaluation - for each solution, evaluate the
value of its fitness function (i.e., cost
function), - Sort - sort all solutions according to their cost
values. - Selection - choose certain number of the best
solutions from the result of Sort as the parents
of the next generation. - Termination - check termination criteria for
stopping the optimization.
75Gene Base for 5 collections
76Mutation Operator
77Crossover operator
78Improved GA
79GA Procedure
- Procedure GA()
- Generate the initial population PopuBasePopu
// Initialize the base population at random - Sort(Popu) // Sort population of PTs on
increasing cost - OptimalPlan Popu0 // Keep the best
traversed plan - While not (stopping_condition) do
- Percent 0
- While Percent lt Part BasePopu do // Apply
Crossover to Part of the population - p1 PopuRandom(BasePopu) // Randomly
choose p1 and p2 from Popu - p2 PopuRandom(BasePopu)
- Crossover(p1, p2) // Apply Crossover if
possible - Percent Percent 2
- For the rest in Popu do Mutation // Apply
Mutation for the rest of the population - For (i0 i lt NewPopu i) do
evaluate(Popui) // Compute cost for new
population - Sort(Popu) // Sort population of PTs on
increasing cost - if (Popu0 lt OptimalPlan) then OptimalPlan
Popu0 // Keep the best traversed plan - // Optional replacement of the worst elements
- Percent 0
- i BasePopu // Initialize for replacement
- While Percent lt Repl BasePopu do // Apply
replacement to Repl of the population
G. Gardarin
808. Open Problems
- Efficient control of rule applications
- What is the best strategy (Genetic ?)
- Simple but accurate rule gain estimator
(Priorities ?) - Estimation of method costs
- Statistics keep average and variations at each
call - Revelation the user provide a cost estimate
attribute - Disencapsulation understand the method code
- Problem late binding complexifies the
estimation - Querying bulk types
- Collections may be list, array, trees or matrices
- Ordered collections may requires additional
operators - Cost model should be extended to capture
aggregates
G. Gardarin
81Exercice Strategies
- Discuss the advantages and inconvenients of each
search strategy - Compare them in case of a small rule base and a
large rule base
G. Gardarin
82Conclusion
- The Query Optimizer is a key component of a DBMS
- Relational techniques can be generalized
- Object algebra
- Cost model
- Search strategies
- New techniques are required for
- Extensibility of data types
- Optimizing path expressions
- Optimizing method calls
- Optimizing bulk data types
- Path index maintenance and access
- Not much is done on new features in OODBMS
G. Gardarin
83For More Informations (1)
- Finance, Gardarin, IEEE DE 91
- LORA Algebra and EDS extensible optimizer
- Finance, Gardarin, DKE 93
- Rule Language for extensible optimizer
- Gardarin,Gruser, Tang, VLDB95
- Cost model for OODBs Scan, DFF, validation on
O2 - Gardarin,Gruser, Tang, VLDB96
- Analytical experimental comparisons of
DFF,BFF,RBFF - Gardarin, in Advances in OO DB Systems,
Springer94 - Object rule language, optimisation of recursive
updates extension of LORA to recursion
G. Gardarin
84For More Informations (2)
- Mitchell, Zdonik, Dayal, in Advances in OO DB
Systems , Springer V. 94 - Optimization of OO query languages Problems and
Approaches - Cluet, Delobel, SIGMOD 92
- A General Framework for the Optimization of
Object-Oriented Queries - Kemper, Moerkotte, VLDB 90
- Advance Query Processing in An Object Bases Using
Access Support Relations - Lanzelotte, Valduriez, VLDB 91
- Extending the Search Strategy in a Query Optimizer
G. Gardarin
85Path Index
- Multi-index Gemstone
- les chemins sont des séquences de variables
appartenant à la structure des objets. - Les index sont définis pour chaque lien du chemin
- les index représentent des identifiants d'objet
ou les valeurs des variables. - Implémentés comme des B trees.
- Nested index Bertino 89
- une seule entrée définie pour toute la longueur
d'un chemin. - Accès au début seulement en connaissant la fin du
chemin. - Utile seulement si l'on connait parfaitement le
chemin et qu'il est utilisé souvent. - Difficile à maintenir.
- Path index Bertino 89 Kemper 90
- associe la fin du chemin avec tous les suffixes
du chemin - il fonctionne avec des sous-chemins
- Implémenté comme des relations
- chaque colonne d'un tuple correspond à un pas du
chemin - chaque champs du tuple contient un identifiant
d'objet ou une valeur.
G. Gardarin