Query Optimization in Object Databases - PowerPoint PPT Presentation

1 / 85
About This Presentation
Title:

Query Optimization in Object Databases

Description:

Query Optimizers transform query in query plans composed of low level operations ... Side effect introduced through cursors. 20. G. Gardarin. The OFL Language ... – PowerPoint PPT presentation

Number of Views:106
Avg rating:3.0/5.0
Slides: 86
Provided by: georgesg8
Category:

less

Transcript and Presenter's Notes

Title: Query Optimization in Object Databases


1
Query Optimization in Object Databases
  • Georges GARDARIN

Laboratoire PRiSM/UVSQ
G. Gardarin
2
1. Introduction
  • Object models provide ADTs, inheritance, complex
    structures, relationships and object identity
  • Query Optimizers transform query in query plans
    composed of low level operations to evaluate on
    the object collections
  • New techniques are required for supporting the
    object-oriented features

3
Outline
  • Object Query Languages
  • Complex Object Algebra
  • Operator Algorithms
  • Query Plan Transformations
  • Cost Models
  • Search Strategies
  • Open Problems

4
Overview
  • Presentation of the various topics of Query
    Processing in OODBMSs
  • Topics are not independent
  • Operators depend on data structures (index)
  • Search strategies depend on cost model
  • An optimizer has to consider all aspects
  • Complex piece of software
  • Has to be extensible for additional features
  • data types
  • access methods
  • operators

5
Vocabulary
  • Collection a set, list, array or bag of objects
  • Query a user query in high level language
  • Predicate a term of a query criteria
  • Qualification a logical expression of
    predicates
  • Operator a low level accessor to 1 or several
    collections
  • Annotation the selected algorithm for executing
    an operator
  • Query plan a program of annotated operator
  • Cluster a group of related objects stored
    together in a bucket
  • Index an accelerator by value of an attribute
  • Path index an accelerator by values along a path

6
2. Object Query Languages
  • Extension of SQL with
  • user defined functions in predicates and results
  • user defined comparison predicates
  • path expressions to traverse relationships
  • flattening, grouping and degrouping operators
  • automatic scan of inheritance hierarchies
  • Two standards are under construction
  • The object standard of ODMG (OQL)
  • The object-relational standard of ISO/ANSI (SQL3)

7
Database Example (1)
Vehicle
Maker
Number
Color
Company
String
String
President
Name
City
Employee
String
String
Ssn
BirthDate
Name
Float
String
8
Query Example
  • Object identity
  • SELECT E.Name, C.Name
  • FROM Employee E, Company C
  • WHERE C.President E
  • Paths and method
  • SELECT Number
  • FROM Vehicle
  • WHERE Color "Red"
  • AND Vehicle.Maker.City "Paris"
  • AND Vehicle.Maker.President.age() lt 50

9
Database Example (2)
Company
Employs
Name
City
Person
Owns
Name
Age
Vehicle
Number
Power
10
Qualified Path Expression
  • OQL form
  • SELECT C.Name, P.Name, V.Number
  • FROM C IN Companies, P IN C.Employs, V IN P.Owns
  • WHERE C.City"Paris" AND P.Agelt30 AND V.Powgt10
  • Direct form
  • SELECT C.Name, P.Name, V.Number
  • FROM Companies C, Persons P, Vehicles V
  • WHERE
  • CCity"Paris".Employs.PAgelt30.Owns.VPowgt10

11
Exercice Queries
  • Express in OQL, then with qualified path
    expressions, a set of given queries.

12
3. Complex Object Algebra
  • Generalization of relational algebra
  • Set-oriented processing of objects
  • A set of operations on collections of objects
    generating collections of objects
  • Different types of collections
  • class extent, set, bag, list, array
  • Any query can be expressed as a complex object
    algebra expression
  • Logical algebra annotated for execution

13
The LORA Algebra
  • Finance and Gardarin 1991

LoraOp
SearchOp
UpdateOp
TransactOp
GroupOp
Filter Map
SetOp
Join
Sort
RJoin
VJoin
- RemoveDup - Aggregate - Nest - Unnest
- Union - Intersect - Difference
14
Main Operator Signatures
  • JOINCol,Exp,Col gtCol
  • OUTER_JOIN Col,Exp,Col gt Col
  • SORT Col, Exp gt Col
  • AGG Col, Exp, Exp gt Col
  • NEST Col, Nest_Exp gt Col
  • UNNEST Col, Nest_exp gtCol
  • RDUPLICATE Col, Exp gt Col
  • FILTER Col, Qual gtCol
  • MAP Col, Exp,Qual gt Col
  • MINUS Col, Col gt Col
  • DIVIDE Col, Col gtCol
  • UNION Col, Col gt Col
  • OUTER_UNION Col, Col gt Col
  • INTERSECT Col, Colgt Col
  • UPDATE Col, Col, Ident, Assignement gtCol
  • INSERT Col, Col gt Col
  • DELETE Col, Col, Ident gt Col

15
Algebraic Tree
SELECT Number FROM Vehicle WHERE Color
"Red" AND Vehicle.Maker.City "Paris" AND
Vehicle.Maker.President.age() lt 50
Filter(,Number)
RJoin(Maker)
Filter(City"Paris",)
RJoin(President)
Filter(Color"Red")
Filter(age()lt50,)
Vehicle
Company
Employee
16
The ENCORE Algebra (1)
  • Shaw and Zdonik 1990
  • Select(InputCollection, p)
  • s?(s in InputCollection) ? p(s)
  • Image(InputCollection, f T)
  • f(s)?s in InputCollection
  • Project(InputCollection,lt(A1, f1), ...,(An, fn)gt)
  • ltA1 f1(s), ...,An fn(s)gt?(s in
    InputCollection)

17
The ENCORE Algebra (2)
  • Nest(InputCollection,Ai)
  • ltA1 s.A1, ...,Ai t, ...,An s.Angt??r ?s
    (r in t ? s in InputCollection ? s.Ai r)
  • UnNest(InputCollection,Ai)
  • ltA1 s.A1, ...,Ai t, ...,An s.Angt?s in
    InputCollection???t in Ai
  • Flatten(InputCollection)
  • r??t in InputCollection ? r in t
  • DupEliminate (InputCollection)
  • Coalesce (InputCollection, Ai)

18
The ENCORE Algebra (3)
  • OJoin(InputCollection1,InputCollection2,A1,A2,p)
  • ltA1 s, A2 rgt?s in InputCollection1 ? r in
    InputCollection2 ? p(s,r)
  • Set-oriented operations
  • Union
  • Intersection
  • Difference
  • with set membership based on object identity

19
The OFL Operators
  • Gardarin Machucca 1995
  • Navigational traversal often interesting
  • Existential quantification
  • Better control of query plans, smaller
    granularity
  • Mixing navigational and set-oriented traversal
  • Based on Bachus functional approach
  • Processing of collections of objects
  • Side effect introduced through cursors

20
The OFL Language
  • Definition Abstract Collection
  • A container of objects encapsulated by a finite
    set of behavioral and traversal functions.
  • Constructions
  • Composition f.g (x) f(g(x))
  • Path expressions f0.f1....fn(x)
  • Conditional If_Then_Else (p, f1, f2) (x)
  • Iteration While (p,f)
  • Sequence Sequence(f1, f2, , fn)

21
Collection Traversal in OFL
  • Quantified function ????Apply to all
  • A second order function of signature ForAll(C, p,
    f) that applies a function f to all objects of a
    collection C satisfying a predicate p.
  • Quantified function ?????Apply to any
  • A second order function of signature ForAny(C, p,
    f) that applies a function f to any object of a
    collection C satisfying a predicate p.
  • Iterator and Annotations
  • Each quantified function works on an iterator
  • Set-oriented or navigational traversal is selected

22
Database Example
23
Translating Query in OFL
  • SELECT p.lastname
  • FROM p in Person
  • WHERE exists v in p.owner v.color "Red"
  • ForAll(Person P, null,
  • ForAny(Owner(P) V, StringEqual(Color(V),
    "Red"), LastName(P) ) )

24
A More Complex Query
  • OQL Query
  • SELECT tuple(p.lastname, v.price, c.partlabel)
  • FROM p in Person, v in p.owner, c in v.composed
  • WHERE p.age 16 and v.pricec.price
  • OFL translation
  • ForAll(Find(AgeIndex,16) P, null,
  • ForAll(Owner(P) V, null,
  • ForAll(Composed(V)T,IntegerEqual(Price(V),Price(T
    )),
  • Tuple (LastName(P), Price(V), PartLabel(T))))))

25
Further Operators
  • Recursive operators
  • FixPoint(ResultCollection, InitializationExpressio
    n,RecursivePredicate, RecursiveExpression,
    FinalExpression)
  • gives the OFL program
  • Sequence(OFLInitializationExpression,
  • While(OFLRecursivePredicate,
    OFLRecursiveExpression),
  • OFLFinalExpression)

26
Exercice Algebra
  • Write in OFL the definition of LORA operations
  • exemple
  • Join(InputCollection1,InputCollection2,ResultColle
    ction, JoinPredicate, ProjectionExpression)
  • gives the OFL program
  • ForAll(InputCollection1,null,
  • ForAll(InputCollection2, OFLJoinPredicate,
  • InsertResultCollection(ResultCollection,
  • OFLProjectionExpression))))

27
4. Algebraic Operator Algorithms
  • Classical relational operators still valid ...
  • Filtering with a predicate (Restriction)
  • Sequential scan
  • Index scan, clustered or non-clustered
  • Value-based join
  • Nested loop join
  • iterate on the outer collection and compare each
    outer object with each object in the inner
    collection
  • Merge join
  • sort on join fields the two collections and then
    merge
  • Hash join
  • hash the outer collection on join fields, scan
    the inner table and probe the hashed collection

28
Path Traversals
  • Paths may involve multiple collections
  • Each collection can be qualified by predicates

29
Depth-First-Fetch
  • Depth-First-Fetch (DFF) is the natural algorithm
    for evaluating a path expression.
  • It follows the path from the root to the target
    collection, using a depth first graph traversal
    algorithm.
  • The corresponding operator is an n-ary operator
    denoted DFF.
  • Advantages
  • no intermediate results, simple pointer chasing
  • result are assembled one at a time allowing
    pipeline
  • efficient when the memory size is large enough to
    avoid swapping of objects

30
Breadth-First-Fetch
  • Breadth-First-Fetch (BFF) traversal processes the
    tree of objects using a Forward Join (FJ)
    algorithm which is based on pointer chasing
    between two collections.
  • Successive binary joins of collections are
    performed from the source collection to the
    target, following the path in a forward order.
  • Advantages
  • no multiple fetch of objects
  • requires the construction of hashed support table
    to memorize FJ results

31
Reverse-Breadth-First-Fetch
  • Reverse-Breadth-First-Fetch (RBFF) performs a
    sequence of binary joins between two neighbor
    collections to traverse the path, but it proceeds
    in the reverse order of the path.
  • Thus, each join is called a Reverse Join (RJ).
    The join criterion is the member-ship of the
    second collection object identifier to the first
    collection pointer attribute values.
  • Advantages
  • efficient when predicate(s) in last collection(s)
    selective
  • requires supporting tables and value-based joins

32
Illustration of BFF RBFF
33
Further Algorithms
  • Various algorithms are available for each
    operator
  • Combination of operators can be applied
  • to traverse long paths
  • to derive new algorithms
  • hash both sort buckets merge buckets
  • limited breadth-first-fetch
  • Cost is dependent of many factors
  • physical organization of objects
  • size of collections
  • selectivity of predicates
  • available memory size
  • possible degree of parallelism

34
Exercice Algorithms
  • List all the possible annotated query plans to
    process the query
  • SELECT C.Name, P.Name, V.Number
  • FROM C IN Companies, P IN C.Employs, V IN P.Owns
  • WHERE C.City"Paris" AND P.Agelt30 AND V.Powgt10
  • over the database schema

Company
Employs
Name
City
Person
Owns
String
String
Name
Age
Vehicle
String
Int
Number
Power
Int
Int
35
5. Query Plan Transformations
  • Query rewrite Algebraic rewrite of query tree
  • semantic transformations based on properties of
    data types and integrity constraints
  • syntactic transformation based on properties of
    operators
  • Query planning Selection of best algorithms
  • annotation of logical operators with selected
    algorithms
  • cost of an annotated algorithm often dependent of
    result of previous algorithm
  • e.g., no sort needed if result sorted
  • Query rewrite and query planning are not
    independent

36
Extensible Optimizers
  • Closed Optimizer
  • set of operators and transformations fixed
  • heuristic-based or cost-based selection of plans
  • efficient but hard to modify and extend
  • e.g., Oracle 7.3, SQL Server 10, ...
  • Extensible Optimizer
  • extensible set of operators and transformations
  • rule-based generation of query plans
  • selection of "best" plan using a search strategy
  • e.g., Exodus, Starbust and DB2 CS, Esprit EDS
    IDEA, Illustra, ...

37
Rewrite Rule Base
From Gardarin, Finance DKE 93
Query Plan
  • Common Expression Detection
  • Syntactic
  • Optimization
  • Semantic
  • Optimization
  • Predicate
  • Simplification

Cost model Heuristics
Modular Rule Base
STRATEGY
Optimized Query Plan
38
Syntactic Rewrite Rules (1)
  • Restrict through Union Pushing Rule
  • Restrict(Union(C1,C2)) ltgt
  • Union(Restrict(C1,C2))
  • Restrictions through Super Class Pushing Rule
  • Restrict (Super(C1,C2)) ltgt Super(Restrict(C1),Re
    strict(C2))

39
Syntactic Rewrite Rules (2)
  • Join Commutativity
  • Join (C1,C2) ltgt
  • Join (C2,C1)
  • Join Associativity
  • (C1 Join C2) Join C3) ltgt
  • C1 Join (C2 Join C3)
  • Restrict through Join Pushing Rule
  • Restrict(Join(C1,C2)) ltgt
  • Join(Restrict(C1),Restrict(C2))

40
Planning Rules
  • Join method choice
  • JoinNL (C1,C2) ltgt JoinSM (C1,C2)
  • JoinHP (C1,C2) ltgt JoinSM (C1,C2)
  • Depth First Fetch introduction
  • DFF(C1,C2,C3) ltgt Join(C1,Join(C2,C3))
  • Index Scan introduction
  • Scan(C1,P) ltgt Scan(IScan(C1,I(P)),PI(P))

41
Semantic Rules
  • Integrity constraints
  • Type(x) Square ltgt
  • Type(x) Polygon and large(x) long(x)
  • User function properties
  • draw(xy) draw(x) draw(y)

42
What Rule Language ?
  • Rules are often complex to express
  • Conditions on qualifications, operators, results,
    ...
  • Proposed rule languages
  • C rewriting procedure Exodus, Starbust
  • if ltC proceduregt is true then ltC proceduregt
  • Practical but hard to extend optimizer
  • Side effective rule language Finance91
  • WHEN ltQuery Expressiongt IF ltConditiongt
  • THEN ltQuery Expression'gt UNDER ltActiongt
  • Complex to implement for pattern matching
  • OQL Query equivalence Florescu95
  • Parametrized Query Parametrized Query
  • Lack of generality (e.g., query planning not
    possible)

43
Choice of Best Query Plan
Algebraic Tree
Database Schema
Query Plans
Transformation Rule base
Query Plan Generator
Search Strategy
Cost Model
action ? cost ? float goak
??boolean
"Best" Query Plan
44
Exercice Rules
  • Given a linear path expression from collection 1
    to i, determine the number of distinct query
    plans to process the query, assuming that 3
    algorithms are available to process any path
    expression (DFF, BFF, RBFF)
  • Give the rule base to generate those plans

........
Ci
C1
Ci-1
C2
45
6. Cost Models
  • Extension of relational cost model to handle
  • Object identifiers
  • Path indexes
  • Object linking and embedding
  • Clustering
  • Takes into account CPU cost and I/O cost
  • CPU cost ? Number of examined objects
  • I/O cost ? Number of pages read

46
Collection Parameters
  • C number of pages of collection C
  • C number of objects of collection C
  • Ci number of pages of cluster i of collection
    C
  • Ci number of objects of cluster i of
    collection C
  • SC average object size in collection C
  • SProj average size of projection result
  • M available memory size
  • Sel(Qual) selectivity of qualification Qual
  • Sel(Pred) selectivity of indexed predicate Pred

47
I/O Scan Formulas
  • Sequential scan
  • I/O cost I/OScan I/OResult
  • I/OScan C
  • I/OResult Sel(Qual)CSProj/SC - M if gt 0 else
    0
  • Unclustered index scan
  • I/O cost I/OIndex I/OHitI/OResult
  • I/OIndex Blevel(I)
  • I/OHit Yao(C,C,Sel(Pred)C)
  • Clustered index scan
  • I/O cost I/OIndex I/OHitI/OResult
  • I/OHit Sel(Pred)C

48
Object Clustering
  • Clustering par classe
  • Regroupement de toutes les instances d'une même
    classe dans un même fichier
  • Clustering par composition
  • Regroupement d' un objet d'une classe avec un ou
    plusieurs de ses objets composants.
  • Placement adapté aux parcours de chemin
  • Clustering aléatoire
  • les objets sont placés dans l'ordre de leur
    création, dans un espace unique.

49
Clustered Collection Cases
Cluster objects on disk Reduce the
number of IOs
(Placement trees are represented by directed
graphs)
  • Default clustering
  • Simple clustering

COMPANY
50
More Clustering Cases
  • Conjunctive clustering
  • Disjunctive clustering

COMPANY
PROPOSAL
5
10
COMMAND
51
Clustering Statistics on Partitions
How many pages will have to be loaded to scan the
collection A ?
IIAII Cardinality IAI Number of disk blocks
  • Can be maintained by the system
  • Can be evaluated

SA Average Size of Object Sp Available Page
Size DA,B Average Number of distinct
references .....
52
Clustering Statistics Example

ICOMPANYI

IPRODUCTI
53
Yao Number of Clustered Block Hits
  • Yao function Yao( IICII , ICI , k number of
    selected objects )
  • returns the number of block hits
  • Yao' function sum Yao functions applied on each
    involved cluster
  • Given a clustered collection C and p the number
    of partitions to be scanned, we have
  • where ki is the number of objects to be selected
    in cluster i

54
Yao Example
  • x in Companies, x.assetgt100 000 kF
  • x in Companies, x.assetgt100 000 kF and
    x.product.yearlt1980
  • x in Companies, x.assetgt100 000 kF and
    x.product.yearlt1980
  • and x.product.command.Nlt1000

Yao( ) Yao( ) Yao( ) Yao(
)
Yao( ) Yao( )
Yao( )
55
I/O Join Formulas
  • Nested loops
  • C1 C1C2
  • Merge join
  • cost(sort(C1)) cost(sort(C2))
    cost(merge(C1,C2)) cost(Result)
  • ?C1logC1 ?C2logC2
    C1C2 ...
  • Hash join
  • cost(hash(C1)) cost(scan(C2))
    C2cost(probe(C1)) cost(Result)
  • C1 C2 ...

56
Parameters for Links
  • fanC1,C2 average number of references from a C1
    object to a C2 object
  • DC1,C2 number of distinct references from a C1
    object to a C2 object
  • XC1,C2 number of C1 objects having no reference
    to C2 object
  • ZC1,C2 average number of distinct references
    from C1 objects having at least one reference to
    C2 object
  • ZC1,C2 DC1,C2 C1 / (C1- XC1,C2 )

57
I/O Path Traversal Formulas
  • Cost of DFF Gardarin, Gruser, Tang 96
  • Large memory (no swap)
  • Small memory (worst case)
  • DFF is efficient with large memory

58
Exercice Cost Model
  • Compare the I/O costs of DFF, BFF and RBFF
  • Discuss the advantage of each of them according
    to memory size and predicate selectivity

59
7. Search Strategies
from Lanzelotte 1992
Search
Strategy
Randomized
Enumerative
Genetic Algorithm
Iterative Improvement
Exhaustive Search
Augmentation Heuristic
Tabu Search
Simulated Annealing
Mixte (2 phases)
60
Exhaustive Search
  • Function Exhaustive(Query)
  • p Parse(Query) // Set the initial plan
  • S p // S is the set of all investigated
    plans
  • while not StopCond()
  • p' Transform (p) // Apply a
    transformation rule
  • if p' ? S then
  • ? p p'
  • Insert (S, p') // Maintain the set of
    investigated plans
  • return Optimal(S) // Select best plan among
    all generated plans
  • ?

61
Illustration of ES
Parse(Query)
r2
r1
r3
r4
r5
r6
r7
r9
r8
r10
SELECT MINIMAL COST PLAN
Up to StopCond (Exhausted time)
r11
62
Classical Ameliorations
  • Reduce search space and control rule selection
  • Select profitable/best move at each step
  • Introduce a gain estimator for each rule
  • Apply only rules with best estimators
  • Avoids loops and applying rules in both
    directions
  • Such approaches find only a local minimum
  • Risk of fall into a hole
  • Iterative improvement minimize risk

63
Iterative Improvement Scheme
  • II randomly chooses an initial processing tree.
  • It then accepts only those downhill moves.
  • This is called local optimization.
  • When the local condition is reached, II picks up
    a new random state, and performs local
    optimization from that state.
  • The process is repeated until a stopping
    condition is met.
  • The global minimum is the best local minimum
    found till now.

64
Iterative Improvement Procedure
  • Procedure II()
  • p Initialize() // set an initial state, i.e.,
    pick a random PT for evaluating the query
  • OptimalPlan p // Initialize optimal plan
  • while not(stopping_condition) do // Loop for
    global optimization (on various initial states)
  • while not(local_condition) do // Loop for
    local optimization
  • p move(p) // Apply a valid transformation
    to p
  • if (Cost(p)ltCost(p)) then p p // Keep
    plan if less costly
  • if cost(p)ltcost(OptimalPlan) then OptimalPlan
    p // Select optimal plan
  • p RandomPlan // Move to next random
    selected plan
  • Return(OptimalPlan)

65
Illustration of I.I.
Parse(Query)
Rand(Parse(Query))
Rand(Rand((Parse(Query)))
Profitable r'1
Profitable r"1
Profitable r1
Profitable r"2
Profitable r2
SELECT MINIMAL COST PLAN
66
Simulated Annealing Scheme
  • SA also starts at a random processing tree and
    generates the next state by applying a
    transformation rule on the current processing
    tree.
  • Differently from II, SA accepts both downhill and
    uphill moves. Uphill moves are allowed with the
    probability
  • The parameter temperature decreases when the
    inner block reaches an equilibrium point. Thus
    the uphill moves are being accepted with less and
    less probability.
  • When a stopping condition is satisfied, the best
    traversed plan is selected as optimal.

67
Simulated Annealing Procedure
  • Procedure SA()
  • p Initialize() // set an initial state, i.e.,
    pick a random PT for evaluating the query
  • OptimalPlan p // Initialize optimal plan
  • TT0 // Initialize temperature
  • while not(stopping_condition) do // Loop for
    global optimization (on various initial states)
  • while not(equilibrium) do // Loop for local
    optimization
  • p move(p) // Apply a valid transformation
    to p
  • deltacost(p)-cost(p) // Compute
    differential cost
  • if (deltalt0) then p p // If cost reduced
    pick new plan
  • if (deltagt0) then p p with probability
    e-delta/T // If cost increased, accept if hot
  • if cost(p)ltcost(OptimalPlan) then OptimalPlan
    p // Maintain optimal plan
  • Treduce(T) // Reduce temperature
  • Return(OptimalPlan)

68
Illustration of S.A.
Cost
Parse(Query)
Profitable r1
Non Profit. r2
Profitable r3
Profitable r4
Profitable r6
Non Profit. r5
SELECTED PLAN
Moves
69
Tabu Search Scheme
  • TS is a general meta-heuristic procedure for
    global optimization, which performs an aggressive
    exploration of the state space ( best possible
    move, with restriction list)
  • TS starts from a randomly generated initial
    state, and repeatedly performs moves from a state
    to a neighbor one.
  • At each iteration the procedure generates a
    subset V of the set N(S) of the neighbors of the
    current state S and select the best.
  • The subset does not contain any state which is
    recorded in the Tabu list. This avoids the
    cycling or at least reduces its probability. The
    Tabu list is updated each time the current state
    is updated. This forbids moves which should bring
    back to a previous explored state.

70
Tabu Search Procedure
  • Procedure TS()
  • p Initialize() // set an initial state, i.e.,
    pick a random PT for evaluating the query
  • OptimalPlan p // Initialize optimal plan
  • T // initialize Tabu list
  • while not(stopping condition) do // global
    loop
  • generate the set VN(S)-T by applying move(S)
    // All move accepted except tabu
  • choose the best solution p V // Pick best
    move
  • T (T-(oldest)) ??p // Update the tabu list
    by removing oldest plans and adding picked
  • if cost(p)ltcost(OptimalPlan) then OptimalPlan
    p // Maintain optimal plan
  • return(OptimalPlan)

71
Comparison of Strategies
Cost of Best Plan
II is the best with good random sampling Tabu
looks attractive
Tabu
SA
Tabu
II Join exchange
II Swap
  • 200 400 600 800 1000 1200 1400

Number of moves
72
Genetic Algorithm
  • Genetic Algorithm (GA) is a non-gradient
    optimization algorithm used for the search of
    local extremes (minimum or maximum) of functions
    with many variables and functional extremes.
  • These functions are usually defined on very
    complex and discrete domains.
  • The basic idea of GA is to use principles of
    evolution of organisms in nature.
  • Instead of working on one particular solution at
    a time, it considers a population of solutions.

73
GA Principle
Initialisation
Mutation
Crossover
Evaluation
Sort
Selection
Terminate
No
Yes
74
GA Phases
  • Initialization - randomly generate an initial
    small population of solutions (i.e., processing
    trees) from the whole search space.
  • Mutation - choose one solution (i.e., processing
    tree) from the population, and apply
    transformation rules to it.
  • Crossover - randomly choose two solutions from
    the population, and exchange their common
    subtrees in order to generate two new processing
    trees.
  • Evaluation - for each solution, evaluate the
    value of its fitness function (i.e., cost
    function),
  • Sort - sort all solutions according to their cost
    values.
  • Selection - choose certain number of the best
    solutions from the result of Sort as the parents
    of the next generation.
  • Termination - check termination criteria for
    stopping the optimization.

75
Gene Base for 5 collections
76
Mutation Operator
77
Crossover operator
78
Improved GA
79
GA Procedure
  • Procedure GA()
  • Generate the initial population PopuBasePopu
    // Initialize the base population at random
  • Sort(Popu) // Sort population of PTs on
    increasing cost
  • OptimalPlan Popu0 // Keep the best
    traversed plan
  • While not (stopping_condition) do
  • Percent 0
  • While Percent lt Part BasePopu do // Apply
    Crossover to Part of the population
  • p1 PopuRandom(BasePopu) // Randomly
    choose p1 and p2 from Popu
  • p2 PopuRandom(BasePopu)
  • Crossover(p1, p2) // Apply Crossover if
    possible
  • Percent Percent 2
  • For the rest in Popu do Mutation // Apply
    Mutation for the rest of the population
  • For (i0 i lt NewPopu i) do
    evaluate(Popui) // Compute cost for new
    population
  • Sort(Popu) // Sort population of PTs on
    increasing cost
  • if (Popu0 lt OptimalPlan) then OptimalPlan
    Popu0 // Keep the best traversed plan
  • // Optional replacement of the worst elements
  • Percent 0
  • i BasePopu // Initialize for replacement
  • While Percent lt Repl BasePopu do // Apply
    replacement to Repl of the population

G. Gardarin
80
8. Open Problems
  • Efficient control of rule applications
  • What is the best strategy (Genetic ?)
  • Simple but accurate rule gain estimator
    (Priorities ?)
  • Estimation of method costs
  • Statistics keep average and variations at each
    call
  • Revelation the user provide a cost estimate
    attribute
  • Disencapsulation understand the method code
  • Problem late binding complexifies the
    estimation
  • Querying bulk types
  • Collections may be list, array, trees or matrices
  • Ordered collections may requires additional
    operators
  • Cost model should be extended to capture
    aggregates

G. Gardarin
81
Exercice Strategies
  • Discuss the advantages and inconvenients of each
    search strategy
  • Compare them in case of a small rule base and a
    large rule base

G. Gardarin
82
Conclusion
  • The Query Optimizer is a key component of a DBMS
  • Relational techniques can be generalized
  • Object algebra
  • Cost model
  • Search strategies
  • New techniques are required for
  • Extensibility of data types
  • Optimizing path expressions
  • Optimizing method calls
  • Optimizing bulk data types
  • Path index maintenance and access
  • Not much is done on new features in OODBMS

G. Gardarin
83
For More Informations (1)
  • Finance, Gardarin, IEEE DE 91
  • LORA Algebra and EDS extensible optimizer
  • Finance, Gardarin, DKE 93
  • Rule Language for extensible optimizer
  • Gardarin,Gruser, Tang, VLDB95
  • Cost model for OODBs Scan, DFF, validation on
    O2
  • Gardarin,Gruser, Tang, VLDB96
  • Analytical experimental comparisons of
    DFF,BFF,RBFF
  • Gardarin, in Advances in OO DB Systems,
    Springer94
  • Object rule language, optimisation of recursive
    updates extension of LORA to recursion

G. Gardarin
84
For More Informations (2)
  • Mitchell, Zdonik, Dayal, in Advances in OO DB
    Systems , Springer V. 94
  • Optimization of OO query languages Problems and
    Approaches
  • Cluet, Delobel, SIGMOD 92
  • A General Framework for the Optimization of
    Object-Oriented Queries
  • Kemper, Moerkotte, VLDB 90
  • Advance Query Processing in An Object Bases Using
    Access Support Relations
  • Lanzelotte, Valduriez, VLDB 91
  • Extending the Search Strategy in a Query Optimizer

G. Gardarin
85
Path Index
  • Multi-index Gemstone
  • les chemins sont des séquences de variables
    appartenant à la structure des objets.
  • Les index sont définis pour chaque lien du chemin
  • les index représentent des identifiants d'objet
    ou les valeurs des variables.
  • Implémentés comme des B trees.
  • Nested index Bertino 89
  • une seule entrée définie pour toute la longueur
    d'un chemin.
  • Accès au début seulement en connaissant la fin du
    chemin.
  • Utile seulement si l'on connait parfaitement le
    chemin et qu'il est utilisé souvent.
  • Difficile à maintenir.
  • Path index Bertino 89 Kemper 90
  • associe la fin du chemin avec tous les suffixes
    du chemin
  • il fonctionne avec des sous-chemins
  • Implémenté comme des relations
  • chaque colonne d'un tuple correspond à un pas du
    chemin
  • chaque champs du tuple contient un identifiant
    d'objet ou une valeur.

G. Gardarin
Write a Comment
User Comments (0)
About PowerShow.com