NiagaraCQ - PowerPoint PPT Presentation

About This Presentation
Title:

NiagaraCQ

Description:

quotes.xml. 9. NiagaraCQ. Group Signature. Common expression signature of all queries in the group ... quotes.xml. 13. NiagaraCQ. Incremental Grouping Algorithm (2) ... – PowerPoint PPT presentation

Number of Views:88
Avg rating:3.0/5.0
Slides: 50
Provided by: amir7
Category:
Tags: niagaracq | quotes

less

Transcript and Presenter's Notes

Title: NiagaraCQ


1
NiagaraCQ
  • A Scalable Continuous Query System for Internet
    Databases

2
Outline
  1. Problem
  2. NiagaraCQ
  3. Selection Placement Strategies
  4. Dynamic Regrouping Algorithm

3
Problem
Lack of a scalable and efficient system which
supports persistent queries, that allow users to
receive new results when they become
available Notify me whenever the price of Dell
stock drops by more than 5 and the price of
Intel stock remains unchanged over next three
months.
4
NiagaraCQ
  • Support continues queries
  • Change-based queries
  • Timer-based queries
  • Scalability
  • Performance
  • Adequate to the Internet
  • User Interface - high level query language

5
Command Language
  • Create continuous query
  • CREATE CQ_name
  • XML-QL query
  • DO action
  • START start_time EVERY time_interval
  • EXPIRE expiration_time
  • Delete continuous query
  • DELETE CQ_name

6
Expression Signature
Represent the same syntax structure, but possibly
different constant values, in different
queries. Where ltQuotesgt ltQuotegt ltSymbolgtINTClt/gt
lt/gt lt/gt element_as g in http//www.cs.wisc.edu
/db/quotes.xml construct g Where ltQuotesgt
ltQuotegt ltSymbolgtMSFTlt/gt lt/gt lt/gt element_as
g in http//www.cs.wisc.edu/db/quotes.xml const
ruct g
7
Expression Signature (2)
Quotes.Quote.Symbol constant in
quotes.xml
8
Query Plan
Trigger Action I
Trigger Action J
Select SymbolINTC
Select SymbolMSFT
File Scan
File Scan
quotes.xml
quotes.xml
9
Group Signature
Common expression signature of all queries in the
group
Quotes.Quote.Symbol constant in
quotes.xml
10
Group Constant Table
Constant_value Destination_buffer

INTC Dest . I
MSFT Dest . J

11
Group Plan
..
Trigger Action I
Trigger Action J
Split
Join
Symbol Constant_value
File
File Scan
Constant Table
quotes.xml
12
Incremental Grouping Algorithm
  1. Group optimizer traverses the query plan bottom
    up.
  2. Matches the querys expression signature with the
    signatures of existing groups.

Trigger Action
Select SymbolAOL
File Scan
quotes.xml
13
Incremental Grouping Algorithm (2)
  • Group optimizer breaks the query plan into two
    parts.
  • Lower removed
  • Upper added onto the group plan.
  • Adds the constant to the constant table.

Trigger Action
Select SymbolAOL
File Scan
quotes.xml
14
Pipeline Approach
  • Tuples are pipelined from the output of one
    operator into the input of the next operator.
  • Disadvantages
  • Doesnt work for grouping timer-based queries.
  • Split operator may become a bottleneck.
  • Not all parts should be executed.

15
Intermediate Files
16
Intermediate Files (2)
  • Advantages
  • Intermediate files and data sources are monitored
    uniformly.
  • Each query is scheduled independently.
  • The potential bottleneck problem of the pipelined
    approach is avoided.
  • Disadvantages
  • Extra disk I/Os.
  • Split operator becomes a blocking operator.

17
Virtual Intermediate Files
Where ltQuotesgt ltQuotegt ltChange_ratiogtclt/gt lt/gt
lt/gt element_as g in quotes.xml,
cgt0.05 construct g Where ltQuotesgt
ltQuotegt ltChange_ratiogtclt/gt lt/gt lt/gt element_as
g in quotes.xml, cgt0.15 construct
g gt Quotes.Quote.Change_Ratio
constant in quotes.xml
Overlap
18
Virtual Intermediate Files (2)
  • All outputs from split operator are stored in one
    real intermediate file.
  • This file has index on the range attribute.
  • Virtual intermediate files store a value range.
  • Modification of virtual intermediate files can
    trigger upper-level queries.
  • The value range is used to retrieve data from the
    real intermediate file.

19
Event Detection
  • Types of Events
  • Data-source change
  • Timer
  • Types of data sources
  • Push-based
  • Pull-based

20
Timer-based
  • Timer events are stored in an event list, sorted
    in time order.
  • Each entry stores query ids.
  • Query will be fired if its data source has been
    modified since its last firing time.
  • After a timer event, the next firing times are
    calculated and the queries are added into the
    corresponding entries.

21
Incremental Evaluation
  • Queries are been invoked only on changed data.
  • For each file, NiagaraCQ keeps a delta file.
  • Queries are run over delta files.
  • Incremental evaluation of join operators requires
    complete data files.
  • Time stamp is added to each tuple in order to
    support timer-based.

22
Memory Caching
  • Query plans - using LRU policy that favors
    frequently fired queries.
  • Data files - favors the delta files.
  • Event list only a time window

23
System Architecture
24
Continues Queries Processing
CQM adds continuous queries with file and timer
information to enable ED to monitor the events
If file changes and timer events are satisfied,
ED provides CQM with a list of firing CQs
1
CQM invokes QE to execute firing CQs
Continuous Query Manager (CQM)
ED asks DM to monitor changes to files
Event Detector (ED)
5
2
, 3
6
4
7
DM informs ED of changes to pushed-based data
sources
Query Engine (QE)
Data Manager (DM)
8
When a timer event happens, ED asks DM the last
modified time of files
File scan operator calls DM to retrieve selected
documents
DM only returns changes between last fire time
and current fire time
25
Selection Placement Strategies
Where ltQuotesgtltQuotegtltSymbolgtslt/gt
ltPricegtplt/gtlt/gt element_as g lt/gt in
quotes.xml, p gt 90 ltCompaniesgtltCompanygtltSymbolgt
slt/gtlt/gt element_as tlt/gt in profiles.xml
construct g, t Where ltQuotesgtltQuotegtltSymbolgts
lt/gt ltPricegtplt/gtlt/gt element_as g lt/gt in
quotes.xml, p gt 100 ltCompaniesgtltCompanygtltSymbol
gtslt/gtlt/gt element_as tlt/gt in profiles.xml
construct g, t
26
Expressions Signatures
gt Quotes.Quote.Price constant in
quotes.xml SymbolSymbol quotes.xml
profiles.xml
27
Where to place the selection operator ?
  • Below the join - PushDown
  • (s1R S) U (s2R S) U U (snR S)
  • Above the join PullUp
  • s1(R S) U s2(R S) U U sn(R S)
  • PullUp achieves an average 10-fold performance
    improvement over PushDown.

28
PushDown - Query Plan
Join
Select Pricegt90
profiles.xml
quotes.xml
29
PushDown - Groups Plans
30
PullUp - Groups Plans
31
PullUp Vs. PushDown
  • Only one join group and one selection group
  • Maintains a single intermediate file
  • Irrelevant tuples being joined
  • Very large intermediate file
  • Changes in profiles.xml affect the intermediate
    file (file_k) maintenance overhead.

32
Filtered PullUp
quotes.xml
Grouped Join Plan
Join
Selection Pricegt90
profiles.xml
quotes.xml
33
Filtered PullUp Vs. PullUp
  • Relevant tuples being joined
  • Reduce the size of intermediate file
  • Reduce the cost of PullUp by 75
  • Complexity the selection predicate may need to
    be dynamically modified (query with pricegt70)

34
Dynamic Re-grouping
  • Let Q1 (A B C) and Q2 (B C) be two
    continuous queries submitted sequentially.
  • Incremental grouping algorithm chooses a plan ((A
    B) C).
  • Neither of these groups can be used for Q2.

ABC
ABC
BC
AB
BC
35
Dynamic Re-grouping (2)
  • Existing queries are not regrouped with new
    grouping opportunities introduced by subsequent
    queries.
  • Reduction in the overall performance - queries
    are continuously being added and removed.
  • Naive regrouping-algorithm periodically perform
    a global query optimization
  • Expensive
  • Redundant work (already done by incremental opt.)

36
Data Structures
  • A query graph directed acyclic graph, with each
    node representing an existing join expression in
    the group plan.
  • Node
  • char query //ASCII query plan
  • SIG_TYPE sig //signature of the query string
  • int final_node_count //number of users that
    require this query.
  • //0 non-final node gt0 final node
  • listltChildgt children //children of this node,
    where ChildNode, weight
  • listltNodegt parents //parents of this node
  • float updateFreq //update frequency of this
    node
  • float cost //the cost for computing this node
  • //Following data structures used only for dynamic
    regrouping
  • int reference_count //reference count
  • bool visited //a flag that records whether
  • //purgeSibling has performed on this node

37
Data Structures (2)
  • A group table array of hash tables.
  • i-th hash table - queries with query length
    (number of joins) i.
  • Hash table entry - mapping from a query string
    to the corresponding node in the graph.

Array
Hash
Node
38
Data Structures (3)
  • A query log array of vectors.
  • Stores new nodes that have been added since the
    last regrouping.
  • Cleared after regrouping.

Array
Vector
Node
39
Incremental Grouping Algorithm
  • Top-down local exhaustive search
  • If the query exists, increases the final node
    count by 1.
  • Else
  • Enumerates all possible sub-query in a top-down
    manner and probes the group table to check
    whether a sub-query node exists.
  • Computes the minimal cost of using existing
    sub-query nodes.
  • Computes the minimal cost without using existing
    sub-query nodes.
  • The least-costly plan will be chosen.

40
Dynamic Regrouping Algorithm
  • Phase 1 constructing links among existing nodes
    and new nodes.
  • Phase 2 find minimal-weighted solution from the
    current solution by removing redundant nodes.

ABC
BC
AB
41
Phase 1 constructing links among existing nodes
and new nodes
  • Main idea - for any pair of nodes in the graph,
    if one node is a sub-query of another node, it
    creates a link between them if it did not exist
    before.
  • Relationships are only evaluated between existing
    nodes and nodes added since last regrouping.
  • The difference of levels between a parent and a
    child is always 1.

42
Phase 1 - Algorithm
  • bottom-up
  • for each node in level i query log
  • if node has parents in level i1 group table
  • connect node to parent
  • if node has children in level i-1 group table
  • connect node to children

43
Phase 2 A greedy algorithm for level-wise graph
minimization
  • Main idea traverse the query graph
    level-by-level and attempt to remove any
    redundant nodes at one level a time.
  • Starts from the second level from the top.
  • Subset of level i nodes retain if
  • Nodes at level i1 have at least one child in
    this set.
  • These nodes have a minimum total cost.
  • Nodes that are not selected are removed
    permanently.

44
Phase 2 - Algorithm
MinimizeGraph() for each level L in
group-table // L ranging from the maximum
number of join-1 to 1 for each node N in
the level-L group table
InitializeSet(N) for each node
N in finalSet PurgeSiblings(N)
while (remain set is not empty)
scan each node R in the remain set
if (Rs reference count 0)
remove R from the remain set
deleteNode(R)
else if (R.cost/R.reference_
count lt
Current_minimum)
MR
Current_minimum
R.cost/R.reference_count
//scan
remove M from the remain set
PurgeSiblings(M) //while
//for each level //MinimizeGraph
InitializeSet(Node N) if N is a final
node Add N into final_set else
add N into the remain_set
N.reference_count
number of parents of N
N.visited false purgeSiblings(Node N)
For each parent P of N if
(!P.visited) Decrease the
reference count of Ns siblings
of same parent P by 1
P.visited true
45
Cost Analysis
  • N number of queries
  • Number of nodes is proportional to the number of
    queries CN
  • Each query contains no more then 10 joins.
  • Each level contain about CN/10 nodes

46
Cost Analysis Phase 1
  • R or KR regrouping frequencies
  • In frequency R
  • N/R number of regrouping
  • CR number of nodes that will be joined with
    existing nodes.
  • mCR number of nodes after m-1 regrouping.
  • m(CR)2 number of comparisons for m-th
    regrouping (ignoring a constant reduction).

47
Cost Analysis Phase 1 (2)
  • Total number of comparisons, frequency R
  • (CR)22(CR)2N/R(CR)2
  • N(NR)C2/2 O(N2)
  • Total number of comparisons, frequency KR
  • (CKR)2(N/(KR))(CKR)2
  • N(NKR)C2/2
  • The ratio
  • N(NKR)C2/2/N(NR)C2/2 (NKR)/(NR)

48
Cost Analysis Phase 2
  • Worst case each pass remove one node.
  • Cost for a level
  • (CN/10) (CN/10-1) 1
  • CN(CN10)/200 O(N2)
  • Purge siblings
  • (CN/10 CN/10) (CN)2/100 O(N2)
  • All 9 levels O(N2)

49
References
  • NiagaraCQ A Scalable Continuous Query System for
    Internet Databases
  • http//www.cs.wisc.edu/niagara/papers/NiagaraCQ.p
    df
  • Design and Evaluation of Alternative Selection
    Placement Strategies in Optimizing Continuous
    Queries
  • http//www.cs.wisc.edu/niagara/papers/Icde02.pdf
  •  
  • Dynamic Re-grouping of Continuous Queries
  • http//www.cs.wisc.edu/niagara/papers/507.pdf
Write a Comment
User Comments (0)
About PowerShow.com