Title: NiagaraCQ
1NiagaraCQ
- A Scalable Continuous Query System for Internet
Databases
2Outline
- Problem
- NiagaraCQ
- Selection Placement Strategies
- Dynamic Regrouping Algorithm
3Problem
Lack of a scalable and efficient system which
supports persistent queries, that allow users to
receive new results when they become
available Notify me whenever the price of Dell
stock drops by more than 5 and the price of
Intel stock remains unchanged over next three
months.
4NiagaraCQ
- Support continues queries
- Change-based queries
- Timer-based queries
- Scalability
- Performance
- Adequate to the Internet
- User Interface - high level query language
5Command Language
- Create continuous query
- CREATE CQ_name
- XML-QL query
- DO action
- START start_time EVERY time_interval
- EXPIRE expiration_time
- Delete continuous query
- DELETE CQ_name
6Expression Signature
Represent the same syntax structure, but possibly
different constant values, in different
queries. Where ltQuotesgt ltQuotegt ltSymbolgtINTClt/gt
lt/gt lt/gt element_as g in http//www.cs.wisc.edu
/db/quotes.xml construct g Where ltQuotesgt
ltQuotegt ltSymbolgtMSFTlt/gt lt/gt lt/gt element_as
g in http//www.cs.wisc.edu/db/quotes.xml const
ruct g
7Expression Signature (2)
Quotes.Quote.Symbol constant in
quotes.xml
8Query Plan
Trigger Action I
Trigger Action J
Select SymbolINTC
Select SymbolMSFT
File Scan
File Scan
quotes.xml
quotes.xml
9Group Signature
Common expression signature of all queries in the
group
Quotes.Quote.Symbol constant in
quotes.xml
10Group Constant Table
Constant_value Destination_buffer
INTC Dest . I
MSFT Dest . J
11Group Plan
..
Trigger Action I
Trigger Action J
Split
Join
Symbol Constant_value
File
File Scan
Constant Table
quotes.xml
12Incremental Grouping Algorithm
- Group optimizer traverses the query plan bottom
up. - Matches the querys expression signature with the
signatures of existing groups.
Trigger Action
Select SymbolAOL
File Scan
quotes.xml
13Incremental Grouping Algorithm (2)
- Group optimizer breaks the query plan into two
parts. - Lower removed
- Upper added onto the group plan.
- Adds the constant to the constant table.
Trigger Action
Select SymbolAOL
File Scan
quotes.xml
14Pipeline Approach
- Tuples are pipelined from the output of one
operator into the input of the next operator. - Disadvantages
- Doesnt work for grouping timer-based queries.
- Split operator may become a bottleneck.
- Not all parts should be executed.
15Intermediate Files
16Intermediate Files (2)
- Advantages
- Intermediate files and data sources are monitored
uniformly. - Each query is scheduled independently.
- The potential bottleneck problem of the pipelined
approach is avoided. - Disadvantages
- Extra disk I/Os.
- Split operator becomes a blocking operator.
17Virtual Intermediate Files
Where ltQuotesgt ltQuotegt ltChange_ratiogtclt/gt lt/gt
lt/gt element_as g in quotes.xml,
cgt0.05 construct g Where ltQuotesgt
ltQuotegt ltChange_ratiogtclt/gt lt/gt lt/gt element_as
g in quotes.xml, cgt0.15 construct
g gt Quotes.Quote.Change_Ratio
constant in quotes.xml
Overlap
18Virtual Intermediate Files (2)
- All outputs from split operator are stored in one
real intermediate file. - This file has index on the range attribute.
- Virtual intermediate files store a value range.
- Modification of virtual intermediate files can
trigger upper-level queries. - The value range is used to retrieve data from the
real intermediate file.
19Event Detection
- Types of Events
- Data-source change
- Timer
- Types of data sources
- Push-based
- Pull-based
20Timer-based
- Timer events are stored in an event list, sorted
in time order. - Each entry stores query ids.
- Query will be fired if its data source has been
modified since its last firing time. - After a timer event, the next firing times are
calculated and the queries are added into the
corresponding entries.
21Incremental Evaluation
- Queries are been invoked only on changed data.
- For each file, NiagaraCQ keeps a delta file.
- Queries are run over delta files.
- Incremental evaluation of join operators requires
complete data files. - Time stamp is added to each tuple in order to
support timer-based.
22Memory Caching
- Query plans - using LRU policy that favors
frequently fired queries. - Data files - favors the delta files.
- Event list only a time window
23System Architecture
24Continues Queries Processing
CQM adds continuous queries with file and timer
information to enable ED to monitor the events
If file changes and timer events are satisfied,
ED provides CQM with a list of firing CQs
1
CQM invokes QE to execute firing CQs
Continuous Query Manager (CQM)
ED asks DM to monitor changes to files
Event Detector (ED)
5
2
, 3
6
4
7
DM informs ED of changes to pushed-based data
sources
Query Engine (QE)
Data Manager (DM)
8
When a timer event happens, ED asks DM the last
modified time of files
File scan operator calls DM to retrieve selected
documents
DM only returns changes between last fire time
and current fire time
25Selection Placement Strategies
Where ltQuotesgtltQuotegtltSymbolgtslt/gt
ltPricegtplt/gtlt/gt element_as g lt/gt in
quotes.xml, p gt 90 ltCompaniesgtltCompanygtltSymbolgt
slt/gtlt/gt element_as tlt/gt in profiles.xml
construct g, t Where ltQuotesgtltQuotegtltSymbolgts
lt/gt ltPricegtplt/gtlt/gt element_as g lt/gt in
quotes.xml, p gt 100 ltCompaniesgtltCompanygtltSymbol
gtslt/gtlt/gt element_as tlt/gt in profiles.xml
construct g, t
26Expressions Signatures
gt Quotes.Quote.Price constant in
quotes.xml SymbolSymbol quotes.xml
profiles.xml
27Where to place the selection operator ?
- Below the join - PushDown
- (s1R S) U (s2R S) U U (snR S)
- Above the join PullUp
- s1(R S) U s2(R S) U U sn(R S)
- PullUp achieves an average 10-fold performance
improvement over PushDown.
28PushDown - Query Plan
Join
Select Pricegt90
profiles.xml
quotes.xml
29PushDown - Groups Plans
30PullUp - Groups Plans
31PullUp Vs. PushDown
- Only one join group and one selection group
- Maintains a single intermediate file
- Irrelevant tuples being joined
- Very large intermediate file
- Changes in profiles.xml affect the intermediate
file (file_k) maintenance overhead.
32Filtered PullUp
quotes.xml
Grouped Join Plan
Join
Selection Pricegt90
profiles.xml
quotes.xml
33Filtered PullUp Vs. PullUp
- Relevant tuples being joined
- Reduce the size of intermediate file
- Reduce the cost of PullUp by 75
- Complexity the selection predicate may need to
be dynamically modified (query with pricegt70)
34Dynamic Re-grouping
- Let Q1 (A B C) and Q2 (B C) be two
continuous queries submitted sequentially. - Incremental grouping algorithm chooses a plan ((A
B) C). - Neither of these groups can be used for Q2.
ABC
ABC
BC
AB
BC
35Dynamic Re-grouping (2)
- Existing queries are not regrouped with new
grouping opportunities introduced by subsequent
queries. - Reduction in the overall performance - queries
are continuously being added and removed. - Naive regrouping-algorithm periodically perform
a global query optimization - Expensive
- Redundant work (already done by incremental opt.)
36Data Structures
- A query graph directed acyclic graph, with each
node representing an existing join expression in
the group plan. - Node
- char query //ASCII query plan
- SIG_TYPE sig //signature of the query string
- int final_node_count //number of users that
require this query. - //0 non-final node gt0 final node
- listltChildgt children //children of this node,
where ChildNode, weight - listltNodegt parents //parents of this node
- float updateFreq //update frequency of this
node - float cost //the cost for computing this node
- //Following data structures used only for dynamic
regrouping - int reference_count //reference count
- bool visited //a flag that records whether
- //purgeSibling has performed on this node
37Data Structures (2)
- A group table array of hash tables.
- i-th hash table - queries with query length
(number of joins) i. - Hash table entry - mapping from a query string
to the corresponding node in the graph.
Array
Hash
Node
38Data Structures (3)
- A query log array of vectors.
- Stores new nodes that have been added since the
last regrouping. - Cleared after regrouping.
Array
Vector
Node
39Incremental Grouping Algorithm
- Top-down local exhaustive search
- If the query exists, increases the final node
count by 1. - Else
- Enumerates all possible sub-query in a top-down
manner and probes the group table to check
whether a sub-query node exists. - Computes the minimal cost of using existing
sub-query nodes. - Computes the minimal cost without using existing
sub-query nodes. - The least-costly plan will be chosen.
40Dynamic Regrouping Algorithm
- Phase 1 constructing links among existing nodes
and new nodes. - Phase 2 find minimal-weighted solution from the
current solution by removing redundant nodes.
ABC
BC
AB
41Phase 1 constructing links among existing nodes
and new nodes
- Main idea - for any pair of nodes in the graph,
if one node is a sub-query of another node, it
creates a link between them if it did not exist
before. - Relationships are only evaluated between existing
nodes and nodes added since last regrouping. - The difference of levels between a parent and a
child is always 1.
42Phase 1 - Algorithm
- bottom-up
- for each node in level i query log
- if node has parents in level i1 group table
- connect node to parent
- if node has children in level i-1 group table
- connect node to children
43Phase 2 A greedy algorithm for level-wise graph
minimization
- Main idea traverse the query graph
level-by-level and attempt to remove any
redundant nodes at one level a time. - Starts from the second level from the top.
- Subset of level i nodes retain if
- Nodes at level i1 have at least one child in
this set. - These nodes have a minimum total cost.
- Nodes that are not selected are removed
permanently.
44Phase 2 - Algorithm
MinimizeGraph() for each level L in
group-table // L ranging from the maximum
number of join-1 to 1 for each node N in
the level-L group table
InitializeSet(N) for each node
N in finalSet PurgeSiblings(N)
while (remain set is not empty)
scan each node R in the remain set
if (Rs reference count 0)
remove R from the remain set
deleteNode(R)
else if (R.cost/R.reference_
count lt
Current_minimum)
MR
Current_minimum
R.cost/R.reference_count
//scan
remove M from the remain set
PurgeSiblings(M) //while
//for each level //MinimizeGraph
InitializeSet(Node N) if N is a final
node Add N into final_set else
add N into the remain_set
N.reference_count
number of parents of N
N.visited false purgeSiblings(Node N)
For each parent P of N if
(!P.visited) Decrease the
reference count of Ns siblings
of same parent P by 1
P.visited true
45Cost Analysis
- N number of queries
- Number of nodes is proportional to the number of
queries CN - Each query contains no more then 10 joins.
- Each level contain about CN/10 nodes
46Cost Analysis Phase 1
- R or KR regrouping frequencies
- In frequency R
- N/R number of regrouping
- CR number of nodes that will be joined with
existing nodes. - mCR number of nodes after m-1 regrouping.
- m(CR)2 number of comparisons for m-th
regrouping (ignoring a constant reduction).
47Cost Analysis Phase 1 (2)
- Total number of comparisons, frequency R
- (CR)22(CR)2N/R(CR)2
- N(NR)C2/2 O(N2)
- Total number of comparisons, frequency KR
- (CKR)2(N/(KR))(CKR)2
- N(NKR)C2/2
- The ratio
- N(NKR)C2/2/N(NR)C2/2 (NKR)/(NR)
48Cost Analysis Phase 2
- Worst case each pass remove one node.
- Cost for a level
- (CN/10) (CN/10-1) 1
- CN(CN10)/200 O(N2)
- Purge siblings
- (CN/10 CN/10) (CN)2/100 O(N2)
- All 9 levels O(N2)
49References
- NiagaraCQ A Scalable Continuous Query System for
Internet Databases - http//www.cs.wisc.edu/niagara/papers/NiagaraCQ.p
df - Design and Evaluation of Alternative Selection
Placement Strategies in Optimizing Continuous
Queries - http//www.cs.wisc.edu/niagara/papers/Icde02.pdf
- Â
- Dynamic Re-grouping of Continuous Queries
- http//www.cs.wisc.edu/niagara/papers/507.pdf