NiagaraCQ - PowerPoint PPT Presentation

About This Presentation

Title:

NiagaraCQ

Description:

quotes.xml. 9. NiagaraCQ. Group Signature. Common expression signature of all queries in the group ... quotes.xml. 13. NiagaraCQ. Incremental Grouping Algorithm (2) ... – PowerPoint PPT presentation

Number of Views:88

Avg rating:3.0/5.0

Slides: 50

Provided by: amir7

Category:

more less

Transcript and Presenter's Notes

Title: NiagaraCQ

1
NiagaraCQ

A Scalable Continuous Query System for Internet
Databases

2
Outline

Problem
NiagaraCQ
Selection Placement Strategies
Dynamic Regrouping Algorithm

3
Problem
Lack of a scalable and efficient system which
supports persistent queries, that allow users to
receive new results when they become
available Notify me whenever the price of Dell
stock drops by more than 5 and the price of
Intel stock remains unchanged over next three
months.
4
NiagaraCQ

Support continues queries
Change-based queries
Timer-based queries
Scalability
Performance
Adequate to the Internet
User Interface - high level query language

5
Command Language

Create continuous query
CREATE CQ_name
XML-QL query
DO action
START start_time EVERY time_interval
EXPIRE expiration_time
Delete continuous query
DELETE CQ_name

6
Expression Signature
Represent the same syntax structure, but possibly
different constant values, in different
queries. Where ltQuotesgt ltQuotegt ltSymbolgtINTClt/gt
lt/gt lt/gt element_as g in http//www.cs.wisc.edu
/db/quotes.xml construct g Where ltQuotesgt
ltQuotegt ltSymbolgtMSFTlt/gt lt/gt lt/gt element_as
g in http//www.cs.wisc.edu/db/quotes.xml const
ruct g
7
Expression Signature (2)
Quotes.Quote.Symbol constant in
quotes.xml
8
Query Plan
Trigger Action I
Trigger Action J
Select SymbolINTC
Select SymbolMSFT
File Scan
File Scan
quotes.xml
quotes.xml
9
Group Signature
Common expression signature of all queries in the
group
Quotes.Quote.Symbol constant in
quotes.xml
10
Group Constant Table
Constant_value Destination_buffer

INTC Dest . I
MSFT Dest . J

11
Group Plan
..
Trigger Action I
Trigger Action J
Split
Join
Symbol Constant_value
File
File Scan
Constant Table
quotes.xml
12
Incremental Grouping Algorithm

Group optimizer traverses the query plan bottom
up.
Matches the querys expression signature with the
signatures of existing groups.

Trigger Action
Select SymbolAOL
File Scan
quotes.xml
13
Incremental Grouping Algorithm (2)

Group optimizer breaks the query plan into two
parts.
Lower removed
Upper added onto the group plan.
Adds the constant to the constant table.

Trigger Action
Select SymbolAOL
File Scan
quotes.xml
14
Pipeline Approach

Tuples are pipelined from the output of one
operator into the input of the next operator.
Disadvantages
Doesnt work for grouping timer-based queries.
Split operator may become a bottleneck.
Not all parts should be executed.

15
Intermediate Files
16
Intermediate Files (2)

Advantages
Intermediate files and data sources are monitored
uniformly.
Each query is scheduled independently.
The potential bottleneck problem of the pipelined
approach is avoided.
Disadvantages
Extra disk I/Os.
Split operator becomes a blocking operator.

17
Virtual Intermediate Files
Where ltQuotesgt ltQuotegt ltChange_ratiogtclt/gt lt/gt
lt/gt element_as g in quotes.xml,
cgt0.05 construct g Where ltQuotesgt
ltQuotegt ltChange_ratiogtclt/gt lt/gt lt/gt element_as
g in quotes.xml, cgt0.15 construct
g gt Quotes.Quote.Change_Ratio
constant in quotes.xml
Overlap
18
Virtual Intermediate Files (2)

All outputs from split operator are stored in one
real intermediate file.
This file has index on the range attribute.
Virtual intermediate files store a value range.
Modification of virtual intermediate files can
trigger upper-level queries.
The value range is used to retrieve data from the
real intermediate file.

19
Event Detection

Types of Events
Data-source change
Timer
Types of data sources
Push-based
Pull-based

20
Timer-based

Timer events are stored in an event list, sorted
in time order.
Each entry stores query ids.
Query will be fired if its data source has been
modified since its last firing time.
After a timer event, the next firing times are
calculated and the queries are added into the
corresponding entries.

21
Incremental Evaluation

Queries are been invoked only on changed data.
For each file, NiagaraCQ keeps a delta file.
Queries are run over delta files.
Incremental evaluation of join operators requires
complete data files.
Time stamp is added to each tuple in order to
support timer-based.

22
Memory Caching

Query plans - using LRU policy that favors
frequently fired queries.
Data files - favors the delta files.
Event list only a time window

23
System Architecture
24
Continues Queries Processing
CQM adds continuous queries with file and timer
information to enable ED to monitor the events
If file changes and timer events are satisfied,
ED provides CQM with a list of firing CQs
1
CQM invokes QE to execute firing CQs
Continuous Query Manager (CQM)
ED asks DM to monitor changes to files
Event Detector (ED)
5
2
, 3
6
4
7
DM informs ED of changes to pushed-based data
sources
Query Engine (QE)
Data Manager (DM)
8
When a timer event happens, ED asks DM the last
modified time of files
File scan operator calls DM to retrieve selected
documents
DM only returns changes between last fire time
and current fire time
25
Selection Placement Strategies
Where ltQuotesgtltQuotegtltSymbolgtslt/gt
ltPricegtplt/gtlt/gt element_as g lt/gt in
quotes.xml, p gt 90 ltCompaniesgtltCompanygtltSymbolgt
slt/gtlt/gt element_as tlt/gt in profiles.xml
construct g, t Where ltQuotesgtltQuotegtltSymbolgts
lt/gt ltPricegtplt/gtlt/gt element_as g lt/gt in
quotes.xml, p gt 100 ltCompaniesgtltCompanygtltSymbol
gtslt/gtlt/gt element_as tlt/gt in profiles.xml
construct g, t
26
Expressions Signatures
gt Quotes.Quote.Price constant in
quotes.xml SymbolSymbol quotes.xml
profiles.xml
27
Where to place the selection operator ?

Below the join - PushDown
(s1R S) U (s2R S) U U (snR S)
Above the join PullUp
s1(R S) U s2(R S) U U sn(R S)
PullUp achieves an average 10-fold performance
improvement over PushDown.

28
PushDown - Query Plan
Join
Select Pricegt90
profiles.xml
quotes.xml
29
PushDown - Groups Plans
30
PullUp - Groups Plans
31
PullUp Vs. PushDown

Only one join group and one selection group
Maintains a single intermediate file
Irrelevant tuples being joined
Very large intermediate file
Changes in profiles.xml affect the intermediate
file (file_k) maintenance overhead.

32
Filtered PullUp
quotes.xml
Grouped Join Plan
Join
Selection Pricegt90
profiles.xml
quotes.xml
33
Filtered PullUp Vs. PullUp

Relevant tuples being joined
Reduce the size of intermediate file
Reduce the cost of PullUp by 75
Complexity the selection predicate may need to
be dynamically modified (query with pricegt70)

34
Dynamic Re-grouping

Let Q1 (A B C) and Q2 (B C) be two
continuous queries submitted sequentially.
Incremental grouping algorithm chooses a plan ((A
B) C).
Neither of these groups can be used for Q2.

ABC
ABC
BC
AB
BC
35
Dynamic Re-grouping (2)

Existing queries are not regrouped with new
grouping opportunities introduced by subsequent
queries.
Reduction in the overall performance - queries
are continuously being added and removed.
Naive regrouping-algorithm periodically perform
a global query optimization
Expensive
Redundant work (already done by incremental opt.)

36
Data Structures

A query graph directed acyclic graph, with each
node representing an existing join expression in
the group plan.
Node
char query //ASCII query plan
SIG_TYPE sig //signature of the query string
int final_node_count //number of users that
require this query.
//0 non-final node gt0 final node
listltChildgt children //children of this node,
where ChildNode, weight
listltNodegt parents //parents of this node
float updateFreq //update frequency of this
node
float cost //the cost for computing this node
//Following data structures used only for dynamic
regrouping
int reference_count //reference count
bool visited //a flag that records whether
//purgeSibling has performed on this node

37
Data Structures (2)

A group table array of hash tables.
i-th hash table - queries with query length
(number of joins) i.
Hash table entry - mapping from a query string
to the corresponding node in the graph.

Array
Hash
Node
38
Data Structures (3)

A query log array of vectors.
Stores new nodes that have been added since the
last regrouping.
Cleared after regrouping.

Array
Vector
Node
39
Incremental Grouping Algorithm

Top-down local exhaustive search
If the query exists, increases the final node
count by 1.
Else
Enumerates all possible sub-query in a top-down
manner and probes the group table to check
whether a sub-query node exists.
Computes the minimal cost of using existing
sub-query nodes.
Computes the minimal cost without using existing
sub-query nodes.
The least-costly plan will be chosen.

40
Dynamic Regrouping Algorithm

Phase 1 constructing links among existing nodes
and new nodes.
Phase 2 find minimal-weighted solution from the
current solution by removing redundant nodes.

ABC
BC
AB
41
Phase 1 constructing links among existing nodes
and new nodes

Main idea - for any pair of nodes in the graph,
if one node is a sub-query of another node, it
creates a link between them if it did not exist
before.
Relationships are only evaluated between existing
nodes and nodes added since last regrouping.
The difference of levels between a parent and a
child is always 1.

42
Phase 1 - Algorithm

bottom-up
for each node in level i query log
if node has parents in level i1 group table
connect node to parent
if node has children in level i-1 group table
connect node to children

43
Phase 2 A greedy algorithm for level-wise graph
minimization

Main idea traverse the query graph
level-by-level and attempt to remove any
redundant nodes at one level a time.
Starts from the second level from the top.
Subset of level i nodes retain if
Nodes at level i1 have at least one child in
this set.
These nodes have a minimum total cost.
Nodes that are not selected are removed
permanently.

44
Phase 2 - Algorithm
MinimizeGraph() for each level L in
group-table // L ranging from the maximum
number of join-1 to 1 for each node N in
the level-L group table
InitializeSet(N) for each node
N in finalSet PurgeSiblings(N)
while (remain set is not empty)
scan each node R in the remain set
if (Rs reference count 0)
remove R from the remain set
deleteNode(R)
else if (R.cost/R.reference_
count lt
Current_minimum)
MR
Current_minimum
R.cost/R.reference_count
//scan
remove M from the remain set
PurgeSiblings(M) //while
//for each level //MinimizeGraph
InitializeSet(Node N) if N is a final
node Add N into final_set else
add N into the remain_set
N.reference_count
number of parents of N
N.visited false purgeSiblings(Node N)
For each parent P of N if
(!P.visited) Decrease the
reference count of Ns siblings
of same parent P by 1
P.visited true
45
Cost Analysis

N number of queries
Number of nodes is proportional to the number of
queries CN
Each query contains no more then 10 joins.
Each level contain about CN/10 nodes

46
Cost Analysis Phase 1

R or KR regrouping frequencies
In frequency R
N/R number of regrouping
CR number of nodes that will be joined with
existing nodes.
mCR number of nodes after m-1 regrouping.
m(CR)2 number of comparisons for m-th
regrouping (ignoring a constant reduction).

47
Cost Analysis Phase 1 (2)

Total number of comparisons, frequency R
(CR)22(CR)2N/R(CR)2
N(NR)C2/2 O(N2)
Total number of comparisons, frequency KR
(CKR)2(N/(KR))(CKR)2
N(NKR)C2/2
The ratio
N(NKR)C2/2/N(NR)C2/2 (NKR)/(NR)

48
Cost Analysis Phase 2

Worst case each pass remove one node.
Cost for a level
(CN/10) (CN/10-1) 1
CN(CN10)/200 O(N2)
Purge siblings
(CN/10 CN/10) (CN)2/100 O(N2)
All 9 levels O(N2)

49
References

NiagaraCQ A Scalable Continuous Query System for
Internet Databases
http//www.cs.wisc.edu/niagara/papers/NiagaraCQ.p
df
Design and Evaluation of Alternative Selection
Placement Strategies in Optimizing Continuous
Queries
http//www.cs.wisc.edu/niagara/papers/Icde02.pdf
Dynamic Re-grouping of Continuous Queries
http//www.cs.wisc.edu/niagara/papers/507.pdf