PPT – Pr PowerPoint presentation | free to view

About This Presentation

Title:

Pr

Description:

... Ancestor Bloom Filters. Query: a//b. Compute the Bloom Filter ... Structural Bloom Filters. A full system for P2P XML indexing. As opposed to some simulation ... – PowerPoint PPT presentation

Number of Views:38

Avg rating:3.0/5.0

Slides: 31

Provided by: proje76

Category:

Tags: bloom

more less

Transcript and Presenter's Notes

Title: Pr

1
XML processing in DHT networks Serge Abiteboul,
Ioana Manolescu, Neoklis Polyzotis, Nicoleta
Preda, Chong Sun INRIA-Saclay UC Santa-Cruz
Date
1
2
Outline

Topic
KadoP System
Overview of DHT
Query evaluation
Optimization techniques
DPP Distributed postings partitioning
Structural Bloom Filters
Conclusion

2
3
Topic

Querying large volume of content in a P2P network
for a community of users
Focus on indexing
Content XML
P2P network structured - around DHT
XML indexing DHT networks

3
4
Example Edos distribution system

A system for managing Linux distribution
(Mandriva)
System releases
about 10 000 software packages metadata (XML)?
Community of open-source developers thousands
Functionalities
Publish/update releases
Query the metadata
Retrieve packages

4
5
The KadoP system
5
6
DHT A P2P indexing infrastructure
ID15
ID0x mod 24
ID1(x20)mod24
ID2(x21)mod24
Pastry
ID4(x22)mod24
Pointer in the finger table Look-up (K) from
client ID0 Look-up (K) from client ID1
ID8(x23)mod24

Use a ring
each peer takes an ID in the space Modulo(2N)?
each peer stores (K, Object) pairs, for K
satisfying
ID peer K lt ID next peer

Which API?
locate (K) ? Peer IP
get (K) ? Object
put (K, Object)

6
7
Advantages and Disadvantage

Advantages
Availability and reliability
No centralization (bottleneck) and replication
Scalability
Scalable solution for keyword queries
Disadvantage
Difficult to maintain the structure
Not suited for transient population of peers

7
8
XML query processing in KadoP

Query evaluation
Step 1.
Given a XQuery Q, decompose Q in tree pattern
queries
Evaluate each tree pattern query using the DHT
index to identify a set candidates peers P that
can provide answers
Step 2.
Ship Q to these peers P and evaluate it there

8
9
Indexing XML documents
Doc.xml
8
X ancestor of Y ? start(X) lt start(Y) end(X)?
1
A
2
6
7
8
B
C
X parent of Y? X ancestor of Y and level(X)
level(Y) - 1
4
4
4
3
5
6
D
E
F
4
4
John
G
6 6
Posting peer, doc, start, end, level
9
10
XML indexing in DHT

Publish them via a DHT
put (k,postings), where k is a label or a keyword
Remark all the postings for author accumulate at
the same peer

put(authorp2,d2,start,end,lev)?
Posting list for author
p2
DHT
p(author)?
p1
put(authorp1,d2,start,end,lev)?
10
11
Some technical issues

Goal manage millions of documents with thousands
of peers
First experiments were a disaster
First works
Replace the index storage of the DHT in a FS by
storage in a database (Berkeley DB)?
Extend the API of the DHT with Append and not
only Read/Write
Extend the API of the DHT with a streaming
exchange of postings
With this, KadoP scaled but was slow due to e.g.
long postings

11
12
Optimization
12
13
Main issue long postings

Transfer of long posting is hurting performance
Bad response time
Parallelization Distributed Posting Partitioning
(DPP)?
Communication load
Bloom filter Structural Bloom Filter

p(Name)?
long posting for Name
13
14
DPP structure
(p,d)?
p(Name)?
long posting for Name
(p1,d1)?
(p3,d3)?
(p2,d2)?
(p4,d4)?

DPP structure
Split and distribute postings according to
conditions
Each condition is an interval C1(p1,d1),(p2,d2)
Each two conditions are over disjoint intervals
Some kind of B-tree for postings

C1 C2 C3 C4
p(Name)?
14
15
Query processing (no DPP)
8
0
article
article
0
8
QP-peer
abstract
author
author
database
Ullman
8
0
abstract
index-Q
0
0
8
8
Ullman
database
Pipeline transfers of postings to query
processing peer Holistic twig-join algorithm to
compute the result in parallel at QP peer
15
16
Query processing with DPP
At p(client)?
Conditions sorted according (p,d)
p( )?
C2
p( )?
C1
abstract
C5
p( )?
C4
p( )?
C3
XML
Fetch from p(abstract) and p(XML) the conditions
C1-C5 Prune intervals Transfer and compute in
parallel the join for each sub-interval
16
17
Experiments

Platform
Grid5000 P2P platform for research in P2P
systems
Distributed geographically across 6 sites in
France
KadoP tested on more than 100 machines
1000 logical peers
Conclusions in brief
Good performance
KadoP scales very nicely
Issue does not support high churn of peers
(index copying)?

17
18
Query response time
Qarticle//author//Ullman
18
19
Optimization

(b) Structural Bloom Filters
Ancestor Bloom Filter
Also in paper Descendant BF

19
20
Using an Ancestor Bloom Filters
Query a//b Compute the Bloom Filter of the
a-postings and send to p(b)? Compute the
b-postings that have an a-ancestor (and
more)? Send it to the p(a) that can compute the
answer
L(a)?
DHT
p(a)?
L(b)?
F(b, ABF(a))?
p(b)?
20
21
Technique dyadic intervals
Dyadic intervals
23
1
8
1, 4
22
5, 8
1 2 3 4 5 6 7
21
1, 2 3, 4 5, 6 7,
8
start
end
ap
1,1 2,2 3,3 4,4 5,5 6,6 7,7 8,8
20
bp

Dyadic covers
D(ap)1,4, 5,6, 7,7
ap is ancestor of bp if
?? ? D(ap) (start(bp) ? ? )?
Here 3 ? 1,4, so answer is yes!

21
22
Ancestor Bloom Filter (simplified)?

Publication ?d, ?ap in d, ?? ? D(ap)?
Insert a trace in the Bloom Filter
Say Th(d,?) 1 for some has function h
Test for bp in d,
for each dyadic interval ? s.t. start(bp) ? ?,
test if Th(d,?) 1
If one test is positive, conclude bp in d is a
solution
Wrong positives because of Hash collisions

22
23
Query evaluation strategies
p(a)?
p(a)?
ABF(a)?
DBF(b)?
b F(b, DBF(c)? DBF(d))
b F(b, ABF(a))?
p(b)?
p(b)?
ABF(b)?
ABF(b)?
DBF(c)?
DBF(c)?
d F(d, ABF(b))?
p(d)?
p(c)?
p(d)?
p(c)?
c F(c, ABF(b))?
Descendant Bloom Reducer
Ancestor Bloom Reducer
24
Performances
25
Conclusion
25
26
Related works

Very active area
DHT-based platforms for XML data management
Locating data sources (Galanis al. VLDB03)?
XPath lookup queries in P2P networks (Bonifati et
al. WIDM04)?
Other DHT-based systems for data management
PIER query processor (Huebsch al, CIDR05)?
Indexing in P2P networks (Aberer al, VLDB05)?
Dyadic Intervals
Maintenance of dynamic intervals (Gilbert al,
VLDB02)?

26
27
Contribution

Two optimization techniques for index processing
Distributed Posting Partitioning
Structural Bloom Filters
A full system for P2P XML indexing
As opposed to some simulation
Lots of engineering details that are important
for performance
Extensively tested for performance
Tested with a real application, EDOS

27
28
On-going and future work

New indexing techniques
Trading-off precision for performance
Publish summarizations of documents
Index/transfer postings at a coarse level of
detail
Index views (query caching)?
Query optimizer for KadoP
This is standard distributed query processing
Use standard optimization techniques, e.g., use
OptiMax ActiveXML optimizer (demo in ICDE08)
Develop what is specific for KadoP cost model

28
29

Merci

30
Indexing time
30

Write a Comment

User Comments (0)

About PowerShow.com

Pr - PowerPoint PPT Presentation

Pr

... Ancestor Bloom Filters. Query: a//b. Compute the Bloom Filter ... Structural Bloom Filters. A full system for P2P XML indexing. As opposed to some simulation ... – PowerPoint PPT presentation