gStore: Answering SPARQL Queries Via Subgraph Matching - PowerPoint PPT Presentation

1 / 53
About This Presentation
Title:

gStore: Answering SPARQL Queries Via Subgraph Matching

Description:

gStore: Answering SPARQL Queries Via Subgraph Matching Lei Zou1, Jinghui Mo1, Lei Chen2, M. Tamer zsu3, Dongyan Zhao1 1Peking University, 2Hong Kong University of ... – PowerPoint PPT presentation

Number of Views:96
Avg rating:3.0/5.0
Slides: 54
Provided by: educ5460
Category:

less

Transcript and Presenter's Notes

Title: gStore: Answering SPARQL Queries Via Subgraph Matching


1
gStore Answering SPARQL Queries Via Subgraph
Matching
  • Lei Zou1, Jinghui Mo1, Lei Chen2, M. Tamer
    Özsu3, Dongyan Zhao1

1Peking University, 2Hong Kong University of
Science and Technology, 3University of Waterloo
2
Outline
  • Background Related Work
  • Overview of gStore
  • Encoding Technique
  • VS-tree Query Algorithm
  • Experiments
  • Conclusions

3
Outline
  • Background Related Work
  • Overview of gStore
  • Encoding Technique
  • VS-tree Query Algorithm
  • Experiments
  • Conclusions

4
Semantic Web
Semantic Web Technologies is a collection of
standard technologies to realize a Web of Data.
5
RDF Data Model
URI
Literals
URI
6
RDF Graph
Literal Vertex
Entity Vertex
7
SPARQL Queries
SPARQL Query Select ?name Where ?m lthasNamegt
?name. ?m ltBornOnDategt 1809-02-12. ?m
ltDiedOnDategt 1865-04-15.
Query Graph
8
Subgraph Match vs. SPARQL Queries
9
Naïve Triple Store
SPARQL Query Select ?name Where ?m lthasNamegt
?name. ?m ltBornOnDategt 1809-02-12. ?m
ltDiedOnDategt 1865-04-15.
Too many Self-Joins
SQL Select T3.Subject From T as T1, T as T2, T
as T3 Where T1.PredictBornOnDate and
T1.Object1809-02-12 and T2.PredictDiedOnDate
and T2.Object1865-04-15 and T3.
PredicthasName and T1.Subject T2.Subject
and T2. Subject T3.subject
10
Existing Solutions
  • Three categories of solutions are proposed to
    speed up query processing
  • Property Table
  • Jena K. Wilkinson et al. SWDB 03,
  • 2. Vertically Partitioned Solution
  • SW-store D. J. Abadi et al. VLDB 07,
  • 3. Exhaustive-IndexingRDF-3x T. Neumann et
    al. VLDB 08, Hexastore C. Weiss et al. VLDB 08
    ,

11
Existing Solutions-Property Table
SPARQL Query Select ?name Where ?m lthasNamegt
?name. ?m ltBornOnDategt 1809-02-12. ?m
ltDiedOnDategt 1865-04-15.
Reducing of join steps
SQL Select People.hasName from People where
People.BornOnDate 1809-02-12 and
People.DiedOnDate 1865-04-15.
12
Existing Solutions-Vertically Partitioned
Solution
Fast Merge Join
13
Existing Solutions- Exhaustive-Indexing
Range query Merge Join
  • Each SPARQL query statement can be translated
    into one range query.
  • SPARQL Query
  • Select ?name Where ?m lthasNamegt ?name. ?m
    ltBornOnDategt 1809-02-12. ?m ltDiedOnDategt
    1865-04-15.

14
Some Limitations
  • Difficult to handle wildcard queries.
  • Difficult to handle updates.

15
Outline
  • Background Related Work
  • Overview of gStore
  • Encoding Technique
  • VS-tree Query Algorithm
  • Experiments
  • Conclusions

16
Intuition of gStore
Finding Matches over a Large Graph is not a
trivial task.
17
Preliminaries
Literal Vertex
Entity Vertex
18
Preliminaries
  • RDF graph

19
Preliminaries
  • Query Graph

20
Preliminaries
  • match

21
Preliminaries
  • Problem definition

22
Storage Schema in gStore
Encoding all neibhors into a bit-string, called
signature.
23
Encoding Technique (1)
  • eSig(e).e M.
  • we employ m different string hash functions Hi (i
    1, ...,m)
  • For each hash function Hi, we set the (Hi(eLabel)
    MOD M)-th bit in eS ig(e).e to be 1
  • Encoding Sig(e).n is the same
  • eSig(e).n N
  • n different hash functions

24
Encoding Technique (2)
Abr, bra, rah, aha, .,
0000 0010 0000 0000
( hasName, Abraham Lincoln)
1000 0000 0000 0000
0010 0000 0000
1000 0010 0100 0001
0000 0000 0100 0000
( BornOnDate, 1809-02-12)
0100 0000 0000
0100 0010 0100 1000
0000 0000 0000 0001
OR
( DiedOnDate, 1865-04-15)
1000 0010 0100 0001
0000 1000 0000
0000 0010 0100 0000
OR
( DiedIn, yWashington_D.c)
0110 1010 0000
1100 0010 0100 1001
0000 0010 0000
1000 0010 0100 0001
25
Encoding Technique (3)
26
Encoding Technique (4)
27
Encoding Technique (5)
28
Outline
  • Background Related Work
  • Overview of gStore
  • Encoding Technique
  • VS-tree Query Algorithm
  • Experiments
  • Conclusions

29
A Straightforward Solution (1)
u2
u1
001
004
006
002
003
006
L1
L2
30
A Straightforward Solution (2)
L1
L2
Large Join Space ! ?
001
004
006
002
003
006
31
VS-tree
32
VS-Tree query definition
33
Pruning Technique
Reduced Join Space! ?
u2
u1




10010
001
004
006
002
003
006
34
Query Algorithm-Top-Down
35
Optimized method
  • Too many super edges
  • Which level to start search
  • No brute-force enumeration

36
VS-Tree Insert
  • The criterion in the VS-tree only depends on the
    Hamming distance between the signatures of u and
    the node in VS-tree.
  • the criterion in VS- tree depends on both node
    signatures and Gs structure

37
Updates- Insertion in G
38
Updates- Insertion in VS-tree
39
VS-Tree split
  • the B1 entities of the node will be partitioned
    into two new nodes, where B is the maximal fanout
    for a node in VS-tree.
  • 1. we find two entities that have the maximal
    Hamming distance between them as two seed nodes
  • 2. we associate each left entry with the nearest
    seed node, according to Equation 1.

40
VS-Tree deletion
  • Similar to split
  • if some node d has less than b entries, where b
    is the minimal fanout of node in VS-tree, then d
    is deleted and its entries are reinserted into
    VS-tree.

41
Updates- Deletion in VS-tree
To be deleted
42
Which Level To Begin
  • a concept pruning power of GI with regard to Q
    denoted as P(Q,GI )

43
Estimate P(Q,GI)
44
Finding Valid Child States
  • propose a DFS strategy to find all valid child
    states of J.
  • start a DFS over G beginning from some vertex vi

45
(No Transcript)
46
Outline
  • Background Related Work
  • Overview of gStore
  • Encoding Technique
  • VS-tree Query Algorithm
  • Experiments
  • Conclusions

47
Datasets
Triple Size
Yago 20 million 3.1GB
DBLP 8 million 0.8 GB
48
Offline Performance
49
Exact Queries
50
Wildcard Queries
51
Outline
  • Background Related Work
  • Overview of gStore
  • Encoding Technique
  • VS-tree Query Algorithm
  • Experiments
  • Conclusions

52
Conclusions
  • Vertex Encoding Technique
  • An Efficient index Structure VS-tree
  • A Novel Filtering Technique.

53
Q/A
Thank You!
Write a Comment
User Comments (0)
About PowerShow.com