Lei Zou1, Jinghui Mo1, Lei Chen2, M. Tamer - PowerPoint PPT Presentation

About This Presentation
Title:

Lei Zou1, Jinghui Mo1, Lei Chen2, M. Tamer

Description:

gStore: Answering SPARQL Queries Via Subgraph Matching Lei Zou1, Jinghui Mo1, Lei Chen2, M. Tamer zsu3, Dongyan Zhao1 1Peking University, 2Hong Kong University of ... – PowerPoint PPT presentation

Number of Views:101
Avg rating:3.0/5.0
Slides: 41
Provided by: vldbOrg2
Learn more at: https://www.vldb.org
Category:

less

Transcript and Presenter's Notes

Title: Lei Zou1, Jinghui Mo1, Lei Chen2, M. Tamer


1
gStore Answering SPARQL Queries Via Subgraph
Matching
  • Lei Zou1, Jinghui Mo1, Lei Chen2, M. Tamer
    Özsu3, Dongyan Zhao1

1Peking University, 2Hong Kong University of
Science and Technology, 3University of Waterloo
2
Outline
  • Background Related Work
  • Overview of gStore
  • Encoding Technique
  • VS-tree Query Algorithm
  • Experiments
  • Conclusions

3
Outline
  • Background Related Work
  • Overview of gStore
  • Encoding Technique
  • VS-tree Query Algorithm
  • Experiments
  • Conclusions

4
Semantic Web
Semantic Web Technologies is a collection of
standard technologies to realize a Web of Data.
5
RDF Data Model
URI
Literals
URI
6
RDF Graph
Literal Vertex
Entity Vertex
7
SPARQL Queries
SPARQL Query Select ?name Where ?m lthasNamegt
?name. ?m ltBornOnDategt 1809-02-12. ?m
ltDiedOnDategt 1865-04-15.
Query Graph
8
Subgraph Match vs. SPARQL Queries
9
Naïve Triple Store
SPARQL Query Select ?name Where ?m lthasNamegt
?name. ?m ltBornOnDategt 1809-02-12. ?m
ltDiedOnDategt 1865-04-15.
Too many Self-Joins
SQL Select T3.Subject From T as T1, T as T2, T
as T3 Where T1.PredictBornOnDate and
T1.Object1809-02-12 and T2.PredictDiedOnDate
and T2.Object1865-04-15 and T3.
PredicthasName and T1.Subject T2.Subject
and T2. Subject T3.subject
10
Existing Solutions
  • Three categories of solutions are proposed to
    speed up query processing
  • Property Table
  • Jena K. Wilkinson et al. SWDB 03,
  • 2. Vertically Partitioned Solution
  • SW-store D. J. Abadi et al. VLDB 07,
  • 3. Exhaustive-IndexingRDF-3x T. Neumann et
    al. VLDB 08, Hexastore C. Weiss et al. VLDB 08
    ,

11
Existing Solutions-Property Table
SPARQL Query Select ?name Where ?m lthasNamegt
?name. ?m ltBornOnDategt 1809-02-12. ?m
ltDiedOnDategt 1865-04-15.
Reducing of join steps
SQL Select People.hasName from People where
People.BornOnDate 1809-02-12 and
People.DiedOnDate 1865-04-15.
12
Existing Solutions-Vertically Partitioned
Solution
Fast Merge Join
13
Existing Solutions- Exhaustive-Indexing
Range query Merge Join
  • Each SPARQL query statement can be translated
    into one range query.
  • SPARQL Query
  • Select ?name Where ?m lthasNamegt ?name. ?m
    ltBornOnDategt 1809-02-12. ?m ltDiedOnDategt
    1865-04-15.

14
Some Limitations
  • Difficult to handle wildcard queries.
  • Difficult to handle updates.

15
Outline
  • Background Related Work
  • Overview of gStore
  • Encoding Technique
  • VS-tree Query Algorithm
  • Experiments
  • Conclusions

16
Intuition of gStore
Finding Matches over a Large Graph is not a
trivial task.
17
Preliminaries
Literal Vertex
Entity Vertex
18
Storage Schema in gStore
Encoding all neibhors into a bit-string, called
signature.
19
Encoding Technique (1)
Abr, bra, rah, aha, .,
0000 0010 0000 0000
( hasName, Abraham Lincoln)
1000 0000 0000 0000
0010 0000 0000
1000 0010 0100 0001
0000 0000 0100 0000
( BornOnDate, 1809-02-12)
0100 0000 0000
0100 0010 0100 1000
0000 0000 0000 0001
OR
( DiedOnDate, 1865-04-15)
1000 0010 0100 0001
0000 1000 0000
0000 0010 0100 0000
OR
( DiedIn, yWashington_D.c)
0000 0010 0000
1100 0010 0100 1001
0000 0010 0000
1000 0010 0100 0001
20
Encoding Technique (2)
21
Encoding Technique (3)
22
Outline
  • Background Related Work
  • Overview of gStore
  • Encoding Technique
  • VS-tree Query Algorithm
  • Experiments
  • Conclusions

23
A Straightforward Solution (1)
u2
u1
001
004
006
002
003
006
L1
L2
24
A Straightforward Solution (2)
L1
L2
Large Join Space ! ?
001
004
006
002
003
006
25
VS-tree
26
Pruning Technique
Reduced Join Space! ?
u2
u1




10010
001
004
006
002
003
006
27
An Example for Pruning Effect
Query ?x1 yhasGivenName ?x5 ?x1
yhasFamilyName ?x6 ?x1 rdftype
ltwordnet_scientist_110560637gt ?x1 ybornIn ?x2
?x1 yhasAcademicAdvisor ?x4 ?x2 ylocatedIn
ltSwitzerlandgt ?x3 ylocatedIn ltGermanygt ?x4
ybornIn ?x3
Before Pruning After Pruning
x1 810 810
X2 424 197
x3 66 66
x4 36187 6686
28
Query Algorithm-Top-Down
29
Outline
  • Background Related Work
  • Overview of gStore
  • Encoding Technique
  • VS-tree Query Algorithm
  • Experiments
  • Conclusions

30
Datasets
Triple Size
Yago 20 million 3.1GB
DBLP 8 million 0.8 GB
31
Exact Queries
32
Wildcard Queries
33
Outline
  • Background Related Work
  • Overview of gStore
  • Encoding Technique
  • VS-tree Query Algorithm
  • Experiments
  • Conclusions

34
Conclusions
  • Vertex Encoding Technique
  • An Efficient index Structure VS-tree
  • A Novel Filtering Technique.

35
Q/A
Thank You!
zoulei_at_pku.edu.cn
36
Updates- Insertion in G
37
Updates- Insertion in VS-tree
38
Updates- Deletion in VS-tree
To be deleted
39
Framework in gStore
40
A Straightforward Solution (1)
u
u 001 u
Write a Comment
User Comments (0)
About PowerShow.com