SPARQLeR: Extended Sparql for Semantic Association Discovery - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

SPARQLeR: Extended Sparql for Semantic Association Discovery

Description:

PREFIX opus: http://lsdis.cs.uga.edu/projects/semdis/opus# SELECT ?end_publication WHERE ... regex(%path, '(opus:cites_publication) ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 35
Provided by: Mac77
Category:

less

Transcript and Presenter's Notes

Title: SPARQLeR: Extended Sparql for Semantic Association Discovery


1
SPARQLeR Extended Sparql for Semantic
Association Discovery
  • Krzysztof Kochut and Maciej Janik

ESWC 2007, Innsbruck, Austria June 4, 2007
Work supported by the National Science Foundation
Grant No. IIS-0325464, entitled SemDIS
Discovering Complex Relationships in the Semantic
Web.
2
Paths in RDF
child
child
older
works_for
child
Directed path
child
child
Undirected path, but with specific properties
and directionality
Undirected path
3
Why are paths interesting ?
  • A path describes how entities are related.
  • Relationships on the path define meaning of this
    connection.
  • Entities on the path specify the content.
  • Do you have migraine? Try taking magnesium!
  • Path discovered by Dr. D.R.Swanson from partial
    information available in PubMed publications
  • stress can lead to loss of magnesium in the human
    body
  • migraine patients seem to be experiencing stress
  • thats why
  • migraine could lead to a loss of magnesium, so
    take magnesium to fight migraine!

Swanson, R.D. Migraine and Magnesium Eleven
Neglected Connections. Perspectives in Biology
and Medicine, 31 (4). 526-557.
4
Formally, what is a simple path ?
  • Simple directed path between resources r0 and rn
    in a description base R
  • sequence r0 p1 r1 p2 r2 , , pn-1 rn-1 pn rn
    (ngt0)
  • r0 p1 r1, r1 p2 r2 , , rn-2 pn-1 rn-1, rn-1 pn
    rn (ngt0) are triples in R.
  • all of the resources ri (0 i n) in the path
    are distinct
  • Simple undirected path between resources r0 and
    rn in R
  • sequence r0 p1 r1 p2 r2 , , pn-1 rn-1 pn rn
    (ngt0)
  • for each ri-1 pi ri (0 lt i n) in the path,
    either ri-1 pi ri or ri pi ri-1 is a triple in R
  • all of the resources ri (0 i n) in the path
    are distinct

5
Paths and SPARQL
  • SPARQL query can express only static graph
    patterns.
  • Some flexibility is introduced by an OPTIONAL
    part, but it does not solve path problems.
  • No support for flexible length path expressions.
  • Glycan biosynthesis pathway in biology has a
    specific pattern (properties), but its length may
    be unknown.
  • Path discovery may be of unknown length and
    pattern, like in Dr. Swansons example.

6
What we need to discover paths?
  • Knowledge discovery needs more flexible patterns.
  • Patterns may be partially known or even unknown
    (unrestricted path).
  • Properties on the path, their order and
    directionality create a specific meaning.
  • Entities on the path provide content.
  • Relationships to entities outside of the path
    give an additional context.

7
Proposed extensions
  • A path may have a flexible length
  • For computational reasons, length is limited.
  • Constraints on properties
  • Specific properties must appear in the path.
  • Their order and directionality is meaningful.
  • They can form a repeating pattern.
  • Constraints on resources
  • Specific resources must be on the path.
  • They can be anywhere on the path or at specific
    positions.

8
SPARQLeR
  • Extension of SPARQL for semantic association
    discovery.
  • Seamlessly integrated into the SPARQL syntax.
  • Graph patterns incorporating simple paths with
    constraints.
  • Constraints are based on regular expressions over
    properties.

9
What is a path in SPARQLeR ?
  • Path is a meta-property that connects two
    resources.
  • Defined as a sequence of interleaving properties
    and resources.
  • Starts and ends with properties (endpoint
    resources are not included).
  • A path of length 1 is a sequence with just one
    property.
  • ltrdfClass rdfabout"http//meta.org/rdf-meta-sch
    emaPath"gt
  • ltrdfsisDefinedBy rdfresource"http//meta.org/r
    df-meta-schema"/gt
  • ltrdfssubClassOf rdfresource"http//www.w3.org
    /1999/02/22-rdf-syntax-nsProperty"/gt
  • ltrdfssubClassOf rdfresource"http//www.w3.org
    /1999/02/22-rdf-syntax-nsSeq"/gt
  • ltrdfslabelgtPathlt/rdfslabelgt
  • ltrdfscommentgtThe class of RDFMS
    paths.lt/rdfscommentgt
  • lt/rdfClassgt

10
Path patterns in SPARQLeR
  • Meta-property similar concept to a property
  • Resource property? Resource
  • Resource path? Resource
  • Path as a Sequence
  • Test if a resource is in the path
  • rdfsmember
  • Test if a resource is at a specific position in
    the path
  • rdf_2, rdf_4, ...
  • SPARQLeR-specific path properties
  • Test all resources or all properties in the path
  • rdfmsentityResource and rdfmspropertyResource
  • Example all resources on a path must be of type
    fooPerson

11
Path pattern anatomy
p1
p1
p2
p3
12
Path types in SPARQLeR
  • Directionality of relationships in the path
    defines its specific semantics.
  • SPARQLeR allows definition of the following path
    types
  • As defined in graph theory
  • Directed
  • Undirected
  • SPARQLeR specific extension
  • Defined directionality path (includes directed
    path)

13
Directionality of properties in path
  • Defined directionality paths
  • Neither directed nor undirected
  • Each property in a path has a specified
    directionality
  • Example simple graph with p relationship
  • (a) X p Y, directed path
  • (b) X p Y, undirected path
  • (c) X ( p p-1 ) Y, directional path

(a)
(b)
(c)
p
p
p
p
X
Y
p
p
p
p
14
Inverse property operator
  • In standard SPARQL there is no need for inverse
    property operator
  • Pattern syntax is based on individual statements,
    so it is easy to reverse direction.
  • Defining path constraints requires the inverse
    operator
  • A pPath expression defines constraints on
    properties, not on individual statements.
  • Without the inverse property operator some paths
    constraints would be impossible to express (as
    shown in the previous example).

15
RegExp in path constraints
  • Path constraints on properties are based on
    regular expressions
  • Uses syntax similar to lex
  • Easy for grep users
  • Examples
  • a c d a (bc) a
  • abc c? d ( b a-1 ) c

16
Path constraints in SPARQLeR
  • Defined as regular path expressions
  • Can specify patterns of properties in the path
  • Directionality requirement needs the inverse
    operator ? (- minus) p
  • Supported regular expressions
  • p (single property)
  • -p (the inverse of p)
  • p1 p2 ... pn (class of properties)
  • -p1 p2 ... pn (class of inverse properties)
  • p1 p2 .. pn (complement of properties)
  • -p1 p2 .. pn (inverse of complement of
    properties)

. (wildcard) x y (alternative) xy
(concatenation) x (Kleene star) x (one or more
repetition) (x) (match a path matched by x)
17
Path constraints (contd)
  • Class of properties and inverse operator
  • Complement operator can be applied only to
    defined properties, not their inverses
  • Inverse operator
  • Not allowed inside class of properties
  • Inverses set created from defined properties
  • Example
  • properties q r s t
  • rt ? q s
  • qr ? t-1 s-1 (inverses)
  • (st t) ? q r q-1 r-1 s-1

18
Integrating paths into SPARQL
  • Path variable binds a path
  • Name begins with instead of ?
  • Simple patterns path between two resources
  • SELECT ?prop WHERE ltrgt ?prop ltsgt
  • SELECT path WHERE ltrgt path ltsgt
  • Single source path
  • SELECT path, ?res WHERE ltrgt path ?res

19
Integrating paths into SPARQL
  • Resources on the path
  • SELECT path WHEREltrgt path ltsgt . path
    rdfsmember ltegt
  • SELECT path WHEREltrgt path ltsgt . path rdf_1
    ltpgt
  • Listing path elements list operator
  • SELECT list(path) WHERE ltrgt path ltsgt

20
Expressing path constraints
  • Bounded path length
  • only constants allowed
  • FILTER(length(path)lt5)
  • FILTER(length(path)gt3 length(path)lt7)

21
Expressing path constraints
  • Constraints added as a regular expression filter
    (existing syntax in SPARQL)
  • regex( pathvariable, pathexpr, pathflags )
  • FILTER(regex(path,.fooprop.,uis))
  • Flags i (instances) s (schema) l (literals)
    h (match using hierarchy) d (set
    directionality) u (undirected)
  • Default flags d i

22
Some examples
  • SELECT list(path), ?res WHERE
  • ltrgt path ?res .
  • path rdfsmember ?x .
  • ?x foolocatedIn wikiEurope
  • FILTER(regex(path,fooprop)
  • SELECT list(path) WHERE
  • ltrgt path ltsgt .
  • path rdfmsentityResource ?x .
  • ?x rdftype fooPerson
  • FILTER(regex(path,(foopropfoorel),u)
  • SELECT list(path) WHERE
  • ltrgt path ltsgt
  • FILTER(length(path)lt6 length(path)gt4
  • regex(path,(fooprop -foorel))

23
SPARQLeR Prototype Implementation
  • Prototype implementation is based on BRAHMS
    RDF/S main memory storage.
  • Path search based on a bi-directional BFS for
    simple paths.
  • Checking of path constraints in regex is
    implemented as a simulation of DFAs.

Janik, M. and Kochut, K., BRAHMS A WorkBench RDF
Store And High Performance Memory System for
Semantic Association Discovery. ISWC 2005
24
Implementation details
  • Each path expression (FILTER regex) is translated
    into a DFA.
  • For path between two resources, partial
    constraints are checked while building the search
    trie from both endpoints forward and reverse
    DFAs
  • When a path is connected,the forward DFA used
    to check the full (path) constraint.

25
Experiments biology pathway
  • Biosynthesis paths in biology (glycomics)
  • How specific glyco peptide is created from a
    basic structure?
  • Find pathway between dolichol phosphate and glyco
    peptide G00009
  • Path has 15 reactions (30 hops, as each reaction
    is represented by its substrates and products)
  • Only undirected path connects the endpoint
    resources, but a specific directionality pattern
    is present
  • RDF representation sample reactions in
    the path

26
Experiments biology pathway
  • Functionality test - proof of concept
  • N-glycan biosynthesis pathway

SELECT list(path) WHERE glycodolichol_phosph
ate path glycoglyco_peptide_G00009 . path
rdfsmember enzyoR05969 FILTER ( length(path)
lt 30 regex(path,
"((-glycohas_acceptor_substrate
-glycohas_reactant) glycohas_product)" ) )
Ontology GlycO Length 30 hops Consists
of 15 reactions Search time milliseconds (less
than 1 tick)...
courtesy of Dr. Alison Vandersall-Nairn,
University of Georgia
27
Experiments
  • Scalability
  • Modified DBLP datasets in RDF (added random
    citations)
  • Test on increasing dataset (adding older years of
    publications)
  • Search for cited publications (transitive)
  • PREFIX opus lthttp//lsdis.cs.uga.edu/projects/sem
    dis/opusgt
  • SELECT ?end_publication WHERE
  • lthttp//dblp.uni-trier.de/rec/bibtex/journals/ai/H
    uber06gtpath ?end_publication
  • FILTER ( length(path)lt26 regex(path,
    "(opuscites_publication)" ) )

B. Aleman-Meza et. al. Semantic Analytics on
Social Networks Experiences in Addressing the
Problem of Conflict of Interest Detection.
(WWW2006)
28
Experiments dataset characteristics
29
Experiments results single source paths
Search paths up to length 26
30
Experiments results two endpoint paths
31
More complex uses of path expressions
  • Discover connecting paths with a shared node
  • Path between A and B, length up to 4
  • Path between C and D, length up to 4
  • Both paths have a shared resource

A path_1 B length(path_1) lt 4
?x
C path_2 D length(path_2) lt 4
path_1 rdfsmember ?x path_2 rdfsmember ?x
Potential subgraph discovery
32
SPARQLeR summary
  • Path expressions
  • use of regular expressions over properties
  • Flexible path specification
  • Undirected
  • Defined directionality paths
  • Directed
  • Length restricted
  • Complex path patterns
  • Test of resources and properties on the path
  • Intersecting paths

33
Conclusion and future work
  • SPARQLeR extension fits seamlessly into the
    current SPARQL syntax.
  • Performance of path queries is acceptable (if
    defined expression is highly selective).
  • Optimization of path queries, complex expressions
    and multiple paths in query.
  • Inclusion of context.

34
SPARQLeR Krys Kochut, Maciej Janik
  • Thank you

35
Predicate Vs. Statement expressions
  • Predicate alphabet
  • p
  • -p
  • _ (wildcard)
  • simplicity
  • Statement alphabet
  • s p o
  • _ p o
  • s _ o
  • s p _
  • _ _ o
  • _ p _
  • s _ _
  • _ _ _

Additional rules Which statement pattern can
be connected withwhich one
Write a Comment
User Comments (0)
About PowerShow.com