Title: The graphbased data model:
1 - The graph-based data model
- Storing and manipulating data in
- distributed graphs
-
- (Using RDF and Jena to put the
- SparQL in your smile, and the
- Twinkle in your eye
- and D2R too)
- Michael Grobe
- Biomedical Applications Group
- Research Technologies
- University Information Technology Services
- Indiana University
2 - This presentation in perspective
- This is actually one of a series of presentations
on Linked Data Web and graph database
technologies - Introduction to ontologies
- This presentation on RDF, Jena, SparQL, etc
- OWL and inference over ontologies
- Using graph technologies in bioinformatics
research - In general, these topics appear simple, but are
fraught with complications, limitations, and
qualificationsespecially when the casual user
attempts to compare them with relational data
approaches to the same or similar problems. - In addition, this is a pretty big elephant
surrounded by a lot of blind men. - As a result, this presentation is a survey of
concrete examples of basic components used to
manipulate data using stored as graphs, or
appearing to have been stored in graphs. It
will use the Gene Ontology for some of these
examples.
3 - Table of Contents
- Using graphs to represent data
- Using RDF to represent graphs
- Jena a Java class library for manipulating RDF
- Using SparQL graph templates to query RDF
- Using Twinkle to make SparQL queries
- Using iSPARQL graphical graph templates to query
RDF - Exposing relational data as RDF
- Thinking of SparQL queries as SQL
- Table of Non-contents
- OWL and inference over ontologies
- Using the Semantic Web in bioinformatics research
4 - Using graphs used to represent data
- Here are 2 graphs that represent 2 kinds of
information associated with 4 different persons. - Graph 1 Person ages Graph 2
Favorite Friends
5 - Using graphs to represent data
- Here the 2 graphs are combined using named edges
to represent 2 kinds of information associated
with the same 4 persons. - Graph 3 Person ages (age) and favorite friends
(fav) - Read these links as Smith has age 21 or Jones
has favorite friend Smith to make them more
sentence-like. Each arc is like the
predicate of a sentence, connecting a subject
with an object. (Note that a subject may have
gt 0 arcs of each type.)
6 - Using graphs to represent data
- Data is sometimes represented using so-called
blank nodes to help cluster attributes
together. - Graph 4 Blank nodes linking a name, an age,
and a favorite friend via arcs named name,
age, and fav, as follows - Blank nodes are useful for specifying lists of
items, but are discouraged within the Semantic
Web. Use (dereferenceable) URIs (like
http//www.iu.edu/) whenever possible.
7 - Using URIs and URLs to represent data
- Now if it hadnt already happened someone could
come up with the idea to use URLs to point to Web
documents that describe the exact meaning of
each edge. - For example, some popular magazine could publish
their definition of favorite friend on a page
like - http//CelebrityMagazine.com/fav
- and other documents could define BFF,
long-time-friend, family-friend, etc, And, in
fact, these definitions could themselves refer to
other definitions like some superset of
relationships such as - http//SomeCelebrityMagazine.com/personal_rel
ationships - or the personal_relationships file, itself, could
include a collection of definitions, including
favorite friend, or fav, that we might refer
to as - http//SomeCelebrityMagazine.com/personal_rel
ationshipsfav - using the convention for targeting a specific
location within a URL. - Of course, for a lot of applications this would
all be unnecessary some URI could just be used
to indicate an edge type known to the file
creator.
8 - Using RDF-XML to serialize graphs
- Graphs can be serialized or represented in a
textual format. When graphs are serialized, each
connection is represented by 3 components, a
so-called RDF triple. Each triple is composed
of a subject, predicate and object where
each edge between each pair of entities becomes a
named predicate. - Each subject is represented as
- - a blank node, such as _2,
- - a literal value, such as valuetype where
type is some URI, - that defines a data type, as in 21age, or
- - a URI, like http//fake.host.edu/smith
- Each object is represented as
- - a blank node
- - a literal value, or
- - a URI
- Each predicate is represented as
- - a URI, like http//fake.host.edu/contact-schema
fav, or an - abbreviated URI like exampleage which
represents a URI that will - be expanded by substituting a value for the
stringexample. If the
9 - Graph 3 as a set of 12 triples (3 for each
person) - -------------------------------------
- Subject Predicate Object
-
- Blake examplefav Blake
- Blake exampleage "12"
- Blake examplename "Blake"
-
- Jones examplefav Smith
- Jones exampleage "35"
- Jones examplename "Jones"
-
- George examplefav Smith
- George exampleage "21"
- George examplename "George"
-
- Smith examplefav Jones
- Smith exampleage "21"
10 - Two ways to represent the Graph 3 triples using
RDF-XML -
- Properties encoded as XML entities
- ltrdfRDF Â Â Â Â Â xmlnsrdf"http//www.w3.org/1999/0
2/22-rdf-syntax-ns" Â Â Â Â Â xmlnsexample"http//f
ake.host.edu/example-schema"gt      ltexamplePer
songt          - ltexamplenamegtSmithlt/examplenamegt
ltexampleagegt21lt/exampleagegt - ltexamplefavgtJoneslt/examplegt     lt/examplePer
songt          lt/rdfRDFgt - Properties encoded as XML attributes
- ltrdfRDF Â Â Â Â Â xmlnsrdf"http//www.w3.org/1999/0
2/22-rdf-syntax-ns" Â Â Â Â Â xmlnsexample"http//f
ake.host.edu/example-schema"gt      ltrdfDescrip
tion  examplenameSmith          Â
exampleage21 - examplefavJones
     lt/rdfDescriptiongt          lt/rdfRDFgt
11 - Representing URIs
- In work with RDF you will see URIs abbreviated in
several ways, using namespace, PREFIX and ENTITY
definitions, depending on the context - xmlnslibhttp//some.host.edu/directory
- or
- PREFIX ltlibhttp//some.host.edu/directorygt
- or
- !ENTITY lib http//some.host.edu/directory
- If the namespace abbreviations in the entities
example above get expanded, then Smith is
actually being represented as - ltrdfRDF Â Â Â Â Â xmlnsrdf"http//www.w3.org/1999/0
2/22-rdf-syntax-ns" Â Â Â Â Â lthttp//fake.host.edu/
example-schemaPersongt - Â Â Â Â Â Â Â Â Â Â
- lthttp//fake.host.edu/example-schemanamegt
- Smith
- lt/http//fake.host.edu/example-schemanamegt
- lthttp//fake.host.edu/example-schemaagegt
- 21
12 - Graph 3 using resources to represent each
person - Persons are modeled as resources by replacing
the strings for each node identifier with URIs - ----------------------------------------------
--------------------------------- - Subject Predicate
Object -
- lthttp//fake.host.edu/blakegt examplefav
lthttp//fake.host.edu/blakegt - lthttp//fake.host.edu/blakegt exampleage
"12" - lthttp//fake.host.edu/blakegt
examplename "Blake" -
- lthttp//fake.host.edu/jonesgt examplefav
lthttp//fake.host.edu/smithgt - lthttp//fake.host.edu/jonesgt exampleage
"35" - lthttp//fake.host.edu/jonesgt
examplename "Jones" -
- lthttp//fake.host.edu/georgegt examplefav
lthttp//fake.host.edu/smithgt - lthttp//fake.host.edu/georgegt exampleage
"21" - lthttp//fake.host.edu/georgegt
examplename "George" -
13 - Representing entries in Graph 3 as resources
- Format 1
- ltrdfRDF Â Â Â xmlnsrdf"http//www.w3.org/1999/02/
22-rdf-syntax-ns" Â Â Â xmlnsexample"http//fake.
host.edu/example-schema"gt    ltexamplePerson
rdfabouthttp//fake.host.edu/smithgt - Â Â Â Â Â Â Â ltexamplenamegtSmithlt/examplenamegt
       ltexampleagegt21lt/exampleagegt - ltexamplefav rdfresourcehttp//fake.host
.edu/jones /gt   lt/examplePersongt         Â
lt/rdfRDFgt - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - Format 2
- ltrdfRDF Â Â Â xmlnsrdf"http//www.w3.org/1999/02/
22-rdf-syntax-ns" Â Â Â xmlnsexample"http//fake.
host.edu/example-schema"gt    ltrdfDescription
 abouthttp//fake.host.edu/smith - examplenameSmith
exampleage21 /gt - ltexamplefav rdfresourcehttp//fake.host.
edu/jones /gt   lt/rdfDescriptiongt         Â
lt/rdfRDFgt - Note that the resource URI references in this
example are not real documents they are not
dereferenceable.
14 - A person record using FOAF (from Obitko)
- ltrdfRDF xmlnsrdf"http//www.w3.org/1999/02/22-
rdf-syntax-ns" - xmlnsfoaf"http//xmlns.com/foaf/
0.1/" - xmlns"http//www.example.org/joe
/contact.rdf"gt - ltfoafPerson rdfabout "http//www.example.org
/joe/contact.rdfjoesmith"gt - ltfoafmbox rdfresource"mailtojoe.smith
_at_example.org"/gt - ltfoafhomepage rdfresource"http//www.e
xample.org/joe/"/gt - ltfoaffamily_namegtSmithlt/foaffamily_name
gt - ltfoafgivennamegtJoelt/foafgivennamegt
- lt/foafPersongt
- lt/rdfRDFgt
15 - RDF summary and implications for the Semantic Web
- A graph may be represented as a collection of
triples. - RDF-XML representations of graphs will contain
URIs that - - serve to identify and/or reference syntactic
elements (they define - tag names), and
- - identify and/or name resources subjects,
predicates and/or objects. -
- Such URIs may be imaginary or provide addresses
of actual, dereferenceable, web documents, in
possibly remote locations. - This can result in a Gigantic Global Graph,
usually know as the Linked Data Web or the
Semantic Web, with RDF as one of W3Cs Semantic
Web architectural levels. - If HTML and the Web make all online documents
look like one huge book, RDF, schema, and
inference languages will make all the data in the
world look like on huge database. TimBL - Editors note Here TimBL is using the term
schema to refer to an RDF schema that defines
RDF triples much more loosely than a relational
database schema defines a collection of tables in
a database.
16 - RDF graphs may be interrogated
- - by physical inspection (for anyone willing to
read XML) - - by writing programs that read RDF files,
construct the - represented graphs internally, and then
- - access graph triples in sequential order,
- - select triples according to specified content,
and/or - - apply SparQL queries and access results in
sequential order - - using command-line tools that apply SparQL
queries - - using GUI interfaces accepting SparQL queries
- - written in text, or
- - represented graphically
- - via URLs carrying form data, or SOAP requests
to SparQL endpoints
17 - How to query an RDF graph using Jena
- The Java-based Jena package from HP Labs allows
users to manipulate and query graphs, and
import/export RDF, etc. - You can write a program that uses Jena classes to
- - retrieve and parse an RDF file containing a
graph or a - collection of graphs,
- - store it in memory, and then
- - examine each triple in turn, examine one
component (say, - the subject) of each triple in turn, or
examine only triples that - meet specified criteria.
- For example, one might examine each stored triple
searching for a specific reference URI, or for a
specific literal value. - One might look for persons of a specific age,
21xsdage, in the object portion of each
triple. - Jena also provides support for inference using
rule sets and for querying via SparQL.
18 - Jena example
- In JENA, RDF nodes can have type Resource, URI
Resource, literal, or anonymous (slight
extension to standard RDF). - A Jena model is created by a factory
- Model m ModelFactory.createDefaultModel()
- A Jena ontological model is a model along with a
reasoner(sic) - OntModel m ModelFactory.createOntologyModel()
- Jena can
- - read in an RDF serialized graph (from a
file, URL, etc.) - - write a serialized model to a file or STDOT,
and - - perform standard operations on the model.
For example, given the - populated models m and n, Jena can then do
- Model x m.add( n ) // Union
19 - Reading and writing a model in Jena
- String input FIleName Some-GO-entries-diddled.r
df - Model m ModelFactory.createDefaultModel()
- InputStream in FileManager.get().open(
inputFileName ) - if( in null )
-
- throw new IllegalArgumentException( File not
found.\n ) -
- model.read( in, ) // Treat blank lines as
nulls. - model.write( System.out , N-tripleRDF/XML
XML-ABBREV ) - //which will yield a file of N-triple,
RDF/XML, or XML-ABBREV records.
20 - Cannonical process to examine each triple in a
model - stmtIterator iterator model.listStatements()
// Statements composed of triples - while( iterator.hasNext() )
-
- Statement statement iterator.nextStatement(
) - Resource subject statement.getSubject()
- Property predicate statement.getPredicate()
- // Get the object, which in this example, may
be a Resource or just a string, so - // it is kept in an RDFNode, a superclass of
Resource and literal. - RDFNode object statement.getObject() //
superclass of Resource and literal - // Now process the object here it is just
printed. - System.out.print( subject.toString() )
- System.out.print( predicate.toString()
) - if( object instanceof Resource )
- // its a resource.
- System.out.print(
object.toString() )
21 - Statement iterators for accessing selected
components - There are several methods for creating iterators
over a model - - Some simply list the components of each
triple - - model.listSubjects()
- - model.listObjects()
- - Some compare a specific component with a
specified value, as in - model.listSubjectsWithProperty( Prop p, RDFNode
o) - (which will get you a
collection of subjects possessing - property/predicate p and specific value o)
- - Some compare all components against specific
values in 2 steps - - define a selector possessing specific values
s, p and o, - where null or (RDFNode) null matches
anything - Selector selector new SimpleSelector(
subject, -
predicate, object )
22 - SparQL a graph-based query language
- Sparql is a language that lets users query RDF
graphs . . . using graph patterns (written in N3)
containing variables. - The query engine will return an exhaustive list
of triples that satisfy each query through value
substitution. (aka query by example, QBE). - This process is not always intuitive, and/or SQL
has perverted the minds of a generation of
programmers (J. Random Guy somewhere on the
Web). - SparQL is implemented in Jena through the ARQ
package, and queries may be made from within Java
scripts (McCarthy, 2005), or via a SparQL client
distributed with Jena. The process to make a
query is - - build a query in a .rq file, and
- - execute the query using
- sparql query filename.rq
- or
- sparql.bat query filename.rq
- SparQL does not do inference (except when used
within Jena against an ontological model). -
23 - A SparQL example
- This SparQL example query simply asks for a list
of the first 10 triples in the file specified in
the FROM clause - PREFIX
- rdf lthttp//www.w3.org/1999/02/22-rdf-syntax
-nsgt - PREFIX example lthttp//fake.host.edu/example-sch
emagt - select s o
- from lthttp//kongo.uits.iupui.edu8546/rdf-example
-1.rdfgt - where
-
- s p o .
-
- LIMIT 10
- s, p, and o are variable names that will each
be assigned a value as the query is satisified.
Variable names may also start with ?.
24 - SparQL a graph-based query language
- The basic, partial syntax of a SparQL query is
based on N3(/turtle) and similar to - BASE ltsome URI from which relative FROM and
PREFIX entries will be offsetgt - PREFIX prefix_abbreviation lt some_URI gt
- SELECT some_variable_list
- FROM ltsome_RDF_source gt
- WHERE
-
- some_triple_pattern . .
-
- Notes
- - the lt and gt characters are required
literals, - - the BASE and PREFIX entries are optional and
BASE applies to relative
25 - Querying Graph 3 format 1 using SparQL
- Heres a reminder of one of the representations
used to store of Graph 3 here stored in a file
named rdf-example-1.rdf - lt?xml version"1.0" encoding"UTF-8"?gt
- ltrdfRDF
- xmlnsrdf"http//www.w3.org/1999/02/22-rdf-synt
ax-ns" - xmlnsexample"http//fake.host.edu/example-sche
ma" - gt
- ltexamplePerson rdfabout"http//fake.host.edu/
smith"gt - ltexamplenamegtSmithlt/examplenamegt
- ltexampleagegt21lt/exampleagegt
- ltexamplefav rdfresource"http//fake.host.edu
/jones" /gt - lt/examplePersongt
- lt/rdfRDFgt
26 - A SparQL query against the first data
representation - C\Jena-2.5.7\Jena-2.5.7\batgt cat
query-example-1.rq - PREFIX rdf lthttp//www.w3.org/1999/02/22-rdf-syn
tax-nsgt - PREFIX example lthttp//fake.host.edu/example-sch
emagt - select
- from lthttp//kongo.uits.iupui.edu8546/smiht-forma
t-1.rdfgt - where
-
- s p o .
-
- C\Jena-2.5.7\Jena-2.5.7\batgt sparql.bat --query
query-example-1.rq - --------------------------------------------------
---------------------------- - s p o
- lthttp//fake.host.edu/smithgt examplefav
lthttp//fake.host.edu/jonesgt - lthttp//fake.host.edu/smithgt exampleage
"21"
27 - Querying Graph 3 format 2 using Sparql
- Heres a reminder of the other representation of
Graph 3 stored in a file named
rdf-example-2.rdf - lt?xml version"1.0" encoding"UTF-8"?gt
- ltrdfRDF
- xmlnsrdf"http//www.w3.org/1999/02/22-rdf-synt
ax-ns" - xmlnsexample"http//fake.host.edu/example-sche
ma" - gt
- ltexamplePerson rdfabout"http//fake.host.edu/
smith" - examplenameSmith
- exampleage21 /gt
- ltexamplefav rdfresource"http//fake.host.ed
u/jones" /gt - lt/examplePersongt
- lt/rdfRDFgt
28 - The same SparQL query against the second data
representation - C\Jena-2.5.7\Jena-2.5.7\batgt cat
query-example-2.rq - PREFIX rdf lthttp//www.w3.org/1999/02/22-rdf-syn
tax-nsgt - PREFIX example lthttp//fake.host.edu/example-sch
emagt - select
- from lthttp//kongo.uits.iupui.edu8546/smith-forma
t-2.rdfgt - where
-
- s p o .
-
- C\Jena-2.5.7\Jena-2.5.7\batgt sparql.bat --query
query-example-2.rq - --------------------------------------------------
---------------------------- - s p
o
- lthttp//fake.host.edu/smithgt examplefav
lthttp//fake.host.edu/jonesgt - lthttp//fake.host.edu/smithgt exampleage
"21"
29 - A distributed SparQL query against 4 separate
RDF files - The next query searches 4 dereferenceable files
holding live data in the first representation
format above - C\Jena-2.5.7\Jena-2.5.7\batgt cat
query-example-all.rq - PREFIX rdf lthttp//www.w3.org/1999/02/22-rdf-syn
tax-nsgt - PREFIX example lthttp//fake.host.edu/example-sch
emagt - select
- from lthttp//kongo.uits.iupui.edu8546/smithgt
- from lthttp//kongo.uits.iupui.edu8546/jonesgt
- from lthttp//kongo.uits.iupui.edu8546/georgegt
- from lthttp//kongo.uits.iupui.edu8546/blakegt
- where
-
- s p o .
-
30 - Results of the distributed SparQL query
- C\Jena-2.5.7\Jena-2.5.7\batgt sparql.bat --query
query-example-all.rq - --------------------------------------------------
----------------------------------------- - s p
o
- lthttp//kongo.uits.iupui.edu/blakegt
examplefav lthttp//kongo.uits.iupui.edu/blakegt
- lthttp//kongo.uits.iupui.edu/blakegt
exampleage "12"
- lthttp//kongo.uits.iupui.edu/blakegt
examplename "Blake"
- lthttp//kongo.uits.iupui.edu/blakegt rdftype
examplePerson - lthttp//kongo.uits.iupui.edu/jonesgt
examplefav lthttp//kongo.uits.iupui.edu/smithgt
- lthttp//kongo.uits.iupui.edu/jonesgt
exampleage "35"
- lthttp//kongo.uits.iupui.edu/jonesgt
examplename "Jones"
- lthttp//kongo.uits.iupui.edu/jonesgt rdftype
examplePerson - lthttp//kongo.uits.iupui.edu/georgegt
examplefav lthttp//kongo.uits.iupui.edu/smithgt
- lthttp//kongo.uits.iupui.edu/georgegt
exampleage "21"
- lthttp//kongo.uits.iupui.edu/georgegt
examplename "George"
- lthttp//kongo.uits.iupui.edu/georgegt rdftype
examplePerson
31 The magic of ontologies There are many
defintions of ontology, but in very general
terms, an ontology may be thought of as a
taxonomy of objects (or concepts) based on a
particular relationship between pairs of those
objects (or concepts). A common example of a
taxonomy is an evolutionary tree in which
individual species are related on the basis of
evolutionary descent. That is, one species of
each pair connected by an edge descended from the
other. (Actually, its the members of the
species who evolve, but . . .) Within such
structures no member is considered to have
descended from more than one immediate species.
Within an ontology, however, an object or
concept may have more than one immediate
parent, and no circular sub-graphs are allowed,
so the resulting structure is a Directed Acyclic
Graph (DAG). An ontology can be represented by
a special RDF graph It is special in that
the predicates convey transitivity if A is a
descendant of B, and B is a descendant of C, then
A is a descendant of C. IS_A and PART_OF
relationships are commonly used to build
ontologies.
32 Here is a portion of the GO is_a DAG
(Ashburner, 2004) for molecular function
(example chromatin binding is_a DNA
binding) (Note that this diagram
shows some genes, but the Gene Ontology is
actually a taxonomy of terms that can be used to
describe or annotate genes, rather than a
taxonomy of genes. )
33 - Heres the first entry (of the 26K) in the GO
text version (with all three parts intermixed) - Term
- id GO0000001
- name mitochondrion inheritance
- namespace biological_process
- def "The distribution of mitochondria, including
the mitochondrial genome, into daughter cells
after mitosis or meiosis, mediated by
interactions between mitochondria and the
cytoskeleton." GOCmcc, PMID10873824,
PMID11389764 - synonym "mitochondrial inheritance" EXACT
- is_a GO0048308 ! organelle inheritance
- is_a GO0048311 ! mitochondrion distribution
- You can also get the GO as RDF XML, or as a MySQL
database. A portion of the molecular function
extract on the previous page is shown in RDF XML
on next page
34 - lt?xml version"1.0" encoding"UTF-8"?gt
- ltrdfRDF xmlnsrdf"http//www.w3.org/1999/02/22-r
df-syntax-ns" xmlnsgo"http//www.geneontology.o
rg/dtds/go.dtd"gt - ltgoterm rdfabout"http//www.geneontology.org/
goall"gt (Note all is like root.) - ltgoaccessiongtalllt/goaccessiongt
- ltgonamegtalllt/gonamegt
- ltgodefinitiongtThis term is the most general
term possiblelt/godefinitiongt - lt/gotermgt
- ltgoterm rdfabout"http//www.geneontology.org/go
GO0003674"gt - ltgoaccessiongtGO0003674lt/goaccessiongt
- ltgonamegtmolecular_functionlt/gonamegt
- ltgosynonymgtGO0005554lt/gosynonymgt
- ltgosynonymgtmolecular functionlt/gosynonymgt
- ltgodefinitiongtElemental activities, such as
catalysis or binding, describing the actions of a
gene product at the molecular level. A given gene
product may exhibit one or more molecular
functions.lt/godefinitiongt - ltgois_a rdfresource"http//www.geneontology
.org/goall" /gt - lt/gotermgt
- lt/rdfRDFgt
35 - Find parents of GO0004003 in the example GO
subset - PREFIX xsd lthttp//www.w3.org/2001/XMLSchemagt
- PREFIX rdf lthttp//www.w3.org/1999/02/22-rdf-synt
ax-nsgt - PREFIX go lthttp//www.geneontology.org/dtds/go.dt
dgt - select
- from lthttp//discern.uits.iu.edu8421/Some-GO-entr
ies- - diddled.rdfgt
- where
-
- lthttp//www.geneontology.org/goGO0004003gt
gois_a parent . -
- Result
- C\Jena-2.5.7\batgt sparql.bat --query
GO-paths-from-4003.rq - -----------------------------------------------
- parent
36 - Find all 3-element paths up from GO0004003
- PREFIX go lthttp//www.geneontology.org/dtds/go.d
tdgt - select
- from
- lthttp//discern.uits.iu.edu8421/Some-GO-entries
-diddled.rdfgt - where
-
- lthttp//www.geneontology.org/goGO0004003gt
gois_a a . - a gois_a b .
- b gois_a c .
-
- Note that given a table showing the GO DAG, you
can get this result within SQL using multiple
joins, but you cant find N-element paths in
either language (unless you use inference within
SparQL).
37 - Find all 3-element paths up from GO0004003 using
Twinkle
38 - Query dbpedia for entries about Goethe
- (using Virtuoso iSparql text query)
- Note that the predicate bifcontains is a
Virtuoso Built-In Function that searches
back-end text indexes. It might be possible to
search using a standard SparQL regex FILTER, but
it would be much slower.
39 - The same query using the iSparql graphical QBE
interface - Here is the same query in graphical form as
constructed using the iSparql QBE interface - Components can be dragged-and-dropped from the
menu at the top of the window. The whole
interactive window is shown on the next page.
40 - The same query within the whole iSparql QBE window
41 - Results from the iSparql text and/or QBE queries
42 - Possible applications for ontologies
- Suppose uniprot.org provides a list of 89K
proteins, their mappings to NCBI Gene IDs, and
their GO annotations (which it does), and perhaps
a small subset looks like - XXX GO00003682
- YYY GO00003682
- ZZZ GO00008026
- AAA GO00008096
- And suppose go.org links GO IDs with GO category
names, which it does, - And suppose I have a list of researchers and
their various areas of interest, like - Smith studies gene XXX
- Jones studies nucleic acid binding
- etc.
- Then . . . what kinds of questions can I ask that
would have been difficult before, like
43 - Optional clauses in SparQL queries
- SparQL has more features than presented so far.
- Here are some clauses permitted following the
where clause - order by DESCASC ( variable_list )
- limit n print up to n return values.
- offset n start output with the nth return value.
43
44 - Optional clauses in SparQL queries
- Permitted within where clauses
- FILTER restricts variable matches in the
preceding triple to specified filter patterns, as
in - s p date FILTER ( date gt
"2005-01-01T000000Z"xsddateTime ) - or
- s p d FILTER
- ( xsddateTime( d ) lt
xsddateTime( "2005-01-01T000000Z ) ) - or
- ?s ?p ?name FILTER regex( ?name,
"smi", some_flag ) - UNION where clauses may be constructed as
- triple_pattern_1 UNION
triple_pattern_2
45 - A relational view of the Semantic Web (Newman,
2007) - Relaxing certain requirements normally imposed
upon SQL (specifically type contraints on joined
fields), there are strong similarities among
operations applied to relational and graph-based
models. For example - - triple_pattern . triple_pattern
- approximates an untyped join, as demonstrated
on the next slide - - filter
- approximates an SQL conditional
- - union
- approximates an outer union
- - optional
- approximates a left outer join( R, S ), which
- ? join( R, S ) unioned with an anti-join( R, S),
where an anti-join - ? difference with a semi-join, and a semi-join
- ? join and a projection.
46 - A relational view of the Semantic Web (Newman,
2007) - Here we look at the triple pattern used to find
the 3 hop paths towards the GO root node, - select a, b, c where
- lthttp//www.geneontology.org/goGO000400
3gt - gois_a a .
- a gois_a b .
- b gois_a c .
-
- Which is roughly equivalent to the following SQL
query - select
- a.parent_id, b.parent_id, c.parent_id
- from
- GO.molecular_function_DAG a
- join
- GO.molecular_function_DAG b
47 - Publishing relational data as virtual RDF
stores - So far we have accessed RDF presented mostly from
free-standing files. However, legacy relational
databases can be published as RDF stores on the
Semantic Web by using gateways like D2R and
Virtuoso (commercial). - The D2R approach requires 2 steps
- - interrogate the database via JDBC using
generate-mapping to build a configuration
(mapping) file from the relational table
definitions, and then - - start the D2R server with the mapping file.
- Notes
- - Each table row becomes a separate
resource/graph. - - Primary keys (if any) become resource
identifiers, and - - rows in linked tables identified by foreign
keys may be - merged into the entity (?).
- The D2R utility dump-rdf can also convert an
entire table into RDF form for access in a single
SparQL query.
48 - Accessing data via a SparQL Endpoint
- Since the D2R server makes a SparQL endpoint
available, one can execute queries via HTTP
requests like - http//kongo.uits.iupui.edu6700/sparql?query
- select ?s ?p ?o where ?s ?p ?o .
limit 10 - The D2R server also provides a Web form that can
be used to interrogate its content using SparQL.
This interface is based on an AJAX component
called SNORQL, and available at - http//kongo.uits.iupui.edu6700/sparql
- The D2R server also provides an interface for
users to browse its backend data. To use it you
just Web in to - http//kongo.uits.iupui.edu6700
49 - Portion of a D2R-server mapping file for CLSD
- _at_prefix map ltfile/C/d2r-server-0.4/mapping-clsd
2-GO-DGN.n3gt . - _at_prefix d2rq lthttp//www.wiwiss.fu-berlin.de/suhl
/bizer/D2RQ/0.1gt . - mapdatabase a d2rqDatabase
- d2rqjdbcDriver "com.ibm.db2.jcc.DB2Driver"
- d2rqjdbcDSN "jdbcdb2//libra45.uits.iu.edu5000
0/clsd2" - d2rqusername account"
- d2rqpassword password"
- .
- Table DISEASE_GENE_NET.GENES
- mapDISEASE_GENE_NET_GENES a d2rqClassMap
- d2rqdataStorage mapdatabase
- d2rquriPattern "DISEASE_GENE_NET.GENES/_at__at_DISEASE
_GENE_NET.GENES.GENE_ID_at__at_" - d2rqclass vocabDISEASE_GENE_NET_GENES
- .
- mapDISEASE_GENE_NET_GENES__label a
d2rqPropertyBridge - d2rqbelongsToClassMap mapDISEASE_GENE_NET_GENES
50 - Triple stores
- There exist so-called triple stores that can
use backend data storage engines, like MySQL, to
house RDF data, and process queries. - For example, Sesame is a triple store that can
use serveral different kinds of backends DBMS
(originally PostgreSQL), simple RDF files, and/or
other, network-accessed triple stores, like
Sesame itself. - Sesame also demonstrates a generic architecture
for RDF and RDFS storage and query processing,
and does not require keeping the whole graph in
memory, when processing requests. - Jena can also employ back-end data base
management systems. - There are also some graph based data management
systems, like Neo4j, that can be used to store
raw graph structured data. In fact, Neo4j has at
least one overlay product that uses Neo4j to
manage RDF. - Neo4j may work well for data collections running
into the billions of nodes, since it does not
require its whole graph to be memory-contained
(although it works better with larger memory),
and is quite fast. .
51 - References
- Ashburner, M., et al., Gene ontology a tool for
the unification of biology, Nature Genetics,
2000. - Berners-Lee, Tim, Linked Data, 2006.
http//www.w3.org/DesignIssues/LinkedData.html - Bizer, Chris, The D2RQ Plattform - Treating
Non-RDF Databases as Virtual RDF Graphs,
http//www4.wiwiss.fu-berlin.de/bizer/d2rq/ - Bizer, Chris, Richard Cyganiak, Tom Heath, How
to Publish Linked Data on the Web, 2007. - http//www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDa
taTutorial/ - Cygniak, Richard, A Relational Algebra for
SPARQL, HP Labs, 2005. - http//www.hpl.hp.com/techreports/2005/HPL-2005-17
0.pdf - Davis, Ian, An Introduction to RDF,
http//research.talis.com/2005/rdf-intro/ - Dodds, Leigh, Introducing SparQL Querying the
Semantic Web, 2005. http//www.xml.com/lpt/a/1628
- McBride, Brian, An Introduction to RDF and the
Jena RDF API , 2007. http//jena.sourceforge.net/
tutorial/RDF_API/index.html