Title: CSCI 586 Paper Presentation
1CSCI 586 Paper Presentation
- Efficient RDF Storage and Retrieval in Jena 2
- Wilkinson Sayers Kuno Reynolds
- By
- Nazim Pethani
2Jena
- Semantic Web Programmers Toolkit (Java)
- Offers a simple abstraction of RDF graphs as its
internal interface - Jena 2 Second generation conforms to revised
RDF specification - Jena 2 addresses performance issues
- - too many joins - single statement table
- - query optimization - reification storage bloat
- Paper focuses on Persistent storage in Jena 2
3Jena (contd.)
- Provides a rich API for manipulating RDF graphs
- Graphs stored in memory or databases
- 2 Key Architectural Goals if Jena 2
- - Multiple, flexible presentations of RDF graphs
- - Simple minimalist view (triples)
- First is layered on top of second
- Triples from memory, database or virtual
(inference)
4Architecture
5Storage Schema Storing Arbitrary RDF Statements
- Jena 1- Two different database schemas one for
Relational databases and one for BerkeleyDB-
Relational database schema was normalized-
BerkeleyDB schema was denormalized- Graphs
stored using BerkeleyDB accessed faster
6Storage Schema Storing Arbitrary RDF Statements
(contd.)
7Storage Schema Storing Arbitrary RDF Statements
(contd.)
- Jena 2- Schema trades-off space for time-
Denormalized- Resource URIs and literal values
stored directly in statement table (unless their
length exceeds a threshold ? separate tables)-
Prefixes used for differentiation (ex. Database
references from literals and URIs)
8Storage Schema Storing Arbitrary RDF Statements
(contd.)
9Storage Schema Storing Arbitrary RDF Statements
(contd.)
- Advantage Possible to perform many find
operations without a join (less time required) - Disadvantage More database space
consumedAddressed by- Compressing common
prefixes (replacing by database references)-
Long values only stored once (configurable
threshold)- Not storing property URI
10Storage Schema Optimizing for Common Statement
Patterns
- Common Patterns arise from RDF specifications
(rdfpredicate, rdfobject ..) or user data - Revised RDF specification allows multiple reified
instances of any statement - Jena 2 Property table stores subject-value pairs
related by a particular property. It stores all
instances of the property in the graph
11Jena 2 Persistence Architecture Specialized
Graph Interface
- Graph interface at higher level supports usual
operations of add, delete and find - Each logical graph implemented by using a list of
specialized graphs (individually optimized) - Any operation on the entire logical graph is
processed by invoking individual operations on
each specialized graph in turn - Results are combined and returned as result for
entire graph
12Jena 2 Persistence Architecture Specialized
Graph Interface
13Jena 2 Persistence Architecture Specialized
Graph Interface
- Optimization- Operation need not use all
graphs- Find operation can be done in a lazy
fashion, continuing only if more information is
needed- Overhead of running the operation over
the database is amortized across several graphs
14Jena 2 Persistence Architecture Database Driver
- Generic driver implementation for SQL databases
- Engine specific drivers for other databases
- Engine specific drivers override general methods
as necessary - Drivers responsible for tasks like database
initialization, table creation and deletion etc. - Drivers map Java objects to database encoding
- Drivers use static and dynamically generated SQL
for data manipulation
15Jena 2 Persistence Architecture Configuration
and Meta-graphs
- Configuration parameters specified as RDF
statements (Jena 1 used a configuration file) - Analogous to storing metadata for relational
databases in tables - Default graphs provided with parameters
- Meta-graph contains metadata about each logical
graph and can be queried but not modified - Meta-graph contains configuration parameters
other metadata such as driver version, list of
graphs stored ..
16Jena Query Processing Find Processing
- Pattern to be evaluated is passed to each
specific graph handler - Searching is done individually on graphs and a
completion flag is set at the end - Results are concatenated and returned to the
application - Each specialized graph issues a separate query
for the pattern (Single query might be unwieldy
for large databases)
17Jena Query Processing RDQL Processing
- Jena 1 converted RDQL queries into a pipeline of
find patterns connected by join variables which
was evaluated in a nested fashion - Jena 2 tries to push the join into the database
engine. The goal is to convert patterns to a
single query to be evaluated by the database
engine - In cases where a single query is not possible, a
combination method is used - Queries in Jena 2 may span graphs
18Miscellaneous Topics (in work)
- Jena 2 Performance Toolkit Utility programs
data generator, benchmark suite, RDF data
analysis tool - Jena Transaction Management Richer transaction
interface - Bulk Load Reduction in time to load persistent
graphs (denormalized schema, JDBC2, query
optimization)
19Conclusion
- Jena 2s denormalized schema is faster than
normalized schema for Jena 1 - Increased database size (might be offset by
several techniques being implemented such as URI
prefix compression and property class tables for
reification) - Comprehensive study of RDQL still to be performed
- Lots of future work expected in this field. Jena
2, though more efficient than Jena 1, still has a
long way to go