CSCI 586 Paper Presentation - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

CSCI 586 Paper Presentation

Description:

... can be done in a lazy fashion, continuing only if more information is needed ... Each specialized graph issues a separate query for the pattern (Single query ... – PowerPoint PPT presentation

Number of Views:113
Avg rating:3.0/5.0
Slides: 20
Provided by: dennyz
Category:

less

Transcript and Presenter's Notes

Title: CSCI 586 Paper Presentation


1
CSCI 586 Paper Presentation
  • Efficient RDF Storage and Retrieval in Jena 2
  • Wilkinson Sayers Kuno Reynolds
  • By
  • Nazim Pethani

2
Jena
  • Semantic Web Programmers Toolkit (Java)
  • Offers a simple abstraction of RDF graphs as its
    internal interface
  • Jena 2 Second generation conforms to revised
    RDF specification
  • Jena 2 addresses performance issues
  • - too many joins - single statement table
  • - query optimization - reification storage bloat
  • Paper focuses on Persistent storage in Jena 2

3
Jena (contd.)
  • Provides a rich API for manipulating RDF graphs
  • Graphs stored in memory or databases
  • 2 Key Architectural Goals if Jena 2
  • - Multiple, flexible presentations of RDF graphs
  • - Simple minimalist view (triples)
  • First is layered on top of second
  • Triples from memory, database or virtual
    (inference)

4
Architecture
5
Storage Schema Storing Arbitrary RDF Statements
  • Jena 1- Two different database schemas one for
    Relational databases and one for BerkeleyDB-
    Relational database schema was normalized-
    BerkeleyDB schema was denormalized- Graphs
    stored using BerkeleyDB accessed faster

6
Storage Schema Storing Arbitrary RDF Statements
(contd.)
7
Storage Schema Storing Arbitrary RDF Statements
(contd.)
  • Jena 2- Schema trades-off space for time-
    Denormalized- Resource URIs and literal values
    stored directly in statement table (unless their
    length exceeds a threshold ? separate tables)-
    Prefixes used for differentiation (ex. Database
    references from literals and URIs)

8
Storage Schema Storing Arbitrary RDF Statements
(contd.)
9
Storage Schema Storing Arbitrary RDF Statements
(contd.)
  • Advantage Possible to perform many find
    operations without a join (less time required)
  • Disadvantage More database space
    consumedAddressed by- Compressing common
    prefixes (replacing by database references)-
    Long values only stored once (configurable
    threshold)- Not storing property URI

10
Storage Schema Optimizing for Common Statement
Patterns
  • Common Patterns arise from RDF specifications
    (rdfpredicate, rdfobject ..) or user data
  • Revised RDF specification allows multiple reified
    instances of any statement
  • Jena 2 Property table stores subject-value pairs
    related by a particular property. It stores all
    instances of the property in the graph

11
Jena 2 Persistence Architecture Specialized
Graph Interface
  • Graph interface at higher level supports usual
    operations of add, delete and find
  • Each logical graph implemented by using a list of
    specialized graphs (individually optimized)
  • Any operation on the entire logical graph is
    processed by invoking individual operations on
    each specialized graph in turn
  • Results are combined and returned as result for
    entire graph

12
Jena 2 Persistence Architecture Specialized
Graph Interface
13
Jena 2 Persistence Architecture Specialized
Graph Interface
  • Optimization- Operation need not use all
    graphs- Find operation can be done in a lazy
    fashion, continuing only if more information is
    needed- Overhead of running the operation over
    the database is amortized across several graphs

14
Jena 2 Persistence Architecture Database Driver
  • Generic driver implementation for SQL databases
  • Engine specific drivers for other databases
  • Engine specific drivers override general methods
    as necessary
  • Drivers responsible for tasks like database
    initialization, table creation and deletion etc.
  • Drivers map Java objects to database encoding
  • Drivers use static and dynamically generated SQL
    for data manipulation

15
Jena 2 Persistence Architecture Configuration
and Meta-graphs
  • Configuration parameters specified as RDF
    statements (Jena 1 used a configuration file)
  • Analogous to storing metadata for relational
    databases in tables
  • Default graphs provided with parameters
  • Meta-graph contains metadata about each logical
    graph and can be queried but not modified
  • Meta-graph contains configuration parameters
    other metadata such as driver version, list of
    graphs stored ..

16
Jena Query Processing Find Processing
  • Pattern to be evaluated is passed to each
    specific graph handler
  • Searching is done individually on graphs and a
    completion flag is set at the end
  • Results are concatenated and returned to the
    application
  • Each specialized graph issues a separate query
    for the pattern (Single query might be unwieldy
    for large databases)

17
Jena Query Processing RDQL Processing
  • Jena 1 converted RDQL queries into a pipeline of
    find patterns connected by join variables which
    was evaluated in a nested fashion
  • Jena 2 tries to push the join into the database
    engine. The goal is to convert patterns to a
    single query to be evaluated by the database
    engine
  • In cases where a single query is not possible, a
    combination method is used
  • Queries in Jena 2 may span graphs

18
Miscellaneous Topics (in work)
  • Jena 2 Performance Toolkit Utility programs
    data generator, benchmark suite, RDF data
    analysis tool
  • Jena Transaction Management Richer transaction
    interface
  • Bulk Load Reduction in time to load persistent
    graphs (denormalized schema, JDBC2, query
    optimization)

19
Conclusion
  • Jena 2s denormalized schema is faster than
    normalized schema for Jena 1
  • Increased database size (might be offset by
    several techniques being implemented such as URI
    prefix compression and property class tables for
    reification)
  • Comprehensive study of RDQL still to be performed
  • Lots of future work expected in this field. Jena
    2, though more efficient than Jena 1, still has a
    long way to go
Write a Comment
User Comments (0)
About PowerShow.com