Transitioning Legacy Applications to Ontologies: A Handson Tutorial - PowerPoint PPT Presentation

1 / 13

About This Presentation

Title:

Transitioning Legacy Applications to Ontologies: A Handson Tutorial

Description:

Number of Views:69

Avg rating:3.0/5.0

Slides: 14

Provided by: Sum41

Category:

more less

Transcript and Presenter's Notes

Title: Transitioning Legacy Applications to Ontologies: A Handson Tutorial

1
Transitioning Legacy Applications to Ontologies
A Hands-on Tutorial

Heterogeneous Knowledge repositories for storing
legacy content requirements, scalability, and
applicability
Atanas Ilchev (Ontotext Lab, Sirma Group)?
Zlatina Marinova (Ontotext Lab, Sirma Group)?
Atanas Kiryakov(Ontotext Lab, Sirma Group)

2
Outline of the Tutorial

3
Challenges and Innovation

Semi-automatic transitioning of an application to
SOA requires access to
Data used/managed by the application
Data relevant to the application
All this data should be
Viewed within integrated data model
Should be understood by the machine which is
necessary for aligning/linking information from
different sources
Challenges
Integration of ontologies and KB with annotations
which require reasoning
Store document and their content annotations
together
Character-level annotations easily grow in volume
Combining FTS with structured queries

4
Challenges and Innovation

Innovation
Efficient support of meta-data on statement level
Grouping statements into manageable groups for
the purposes of
Management of sets of statements which correspond
to single construct in a higher level language
(e.g. WSML)
Transaction tracking and management
Efficient integration of structured data source
in RDF
Definition of access rights and security policies

5
Challenges and Innovation
6
HKS Implementation
7
HKS Implementation

8
HKS Implementation

Heterogeneous query
SPARQL supported by TRREE
Sesame 2.0 query evaluation
optimizations based on TRREE indices
Full-text search
Sesame matches string patterns against literals
Problem for TRREE disk storage, one IO
operation for each matching, extremely slow
Solution - gt LIKE iterators intersect result
set with the result set of the iterators over
query triple patterns
Implementation difficulties building the
full-text index
Prefix tree easy to implement, useless for
suffix search
Use Lucene for full-text indexing fast prefix
search, slow suffix search (no adequate indexing)
gt full scan or adding OR over all possible
prefices
Suffix array algorithm (chosen approach)
lexicography sorted strings and substrings

9
ORDI SG

HKS is developed on the basis of the second
generation (SG) of the ORDI framework
Feasible integration of different structured data
sources including RDBMS
Backward compatibility with the existing RDF
specifications and SPARQL query language.
Transactional operations over the model (under
development).
Efficient processing and storage of meta-data or
context information.
Grouping statements into manageable groups for
the purposes of
Definition of access rights and signing.
Management of sets of statements which correspond
to single construct in a higher level language.
Transaction tracking and management.
Easy management of data from several sources
within one and the same repository (or
computational environment). Such are the cases of
having data imported from different files (e.g.
several ontologies).
ORDI is open-source, available at SourceForge
http//ordi.sourceforge.net

10
TRREE

TRREE stands for Triple Reasoning and Rule
Entailment Engine
TRREE implements storage, indexing, inference and
query evaluation
Native support for ORDI SG data model
TRREE performs reasoning through forward-chaining
and total materialization
The inferred closuer is generated and
maintained up to date
Deletion invalidates the inferred closure, so
it should be computed again -gt slow delete
No reasoning is done in query time -gt fast query
evaluation, possible optimizations as in RDBMS

11
Scalability and Efficiency

TRREE is the basic engine of OWLIM SwiftOWLIM
and BigOWLIM
SwiftOWLIM and BigOWLIM benchmarks show that
SwiftOWLIM can scale to 10million statements on a
desktop PC (32bit)
SwiftOWLIM can load 6.8 million statements
(LUBM(50)) in 2 minutes at average speed
57Kst./sec.
SwiftOWLIM can load 80million statements(LUBM(600
)) on a server in less than 65 minutes
BigOWLIM loads 130 million (LUBM(1000))
statements for 11hours on a desktop machine
BigOWLIM can process 1 billion statements
(LUBM(8000)) in 34 hours on an entry-level server

12
Demo

Scenario
Previously stored 37 688 annotated documents with
over 3M annotations
Run the following queries
get the first 100 entries relevent to a document
(100 bindings obtained in 7 ms)
get the first 100 annotations of type "Person"
with offset between 100 and 190 (100 bindings
obtained in 9 ms)
get the first 100 annotations which have a
feature containing the word "Patent (100
bindings obtained in 6 ms)
get the content of the first 100 US patents (100
bindings obtained in 2 ms)
http//rascalli.sirma.bg8081/sar-web/