Transitioning Legacy Applications to Ontologies: A Handson Tutorial - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

Transitioning Legacy Applications to Ontologies: A Handson Tutorial

Description:

Transitioning Legacy Applications to Ontologies: A Hands-on Tutorial ... of statements which correspond to single construct in a higher level language (e. ... – PowerPoint PPT presentation

Number of Views:69
Avg rating:3.0/5.0
Slides: 14
Provided by: Sum41
Category:

less

Transcript and Presenter's Notes

Title: Transitioning Legacy Applications to Ontologies: A Handson Tutorial


1
Transitioning Legacy Applications to Ontologies
A Hands-on Tutorial
  • Heterogeneous Knowledge repositories for storing
    legacy content requirements, scalability, and
    applicability
  • Atanas Ilchev (Ontotext Lab, Sirma Group)?
  • Zlatina Marinova (Ontotext Lab, Sirma Group)?
  • Atanas Kiryakov(Ontotext Lab, Sirma Group)

2
Outline of the Tutorial
  • Transitioning Web Services And Applications
    Towards Ontologies
  • Tools For Learning Domain Ontologies
  • Heterogeneous Knowledge Repositories
  • Semantic Annotation and Search Of Software
    Artefacts
  • Tools For Transitioning Databases To Ontologies
  • Representing Software Models And Database Schemas
    In Ontologies

3
Challenges and Innovation
  • Semi-automatic transitioning of an application to
    SOA requires access to
  • Data used/managed by the application
  • Data relevant to the application
  • All this data should be
  • Viewed within integrated data model
  • Should be understood by the machine which is
    necessary for aligning/linking information from
    different sources
  • Challenges
  • Integration of ontologies and KB with annotations
    which require reasoning
  • Store document and their content annotations
    together
  • Character-level annotations easily grow in volume
  • Combining FTS with structured queries

4
Challenges and Innovation
  • Innovation
  • Efficient support of meta-data on statement level
  • Grouping statements into manageable groups for
    the purposes of
  • Management of sets of statements which correspond
    to single construct in a higher level language
    (e.g. WSML)
  • Transaction tracking and management
  • Efficient integration of structured data source
    in RDF
  • Definition of access rights and security policies

5
Challenges and Innovation
6
HKS Implementation
7
HKS Implementation
  • TRREE and TRREE Adapter
  • Native support for context and triplesets
  • Sesame 2.0 compliant
  • OWL Repository
  • Ontology level and Entity level methods
  • No translation needed
  • Semantic Annotations and Document Repository
  • SAR data service
  • Encoding based on the PROTON KM schema
  • Document level metadata is stored in the
    semantic repository to enable querying
  • Document storage - file based
  • SAWSDL
  • Web service description annotations
  • stored through SAR
  • working on translation based on WSDL2RDF
    specification
  • WSMO4RDF

8
HKS Implementation
  • Heterogeneous query
  • SPARQL supported by TRREE
  • Sesame 2.0 query evaluation
  • optimizations based on TRREE indices
  • Full-text search
  • Sesame matches string patterns against literals
  • Problem for TRREE disk storage, one IO
    operation for each matching, extremely slow
  • Solution - gt LIKE iterators intersect result
    set with the result set of the iterators over
    query triple patterns
  • Implementation difficulties building the
    full-text index
  • Prefix tree easy to implement, useless for
    suffix search
  • Use Lucene for full-text indexing fast prefix
    search, slow suffix search (no adequate indexing)
    gt full scan or adding OR over all possible
    prefices
  • Suffix array algorithm (chosen approach)
    lexicography sorted strings and substrings

9
ORDI SG
  • HKS is developed on the basis of the second
    generation (SG) of the ORDI framework
  • Feasible integration of different structured data
    sources including RDBMS
  • Backward compatibility with the existing RDF
    specifications and SPARQL query language.
  • Transactional operations over the model (under
    development).
  • Efficient processing and storage of meta-data or
    context information.
  • Grouping statements into manageable groups for
    the purposes of
  • Definition of access rights and signing.
  • Management of sets of statements which correspond
    to single construct in a higher level language.
  • Transaction tracking and management.
  • Easy management of data from several sources
    within one and the same repository (or
    computational environment). Such are the cases of
    having data imported from different files (e.g.
    several ontologies).
  • ORDI is open-source, available at SourceForge
  • http//ordi.sourceforge.net

10
TRREE
  • TRREE stands for Triple Reasoning and Rule
    Entailment Engine
  • TRREE implements storage, indexing, inference and
    query evaluation
  • Native support for ORDI SG data model
  • TRREE performs reasoning through forward-chaining
    and total materialization
  • The inferred closuer is generated and
    maintained up to date
  • Deletion invalidates the inferred closure, so
    it should be computed again -gt slow delete
  • No reasoning is done in query time -gt fast query
    evaluation, possible optimizations as in RDBMS

11
Scalability and Efficiency
  • TRREE is the basic engine of OWLIM SwiftOWLIM
    and BigOWLIM
  • SwiftOWLIM and BigOWLIM benchmarks show that
  • SwiftOWLIM can scale to 10million statements on a
    desktop PC (32bit)
  • SwiftOWLIM can load 6.8 million statements
    (LUBM(50)) in 2 minutes at average speed
    57Kst./sec.
  • SwiftOWLIM can load 80million statements(LUBM(600
    )) on a server in less than 65 minutes
  • BigOWLIM loads 130 million (LUBM(1000))
    statements for 11hours on a desktop machine
  • BigOWLIM can process 1 billion statements
    (LUBM(8000)) in 34 hours on an entry-level server

12
Demo
  • Scenario
  • Previously stored 37 688 annotated documents with
    over 3M annotations
  • Run the following queries
  • get the first 100 entries relevent to a document
    (100 bindings obtained in 7 ms)
  • get the first 100 annotations of type "Person"
    with offset between 100 and 190 (100 bindings
    obtained in 9 ms)
  • get the first 100 annotations which have a
    feature containing the word "Patent (100
    bindings obtained in 6 ms)
  • get the content of the first 100 US patents (100
    bindings obtained in 2 ms)
  • http//rascalli.sirma.bg8081/sar-web/

13
Questions
  • Thank you for your attention!
  • Questions?
  • http//www.ontotext.com/
  • Slides available at http//www.tao-project.eu/res
    ources/eswc08-tao-tutorial.html
Write a Comment
User Comments (0)
About PowerShow.com