Enhanced Semantic Access to Software Artefacts - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Enhanced Semantic Access to Software Artefacts

Description:

Tamara Polajnar. 1. 21.10.08. Tamara Polajnar. 1. 21.10.08. Tamara ... Tamara Polajnar. 1. 1. University of Sheffield NLP. 21.10.08. Tamara Polajnar. 21.10.08 ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 19
Provided by: abd6
Category:

less

Transcript and Presenter's Notes

Title: Enhanced Semantic Access to Software Artefacts


1
Enhanced Semantic Access to Software Artefacts
  • Danica Damljanovic and Kalina Bontcheva

2
Outline
  • Motivation
  • The GATE case study
  • Semantic-based prototype
  • Data collection
  • Automatic content augmentation
  • Storing implicit annotations
  • Querying using text-based queries
  • Example
  • Conclusion and Future work

3
Motivation
  • Large software frameworks
  • hard to maintain never enough documentation
  • hard to find specific information
  • significant learning curve for
  • new developers working on software extensions
  • software engineers who integrate relevant parts
    into their applications

4
Can semantic technologies help?
Software documentation
Web site
source code
forum post
forum post
Web site
source code
Web site
paper
forum post
forum post
Web site
source code
forum post
paper
source code
5
The GATE case study
  • GATE (gate.ac.uk)
  • open-source, General Architecture for Text
    Engineering
  • development team over 15 people at present, over
    30 over the years
  • documentation about GATE software
  • dispersed on the Web not easy to find by
    new/existing developers/users
  • no unified interface Google, gate.ac.uk, gmane
    mailing list search, etc.

6
The GATE case study requirements
  • Automatic generation of reference pages from the
    ontology
  • provide users with a single point of access to
    all knowledge, continuously kept up to date.
  • generate automatically a web page
  • shown on its own or alongside the ontology tree,
    where searched concept is selected

7
Semantic-based prototype
learn domain ontology
Software documentation
store
Semantic repository
annotate content
text-based query
8
Semantic-based prototype detailed view
Semantic repository
Content Augmentation Service
annotations
Content Augmentation Index
9
Data collection
  • Downloaded around 10000 software artefacts about
    GATE
  • source code,
  • source documentation,
  • GATE manual,
  • forum posts,
  • publications.

10
Annotate content
11
Export annotations
  • Merge
  • document metadata and
  • annotations
  • into the owl file using an information-extraction
    ontology
  • PROTON KM (http//proton.semanticweb.org/2005/04/p
    rotonkm)

12
Information-extraction ontology
  • Document class
  • resourceType property refers to the type of the
    document,
  • informationResourceIdenti?er property refers to
    the URL of the annotated document.
  • Mention class
  • occursIn Document
  • hasStartOffset and hasEndOffset storing position
    of the annotation
  • (new) refersAnything to preserve the URI of the
    resource to which the mention is referring to

13
Export annotations
14
Document class
  • ltrdfDescription rdfabout
  • "gateid_ee7ba66b-cd71-4993-9635-777b24f46372"gt
  • ltrdftype rdfresource
  • "http//proton.semanticweb.org/2005/04/protontDo
    cument"/gt
  • ltprotontinformationResourceIdentifiergt
  • http//gate.ac.uk/gate/doc/java2html/gate/creole
    /gazetteer/
  • FlexibleGazetteer.java.html
  • lt/protontinformationResourceIdentifiergt
  • ltprotonkmresourceTypegt
  • Source Code
  • lt/protonkmresourceTypegt
  • lt/rdfDescriptiongt

15
Mention class
  • ltrdfDescription rdfabout
  • "gatemention_0c45b1dc-efab-48a2-8242-bb78c1ddd3b5
    "gt
  • ltrdftype rdfresource
  • "http//proton.semanticweb.org/2005/04/protonkmMe
    ntion"/gt
  • ltprotonkmoccursIn rdfresource
  • "gateid_ee7ba66b-cd71-4993-9635-777b24f46372"/gt
  • ltprotonkmhasStartOffsetgt 404
    lt/protonkmhasStartOffsetgt
  • ltprotonkmhasEndOffsetgt 409 lt/protonkmhasEndOff
    setgt
  • ltgaterefersAnything
  • rdfresource" http//gate.ac.uk/ns/gate-ontology
    NA"/gt
  • lt/rdfDescriptiongt

16
Access knowledge using text-based queries
  • QuestIO (Question-based interface to ontologies)
  • keyword-based queries
  • full-blown questions

17
QuestIOText-based query gtgt SeRQL
Java Class for parameters for processing
resources in ANNIC?
  • select c0,"inverseProperty", p1,
    c2,"inverseProperty", p3, c4,"inverseProperty"
    , p5, i6
  • from c0 rdftype lthttp//gate.ac.uk/ns/gate-ont
    ologyJavaClassgt, c2 p1 c0, c2 rdftype
    lthttp//gate.ac.uk/ns/gate-ontologyResourceParam
    etergt, c4 p3 c2, c4 rdftype
    lthttp//gate.ac.uk/ns/gate-ontologyProcessingRes
    ourcegt, i6 p5 c4, i6 rdftype
    lthttp//gate.ac.uk/ns/gate-ontologyGATEPlugingt
  • where p1http//gate.ac.uk/ns/gate-ontologyparame
    terHasType and p3http//gate.ac.uk/ns/gate-ontol
    ogyhasRunTimeParameter and p5http//gate.ac.uk/
    ns/gate-ontologycontainsResource and
    i6lthttp//gate.ac.uk/ns/gate-ontologyannicgt

18
An example
19
Demo
  • http//gate.ac.uk/document-search

20
Evaluation on coverage and correctness
  • 36 questions extracted from GATE list
  • 22 out of 36 questions were answerable (the
    answer was in the knowledge base)
  • 12 correctly answered (54.5)?
  • 6 with partially corrected answer (27.3)?
  • system failed to create a SeRQL query or created
    a wrong one for 4 questions (18.2)?
  • Total score
  • 68 correctly answered
  • 32 did not answer at all or did not answer
    correctly
  • In similar evaluation AquaLog correctly answered
    58.

21
Comparison with Aqualog
  • removed 6 questions not supported by Aqualog
  • 1 conjunction query What are the run parameters
    of POS Tagger and Sentence splitter?
  • 1 query with brackets Does GATE have a
    coreference resolution component (PR)?
  • 1 query starting with How many. . .
  • 3 queries not full-blown questions, e.g. I
    cannot get Wordnet plugin to work.

22
Future Work
  • optimise query execution time migrate from SeRQL
    gtgt SPARQL
  • include simple ontology-driven data in the
    interface
  • evaluation to follow
  • user-centric evaluation with GATE users

23
Thank you!
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com