Ricardo Gacitua1, Pete Sawyer1, Paul Rayson1, Scott Piao2 - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Ricardo Gacitua1, Pete Sawyer1, Paul Rayson1, Scott Piao2

Description:

A existing DAML ontology can be used as a reference and to calculate precision and recall. ... Ontology Edition: the bootstrap ontology is turned into OWL. ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 16
Provided by: ricardo59
Category:

less

Transcript and Presenter's Notes

Title: Ricardo Gacitua1, Pete Sawyer1, Paul Rayson1, Scott Piao2


1
A Framework to Experiment with Different NLP
Techniques
  • Ricardo Gacitua1, Pete Sawyer1, Paul Rayson1,
    Scott Piao2
  • 1 Computing Department, Lancaster University,
    Lancaster, UK
  • 2 School of Computer Science, Manchester
    University, U

Workshop - Issues in Ontology Development and
Use Nottingham, UK. 2007
2
Index
Context Problems Research Question Objectives
Framework Brief Demo Ontolancs
Workbench Further Work
3
Context
Focus
Most initiatives for Ontology Learning
combine techniques to find concepts and
relationships between them.
4
Context
Focus
Most initiatives for Ontology Learning
combine techniques to find concepts and
relationships between them.
However, researchers have realised that the
output for the ontology learning process is far
from being perfect Cimmiano, 2005
Philipp Cimiano, Johanna Völker, Rudi Studer
Ontologies on Demand? - A Description of the
State-of-the-Art, Applications, Challenges and
Trends for Ontology Learning from Text
Information, Wissenschaft und Praxis 57 (6-7)
315-320. October 2006. see the special issue for
more contributions related to the Semantic Web
5
Problem
6
Research Question
  • Can shallow semantic analysis of the kind
    enabled by semantic tagging, together with a
    range of other statistical NLP techniques
    identify key domain concepts?
  • Can it do it with sufficient confidence in
    the correctness and completeness of the result?

7
Background..
8
A Flexible Framework
Phase 1 Part-of-Speech (POS) and Semantic
annotation of corpus Domain texts are tagged
morpho-syntactically and semantically.
Phase 4 Domain Ontology Edition the bootstrap
ontology is turned into OWL. Then it is processed
using an ontology editor (Protégé) to manage the
versioning of the domain ontology and modify or
improve it.
A existing DAML ontology can be used as a
reference and to calculate precision and recall.
Phase 2 Extraction of concepts The domain
terminology is extracted from the tagged domain
corpus by identifying a list of domain candidate
terms. The system provides a set of statistical
and linguistic techniques which an ontology
engineer can combine
  • Phase 3 Domain Ontology Construction Concepts
    extracted during the previous phase are then
    added to a concept hierarchy.

9
Preliminary Results
Some researchers use different text processing
techniques such as stopword filtering,
lemmatization or stemming.
StopWord Filtering Bloehdorn et al., 2006
Lemmatization Buitelaar and Ramaka, 2005
Stemming Kietz et al, 2000
  • S. Bloehdorn and P. Cimiano and A. Hotho
    Learning Ontologies to Improve Text Clustering
    and Classification. Proc of GFKL, 2005.
  • Paul Buitelaar, Srikanth Ramaka Unsupervised
    Ontology-based Semantic Tagging for Knowledge
    Markup In Proc. of the Workshop on Learning in
    Web Search at the International Conference on
    Machine Learning, Bonn, Germany, August 2005.
  • J.Kietz, et al., A Method for semi-automatic
    ontology acquisition from a corporate intranet,
    in Proc EKAW-2000 , France. 2000.

From the preliminary experiments, we can conclude
that the lemmatization technique (Group 3)
produces better results than the stemming
technique (Group 2) for the domain concept
acquisition process.
Our results are consistent with other studies.
For instance, Alkula3 suggests that the
lemmatization may be a better approach than
stemming.
3Alkula, R. 2001. From Plain Character Strings
to Meaningful Words Producing Better Full Text
Databases for Inflectional and Compounding
Languages with Morphological Analysis Software.
Inf. Retr. 4, 3-4 (Sep. 2001), 195-208.
10
Brief Demo
Ontology Framework
11
Conclusions
Main challenge
Our research project addresses an important
challenge of ontology research, i.e. how
quantitatively to evaluate the usefulness and
accuracy of both techniques and combinations of
techniques, when are applied to ontology learning.
Our ontology learning environment in unique in
not only providing a framework for integrating
linguistic techniques, but also possibility an
experimental platform for identifying the most
effective technique or combinations.
12
Further Work
Our Project
OntoLancs A Flexible Framework For Ontology
Learning
Future Work
13
The End
OntoLancs Computing Department Lancaster
University 2006, UK
14
Text2Onto vs. OntoLancs
Text2Onto defines the user interaction as a core
aspect whereas our framework provides support to
process algorithms in a unsupervised mode.
Our framework provides a graphical workflow
engine to provide support for the composition of
complex ensemble techniques.
Our framework uses a plug-in-based structure as
Text2Onto. However, in contrast, it can include
techniques from existing linguistic and ontology
tools by using java APIs.
15
Techniques included into OntoLancs
  1. Grouping by POS
  2. Raw Frequency Filtering
  3. POS Filtering
  4. Lemmatization
  5. Stemming
  6. StopWord Filtering
  7. Frequency Profiling
  8. Syntactic Pattern Co-ocurrences
  9. Window-based Collocations
  10. Semantic Filter (soon)
Write a Comment
User Comments (0)
About PowerShow.com