Background Knowledge for Ontology Construction - PowerPoint PPT Presentation

1 / 11

About This Presentation

Title:

Background Knowledge for Ontology Construction

Description:

Title: PowerPoint Presentation Last modified by: Blaz Fortuna Created Date: 1/1/1601 12:00:00 AM Document presentation format: On-screen Show Other titles – PowerPoint PPT presentation

Number of Views:51

Avg rating:3.0/5.0

Slides: 12

Provided by: www2006Org

Category:

more less

Transcript and Presenter's Notes

Title: Background Knowledge for Ontology Construction

1
Background Knowledge for Ontology Construction

Blaž Fortuna,
Marko Grobelnik,
Dunja Mladenic,
Institute Jožef Stefan, Slovenia

2
Bag-of-words

Documents are encoded as vectors
Each element of vector corresponds to frequency
of one word
Each word can also be weighted corresponding to
the importance of the word

There exist various ways of selecting word
weights. In our paper we propose a method to
learn them!

computer 0.9
mathematics 0.8
are 0.01
and 0.01
science 0.9

Important
Word Weigts
Noise
computer 2
mathematics 2
are 1
and 4
science 3

computer 1.8
mathematics 1.6
are 0.01
and 0.04
science 2.7

3
SVM Feature selection

Input
Set documents
Set of categories
Each document is assigned a subset of categories
Output
Ranking of words according to importance
Intuition
Word is important if it discriminates documents
according to categories.

Basic algorithm
Learn linear SVM classifier for each of the
categories.
Word is important if it is important for
classification into any of the categories.
Reference
Brank J., Grobelnik M., Milic-Frayling N.
Mladenic D. Feature selection using support
vector machines.

4
Word weight learning

Algorithm
Calculate linear SVM classifier for each category
Calculate word weights for each category from SVM
normal vectors. Weight for i-th word and j-th
category is
Final word weights are calculated separately for
each document

The word weight learning method is based on SVM
feature selection.
Besides ranking the words it also assigns them
weights based on SVM classifier.
Notation
N number of documents
x1, , xN documents
C(xi) set of categories for document xi
n number of words
w1, , wn word weights
nj1, , njn SVM normal vector for j-th
category

5
OntoGen system

System for semi-automatic ontology construction
Why semi-automatic?The system only gives
suggestions to the user, the user always makes
the final decision.
The system is data-driven and can scale to large
collections of documents.
Current version focused on construction of Topic
Ontologies, next version will be able to deal
with more general ontologies.
Can import/export RDF.

There is a big divide between unsupervised and
fully supervised construction tools.
Both approaches have weak points
it is difficult to obtain desired results using
unsupervised methods, e.g. limited background
knowledge
manual tools (e.g. Protégé, OntoStudio) are time
consuming, user needs to know the entire domain.
We combined these two approaches in order to
eliminate these weaknesses
the user guides the construction process,
the system helps the user with suggestions based
on the document collection.

http//kt.ijs.si/blazf/examples/ontogen.html
6
How does OnteGen help?

By identifying the topics and
relations between them
using k-means clustering
cluster of documents gt topic
documents are assigned to clusters gt
subject-of relation
We can repeat clustering on a subset of documents
assigned to a specific topic gt identifies
subtopics and subtopic-of relation

By naming the topics
using centroid vector
A centroid vector of a given topic is the average
document from this topic (normalised sum of
topics documents)
Most descriptive keywords for a given topic are
the words with the highest weights in the
centroid vector.
using linear SVM classifier
SVM classifier is trained to seperate documents
of the given topic from the other document in
the context
Words that are found most mportant for the
classification are selected as keywords for the
topic

7
(No Transcript)
8
Topic ontology of Yahoo! Finances
9
Background knowledge in OntoGen

All of the methods in OntoGen are based on
bag-of-words representation.
By using a different word weights we can tune
these methods according to the users needs.
The user needs to group the documents into
categories. This can be done efficiently using
active learning.

http//kt.ijs.si/blazf/examples/ontogen.html
10
Influence of background knowledge