Title: Ontology Learning
1Ontology Learning
- David Salz
- david.salz_at_snafu.de
2Motivation (1)
Motivation
- Modeling / maintaining ontologies by hand is
- Slow
- Expensive
- Problem about the Semantic Web is not using the
meta-data but providing it! - Automated help is necessary
- There is plenty of knowledge in the form of
natural language text available on the web
3Motivation (2)
Motivation
- Are hand-made ontologies really better than
machine-made ones? - Modeling and maintaining ontologies by hand is
also - Biased
- Error prone
4Motivation (3)
Motivation
- Goal
- Discover ontologies by analyzing natural language
documents - With or without the help of human experts
- Save time and work
5Machine Learning
Machine Learning
6Learning
Machine Learning
- Learning
- the alteration of behavior as a result of
individual experience. When an organism can
perceive and change its behavior, it is said to
learn. - Learning. Encyclopædia Britannica.
- Retrieved June 28, 2003, from Encyclopædia
Britannica Premium Service.http//www.britannica.
com/eb/article?eu48642
7Machine Learning
Machine Learning
- Problem x ? f(x)
- Function f is too difficult to compute or unknown
- Solution Learning f (or an approximation of f)
- Hypothesis is updated with experience
Hypothesis
x ?
? f(x)
8Machine Learning
Machine Learning
- System is fed with training data
- Supervised learning
- A human teacher gives feedback
- Or the training data is pre-classified
- Unsupervised learning
- Training data is raw system must discover
patterns - We will concentrate on unsupervised learning
9Problems What does the system learn?
Machine Learning
- Training Data
- What you want
- What the system really learns
10Problems Overfitting (1)
Machine Learning
- Training Data
- What you want
- What the system really learns
11Problems Overfitting (2)
Machine Learning
- Lerner adapts too well to training data
- System wont work on new examples
- Remember goal of learning is generalization of
training data - Reasons
- Learning time too long
- Training data too special or inconsistent
12Evaluation
Machine Learning
- It is important to evaluate the quality of
discovered knowledge - Evaluation must be done with independend test
data, NOT with the training data - If necessary, split the available data into test
data and evaluation data beforehand
13Machine Learning
14Text Processing
Text Processing
15Text Processing
Text Processing
to offer
to wish
16Text Processing
Text Processing
- Challenges
- Recognize different grammatical forms
- Identify Names
- Identify compounds
- Identify important keywords / unimportant fillers
- Solutions
- Existing ontologies
- Dictionaries
- Grammars
17Learning Taxonomies
Learning Taxonomies
18Taxonomies
Learning Taxonomies
- structures that provide a way of classifying
things - living organisms, products, books - into
a series of hierarchical groups to make them
easier to identify, study, or locate. - Jean Graef
- 'Managing taxonomies strategically'
19Lexico-Syntactic Patterns (1)
Learning Taxonomies
- Use Regular expressions to find patterns that
describe a semantic relation - X is a n X
- An apple is a fruit
- X ,X , or other X
- Apples, pears, cherries and other fruit
- X, in particular X
- Fruit, in particular apples
- X , X , and X are expamles for X
- Apples, pears and plums are examples for fruit
20Lexico-Syntactic Patterns (2)
Learning Taxonomies
Operating Systems
Unix
21Lexico-Syntactic Patterns (3)
Learning Taxonomies
?????
- No method is ever 100 accurate
- Possible Solutions
- Verifying by hand
- Statistic approaches (dont believe everything
youve seen only once or twice) - Better patterns / better preprocessing
22Lexico-Syntactic Patterns (4)
Learning Taxonomies
- Advantages and Disadvantages
- () quite accurate
- () works on small set of data
- (-) bad scaling
- (-) patterns have to be created somehow
- - by hand
- - through other automatic techniques
23Statistical Clustering (1)
Learning Taxonomies
- Form clusters of similar words
- Split or merge the clusters form a hierarchy
Windows, Unix, Solaris, OS, Operating System
Apple, Cherry, Banana,Fruit, price
Windows, Unix, Solaris
OS, Operating System
24Statistical Clustering (2) Forming Clusters
Learning Taxonomies
- Similarity Measure a mathematical way to express
how similiar two words are - Possible similiarity measures between words
- Words that often appear the same sentence /
paragraph / text - Words that often appear with the same verb
- Exploit existing ontologies, if possible!
25Statistical Clustering (3)
Learning Taxonomies
- Building a hierarchy from clusters
- Top-down start with one big cluster and split it
- Bottom-up start with one-word clusters and join
them
26Statistical Clustering (4)Splitting and Joining
Clusters
Learning Taxonomies
- Use the similarity measure between words to
calculate - similarity of two clusters join the two most
similar clusters - or
- the coherence of a cluster split the most
incoherent cluster
27Statistical Clustering (5)Splitting and Joining
Clusters
Learning Taxonomies
- Computing similarity / coherence of two clusters
- Single linkage similarity of two most similar
objects counts - Complete linkage similarity of two least
similar objects counts - Group-Average average similarity of all objects
is calculated
28Statistical Clustering (6)
Learning Taxonomies
- Advantages and Disadvantages
- () good scaling
- (-) less precise than symbolic approach
- (-) finds relations between words, but cannot
- name the relation must be done by hand
29Learning Relations
Learning Relations
30Relation Learning
Learning Relations
- Assume we have a taxonomy
- We want to use an that taxonomy to discover other
relations between the concepts
31Transactions (1)
Learning Relations
- A transaction is a set of concepts that occured
together
Unix development startet in the late 1960s at
Bell Labs T Unix, development, Bell Labs
32Transactions (2)
Learning Relations
- Extend transaction to include all parent
concepts
Operating System
Company
Unix
Bell Labs
T Unix, development, Bell Labs, Operating
System, Company
33Association Rules (1)
Learning Relations
- An association rule is an expression in the form
X ? Y - Like Bell Labs ? Unix
- We dont know what exactly the relation is at
first, we only want to find out if there is a
relation
34Association Rules (2)
Learning Relations
- We consider all pairs of concepts from our
transaction as association rules - UNIX ? Bell Labs
- Bell Labs ? Unix
- Company ? Unix
- Company ? Operating System
- ...
- Which of these rules are the best?
35Support and Confidence
Learning Relations
- Support of an association rule X ? Y
- Percentage of transactions that contain both X
and Y - Confidence on a rule X ? Y
- Percentage of transactions in which Y appears out
of those in which X appears - You can use confidence and support to compare
rules and select the best candidates
36Selecting the Results
Learning Relations
- Select the rules with best support or confidence
- E.g. user threshold for support
- Present Result to the user
- User gives names to useful rules
- Bell Labs developed Unix
- Prune rules if you have more general rules with
better confidence - More general Bell Labs developed Operating
System
37References
- Maedche, Pekar, Staab Ontology Learning Part
One On Discovering Taxonomic Relations from the
Web - http//www.ontoprise.de/documents/web-intelligenc
e-book.pdf - Maedche, Staab Discovering Conceptual Relations
from Text - Berendt, Hotho, Stumme Towards Semantic Web
Mining - Maedche Development and Application of
Ontologies (Tutorial)
38The End
Thank you for your attention! Questions /
Comments?