Using Termmatching Algorithms for the Annotation of Geoservices

1 / 20

About This Presentation

Title:

Using Termmatching Algorithms for the Annotation of Geoservices

Description:

Information about geographical features such as rivers, lakes, roads, quarries, ... modify hypernym-of Europeanize. Cretaceous period instance-of geological period ... –

Number of Views:31

Avg rating:3.0/5.0

Slides: 21

Provided by: mih95

Category:

more less

Transcript and Presenter's Notes

Title: Using Termmatching Algorithms for the Annotation of Geoservices

1
Using Term-matching Algorithms for the Annotation
of Geo-services
Semantic Web Service Interoperability
forGeospatial Decision Making (FP6-026514)

Miha Grcar1, Eva Klien2
1Joef Stefan Institute, Slovenia
2Institute for Geoinformatics, Germany

2
Introduction and motivation

Geo-data
Provided by geo-services
Information about geographical features such as
rivers, lakes, roads, quarries, geological
structure
Geo-services
Web-based services
Defined by Open GIS Consortium (OGC)
Web Feature Services (WFS)
Spatial filtering
Common interface (syntactically)
HTTP/XML-based
Semantic incompatibility (interoperability issue)
Synonymy (e.g. Aegirite and Acmite is the
same mineral)
Data structured differently
Multiliguality (e.g. river and fleuve is the
same thing)
European project SWING Semantic Web Service
Interoperability for Geospatial Decision Making
STREP in the 6th Framework Programme
http//www.swing-project.org/

This is what weare trying to solve
3
Outline of the talk

Geo-service annotation
Automating the annotation
Text mining
Web as the source of documents
Evaluation
Preliminary evaluation
Larger-scale evaluation
Conclusions and future work

4
Geo-service annotation
Facilitates discovery and composition
How to establish this bridge?
Real world
entities
Spatial
information
objects
5
Geo-service annotation
Domain ontology
WFS
6
Automating the annotation

Term matching is the main building block
Using text mining techniques for term matching
Bag-of-words representation of documents,
document similarity
Clustering and classification
Visualization techniques
Using the Web as the source of documents for text
mining
Search engines
On-line encyclopedias
Dictionaries, thesauruses

7
Automating the annotation
Geo-service
Domain ontology
Schema
open-pit mine
DQuarry
DLegislation
Similarity?
Classifier
8
One possible source of the documents
9
Preliminary evaluation

Dataset 150 mineral names together with their
synonyms
Train a classifier to distinguish between mineral
names

10
Preliminary evaluation

Dataset 150 mineral names together with their
synonyms
Train a classifier to distinguish between mineral
names

Aegirite
Alalite
Allanite
Classifier
Synonym
Diopside
Diopside

Zincblende
Zinc-spinel
Zinc vitriol
11
Preliminary evaluation

Dataset 150 mineral names together with their
synonyms
Train a classifier to distinguish between mineral
names

Aegirite
Alalite
Allanite
Sort andrecommendto the user
Classifier

Zincblende
Zinc-spinel
Zinc vitriol
12
Preliminary evaluation
Sort order
13
Larger-scale evaluation

Datasets
STINET Thesaurus (STINET Scientific and
Technical Information Network)
16,000 terms interlinked with broader-than,
narrower-than, used-in-combination-for,
used-alone-for (2 more)
We took 1,000 term-pairs for each of the
narrower-than and used-alone-for relations
GEMET (General Multilingual Environmental
Thesaurus)
6,000 terms interlinked with broader-than and
related-to
We took 1,000 term-pairs for each of the two
relations
Tourism ontology
710 concepts interlinked with is-a
A set of instances (mostly named entities)
belonging to the concepts
We took 1,000 named entities and their
corresponding concepts, and the entire structure
defined by the is-a relation
WordNet (lexical database for the English
language)
115,000 synsets (i.e. sets of synonymous words)
interlinked with hypernymy, meronymy, entailment,
cause for verbs (6 more)
We took 1,000 word-pairs for each of 9 selected
relations
We also considered the inverted relations for 3
selected relations (e.g. consists-of is inverse
of part-of)

14
Larger-scale evaluation

Examples
GEMET
traffic infrastructure broader-than road network
mineral resource related-to mineral deposit
STINET
numerical methods and procedures used-alone-for
gauss-seidel method
potassium narrower-than alkali metals
Tourism ontology
gliding field is-a sports institution
Warsaw instance-of city
WordNet
do drugs causes trip out
snore entails sleep
modify hypernym-of Europeanize
Cretaceous period instance-of geological period
shuffling meronym-of card game
rum meronym-of rum cocktail
housewife synonym-for homemaker

15
Larger-scale evaluation

Experimental setting
Classification algorithm
k-NN
Centroid classifier
Quotes
Yes exact occurrence
No co-occurrence
We ran experiments on 18 datasets, 4 different
settings on each dataset this means roughly 4 x
18,000 term-pairs altogether
We measured accuracy on top 1, 3, 5, 10, 20, 40
recommended items

16
Larger-scale evaluation
Synonymy
Meronymy
Verbs
17
Larger-scale evaluation
Hyper-/Hyponymy
Class membership (instance-of)...
18
Conclusions

Terms lexical category (e.g. verb vs. noun) has
the largest impact on the accuracy
The dataset has much larger impact on the
accuracy than the choice of the classifier
General vs. specific vocabulary (works better for
specific vocabulary or named entities)
Semantics of the relation (works best for
synonymy)
The centroid classifier faster and slightly
more accurate
Quotes useful on datasets that contain
technical expressions (e.g. STINET)
Inverting the relation has no major impact on the
results

19
Future work