Using Termmatching Algorithms for the Annotation of Geoservices

1 / 20
About This Presentation
Title:

Using Termmatching Algorithms for the Annotation of Geoservices

Description:

Information about geographical features such as rivers, lakes, roads, quarries, ... modify hypernym-of Europeanize. Cretaceous period instance-of geological period ... –

Number of Views:31
Avg rating:3.0/5.0
Slides: 21
Provided by: mih95
Category:

less

Transcript and Presenter's Notes

Title: Using Termmatching Algorithms for the Annotation of Geoservices


1
Using Term-matching Algorithms for the Annotation
of Geo-services
Semantic Web Service Interoperability
forGeospatial Decision Making (FP6-026514)
  • Miha Grcar1, Eva Klien2
  • 1Joef Stefan Institute, Slovenia
  • 2Institute for Geoinformatics, Germany

2
Introduction and motivation
  • Geo-data
  • Provided by geo-services
  • Information about geographical features such as
    rivers, lakes, roads, quarries, geological
    structure
  • Geo-services
  • Web-based services
  • Defined by Open GIS Consortium (OGC)
  • Web Feature Services (WFS)
  • Spatial filtering
  • Common interface (syntactically)
  • HTTP/XML-based
  • Semantic incompatibility (interoperability issue)
  • Synonymy (e.g. Aegirite and Acmite is the
    same mineral)
  • Data structured differently
  • Multiliguality (e.g. river and fleuve is the
    same thing)
  • European project SWING Semantic Web Service
    Interoperability for Geospatial Decision Making
  • STREP in the 6th Framework Programme
  • http//www.swing-project.org/

This is what weare trying to solve
3
Outline of the talk
  • Geo-service annotation
  • Automating the annotation
  • Text mining
  • Web as the source of documents
  • Evaluation
  • Preliminary evaluation
  • Larger-scale evaluation
  • Conclusions and future work

4
Geo-service annotation
Facilitates discovery and composition
How to establish this bridge?
Real world
entities
Spatial
information
objects
5
Geo-service annotation
Domain ontology
WFS
6
Automating the annotation
  • Term matching is the main building block
  • Using text mining techniques for term matching
  • Bag-of-words representation of documents,
    document similarity
  • Clustering and classification
  • Visualization techniques
  • Using the Web as the source of documents for text
    mining
  • Search engines
  • On-line encyclopedias
  • Dictionaries, thesauruses

7
Automating the annotation
Geo-service
Domain ontology
Schema
open-pit mine
DQuarry
DLegislation
Similarity?
Classifier
8
One possible source of the documents
9
Preliminary evaluation
  • Dataset 150 mineral names together with their
    synonyms
  • Train a classifier to distinguish between mineral
    names

10
Preliminary evaluation
  • Dataset 150 mineral names together with their
    synonyms
  • Train a classifier to distinguish between mineral
    names

Aegirite
Alalite
Allanite
Classifier
Synonym
Diopside
Diopside


Zincblende
Zinc-spinel
Zinc vitriol
11
Preliminary evaluation
  • Dataset 150 mineral names together with their
    synonyms
  • Train a classifier to distinguish between mineral
    names

Aegirite
Alalite
Allanite
Sort andrecommendto the user
Classifier


Zincblende
Zinc-spinel
Zinc vitriol
12
Preliminary evaluation
Sort order
13
Larger-scale evaluation
  • Datasets
  • STINET Thesaurus (STINET Scientific and
    Technical Information Network)
  • 16,000 terms interlinked with broader-than,
    narrower-than, used-in-combination-for,
    used-alone-for (2 more)
  • We took 1,000 term-pairs for each of the
    narrower-than and used-alone-for relations
  • GEMET (General Multilingual Environmental
    Thesaurus)
  • 6,000 terms interlinked with broader-than and
    related-to
  • We took 1,000 term-pairs for each of the two
    relations
  • Tourism ontology
  • 710 concepts interlinked with is-a
  • A set of instances (mostly named entities)
    belonging to the concepts
  • We took 1,000 named entities and their
    corresponding concepts, and the entire structure
    defined by the is-a relation
  • WordNet (lexical database for the English
    language)
  • 115,000 synsets (i.e. sets of synonymous words)
    interlinked with hypernymy, meronymy, entailment,
    cause for verbs (6 more)
  • We took 1,000 word-pairs for each of 9 selected
    relations
  • We also considered the inverted relations for 3
    selected relations (e.g. consists-of is inverse
    of part-of)

14
Larger-scale evaluation
  • Examples
  • GEMET
  • traffic infrastructure broader-than road network
  • mineral resource related-to mineral deposit
  • STINET
  • numerical methods and procedures used-alone-for
    gauss-seidel method
  • potassium narrower-than alkali metals
  • Tourism ontology
  • gliding field is-a sports institution
  • Warsaw instance-of city
  • WordNet
  • do drugs causes trip out
  • snore entails sleep
  • modify hypernym-of Europeanize
  • Cretaceous period instance-of geological period
  • shuffling meronym-of card game
  • rum meronym-of rum cocktail
  • housewife synonym-for homemaker

15
Larger-scale evaluation
  • Experimental setting
  • Classification algorithm
  • k-NN
  • Centroid classifier
  • Quotes
  • Yes exact occurrence
  • No co-occurrence
  • We ran experiments on 18 datasets, 4 different
    settings on each dataset this means roughly 4 x
    18,000 term-pairs altogether
  • We measured accuracy on top 1, 3, 5, 10, 20, 40
    recommended items

16
Larger-scale evaluation
Synonymy
Meronymy
Verbs
17
Larger-scale evaluation
Hyper-/Hyponymy
Class membership (instance-of)...
18
Conclusions
  • Terms lexical category (e.g. verb vs. noun) has
    the largest impact on the accuracy
  • The dataset has much larger impact on the
    accuracy than the choice of the classifier
  • General vs. specific vocabulary (works better for
    specific vocabulary or named entities)
  • Semantics of the relation (works best for
    synonymy)
  • The centroid classifier faster and slightly
    more accurate
  • Quotes useful on datasets that contain
    technical expressions (e.g. STINET)
  • Inverting the relation has no major impact on the
    results

19
Future work
  • Try SVM
  • Cleanup the document sets
  • Active learning
  • Clustering, removing irrelevant clusters
  • Both techniques require interaction with the user
  • Visualize the term space
  • Latent Semantic Analysis (LSA), Multi-Dimensional
    Scaling (MDS)
  • Force-directed layout
  • Use WordNet to infer relations between arbitrary
    words
  • Input two words
  • Process detect the corresponding synsets and
    explore inter-relations
  • Output most probable relations (according to
    WordNet)
  • Deal with the multilinguality issue
  • Kernel Canonical Correlation Analysis (KCCA)
  • Machine translation

20
Thank you...
  • ...for your attention
  • Any questions?
Write a Comment
User Comments (0)
About PowerShow.com