Multi-Concept Alignment and Evaluation - PowerPoint PPT Presentation

About This Presentation
Title:

Multi-Concept Alignment and Evaluation

Description:

KB scenario has dually indexed books. Brinkman and GTT concepts co-occur ... Training and evaluation set from dually-indexed books. 2/3 training, 1/3 testing ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 27
Provided by: few5
Category:

less

Transcript and Presenter's Notes

Title: Multi-Concept Alignment and Evaluation


1
Multi-Concept Alignment and Evaluation
  • Shenghui Wang, Antoine Isaac,
  • Lourens van der Meij, Stefan Schlobach
  • Ontology Matching Workshop
  • Oct. 11th, 2007

2
Introduction Multi-Concept Alignment
  • Mappings involving combinations of concepts
  • o1FruitsAndVegetables ? (o2Fruits OR
    o2Vegetables)
  • Also referred to as
  • Multiple, complex
  • Problem only a few matching tools consider it
  • Cf. Euzenat Shvaiko

3
Why is MCA a Difficult Problem?
  • Much larger search space O1 x O2 ? 2 O1x 2
    O2
  • How to measure similarity between sets of
    concepts?
  • Based on which information and strategies?
  • Fruits and vegetables vs. Fruits and
    Vegetables together
  • Formal frameworks for MCA?
  • Representation primitives
  • owlIntersectionOf? skosmAND?
  • Semantics
  • A skosbroader ( skosmAND B C) ? A broader B
    A broaderC ?

4
Agenda
  • The multi-concept alignment problem
  • The Library case and the need for MCA
  • Generating MCAs for the Library case
  • Evaluating MCAs in the Library case
  • Conclusion

5
Yet MCA is needed in real-life problems
  • KB collections (cf. OAEI slides)
  • Scenario re-annotation of GTT-indexed books by
    Brinkman concepts

6
Yet MCA is needed in real-life problems
  • Books can be indexed by several concepts
  • with post-coordination co-occurrence matters
  • G1History , G2the Netherlands in GTT
  • ? a book about Dutch history
  • Granularity of two vocabularies differ
  • ?B1Netherlands History
  • Alignment should associate combination of concepts

7
Agenda
  • The multi-concept alignment problem
  • The Library case and the need for MCA
  • Generating MCAs for the Library case
  • Evaluating MCAs in the Library case
  • Conclusion

8
MCA for Annotation Translation Approach
  • Produce similarity measures between individual
    concepts
  • Sim(A,B) X
  • Grouping concepts based on their similarity
  • G1,B1,G2,G3,B2
  • Creating conversion rules
  • G1,G2,G3 ? B1,B2
  • Extraction of deployable alignment

9
MCA Creation Similarity Measures
  • KB scenario has dually indexed books
  • Brinkman and GTT concepts co-occur
  • Instance-based alignment techniques can be used
  • Between concepts from a same vocabulary,
    similarity mirrors possible combinations!

10
MCA Creation 2 Similarity Measures
  • Jaccard overlap measure applied on concept
    extensions
  • Latent Semantic Analysis
  • Computation of similarity matrix
  • Filter noise due to insufficient data
  • Similarity between concepts between vocabularies
    and inside vocabularies

11
MCA Creation 2 Concept Aggregation Methods
  • Simple Ranking
  • For a concept, take the top k similar concepts
  • Gather GTT concepts and Brinkman ones
  • Clustering
  • Partitioning concepts into similarity-based
    clusters
  • Gather concepts
  • Global approach the most relevant combinations
    should be selected

12
Generated Rules
  • Clustering generated much less rules
  • With more concepts

13
Agenda
  • The multi-concept alignment problem
  • The Library case and the need for MCA
  • Generating MCAs for the Library case
  • Evaluating MCAs in the Library case
  • Conclusion

14
Evaluation Method data sets
  • Training and evaluation set from dually-indexed
    books
  • 2/3 training, 1/3 testing
  • Two training sets (samples)
  • Random
  • Rich books that have at least 8 annotations
    (both thesauri)

15
Evaluation Method Applying Rules
Gr1?Br1
Gt
Gr2?Br2
Gr3?Br3
  • Several configurations for firing rules
  • 1. Gt Gr
  • 2. Gt ? Gr
  • 3. Gt ? Gr
  • 4. ALL

16
Evaluation Measures
  • Precision and recall for matched books
  • Books that were given at least one good Brinkman
    annotation
  • Pb, Rb
  • Precision and recall for annotation translation
  • Averaged over books

17
Results for ALL Strategy
18
Results Rich vs. Random Training Set
  • Rich does not improve the results a lot
  • Bias towards richly annotated books
  • Jaccard performances go down
  • LSA does better
  • Statistical corrections allow simple grouping
    techniques to cope with data complexity

19
Results for Clustering
20
Results Jaccard vs. LSA
  • For 3 and ALL, LSA outperforms Jaccard
  • For 1 and 2 Jaccard outperforms LSA
  • Simple similarity is better at finding explicit
    similarities
  • Really occurring in books
  • LSA is better at finding potential similarities

21
Results using LSA
22
Results Clustering vs. Ranking
  • Clusters performs better on strategies 1 and 2
  • They match existing annotations better
  • They have better precision
  • Ranking has higher recall but lower precision
  • Classical tradeoff (ranking keeps noise)

23
Agenda
  • The multi-concept alignment problem
  • The Library case and the need for MCA
  • Generating MCAs for the Library case
  • Evaluating MCAs in the Library case
  • Conclusion

24
Conclusions
  • There is an important problem multi-concept
    alignment
  • Not extensively dealt with current litterature
  • Needed by applications
  • We have first approaches to create such
    alignments
  • And to deploy them!
  • We hope that further research will improve the
    situation (with our deployer hat on)
  • Better alignments
  • More precise frameworks (methodology research)

25
Conclusions performances
  • Evaluation shows mitigated results
  • Performances are generally very low
  • These techniques cannot be used alone
  • Notice dependence on requirements
  • Settings were manual indexer choose among several
    candidates allow for lower precision
  • Notice indexing variablity
  • OAEI have demonstrated that manual evaluation
    somehow compensates for the bias of automatic one

26
Thanks!
Write a Comment
User Comments (0)
About PowerShow.com