Title: Multi-Concept Alignment and Evaluation
1Multi-Concept Alignment and Evaluation
- Shenghui Wang, Antoine Isaac,
- Lourens van der Meij, Stefan Schlobach
- Ontology Matching Workshop
- Oct. 11th, 2007
2Introduction Multi-Concept Alignment
- Mappings involving combinations of concepts
- o1FruitsAndVegetables ? (o2Fruits OR
o2Vegetables) - Also referred to as
- Multiple, complex
- Problem only a few matching tools consider it
- Cf. Euzenat Shvaiko
3Why is MCA a Difficult Problem?
- Much larger search space O1 x O2 ? 2 O1x 2
O2 - How to measure similarity between sets of
concepts? - Based on which information and strategies?
- Fruits and vegetables vs. Fruits and
Vegetables together - Formal frameworks for MCA?
- Representation primitives
- owlIntersectionOf? skosmAND?
- Semantics
- A skosbroader ( skosmAND B C) ? A broader B
A broaderC ?
4Agenda
- The multi-concept alignment problem
- The Library case and the need for MCA
- Generating MCAs for the Library case
- Evaluating MCAs in the Library case
- Conclusion
5Yet MCA is needed in real-life problems
- KB collections (cf. OAEI slides)
- Scenario re-annotation of GTT-indexed books by
Brinkman concepts
6Yet MCA is needed in real-life problems
- Books can be indexed by several concepts
- with post-coordination co-occurrence matters
- G1History , G2the Netherlands in GTT
- ? a book about Dutch history
- Granularity of two vocabularies differ
- ?B1Netherlands History
- Alignment should associate combination of concepts
7Agenda
- The multi-concept alignment problem
- The Library case and the need for MCA
- Generating MCAs for the Library case
- Evaluating MCAs in the Library case
- Conclusion
8MCA for Annotation Translation Approach
- Produce similarity measures between individual
concepts - Sim(A,B) X
- Grouping concepts based on their similarity
- G1,B1,G2,G3,B2
- Creating conversion rules
- G1,G2,G3 ? B1,B2
- Extraction of deployable alignment
9MCA Creation Similarity Measures
- KB scenario has dually indexed books
- Brinkman and GTT concepts co-occur
- Instance-based alignment techniques can be used
- Between concepts from a same vocabulary,
similarity mirrors possible combinations!
10MCA Creation 2 Similarity Measures
- Jaccard overlap measure applied on concept
extensions - Latent Semantic Analysis
- Computation of similarity matrix
- Filter noise due to insufficient data
- Similarity between concepts between vocabularies
and inside vocabularies
11MCA Creation 2 Concept Aggregation Methods
- Simple Ranking
- For a concept, take the top k similar concepts
- Gather GTT concepts and Brinkman ones
- Clustering
- Partitioning concepts into similarity-based
clusters - Gather concepts
- Global approach the most relevant combinations
should be selected
12Generated Rules
- Clustering generated much less rules
- With more concepts
13Agenda
- The multi-concept alignment problem
- The Library case and the need for MCA
- Generating MCAs for the Library case
- Evaluating MCAs in the Library case
- Conclusion
14Evaluation Method data sets
- Training and evaluation set from dually-indexed
books - 2/3 training, 1/3 testing
- Two training sets (samples)
- Random
- Rich books that have at least 8 annotations
(both thesauri)
15Evaluation Method Applying Rules
Gr1?Br1
Gt
Gr2?Br2
Gr3?Br3
- Several configurations for firing rules
- 1. Gt Gr
- 2. Gt ? Gr
- 3. Gt ? Gr
- 4. ALL
16Evaluation Measures
- Precision and recall for matched books
- Books that were given at least one good Brinkman
annotation - Pb, Rb
- Precision and recall for annotation translation
- Averaged over books
17Results for ALL Strategy
18Results Rich vs. Random Training Set
- Rich does not improve the results a lot
- Bias towards richly annotated books
- Jaccard performances go down
- LSA does better
- Statistical corrections allow simple grouping
techniques to cope with data complexity
19Results for Clustering
20Results Jaccard vs. LSA
- For 3 and ALL, LSA outperforms Jaccard
- For 1 and 2 Jaccard outperforms LSA
- Simple similarity is better at finding explicit
similarities - Really occurring in books
- LSA is better at finding potential similarities
21Results using LSA
22Results Clustering vs. Ranking
- Clusters performs better on strategies 1 and 2
- They match existing annotations better
- They have better precision
- Ranking has higher recall but lower precision
- Classical tradeoff (ranking keeps noise)
23Agenda
- The multi-concept alignment problem
- The Library case and the need for MCA
- Generating MCAs for the Library case
- Evaluating MCAs in the Library case
- Conclusion
24Conclusions
- There is an important problem multi-concept
alignment - Not extensively dealt with current litterature
- Needed by applications
- We have first approaches to create such
alignments - And to deploy them!
- We hope that further research will improve the
situation (with our deployer hat on) - Better alignments
- More precise frameworks (methodology research)
25Conclusions performances
- Evaluation shows mitigated results
- Performances are generally very low
- These techniques cannot be used alone
- Notice dependence on requirements
- Settings were manual indexer choose among several
candidates allow for lower precision - Notice indexing variablity
- OAEI have demonstrated that manual evaluation
somehow compensates for the bias of automatic one
26Thanks!