Clustering Concept Hierarchies from Text - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Clustering Concept Hierarchies from Text

Description:

Reuters news from 1987. GETESS finance ontology (1178 concepts) Results ... F=56.12% (Reuters,HC) Human Performance [M dche 02] 4 subjects ( 1 ontology engineerer) ... – PowerPoint PPT presentation

Number of Views:120
Avg rating:3.0/5.0
Slides: 19
Provided by: philipp6
Category:

less

Transcript and Presenter's Notes

Title: Clustering Concept Hierarchies from Text


1
Clustering Concept Hierarchies from Text
Philipp Cimiano, Andreas Hotho, Steffen
Staab Institute AIFB University of Karlsruhe
LREC04
2
Outline
  • Motivation Task
  • Formal Concept Analysis
  • Attribute Extraction
  • Evaluation
  • Conclusion Further Work

3
Motivation
  • Applications of concept hierarchies
  • Information Retrieval
  • ML-based IE systems
  • - generalize semantically over words
  • Word Sense Disambiguation (WordNet)
  • Syntax-Semantic Interface
  • - more concise rules

4
Knowledge Acquisition Bottleneck
  • Ontology Development
  • difficult to agree
  • time-consuming task
  • Possible solution acquisition from text
  • adaptivity (domain)
  • speed (vs. hand-coding)
  • robust (social aspects)

5
Task
  • Given a set of terms (concepts) C relevant for a
    certain domain, can we derive a concept hierarchy
    ltC i.e. a partial order between these concepts
    from text?
  • Clustering approach
  • Formal Concept Analysis
  • (set theoretical/conceptual clustering)
  • Hiearchical Agglomerative Clustering (bottom-up)
  • Bi-Section-Kmeans (top-down)

6
Formal Context
book rent drive ride join
appartment X X
car X X X
motor-bike X X X X
excursion X X
trip X X
7
FCA Lattice
8
Partial Order
bookable
rentable
joinable
driveable
apartment
trip
excursion
car
rideable
bike
9
Extracting Attributes
  • extract syntactic dependencies from text
  • verb/object, verb/subject, verb/PP relations
  • car drive_obj, crash_subj, sit_in,
  • LoPar, a trainable statistical left-corner
    parser
  • training
  • parsing
  • tgrep
  • syntactic relations

10
Weighting
  • Parsers output
  • erroneous
  • not all verb/object pairs significant
  • Weighting
  • Resnik
  • threshold t

11
Evaluation(Mädche et al. 02)
  • Compare the computed taxonomy against a
  • hand-crafted standard

12
Evaluation (Part II)
13
Domains Corpora
  • Tourism (118 Mio. tokens)
  • http//www.all-in-all.de/english
  • http//www.lonelyplanet.com
  • British National Corpus (BNC)
  • handcrafted tourism ontology (289 concepts)
  • Finance (185 Mio. tokens)
  • Reuters news from 1987
  • GETESS finance ontology (1178 concepts)

14
Results
  • For all domains and algorithms
  • SubjObjPP gt other combinations
  • PP gt Obj gt Subj
  • Best result F56.98 (Tourism,FCA)
  • F56.12 (Reuters,HC)
  • Human Performance Mädche 02
  • 4 subjects (1 ontology engineerer)
  • 12 tourism ontologies (ca. 300 concepts)
  • average TO of 56.35 (47-87)
  • assumption of an LR of 100
  • gt Human F-Measure 72.08

15
Comparison of Approaches(tourism domain)
16
Other Weighting Measures(tourism domain, FCA)
17
Conclusion
  • novel approach to learning concept hierarchies by
    FCA
  • reasonable results compared to top-down and
    bottom-up approaches
  • using SubjObjPP syntactic dependendcies
    outperforms other combinations
  • use of weighting measure has some impact
  • proposed a systematic evaluation method

18
Further Work
  • General Features
  • Levin classes
  • Top Levels
  • Polysemy
  • first experiments
  • evaluation
Write a Comment
User Comments (0)
About PowerShow.com