Christine Preisach, Steffen Rendle and Lars Schmidt-Thieme - PowerPoint PPT Presentation

About This Presentation
Title:

Christine Preisach, Steffen Rendle and Lars Schmidt-Thieme

Description:

... other documents by the authorship, citations, same conference etc. ... Probabilistic pairwaise decision model. Collective decision model. Attributes. Relations ... – PowerPoint PPT presentation

Number of Views:134
Avg rating:3.0/5.0
Slides: 19
Provided by: christinep151
Category:

less

Transcript and Presenter's Notes

Title: Christine Preisach, Steffen Rendle and Lars Schmidt-Thieme


1
Relational Classification Using Automatically
Extracted Relations by Record Linkage
  • Christine Preisach, Steffen Rendle and Lars
    Schmidt-Thieme
  • Information Systems and Machine Learning Lab
    (ISMLL)
  • University of Hildesheim
  • Germany

2
Outline
  • Motivation
  • Relation Extraction and Multi-Relational
    Classification Framework
  • Relation Extraction
  • Multi-Relational Classification
  • Evaluation
  • Conclusion

3
Motivation
  • Example

Publication Title Author Conference Category
1 Classification of scientific publications John Smith ICDM Data Mining
2 Classification of Hypertext John Smith KDD Data Mining
3 Hierarchical Clustering Dan Miller ICDM Data Mining
4
Motivation
  • Traditional classifiers takes only local
    attributes like keywords, title and abstract into
    account
  • Assumption Instances are independent
  • But Assumption does not hold
  • Instances can be related to other documents by
    the authorship, citations, same conference etc.
  • These relations should be exploited and combined
    in order to improve classification accuracy.
  • But Manuel extraction of relations by experts is
    expensive
  • Automatic extraction of relations from noisy
    attributes.

5
Relation Extraction and Relational Classification
Framework
  • Relation Extraction Component
  • Extraction of relations from objects
    with noisy attributes
  • Multi-Relational Classification Component
  • Use extracted relations instead or
    additionally to local attributes for
    classification

6
Relation Extraction
  • Pairwise feature extraction
  • from noisy attributes with
    several similarity measures (e.g.
    TFIDF, cosine similarity, Levenshtein)
  • Probabilistic pairwise decision model
  • Use extracted similarities as features for a
    probabilistic classifierand build a model on
    the training data
  • And apply it on unknown pairs
  • Collective decision model
  • If is an equivalence relation then use
    constrained clustering (e.g. HAC) using the pair
    wise decision model as a learned similarity
    measure to transform into a binary relation

7
Relation Extraction
Collective Decision Model
Initialisation
Must Links
Cannot Links
8
Multi-Relational Classification
  • Relational classification problem
  • Make use of additional information of related
    objects (i.e. their classes or attributes)
  • Propositionalize the relational data e.g. with
  • where
  • is the neighborhood of

9
Multi-Relational Classification
  • Algorithm
  • 1. for each relation R1 to m
  • (a) Build a undirected weighted graph
    with
  • (b) Perform relational classification
    simultaneously for all instances in the test set
  • (c) Output a probability distribution
  • 2. Apply ensemble classification to the
    resulting probability distributions of these
    relations
  • 3. Output final classification

10
Multi-Relational Classification
  • Simple Relational Methods
  • Probabilistic Relational Neighbor Classifier
    (EPRN) Macskassy and Provost 2003
  • Where is a normalization factor, is the
    weight and is the iteration
  • EPRN2HOP
  • Takes additionally the neighbors of the direct
    neighbors into account if the direct neighborhood
    size is small

11
Multi-Relational Classification
  • Aggregation-based Relational Learning Methods
  • Use aggregation functions in order to
    propositionalize the set-valued attribute
  • Use aggregated values as attributes for
    traditional machine learning methods
  • We used Logistic Regression as classifier

12
Ensemble Classification
  • Methods which combine different models
  • Increases classification accuracy
  • Usage
  • Combine results achieved by relational
    classification for different relations
  • Combine results of relational and local models
  • Voting
  • Stacking
  • Use Meta-classifier to learn a model on the
    results of different models
  • Build new instances
  • Apply cross validation

13
Evaluation
  • Data
  • CompuScience data set
  • 147 571 scientific papers
  • 77 topics (categories)
  • Relations authors, reviewer, journals
  • Cora deduplication data set
  • 1 295 citations
  • 112 unique publications
  • RelationsamePaper
  • Cora data set
  • 3298 papers
  • 12 categories
  • Relations conferences, authors, citations

14
Evaluation Relation Extraction
F1 measure for finding the SamePaper relation on
Cora
Evaluation set single linkage complete linkage average linkage
Xtst 0.90 0.74 0.92
X 0.92 0.71 0.93
Pairwise feature extraction with TFIDF,
Levenshtein, Jaccard, Cosine on all attributes
15
Evaluation Multi-Relational Classification
3-fold cross validation on CompuScience for
Author, Reviewer and Journal relation
  • The ensemble of relational and content-based text
    classification achieved a significantly higher
    F-measure then the pure text classifier

16
Evaluation
  • Multi-Relational Classification using
    automatically extracted relations
  • 50/50 splits, 10 runs

17
Conclusion and Future Work
  • Summary
  • Presented framework for relation extraction and
    multi-relational classification
  • Automatic relation extraction with record linkage
  • Relational classification using each extracted
    relation for classification and fusing the
    results with ensemble methods
  • Future Work
  • Evaluate our framework on different data sets and
    relations
  • Evaluate the relational classifiers quality
    depending on the quality of the extracted
    relations

18
Thank you
  • Questions ?
  • www.ismll.uni-hildesheim.de
  • Christine Preisach
  • preisach_at_ismll.uni-hildesheim.de
  • Steffen Rendle
  • srendle_at_ismll.uni-hildesheim.de
  • Lars Schmidt-Thieme
  • schmidt-thieme_at_ismll.uni-hildesheim.de
Write a Comment
User Comments (0)
About PowerShow.com