ThesaurusBased Automatic Keyphrase Indexing - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

ThesaurusBased Automatic Keyphrase Indexing

Description:

Keyphrase Extraction. Select significant n-grams or NPs. according to their ... extract candidates. bird predat. aquacult. fisheri. pseudo-phrase matching ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 15
Provided by: a15205
Category:

less

Transcript and Presenter's Notes

Title: ThesaurusBased Automatic Keyphrase Indexing


1
Thesaurus-BasedAutomatic Keyphrase Indexing
Olena Medelyan and Ian H. Witten Digital Library
LabDepartment of Computer ScienceThe University
of Waikato, New Zealand
  • Agenda
  • Indexing Task
  • Keyphrases and vocabularies
  • Existing approaches
  • KEA
  • How does it work
  • Computing features
  • Evaluation
  • Evaluation
  • Standard evaluation
  • Indexing consistency
  • Examples

KeyphraseExtractionAlgorithm
2
Keyphrases
3
Keyphrases
4
Keyphrases
5
Controlled Vocabulary
  • FAOs domain-specific thesaurus Agrovoc
  • 17,000 descriptors, i.e. allowed index terms
  • 11,000 non-descriptors, that are linked to
    descriptors, e.g. Obesity ? Overweight


United Nations Food and Agriculture Organization
6
Manual Indexing
  • Professional indexer
  • reads the document
  • determines the topics of the document
  • assigns keyphrases from controlled vocabularies
  • Time-consuming, expensive
  • Assigning metadata for 1 catalogue entry 2h
  • Low indexing consistency
  • In digital libraries practically impossible!

7
Automatic Indexing
  • Existing approaches
  • KEA KEA Controlled Vocabulary

Keyphrase Extraction
Select significant n-grams or NPs according to
their characteristics
  • Easy and fast implementation
  • Not much training required
  • Restriction to syntax
  • Low quality phrases
  • No consistency

8
Agenda
  • Indexing Task
  • Keyphrases and vocabularies
  • Existing approaches
  • KEA
  • How does it work
  • Computing features
  • Evaluation
  • Evaluation
  • Standard evaluation
  • Indexing consistency

9
How KEA Works
CV
DOCs
extract candidates
pseudo-phrase matchingpredatory birds ? bird
predat
bird predat aquacultfisheri...
compute features
manualKEYs
no
yes
training?
compute probabilities
compute model
Naïve Bayes
MODEL
automaticKEYs
10
KEAs Features
  • TFIDF specific for a given document
  • First Occurrence in the beginning/end
  • Phrase Length ? 2 words
  • Node Degree related to other phrases in the
    doc

11
Agenda
  • Indexing Task
  • Keyphrases and vocabularies
  • Existing approaches
  • KEA
  • How does it work
  • Computing features
  • Evaluation
  • Evaluation
  • Standard evaluation
  • Indexing consistency
  • Examples

12
Evaluation
  • 10-fold cross validation on a 200 document set
    ()
  • Concept matching (Agrovoc links are taken into
    account) predatory birds ? noxious
    birds
  • Indexing Consistency

13
Example
The Growing Global Obesity Problem Some Policy
Options to Address It
? 2 Indexers KEA Exact overweight overwe
ight food consumption food consumption taxe
s taxes developed countries developed
countries Similar prices price
fixing price policies controlled
prices diets body weight fiscal
policies nutrition policies No
match feeding habits saturated fat food
intake nutritional requirements
14
The Global Obesity Problem
Agrovoc terms
energy value
public health
Indexers
1 2 3 4 5 6
nutritionaldisorders
regulations
weight reduction
nutrient excesses
developing countries
disease control
KEA
nutritional requirements
diet
dietary guidelines
nutritionstatus
nutrition programs
developed countries
body weight
feeding habits
meal patterns
nutrition surveillance
overweight
food policies
price fixing
nutritional physiology
price formation
controlled prices
saturated fat
foodintake
overeating
human nutrition
nutrition policies
price policies
foods
food consumption
fiscal policies
policies
prices
direct taxation
urbanization
globalization
taxes
Write a Comment
User Comments (0)
About PowerShow.com