Semantic%20Video%20Classification%20Based%20on%20Subtitles%20and%20Domain%20Terminologies - PowerPoint PPT Presentation

About This Presentation
Title:

Semantic%20Video%20Classification%20Based%20on%20Subtitles%20and%20Domain%20Terminologies

Description:

animals, biology, entomology. animals. WordNet domains. Category ... animals, entomology, biology. WN domains of a video. animals. biology, mathematics, physics ... – PowerPoint PPT presentation

Number of Views:74
Avg rating:3.0/5.0
Slides: 16
Provided by: Pol964
Category:

less

Transcript and Presenter's Notes

Title: Semantic%20Video%20Classification%20Based%20on%20Subtitles%20and%20Domain%20Terminologies


1
Semantic Video Classification Based on Subtitles
and Domain Terminologies
  • Polyxeni Katsiouli, Vassileios Tsetsos, Stathes
    Hadjiefthymiades
  • Pervasive Computing Research Group
  • Communication Networks Laboratory
  • Department of Informatics and Telecommunications
  • University of Athens Greece
  • KAMC 07 _at_ Genoa, Italy

polina_at_di.uoa.gr
b.tsetsos_at_di.uoa.gr
shadj_at_di.uoa.gr
polina_at_di.uoa.gr
b.tsetsos_at_di.uoa.gr
shadj_at_di.uoa.gr
polina_at_di.uoa.gr
b.tsetsos_at_di.uoa.gr
2
Outline
  • The Polysema Platform
  • Introduction - Motivation
  • Video Categorization Method
  • Experimental Evaluation
  • Conclusions - Future Work

3
Polysema platform
  • Develops an end-to-end platform for iTV services
  • Semantics-related research focuses on the
    development of
  • semantics extraction techniques for automatic
    annotation of audiovisual content,
  • a personalization framework for iTV services with
    SW technologies,
  • a tool with GUI for video annotation and MPEG-7
    metadata creation

http//polysema.di.uoa.gr
4
Introduction - Motivation
  • Multimedia databases are becoming popular
  • Most video classification methods are based on
    visual/audio signal processing
  • Text processing is more lightweight than
    visual/audio processing
  • High-level semantics are more closely related to
    human language than to visual features
  • Subtitles capture the semantics of the
    corresponding video

5
Step 1 Text Preprocessing
  • Subtitles are segmented into sentences
  • A Part of Speech Tagger is applied to each
    sentence
  • Stop words (e.g., to, him) are removed based
    on a stop words list

6
Step 2 Keyword extraction
  • We used the TextRank algorithm to extract
    keywords
  • TextRank
  • represents the text as a graph,
  • applies to the vertices a ranking algorithm based
    on Googles PageRank,
  • sorts vertices in decreasing rank order,
  • extracts the top highly ranked vertices for
    further processing

TextRank
Mihalcea, R., Tarau, P. TextRank Bringing Order
into Texts, in Proceedings of the Conference on
Empirical Methods in Natural Language Processing
(EMNLP 2004), Barcelona, Spain, July 2004
7
Step 3 Word Sense Disambiguation
  • Words have many possible meanings, called senses
  • A Word Sense Disambiguation (WSD) algorithm is
    applied to determine the correct sense of each
    word
  • WSD
  • is based on the lexical database WordNet,
  • is a variation of Lesks WSD algorithm

WSD
Banerjee, S., Pedersen, T. An Adapted Lesk
Algorithm for Word Sense Disambiguation Using
WordNet. In the Proceedings of the 3rd
International Conference on Intelligent Text
Processing and Computational Linguistics
(CICLING-02) Mexico City, Mexico (2002)
8
Step 4 WordNet Domains Extraction (1/2)
WordNet domains
  • augment WordNet with domain labels
  • a taxonomy of 200 domain labels
  • synsets have been annotated with at least one
    domain label

WN domains
http//wndomains.itc.it/wordnetdomains.html
9
Step 4 WordNet Domains Extraction (2/2)
  • For each video
  • Extract the WordNet domains for each keywords
    sense
  • Calculate the frequency occurrence of each domain
    label
  • Sort domain labels in decreasing order according
    to their occurrence frequency

10
Step 5 Correspondences between categories WN
domains
  • For each category label
  • Look up in WordNet the senses related to it
    (include senses related through hypernym
    hyponym relations)
  • Obtain the corresponding WordNet domains
  • Calculate the occurrence score for each domain
  • Sort domains in decreasing occurrence order

Category WordNet domains
animals animals, biology, entomology
war military, history
science medicine, biology, mathematics
Example
11
Step 6 Category label assignment
  • Compare the ordered list with the WN domains of
    each video with the ordered list of the WN
    domains of each category

Example
WN domains of a video
animals
science
animals, entomology, biology
biology, mathematics, physics
Category WordNet domains
animals animals, biology, entomology
war military, history
science medicine, biology, mathematics
12
Experimental Evaluation (1/2)
  • 36 subtitle files of documentaries
  • 36 subtitle files of documentaries

Statistical information of files (average values)
duration (minsec) of words of non stop words of keywords of WN domains
4148 4442 2000 350 53
  • Classify under the categories geography,
    animals, history, war, technology, science,
    accidents, music, transportation, people,
    religious, politics, arts
  • Classify under the categories geography,
    animals, history, war, technology, science,
    accidents, music, transportation, people,
    religious, politics, arts

13
Experimental Evaluation (2/2)
  • Classifiers
  • Proposed method
  • Proposed method in which Step 6 has been
    replaced with Spearmans footrule distance
  • J4.8
  • decision tree classifier
  • supervised approach

14
Conclusions Future Work
  • Conclusions
  • A novel approach that is based only on text and
    uses natural language processing techniques
  • No training phase is required (unsupervised
    approach)
  • Future Work
  • The application of a method on a per video
    segment basis
  • Definition of domain knowledge more close to
    movie classification
  • Performance comparison with other unsupervised
    approaches

15
Thank you!
Questions???
http//p-comp.di.uoa.gr
Write a Comment
User Comments (0)
About PowerShow.com