Building Classifiers in Environments with Multimodal Inputs - PowerPoint PPT Presentation

1 / 15

About This Presentation

Title:

Building Classifiers in Environments with Multimodal Inputs

Description:

... effort in building classifiers. Useful for building tools for distance ... Distance interaction is a useful testbed for building tools that adapt to humans ... – PowerPoint PPT presentation

Number of Views:18

Avg rating:3.0/5.0

Slides: 16

Provided by: weesu

Category:

more less

Transcript and Presenter's Notes

Title: Building Classifiers in Environments with Multimodal Inputs

1
Building Classifiers in Environments with
Multimodal Inputs

Lee Wee Sun
Singapore-MIT Alliance
Department of Computer Science
National University of Singapore

2
Classifiers Multimodal Inputs
A classifier takes an input - e.g. an imageand
outputs a class - e.g. person k in database
In certain environments,we have more than one
typeof input corresponding tothe same object -
e.g. image and voice of the same person
3
Classifiers Multimodal Inputs
Wong Weng Fai
Albert Einstein
Multimodal inputs can help improve
classification accuracy
This talk Multimodal inputs can help us build
(learn) classifierswithout the help of a teacher.
4
Outline

Co-training
Sufficient conditions
Conditional Independence
Learning from noise
Potential applications in distance interaction
and education

5
Supervised Learning

Standard method for training classifiers.
A teacher labels a set of examples.
Machine learning algorithm learns from labeled
examples.
Aims to do well in future unseen examples

Lee
Lee
Goh
Goh
Lee
Goh
?
6
Co-training

Blum Mitchell 1998
Motivating application Classifying web pages
Web pages have two descriptions
Content of the page itself and content of the
links (anchor text) to the page
Algorithm
Build two classifiers, one on page content, one
on link contents
Iteratively make the two classifiers agree with
each other
Slight advantage in one classifier creates a
virtuous cycle
Iterations eventually results in two good
classifiers

7
Conditional Independence

Sufficient conditions
Possible to classify correctly with either input
modes (at least one of the modes)
The two input modes are conditionally independent
given the class.

Identity
P(ImageIdentity,Voice)P(ImageIdentity) P(Voice
Identity,Image)P(VoiceIdentity)
8
Conditional Independence

Each observation is an image-voice pair,
corresponding to a link in the bipartite graph
If conditional independence exists, an image
instance is paired with any voice instance
according to the distribution of voice instances,
regardless of what the image is.

Image
Voice
Same person
9
Conditional Independence
Image
Voice

What if conditional independence does not hold?

Same person
10
Learning with Noise

Sufficient to be able to classify correctly using
one of the input modes.
The other input mode gives noisy label.
If conditional independence condition holds,
noise is independent classification noise.
Can be learned using any method that can tolerate
independent classification noise.

11
Applications

Learning to classify web pages from content of
the page and anchor text of links pointing to the
page (Blum Mitchell 1998)
Learning to classify named entity from spelling
and appositive (Collins Singer 1999)
e.g. Goh Chok Tong, Prime Minister of Singapore
...
Learning to classify web images from image
features and text on a web page
Word Sense Disambiguation from discourse topic
features and collocation features

12
Potential Applications

Distance interaction
Automatically learn the identity of the students
in the class
Usually have both video and voice of the same
person at the same time during interaction.

Identity
13
Potential Applications

Automatic transcribing of lectures
Usually write on the board and speak at the same
time
Written word spoken around the same time and vice
versa
Adapt to the persons speech as well as
handwriting

14
Potential Applications

Automatic grading of certain types of exercises
For example True or False and Justify
True or False one mode, easy to automatically
grade
Justify Natural language answer, other mode,
hard to automatically grade
True or False used as noisy labels for learning
to grade Justify part.

15
Conclusions

Multimodal inputs common in distance interaction
and education
Exploiting multimodal inputs can improve
performance of classifier
Multimodal inputs often allow learning without
human teacher
Reduces effort in building classifiers
Useful for building tools for distance
interaction and education
These tools are actually quite general e.g. face
recognition, speaker recognition, speech
recognition, handwriting recognition, natural
language processing.
Distance interaction is a useful testbed for
building tools that adapt to humans