Title: Building Classifiers in Environments with Multimodal Inputs
1Building Classifiers in Environments with
Multimodal Inputs
- Lee Wee Sun
- Singapore-MIT Alliance
- Department of Computer Science
- National University of Singapore
2Classifiers Multimodal Inputs
A classifier takes an input - e.g. an imageand
outputs a class - e.g. person k in database
In certain environments,we have more than one
typeof input corresponding tothe same object -
e.g. image and voice of the same person
3Classifiers Multimodal Inputs
Wong Weng Fai
Albert Einstein
Multimodal inputs can help improve
classification accuracy
This talk Multimodal inputs can help us build
(learn) classifierswithout the help of a teacher.
4Outline
- Co-training
- Sufficient conditions
- Conditional Independence
- Learning from noise
- Potential applications in distance interaction
and education
5Supervised Learning
- Standard method for training classifiers.
- A teacher labels a set of examples.
- Machine learning algorithm learns from labeled
examples. - Aims to do well in future unseen examples
Lee
Lee
Goh
Goh
Lee
Goh
?
6Co-training
- Blum Mitchell 1998
- Motivating application Classifying web pages
- Web pages have two descriptions
- Content of the page itself and content of the
links (anchor text) to the page - Algorithm
- Build two classifiers, one on page content, one
on link contents - Iteratively make the two classifiers agree with
each other - Slight advantage in one classifier creates a
virtuous cycle - Iterations eventually results in two good
classifiers
7Conditional Independence
- Sufficient conditions
- Possible to classify correctly with either input
modes (at least one of the modes) - The two input modes are conditionally independent
given the class.
Identity
P(ImageIdentity,Voice)P(ImageIdentity) P(Voice
Identity,Image)P(VoiceIdentity)
8Conditional Independence
- Each observation is an image-voice pair,
corresponding to a link in the bipartite graph - If conditional independence exists, an image
instance is paired with any voice instance
according to the distribution of voice instances,
regardless of what the image is.
Image
Voice
Same person
9Conditional Independence
Image
Voice
- What if conditional independence does not hold?
Same person
10Learning with Noise
- Sufficient to be able to classify correctly using
one of the input modes. - The other input mode gives noisy label.
- If conditional independence condition holds,
noise is independent classification noise. - Can be learned using any method that can tolerate
independent classification noise.
11Applications
- Learning to classify web pages from content of
the page and anchor text of links pointing to the
page (Blum Mitchell 1998) - Learning to classify named entity from spelling
and appositive (Collins Singer 1999) - e.g. Goh Chok Tong, Prime Minister of Singapore
- ...
- Learning to classify web images from image
features and text on a web page - Word Sense Disambiguation from discourse topic
features and collocation features
12Potential Applications
- Distance interaction
- Automatically learn the identity of the students
in the class - Usually have both video and voice of the same
person at the same time during interaction.
Identity
13Potential Applications
- Automatic transcribing of lectures
- Usually write on the board and speak at the same
time - Written word spoken around the same time and vice
versa - Adapt to the persons speech as well as
handwriting
14Potential Applications
- Automatic grading of certain types of exercises
- For example True or False and Justify
- True or False one mode, easy to automatically
grade - Justify Natural language answer, other mode,
hard to automatically grade - True or False used as noisy labels for learning
to grade Justify part.
15Conclusions
- Multimodal inputs common in distance interaction
and education - Exploiting multimodal inputs can improve
performance of classifier - Multimodal inputs often allow learning without
human teacher - Reduces effort in building classifiers
- Useful for building tools for distance
interaction and education - These tools are actually quite general e.g. face
recognition, speaker recognition, speech
recognition, handwriting recognition, natural
language processing. - Distance interaction is a useful testbed for
building tools that adapt to humans