Multimedia%20search:%20From%20Lab%20to%20Web

About This Presentation

Title:

Multimedia%20search:%20From%20Lab%20to%20Web

Description:

Platinum Credit Card / $7500 starting credit limit. ... Platinum Credit Card that can help build your credit. And to help. get your card to you sooner, we have ... – PowerPoint PPT presentation

Number of Views:40

Avg rating:3.0/5.0

Slides: 35

Provided by: LambertS7

Category:

more less

Transcript and Presenter's Notes

Title: Multimedia%20search:%20From%20Lab%20to%20Web

1
KI2 - 3
Bayesian Learning Application to Text
Classification Example spam filtering
Marius Bulacu prof. dr. Lambert Schomaker
Kunstmatige Intelligentie / RuG
2
Founders of Probability Theory
Pierre Fermat (1601-1665, France)
Blaise Pascal (1623-1662, France)
They laid the foundations of the probability
theory in a correspondence on a dice game.
3
Prior, Joint and Conditional Probabilities
P(A) prior probability of A P(B) prior
probability of B
P(A, B) joint probability of A and B P(A B)
conditional (posterior) probability of A
given B P(B A) conditional (posterior)
probability of B given A
4
Probability Rules
5
Statistical Independence

Two random variables A and B are independent iff
P(A, B) P(A) P(B)
P(A B) P(A)
P(B A) P(B)

knowing the value of one variable does not yield
any information about the value of the other
6
Statistical Dependence - Bayes
Thomas Bayes (1702-1761, England)
Essay towards solving a problem in the doctrine
of chances published in the Philosophical
Transactions of the Royal Society of London in
1764.
7
Bayes Theorem
gt P(A ? B) P(AB) P(B) P(BA) P(A)
P(BA) P(A)
gt P(AB)
P(B)
8
Bayes Theorem - Causality
Diagnostic P(CauseEffect) P(EffectCause)
P(Cause) / P(Effect) Pattern Recognition P(Class
Feature) P(FeatureClass) P(Class) / P(Feature)
9
Bayes Formula and Classification
Prior probability of the class before seeing
anything
Conditional Likelihood of the data given the class
Posterior probability of the class after seeing
the data
Unconditional probability of the data
10
Medical example
p(disease) 0.002 p(test disease)
0.97 p(test -disease) 0.04
p(test) p(test disease) p(disease)
p(test -disease) p(-disease) 0.97 0.002
0.04 0.97 0.00194 0.03992 0.04186
p(disease test) p(test disease)
p(disease) / p(test) 0.97 0.002 / 0.04186
0.00194 / 0.04186 0.046
p(-disease test) p(test -disease)
p(-disease) / p(test) 0.04 0.998 / 0.04186
0.03992 / 0.04186 0.953
11
MAP Classification
x

To minimize probability of misclassification,
assign new input x to the class with the Maximum
A posteriori Probability, e.g. assign to x to
class C1 if
p(C1x) gt p(C2x) ltgt p(xC1)p(C1) gt
p(xC2)p(C2)
Therefore we must impose a decision boundary
where the two posterior probability distributions
cross each other.

12
Maximum Likelihood Classification

When the prior class distributions are not known
or for equal (non-informative) priors
p(xC1)p(C1) gt p(xC2)p(C2)
becomes
p(xC1) gt p(xC2)
Therefore assign the input x to the class with
the Maximum Likelihood to have generated it.

13
Continuous Features

Two methods for dealing with continuous-valued
features
Binning divide the range of continuous values
into a discrete number of bins, then apply the
discrete methodology.
Mixture of Gaussians make an assumption
regarding the functional form of the PDF (liniar
combination of Gaussians) and derive the
corresponding parameters (means and standard
deviations).

14
Accumulation of Evidence
p(CX,Y) ? p(X,Y,C) ? p(C) p(X,YC) ? p(C)
p(XC) p(YC,X) ... ? p(C) p(XC) p(YC,X)
p(ZC,X,Y)
prior
new prior
new prior

Bayesian inference allows for integrating prior
knowledge about the world (beliefs being
expressed in terms of probabilities) with new
incoming data.
Different forms of data (possibly
incommensurable) can be fused towards the final
decision using the common currency of
probability.
As the new data arrives, the latest posterior
becomes the new prior for interpreting the new
input.

15
Example temperature classification
Classes C Cold P(xC) Normal P(xN) Warm
P(xW) Hot P(xH)
P(xC)
P(xN)
P(xW)
P(xH)
P(x)
P(x) likelihood of x values
16
Bayes probability blow up
P(Cx)
P(Nx)
P(Wx)
P(Hx)
Classes C Cold P(xC) Normal P(xN) Warm
P(xW) Hot P(xH)
17
in
P(xC)
even with an irregular PDF shape
P(Cx)
P(Cx) P(xC) P(C) / P(x) Bayesian output has
a nice plateau
out
18
Puzzle

So if Bayes is optimal and can be used for
continuous data too, why has it become popular so
late, i.e., much later than neural networks?

19
Why Bayes has become popular so late

Note the example was 1-dimensional
A PDF (histogram) with 100 bins for one
dimension will cost 10000 bins for two dimensions
etc.
? Ncells Nbinsndims

20
Why Bayes has become popular so late

? Ncells Nbinsndims
Yes but you could use n-dimensional theoretical
distributions (Gauss, Weibull etc.) instead of
empirically measured PDFs

21
Why Bayes has become popular so late

use theoretical distributions instead of
empirically measured PDFs
still the dimensionality is a problem
20 samples needed to estimate 1-dim. Gaussian
PDF
400 samples needed to estimate 2-dim. Gaussian!,
etc.
massive amounts of labeled data
are needed to estimate probabilities reliably!

22
Labeled (ground truthed) data
0.1 0.54 0.53 0.874 8.455 0.001 0.111 risk 0.2
0.59 0.01 0.974 8.40 0.002 0.315 risk 0.11 0.4
0.3 0.432 7.455 0.013 0.222 safe 0.2 0.64 0.13
0.774 8.123 0.001 0.415 risk 0.1 0.17 0.59
0.813 9.451 0.021 0.319 risk 0.8 0.43 0.55
0.874 8.852 0.011 0.227 safe 0.1 0.78 0.63
0.870 8.115 0.002 0.254 risk . . .
. . . .
.
Example client evaluation in insurances
23
Success of speech recognition

massive amounts of data
increased computing power
cheap computer memory
allowed for the use of Bayes in
hidden Markov Models for speech recognition
similarly (but slower) application of Bayes
in script recognition

Global Structure
year
title
date
date and number of entry (Rappt)
redundant lines between paragraphs
jargon-words
Notificatie
Besluit fiat
imprint with page number
? XML model

25
Local probabilistic structure P(Novb 16 is a
date sticks out to the left is left of
Rappt ) ?
26
Naive BayesConditional Independence

Naive Bayes assumes the attributes (features)
are independent
p(X,YC) p(XC) P(YC)
or
p(x1, ... xnC) ?i p(xiC)
Often works surprisingly well in practice
despite its manifest simplicity.

27
Accumulation of Evidence Independence
28
The Naive Bayes Classifier
29
Learning to Classify Text

Representation each electronic document is
represented by the set of words that it contains
under the independence assumptions
- order of words does not matter
- co-occurrences of words do not matter
i.e. each document is represented as a bag of
words
Learning estimate from the training dataset of
documents
- the prior class probability P(ci)
- the conditional likelihood of a word wj given
the document class ci P(wjci)
Classification maximum a posteriori (MAP)

30
Learning to Classify e-mail

Is this e-mail a spam? e-mail ? spam, ham
Each word represents an attribute characterizing
the e-mail.
Estimate the class priors p(spam) and p(ham) from
the training data as well as the class
conditional likelihoods for all the encountered
words.
For a new e-mail, assuming naive Bayes
conditional independence, compute the MAP
hypothesis.

31
Spam filtering
Example of regular mail
From acd_at_essex.ac.uk Mon Nov 10 192344
2003Return-Path ltalan_at_essex.ac.ukgtReceived
from serlinux15.essex.ac.uk (serlinux15.essex.ac.u
k 155.245.48.17) by tcw2.ppsw.rug.nl
(8.12.8/8.12.8) with ESMTP id hAAIecHC008727
Mon, 10 Nov 2003 194038 0100 Apologies for
multiple postings.gt 2nd C a l l f
o r P a p e r sgtgt DAS
2004gtgt Sixth IAPR International
Workshop ongt Document Analysis
Systemsgtgt September 8-10,
2004gtgt Florence, Italygtgt
http//www.dsi.unifi.it/DAS04gtgt
Notegt There are two main additions with respect
to the previous CFPgt 1) DASDL data are now
available on the workshop web sitegt 2)
Proceedings will be published by Springer Verlag
in LNCS series
32
Spam filtering
Example of spam
From Easy Qualify" ltmbulacu_at_netaccessproviders.n
etgt To bulacu_at_hotmail.com Subject Claim
your Unsecured Platinum Card - 75OO dollar limit
Date Tue, 28 Oct 2003 171207
-0400
mbulacu - Tuesday, Oct 28,
2003
Congratulations, you have been selected
for an Unsecured Platinum Credit Card / 7500
starting credit limit.This offer is valid even
if you've had past credit problems or evenno
credit history. Now you can receive a 7,500
unsecured Platinum Credit Card that can help
build your credit. And to help get your card to
you sooner, we have been authorized to waive any
employment or credit verification.
33
Conclusions

Effective about 90 correct classification
Could be applied to any text classification
problem
Needs to be polished

34
Summary

Bayesian inference allows for integrating prior
knowledge about the world (beliefs being
expressed in terms of probabilities) with new
incoming data.
Inductive bias of Naive Bayes attributes are
independent.
Although this assumption is often violated, it
provides a very efficient tool often used (e.g.
text classification spam filtering).
Applicable to discrete or continuous data.

Write a Comment

User Comments (0)

About PowerShow.com

Multimedia%20search:%20From%20Lab%20to%20Web - PowerPoint PPT Presentation

Multimedia%20search:%20From%20Lab%20to%20Web

Platinum Credit Card / $7500 starting credit limit. ... Platinum Credit Card that can help build your credit. And to help. get your card to you sooner, we have ... – PowerPoint PPT presentation