Multimedia%20search:%20From%20Lab%20to%20Web - PowerPoint PPT Presentation

About This Presentation
Title:

Multimedia%20search:%20From%20Lab%20to%20Web

Description:

Platinum Credit Card / $7500 starting credit limit. ... Platinum Credit Card that can help build your credit. And to help. get your card to you sooner, we have ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 35
Provided by: LambertS7
Category:

less

Transcript and Presenter's Notes

Title: Multimedia%20search:%20From%20Lab%20to%20Web


1
KI2 - 3
Bayesian Learning Application to Text
Classification Example spam filtering
Marius Bulacu prof. dr. Lambert Schomaker
Kunstmatige Intelligentie / RuG
2
Founders of Probability Theory
Pierre Fermat (1601-1665, France)
Blaise Pascal (1623-1662, France)
They laid the foundations of the probability
theory in a correspondence on a dice game.
3
Prior, Joint and Conditional Probabilities
P(A) prior probability of A P(B) prior
probability of B
P(A, B) joint probability of A and B P(A B)
conditional (posterior) probability of A
given B P(B A) conditional (posterior)
probability of B given A
4
Probability Rules
5
Statistical Independence
  • Two random variables A and B are independent iff
  • P(A, B) P(A) P(B)
  • P(A B) P(A)
  • P(B A) P(B)

knowing the value of one variable does not yield
any information about the value of the other
6
Statistical Dependence - Bayes
Thomas Bayes (1702-1761, England)
Essay towards solving a problem in the doctrine
of chances published in the Philosophical
Transactions of the Royal Society of London in
1764.
7
Bayes Theorem
gt P(A ? B) P(AB) P(B) P(BA) P(A)
P(BA) P(A)
gt P(AB)
P(B)
8
Bayes Theorem - Causality
Diagnostic P(CauseEffect) P(EffectCause)
P(Cause) / P(Effect) Pattern Recognition P(Class
Feature) P(FeatureClass) P(Class) / P(Feature)
9
Bayes Formula and Classification
Prior probability of the class before seeing
anything
Conditional Likelihood of the data given the class
Posterior probability of the class after seeing
the data
Unconditional probability of the data
10
Medical example
p(disease) 0.002 p(test disease)
0.97 p(test -disease) 0.04
p(test) p(test disease) p(disease)
p(test -disease) p(-disease) 0.97 0.002
0.04 0.97 0.00194 0.03992 0.04186
p(disease test) p(test disease)
p(disease) / p(test) 0.97 0.002 / 0.04186
0.00194 / 0.04186 0.046
p(-disease test) p(test -disease)
p(-disease) / p(test) 0.04 0.998 / 0.04186
0.03992 / 0.04186 0.953
11
MAP Classification
x
  • To minimize probability of misclassification,
    assign new input x to the class with the Maximum
    A posteriori Probability, e.g. assign to x to
    class C1 if
  • p(C1x) gt p(C2x) ltgt p(xC1)p(C1) gt
    p(xC2)p(C2)
  • Therefore we must impose a decision boundary
    where the two posterior probability distributions
    cross each other.

12
Maximum Likelihood Classification
  • When the prior class distributions are not known
    or for equal (non-informative) priors
  • p(xC1)p(C1) gt p(xC2)p(C2)
  • becomes
  • p(xC1) gt p(xC2)
  • Therefore assign the input x to the class with
    the Maximum Likelihood to have generated it.

13
Continuous Features
  • Two methods for dealing with continuous-valued
    features
  • Binning divide the range of continuous values
    into a discrete number of bins, then apply the
    discrete methodology.
  • Mixture of Gaussians make an assumption
    regarding the functional form of the PDF (liniar
    combination of Gaussians) and derive the
    corresponding parameters (means and standard
    deviations).

14
Accumulation of Evidence
p(CX,Y) ? p(X,Y,C) ? p(C) p(X,YC) ? p(C)
p(XC) p(YC,X) ... ? p(C) p(XC) p(YC,X)
p(ZC,X,Y)
prior
new prior
new prior
  • Bayesian inference allows for integrating prior
    knowledge about the world (beliefs being
    expressed in terms of probabilities) with new
    incoming data.
  • Different forms of data (possibly
    incommensurable) can be fused towards the final
    decision using the common currency of
    probability.
  • As the new data arrives, the latest posterior
    becomes the new prior for interpreting the new
    input.

15
Example temperature classification
Classes C Cold P(xC) Normal P(xN) Warm
P(xW) Hot P(xH)
P(xC)
P(xN)
P(xW)
P(xH)
P(x)
P(x) likelihood of x values
16
Bayes probability blow up
P(Cx)
P(Nx)
P(Wx)
P(Hx)
Classes C Cold P(xC) Normal P(xN) Warm
P(xW) Hot P(xH)
17
in
P(xC)
even with an irregular PDF shape
P(Cx)
P(Cx) P(xC) P(C) / P(x) Bayesian output has
a nice plateau
out
18
Puzzle
  • So if Bayes is optimal and can be used for
    continuous data too, why has it become popular so
    late, i.e., much later than neural networks?

19
Why Bayes has become popular so late
  • Note the example was 1-dimensional
  • A PDF (histogram) with 100 bins for one
    dimension will cost 10000 bins for two dimensions
    etc.
  • ? Ncells Nbinsndims

20
Why Bayes has become popular so late
  • ? Ncells Nbinsndims
  • Yes but you could use n-dimensional theoretical
    distributions (Gauss, Weibull etc.) instead of
    empirically measured PDFs

21
Why Bayes has become popular so late
  • use theoretical distributions instead of
    empirically measured PDFs
  • still the dimensionality is a problem
  • 20 samples needed to estimate 1-dim. Gaussian
    PDF
  • 400 samples needed to estimate 2-dim. Gaussian!,
    etc.
  • massive amounts of labeled data
  • are needed to estimate probabilities reliably!

22
Labeled (ground truthed) data
0.1 0.54 0.53 0.874 8.455 0.001 0.111 risk 0.2
0.59 0.01 0.974 8.40 0.002 0.315 risk 0.11 0.4
0.3 0.432 7.455 0.013 0.222 safe 0.2 0.64 0.13
0.774 8.123 0.001 0.415 risk 0.1 0.17 0.59
0.813 9.451 0.021 0.319 risk 0.8 0.43 0.55
0.874 8.852 0.011 0.227 safe 0.1 0.78 0.63
0.870 8.115 0.002 0.254 risk . . .
. . . .
.
Example client evaluation in insurances
23
Success of speech recognition
  • massive amounts of data
  • increased computing power
  • cheap computer memory
  • allowed for the use of Bayes in
  • hidden Markov Models for speech recognition
  • similarly (but slower) application of Bayes
  • in script recognition

24
  • Global Structure
  • year
  • title
  • date
  • date and number of entry (Rappt)
  • redundant lines between paragraphs
  • jargon-words
  • Notificatie
  • Besluit fiat
  • imprint with page number
  • ? XML model

25
Local probabilistic structure P(Novb 16 is a
date sticks out to the left is left of
Rappt ) ?
26
Naive BayesConditional Independence
  • Naive Bayes assumes the attributes (features)
    are independent
  • p(X,YC) p(XC) P(YC)
  • or
  • p(x1, ... xnC) ?i p(xiC)
  • Often works surprisingly well in practice
    despite its manifest simplicity.

27
Accumulation of Evidence Independence
28
The Naive Bayes Classifier
29
Learning to Classify Text
  • Representation each electronic document is
    represented by the set of words that it contains
    under the independence assumptions
  • - order of words does not matter
  • - co-occurrences of words do not matter
  • i.e. each document is represented as a bag of
    words
  • Learning estimate from the training dataset of
    documents
  • - the prior class probability P(ci)
  • - the conditional likelihood of a word wj given
    the document class ci P(wjci)
  • Classification maximum a posteriori (MAP)

30
Learning to Classify e-mail
  • Is this e-mail a spam? e-mail ? spam, ham
  • Each word represents an attribute characterizing
    the e-mail.
  • Estimate the class priors p(spam) and p(ham) from
    the training data as well as the class
    conditional likelihoods for all the encountered
    words.
  • For a new e-mail, assuming naive Bayes
    conditional independence, compute the MAP
    hypothesis.

31
Spam filtering
Example of regular mail
From acd_at_essex.ac.uk Mon Nov 10 192344
2003Return-Path ltalan_at_essex.ac.ukgtReceived
from serlinux15.essex.ac.uk (serlinux15.essex.ac.u
k 155.245.48.17) by tcw2.ppsw.rug.nl
(8.12.8/8.12.8) with ESMTP id hAAIecHC008727
Mon, 10 Nov 2003 194038 0100 Apologies for
multiple postings.gt 2nd C a l l f
o r P a p e r sgtgt DAS
2004gtgt Sixth IAPR International
Workshop ongt Document Analysis
Systemsgtgt September 8-10,
2004gtgt Florence, Italygtgt
http//www.dsi.unifi.it/DAS04gtgt
Notegt There are two main additions with respect
to the previous CFPgt 1) DASDL data are now
available on the workshop web sitegt 2)
Proceedings will be published by Springer Verlag
in LNCS series
32
Spam filtering
Example of spam
From Easy Qualify" ltmbulacu_at_netaccessproviders.n
etgt To bulacu_at_hotmail.com Subject Claim
your Unsecured Platinum Card - 75OO dollar limit
Date Tue, 28 Oct 2003 171207
-0400
mbulacu - Tuesday, Oct 28,
2003
Congratulations, you have been selected
for an Unsecured Platinum Credit Card / 7500
starting credit limit.This offer is valid even
if you've had past credit problems or evenno
credit history. Now you can receive a 7,500
unsecured Platinum Credit Card that can help
build your credit. And to help get your card to
you sooner, we have been authorized to waive any
employment or credit verification.
33
Conclusions
  • Effective about 90 correct classification
  • Could be applied to any text classification
    problem
  • Needs to be polished

34
Summary
  • Bayesian inference allows for integrating prior
    knowledge about the world (beliefs being
    expressed in terms of probabilities) with new
    incoming data.
  • Inductive bias of Naive Bayes attributes are
    independent.
  • Although this assumption is often violated, it
    provides a very efficient tool often used (e.g.
    text classification spam filtering).
  • Applicable to discrete or continuous data.
Write a Comment
User Comments (0)
About PowerShow.com