Title: Bayes
1 Bayes Theorem
2Remember Language ID?
- Let p(X) probability of text X in English
- Let q(X) probability of text X in Polish
- Which probability is higher?
- (wed also like bias toward English since its
more likely a priori ignore that for now) - Horses and Lukasiewicz are on the curriculum.
- p(x1h, x2o, x3r, x4s, x5e, x6s, )
3Bayes Theorem
- p(A B) p(B A) p(A) / p(B)
- Easy to check by removing syntactic sugar
- Use 1 Converts p(B A) to p(A B)
- Use 2 Updates p(A) to p(A B)
- Stare at it so youll recognize it later
4Language ID
- Given a sentence x, I suggested comparing its
prob in different languages - p(SENTx LANGenglish) (i.e., penglish(SENTx))
- p(SENTx LANGpolish) (i.e., ppolish(SENTx))
- p(SENTx LANGxhosa) (i.e., pxhosa(SENTx))
- But surely for language ID we should compare
- p(LANGenglish SENTx)
- p(LANGpolish SENTx)
- p(LANGxhosa SENTx)
5Language ID
- For language ID we should compare
- p(LANGenglish SENTx)
- p(LANGpolish SENTx)
- p(LANGxhosa SENTx)
- For ease, multiply by p(SENTx) and compare
- p(LANGenglish, SENTx)
- p(LANGpolish, SENTx)
- p(LANGxhosa, SENTx)
- Must know prior probabilities then rewrite as
- p(LANGenglish) p(SENTx LANGenglish)
- p(LANGpolish) p(SENTx LANGpolish)
- p(LANGxhosa) p(SENTx LANGxhosa)
6Lets try it!
First we pick a random LANG, then we roll a
random SENT with the LANG dice.
- p(SENTx LANGenglish)
- p(SENTx LANGpolish)
- p(SENTx LANGxhosa)
best
best
best compromise
total over all ways of getting SENTx
600.465 - Intro to NLP - J. Eisner
6
7Lets try it!
First we pick a random LANG, then we roll a
random SENT with the LANG dice.
p(LANGenglish, SENTx) p(LANGpolish,
SENTx) p(LANGxhosa, SENTx)
0.000007
0.000008
best compromise
0.000005
p(SENTx)
total probability of getting SENTx one way or
another!
0.000020
add up
normalize(divide bya constantso theyllsum to
1)
best
given the evidence SENTx,the possible
languages sum to 1
600.465 - Intro to NLP - J. Eisner
7
8Lets try it!
p(LANGenglish, SENTx) p(LANGpolish,
SENTx) p(LANGxhosa, SENTx)
0.000007
0.000008
best compromise
0.000005
p(SENTx)
total over all ways of getting x
0.000020
600.465 - Intro to NLP - J. Eisner
8
9General Case (noisy channel)
noisy channel
mess up a into b
a
b
language ? text text ? speech spelled ?
misspelled English ? French
maximize p(Aa Bb) p(Aa) p(Bb Aa) /
(Bb) p(Aa) p(Bb Aa) / ?a p(Aa)
p(Bb Aa)
10Language ID
- For language ID we should compare
- p(LANGenglish SENTx)
- p(LANGpolish SENTx)
- p(LANGxhosa SENTx)
- For ease, multiply by p(SENTx) and compare
- p(LANGenglish, SENTx)
- p(LANGpolish, SENTx)
- p(LANGxhosa, SENTx)
- which we find as follows (we need prior probs!)
- p(LANGenglish) p(SENTx LANGenglish)
- p(LANGpolish) p(SENTx LANGpolish)
- p(LANGxhosa) p(SENTx LANGxhosa)
11General Case (noisy channel)
- Want most likely A to have generated evidence B
- p(A a1 B b)
- p(A a2 B b)
- p(A a3 B b)
- For ease, multiply by p(Bb) and compare
- p(A a1, B b)
- p(A a2, B b)
- p(A a3, B b)
- which we find as follows (we need prior probs!)
- p(A a1) p(B b A a1)
- p(A a2) p(B b A a2)
- p(A a3) p(B b A a3)
12Speech Recognition
- For baby speech recognition we should compare
- p(MEANINGgimme SOUNDuhh)
- p(MEANINGchangeme SOUNDuhh)
- p(MEANINGloveme SOUNDuhh)
- For ease, multiply by p(SOUNDuhh) compare
- p(MEANINGgimme, SOUNDuhh)
- p(MEANINGchangeme, SOUNDuhh)
- p(MEANINGloveme, SOUNDuhh)
- which we find as follows (we need prior probs!)
- p(MEANgimme) p(SOUNDuhh MEANgimme)
- p(MEANchangeme) p(SOUNDuhh MEANchangeme)
- p(MEANloveme) p(SOUNDuhh MEANloveme)
13Life or Death!
Does Epitaph have hoof-and-mouth disease?He
tested positive oh no!False positive rate only
5
- p(hoof) 0.001 so p(?hoof) 0.999
- p(positive test ?hoof) 0.05 false pos
- p(negative test hoof) x ? 0 false neg
- so p(positive test hoof) 1-x ? 1
- What is p(hoof positive test)?
- dont panic - still very small! lt 1/51 for any x