Title: GS
1(No Transcript)
2?????G?S
? e??t?s? ?p??e? µ?a e??a?a stat?st???
µe??d?????a ?a apa?t?se? se p??ß??µata
epe?e??as?a? f?s???? ???ssa? p?? eµfa?????? µ?a
?µ???t?ta ?? p??? t? st???, ? ?p???? e??a? ?
ep????? µeta?? a?ta??????µe??? ??t?t?t??
- ?a?ade??µata
- ??ta??????µe?a ????afa st?? a???t?s? p????f???a?.
(Information Retrieval) - ?????e? µ?a? ????? st? p?a?s?? (context) p??
eµfa???eta?. (Word Sense Disambiguation) - ??ta????sµ?? ???e?? ??a t?? d?µ??????a
Collocations (S??e?fe??µe?e? ???e??)
3?????G?S
- Stat?st??? ? ???d?? p?? efa?µ?st??e µe t??
µe?a??te?? ep?t???a st?? ?pe?e??as?a F?s????
G??ssa? (Natural Language Processing) - ?a?ade??µata
- Sta s?st?µata ??a a?a??t?s? p????f???a?
(Information Retrieval IR) - ?p?saf???s? t?? ?????a? µ?a? ????? (Word Sense
Disambiguation WSD) - O s??µat?sµ?? s??e?fe??µe??? ???e?? (Collocation)
- ???? ?a?
- ?at??????p???s? ?e?µ???? (Text Categorization)
- ?p??p???s? ?e?µ???? (Text Simplification)
4?????G?S
- S??p?? t?? ??at??ß??
- ?a a?ade??e? t?? efa?µ??? µ?a? e??a?a?
Stat?st???? µe??d?????a? ??a t??? pa?ap??? t?µe??
??e??a? -
- S???e???µ??a, a??pt??? s?st?µ?t?? ??a
- ??? e??es? s??e?fe??µe??? ???e?? (collocations)
se ?e?µe?a f?s???? ???ssa?, - ??? a?a??t?s? p????f???a? µe ß?s? t? e??t?µa e???
???st? (information retrieval), ?a? - ??? ap?saf???s? t?? ?????a? µ?a? ????? ap? ta
s?µf?a??µe?? t?? (word sense disambiguation).
5?????G?S
- ? a???t?s? p????f???a? (Information Retrieval)
e??a? ???d?? t?? ?pe?e??as?a? F?s???? G??ssa? p??
as???e?ta? µe t?? a??pt??? a??????µ?? ?a?
µ??t???? ??a t?? a?a??t?s? p????f???a? ap?
d??f??e? s??????? ?e?µ???? (Internet, document
depositories). - ?e t?? a?a?????s? t?? p?s?t???? µe??d??
epe?e??as?a? f?s???? ???ssa?, ?? stat?st????
µ???d?? ????a? ? ????a??? p??s????s? a??pt????
s?st?µ?t?? ??a a???t?s? p????f???a?.
6?????G?S
- Word Sense Disambiguation ? ???d?? p??
as???e?ta? µe t?? ap?saf???s? t?? ?????a? µ?a?
????? µ?sa sta s?µf?a??µe?a t?? - ?? stat?st???? µ???d?? ?e?????ta? ?? t?
ap???e?st??? e??a?e?? ??a t?? a??pt??? s?st?µ?t??
?p?saf???s?? ???????. - ??t??a s?st?µata e??a? p??? ???s?µa ?a? ß??????
t? µ??a???? µet?f?as? ?a? t?? ?ata???s? ?e?µ????
7?????G?S
- Collocations
- E??a? ? e??es? s??e?fe??µe??? ???e??
(collocations), ???e?? d??ad? p?? eµfa?????ta?
p??? s???? µa?? ?a? s??µat????? ??a ???
s?µas???????? ??? µe s?µas?a d?af??et??? t??
s?µas??? t?? s???st?s?? µe???. - ?a?ade??µat?? ????? ? ??f?as?
- Ge?? ??t???
8????t??
- ? ?pe?e??as?a F?s???? G??ssa? e??a? a?aµf?sß?t?ta
ep?st?µ?????? ???d?? µe p???pt???. - ??a ta p??????µe?a p??ß??µata e??a? e?a??et???
d?s???a ?a? ? ep???s? t??? a?aµ??eta? ?a
ep??e?se? ?ata??t??? t?? efa?µ???? ?p?????st????
G??ss?????a? ?a? ?d?a?te?a t?? ???d? t?? ?e???t??
???µ?s???? - ????? t??a p????? µ???d?? ?a? s?st?µata ?????
p??ta?e? st?? d?e??? ß?ß?????af?a ??a t?? ep???s?
t?t???? p??ß??µ?t?? a??? µe t??p? ap?spasµat???. - ??a???????ta? µeta?? t??? ta p??ß??µata,
pa?at??e?ta? a??pt??? d?af??et???? µe??d?? ??a t?
???e p??ß??µa - ?p?t??esµa a??????µ?? ?a? te?????? p?? d???e????
??a µ?a pe????? t?? ?pe?e??as?a? F?s???? G??ssa?
?a µ?? µp????? ?a efa?µ?s???? se ????.
9? ?d?a
- ?a pe??ss?te?a p??ß??µata epe?e??as?a? f?s????
???ssa? eµfa?????? ??a ????? ?a?a?t???st???, a?t?
t?? ep?????? µeta?? a?ta??????µe??? ??t?t?t?? ??a
??p??? s???e???µ??? st???. - ?a?ade??µata
- ??ta??????µe?a ????afa st?? a???t?s? p????f???a?
p?? a?ta???????ta? ?? p??? t?? st??? p?? e??a? ?
s???fe?a µe t? e??t?µa (query) e??? ???st?,
a?ta??????µe?e? ?????e? st?? ap?saf???s? t??
?????a? µ?a? ?????, ? a?ta??????µe?a ?e?????a
???e?? ??a t?? s??µat?sµ? collocations. - ? pa???sa d?at??ß? a?ade????e? a?t? t?
?a?a?t???st??? ?a? apa?t?e? µe µ?a e??a?a
stat?st??? µe??d?????a ??a t?? ep???s? t??
pa?ap??? p??ß??µ?t??, s?µß?????ta? st?? ???st???
a???p???s? t?? ep?st?µ?????? ???s??.
10? µe??d?????a
- St? Stat?st??? e??a? p??? ?a?? ?eµe???µ???? ??
??e???? ?a??? ta????sµat?? (Goodness of
statistical tests), ?? ?p???? e??????? ?at? p?s?
?a?? ta???????? ta ded?µ??a se µ?a ?p??e?µe??
?e???t??? ?p??es? p?? ?e????µe ?t? ta d??pe?. - St? d?at??ß? ???s?µ?p??e?ta? ? ?-tet???????
stat?st???? ??e???? ?a??? ta????sµat?? ,
(Chi-square Goodness of Fit Statistical Test) ??a
t?? ap?t?µ?s? t?? s?et???t?ta? µe t? st??? t??
???e a?ta??????µe??? ??t?t?ta?. - ??? s???e???µ??a, d?at?p??eta? µ?a µ?de????
?p??es? (null hypothesis) ?t? ?? d??f??e?
a?ta??????µe?e? ??t?t?te? de? ep?de??????? ?aµ?a
?d?a?te?? s?µpe??f??? ??a?t? t?? st???? p??a? t??
t??a?a?. ??t? e??a? ? ?e???t??? ?p??es? p??
???eta? ??a ta ded?µ??a
11? µe??d?????a
- ?p? ta p?a?µat??? ded?µ??a ?ata???feta? ?
p?a?µat??? s?µpe??f??? t?? ???e a?ta??????µe???
??t?t?ta? ?a? p?st?p??e?ta? ?ts? µ?a d?af???
(discrepancy) µeta?? t?? p?a?µat???? s?µpe??f????
?a? a?t?? p?? ap????e? ap? t?? ?e???t??? ?p??es?. - ? d?af??? a?t? p?s?t???p??e?ta? µe t?? ß???e?a
t?? X2 ?ata??µ?? ?a? a?t? ? p?s?t???p???s?
e??a? ??a?? ?a ???s?µ?p????e? ?? µ?t?? t??
ap?t?µ?s?? t?? s?et???t?ta? t?? a?ta??????µe???
??t?t?ta? µe t? st??? (ranking criterion).
12?? a??????e?
- ??????, pa???s?????µe µ?a e?sa???? t??
stat?st???? µ??t???? p?? ???s?µ?p?????ta? st??
epe?e??as?a f?s???? ???ssa? ?a??? ep?s?? ?a t??
µ?t??? ap?t?µ?s?? t?? ap?d?t???t?ta? t??
s?st?µ?t?? a?t?? - ???????e? ? efa?µ??? t?? stat?st???? e??????
st?? a???t?s? p????f???a? (Information
Retrieval). ??sa st? ?d?? stat?st??? p?a?s??,
pa???s?????µe ??a s?st?µa ??a a?a??t?s?
?e?µe????? p????f???a? ap? de?aµe??? e????f??
(document repositories) µe ß?s? t? e??t?µa e???
???st?. - St?? s????e?a, pa???s?????µe stat?st???? µe??d???
??a t?? a?a?????? s??e?fe??µe??? ???e?? µ?sa se
???????? ?e?µe?a (Collocations) ?a? ?eµe??????µe
??a t??p? efa?µ???? t?? stat?st???? e?????? st??
pe????? a?t?
13?? a??????e?
- ????? efa?µ????µe t??? stat?st????? e??????? st??
pe????? t?? ap?saf???s?? t?? ?????a? µ?a? ?????
(Word Sense Disambiguation). ??a stat?st???
s?st?µa a?apt?sseta? ??a t?? ap?saf???s? t??
?????a? µ?a ????? ap? ta s?µf?a??µe?? t??
?????ta? ???s? t?? ??e?t??????? ?e????? WordNet
sa? ?e?????????? p???. - ?a s?µpe??sµata p?? p????pt??? µet? ap? ap?t?µ?s?
t?? µe??d?? p?? a?apt??aµe p??? se pe??aµat???
ded?µ??a e??????, e??a? ?t? ta stat?st??? a?t?
s?st?µata ap?de??????ta? e???sta ?a? ??a?? ?a
d?s??? ap?te??sµata ?a??te?a ap? a?t? t??
??ass???? µe??d??
14??S?GOG?
- ? stat?st??? e??a? ? ???d?? t?? µa??µat????
ep?st?µ?? p?? ??e? ???s?µ?p????e? e???tata st??
?pe?e??as?a F?s???? G??ssa? (?FG) - ? a?µat?d? e?????? t?? p????f?????? ta te?e?ta?a
?????a ?a? ? d?a?es?µ?t?ta µe????? ????? ?e?µ????
se ??f?a?? µ??f?, d?µ??????sa? t?? s?????e? ??a
t?? a?a?????s? t?? p?s?t???? µe??d?? st?? (?FG) - ?e t?? a?a?????s? t?? p?s?t???? µe??d??
epe?e??as?a? f?s???? ???ssa?, ?? stat?st????
µ???d?? ????a? ? ????a??? p??s????s? a??pt????
s?st?µ?t?? ??a a???t?s? p????f???a?
15??S?GOG?
- ?? stat?st???? µ???d?? ?e?????ta? ?? t?
ap???e?st??? e??a?e?? ??a t?? a??pt??? s?st?µ?t??
??a t?? ??a??t?s? ?????f???a? (Word Sense
Disambiguation), ap?saf???s? ?e?t???? s?µas?a?
(Word Sense Disambiguation), ?at??????p???s?
?e?µ????, e??es? Collocations ??p - ?a p??ß??µata a?t? a?a????????ta? sa?
?p?????st??? p???p???a p??ß??µata st??
epe?e??as?a f?s???? ???ssa? ?a? ? ep???s? t???
a?aµ??eta? ?a ep??e?se? ?ata??t??? t?? e??????
t?? ???d?? t?? ?p?????st???? ???ss?????a?
(Computational Linguistics)
16- Stat?st??? ???t??a st?? epe?e??as?a f?s????
???ssa? - ? ??e??a sta stat?st??? s?st?µata epe?e??as?a?
f?s???? ???ssa? as???e?ta? µe t?? a??pt???
a??????µ?? ?a? s?st?µ?t?? ??a t?? a?apa??stas?,
ap????e?s?, ??????s?, epe?e??as?a ?a? p??sp??as?
t?? st???e??? t?? p????f???a?. - ?? p??te? p??sp??e?e? ??a a?apa??stas? ?a?
a???t?s? p????f???a? ?e????sa? µe ta s?st?µata
a?a??t?s?? p????f???a?. ?? ?a? pa?ad?s?a?? ?
???d?? as?????ta? µ??? µe t?? a?a??t?s? ?e?µ????
?a? t?? e??es? e????f??, s?µe?a, ?p???e? ??t???
e?d?af???? ?a? ??a ???e? µ??f?? p????f???a?. - ? a?apa??stas? t?? p????f???a? se ?p?????s?µ?
µ??f? pa??e? ?a????st??? ???? st?? a??pt???
s?st?µ?t?? epe?e??as?a? f?s???? ???ssa?.
17???t??a ??apa??stas???????f???a?
- ??????a µe t?? f?s? t?? d?ad??as?a? a?apa??stas??
e??? ?e?µ???? sa? s????? ap? ???e?? ??e?d??,
µp????µe ?a ?atat????µe ta p?? s?µa?t??? µ??t??a
a?apa??stas?? p????f???a? st?? e??? ????e?
?at?????e? - ??ad??? µ??t??a (Boolean models)
- ??a??sµat??? µ??t??a (Vector models)
- ???a??t??? µ??t??a (probabilistic models)
18???t??a ??apa??stas???????f???a?
- ??ad??? µ??t??a
- ?? d?ad??? µ??t??? e??a? t? p?? ap?? µ??t??? t?
?p??? ßas??eta? st?? ?e???a s?????? ?a? t??
Boolean ???eß?a - ? p????f???a a?apa??stata? ?p? µ??f? se????
??f??? 0 ?a? 1. ?? 1 d????e? t?? pa???s?a e???
???? ?a? t? 0 t?? ap??s?a - ?p?f??e? ap? a??et? µe???e?t?µata. ??, d?s????a
p?? ?p???e? st? Information Retrieval ?a
e?f?as?e? ??a e??t?µa se Boolean ??f?as? ap? t??
???st?
19???t??a ??apa??stas???????f???a?
- ?? d?a??sµat??? µ??t???
- ?? d?a??sµat??? µ??t??? 1, 2, e??a? t? p??t?
µ??t??? p?? efa?µ?st??e p??ta st?? a?a??t?s?
p????f???a?. - S?µf??a µe t? d?a??sµat??? µ??t???, ???e ???? kj
se µ?a ?e?µe???? p????f???a, ?a?a?t????eta? µe
??a ?et??? µ? µ?de???? p?a?µat??? a???µ? p??
?a?e?ta? ß???? (weight) ?a? e?f???e? t??
s?µa?t???t?ta t?? ???? st?? p??sd????sµ? t??
s?µas??????a? t?? ?e?µ????
20?? d?a??sµat??? µ??t??? st?? ??a??t?s? ?????f???a?
- St?? ??a??t?s? ?????f???a?
- ?p????µe ?a a?apa?ast?s??µe ??a ????af? dj sa?
??a d????sµa (w1j, w2j, , wt,j),
?p?? t t? p????? ???? - ??a e??t?µa q sa? (w1q, w2q, , wtq),
21?? d?a??sµat??? µ??t??? st?? ??a??t?s? ?????f???a?
- ?p????µe ?pe?ta ?a ???s?µ?p???s??µe t? s???µ?t???
t?? ????a? (cosine) µeta?? t?? d?? d?a??sµ?t??
??a ?a ß???µe t?? ?µ???t?ta µeta?? t?? d??
p????f?????
22?a ß??? st?? s?µas??????a t?? ?e?µ????
- G?a t?? ?a????sµ? t?? ß????? e??? ????
?a????st??? ???? pa????? - ? s????t?ta t?? ???? st? ?e?µe?? t?? e????f??
- ? a???µ?? t?? e????f?? sta ?p??a s?µµet??e? ?
???? - ??t? ?a µp????saµe ?a ta s??d??s??µe se ??a
µ??ad??? ß????
Tf-idf s??µata
23???a??t??? ???t??a
- Sta p??a??t??? µ??t??a ? eµf???s? e??? ????
µ??te??p??e?ta? sa? ??a s?µß?? ?a? t??
ap?d?deta? µ?a p??a??t?ta. - ?s? µe?a??te?? e??a? ? p??a??t?ta eµf???s?? e???
????, t?s? p?? s?µa?t???? e??a? ? ????? t?? st??
?a????sµ? t?? s?µas??????a? t?? p????f???a?.
24???a??t??? ???t??a
- ???sfata µ?a ??a p??s????s?, ? µ??te??p???s?
???ssa? (language Modeling) ??e? p??ta?e? sta
pa?ad?s?a?? d?a??sµat??? ?a? ta ???a p??a??t???
µ??t??a. - ??e? efa?µ?s?e? µe ep?t???a sta s?st?µata
??a??t?s?? ?????f???a? 8, 9, 10, 11. - ??a stat?st??? µ??t??? ???ssa? e??a? ??a?
p??a??t???? µ??a??sµ?? pa?a????? ?e?µ????.
25???a??t??? ???t??a
- ? ?ata???? t?? µ??t???? ???ssa? a???eta? st??
ep??? t?? Shannon 12, ? ?p???? d?at?p?se t??
p??? ???st? ?e???a t?? st?? t?µ?a t??
ep?????????? (source channel perspective) - O Shannon µe??t?se ?at? p?s? ta ap?? (?-???µµata)
µ??t??a (n-gram models) µp????? ?a p??ß??????
f?s??? ?e?µe?? - ??e? efa?µ?s?e? µe ep?t???a st?? ??a?????s? ?????
(Speech Recognition)
26???a??t??? ???t??a
- ?? µ??t??? ???ssa? efa?µ?st??e ??a p??t? f??? se
efa?µ???? epe?e??as?a? p????f???a? ?e?µ???? ap?
t??? Ponte ?a? Croft t? 1998 st?? ????t?s?
?????f???a? 8. - Sta ??as??? p??a??t??? µ??t??a ??a??t?s??
?????f???a? 3, 5, 13, 14, ?p???e? ?
a????? ?a ?ata?e?µ??µe µ?a µ??a p??a??t?ta?
(Probability mass) p??? se ??a te??st?? ????
p??a??? t?µ?? (e?ß?se??) ??a t?? ???e ???
(unigram language model) - ??a??et??? ??s????. ? µ??? ??de??? t??
pe??ss?te?e? f???? e??a? ?? ???? t?? e??t?µat??
27???a??t??? ???t??a
- ?? Ponte ?a? Croft 8, a?t?µet?p?sa? t? ??t?µa
µe µ?a a?t?st??f? p??s????s?. ???s?µ?p????ta?
µ?a smoothed e?d??? t?? unigram language model,
p??te??a? µ?a µ???d? ?a ap?d?s??? µ?a t?µ?
p??a??f??e?a? (likelihood score), ap? t? ????af?
st? e??t?µa. - ??t? ? p??s????s? e??a? ???st? sa? language
modeling Approach - ??a µ??t??? ???ssa? ?e??e?ta? sa? ??a ????ß?de?
?a???? ? noisy channel ? translation channel,
t? ?p??? ape??????e? ta ????afa sta e??t?µata
28- Evaluation Measures
- ??t?a ?p?t?µ?s?? t?? s?st?µ?t?? ?pe?e??as?a?
F?s???? G??ssa?
29??t?a ?p?t?µ?s??
- ?e?????f??µe ta µ?t?a ?p?t?µ?s?? p?? ?a
???s?µ?p???s??µe st?? ????t?s? ?????f???a? ?a?
sta s?st?µata ?p?saf???s?? ???????. - ?a µ?t?a a?t? efa?µ????ta? ?a? ?e????te?a sta
s?st?µata ?pe?e??as?a? F?s???? G??ssa?
30??t?a ?p?t?µ?s?? S?st?µ?t?? ?FG
- Precision ?a? Recall
- ?? e????s??µe t?? ?????e? µe ????? ap? t?? s??p??
t?? Information Retrieval ?a? ?a ?e???e?s??µe. - ?st? ?t? st? s?st?µa ??a??t?s?? ?????f???a?
?p?ß???eta? ??a e??t?µa q. - ??? R t? s????? t?? s?et???? e????f?? µe a?t? t?
e??t?µa ?a? A t? s????? t?? e????f?? p??
ep?st?e?e t? s?st?µa
31??t?a ?p?t?µ?s?? S?st?µ?t?? ?FG
- ?p? p???? ?st? Ra ? a???µ?? t?? e????f?? st??
t?µ? (Intersection) t?? R ?a? A - Recall
32??t?a ?p?t?µ?s?? S?st?µ?t?? ?FG
- ???ad? ??a ??a s?st?µa ?pe?e??as?a?
- Precision e??a? t? p?s?st? t?? ?p?t????? st?
s????? t?? ?pa?t?se?? t?? s?st?µat?? - Recall e??a? t? p?s?st? t?? ep?t????? st? s?????
t?? s?st?? ?pa?t?se?? p?? ?p???e?. - S????????µe ?a a?apa??st??µe t?? ?aµp???
Precision versus Recall - ????sta se s???e???µ??a p?s?st? t?? Recall
- 0, 10, 20, ,100
- ??te µ???µe ??a Precision Versus Recall at 11
Recall Points
33?fa?µ??? t?? Stat?st???? ??????? st?? ????t?s?
?????f???a?
34?fa?µ??? t?? Stat?st???? ??????? st?? ????t?s?
?????f???a?
? ?as??? ?d?a.
- Sta pe??ss?te?a µ??t??a p?? ???s?µ?p????µe ??a
t?? ??a??t?s? ?????f???a? e?d?afe??µaste ?a
e?t?µ?s??µe p?s? ?a?? t? µ??t??? t?? e????f??
(document model) ta?????e? st?? p????f???a??
a????? t?? ???st? (query model). - ?p? t?? ???? p?e??? st?? stat?st???, ?p??????
?a?? ?eµe???µ??e? te?????? ??a t?? e?t?µ?s? t??
?at? p?s? ??a µ??t??? ta?????e? µe ??p??? ????
µ??t???
F?????? ???/??? Stat?st???? ??e???? st??
?pe?e??as?a F?s???? G??ssa?
35? ?as??? ?d?a.
- ?? stat?st???? ??e???? ?a??? ta????sµat??
(Goodness of fit statistical tests) e??a? p???
???st?? µ???d?? ??a t?? e?t?µ?s? t?? ?p??es?? t??
?at? p?s? ??a ?e???t??? µ??t??? pe?????fe?
?a?? ??a s????? ded?µ????. - St? ßas??? ??s? t?? d?at??ß?? a?apt?ss??µe µ?a
te????? ??a ??a??t?s? ?????f???a? ? ?p??a
st????eta? st?? ?-tet?????? ??e??? ?a???
ta????sµat?? ??a ?a e?t?µ?s??µe p?s? ?a?? t?
µ??t??? t?? e????f?? ta????e? st?? p????f???a??
a????? t?? ???st?
36?fa?µ??? t?? Stat?st???? ??????? st?? ????t?s?
?????f???a?
- ? te????? a?t? e?t?? t?? ?t? ap?de????eta?
?d?a?te?a ap?d?t???, e??a? ?a? e?????t?. - ?p??e? ?a p??sa?µ?s?e? ?a? se d?af??et???
p??ß??µata, e?e? ?p?? ?pe?s???eta? ? ?????a t??
e?t?µ?s?? t?? ta????sµat??, ?p?? p? st??
ap?saf???s? t?? ?????a? µ?a? ?????.
37?fa?µ??? t?? Stat?st???? ??????? st?? ????t?s?
?????f???a?
???p???s?
? ?????? e??a? ap??. ??at?p????µe µ?a ßas???
?p??es? ??a ta ded?µ??a ???st? ?a? ?? µ?de????
?p??es?
S?µf??a µe a?t? Te????µe ?t? de? ?p???e? ?aµ?a
?d?a?te?? s??s? ? desµ?? µeta?? t?? e??t?µat??
(query) ?a? e??? s???e???µ???? e????f??, e?t??
ap? t? ?t? ?? ???? t?? e??t?µat?? µp??e? ?a
eµfa??s???? se a?t? t? ????af? ap? t??? ?a?
µ???
G?a ?a e?t?µ?s??µe t?? ?p??es? a?t? e?te???µe
??a ?-tet?????? stat?st??? ??e??? (Goodness of
Fit Statistical Test) ?a? µe t?? ß???e?a t??
e?????? a?t?? e?t?µ??µe t?? s?et???t?ta t??
e????f?? µe t? e??t?µa t?? ???st?.
F?????? ???/??? Stat?st???? ??e???? st??
?pe?e??as?a F?s???? G??ssa?
38? µ???d?? a?t? e?t?µ????e p??? sta ep?s?µa TREC
ded?µ??a ??a ??e??? t?? ap?d?t???t?ta? t??
Information Retrieval s?st?µ?t??
? ap?d?t???t?t? t?? sta?e?? p?? p??? ap? ta
??ass??? tf-idf s??µata ?a? t?? OKAPI µ???d?
??e??e?t?µata
- ?? pa?aµet???? µ???d?? ??a Information Retrieval
- ?????pt??? ap??? t?p?? ??a??t?s?? ?????f???a?
- ??a??a?t???? t??p?? µ??te??p???s? ?????f?? ?a?
???t?µ?t??
39??sa???? sta Stat?st??? µ??t??a G??ssa?
- ??a??sµat??? µ??t??a (vector Space models)
- ???a??t??? µ??t??a (Probabilistic models)
- Language Modeling Approach
40- ??a??sµat??? µ??t???. ???t????e ap? t?? Salton
2 t? 1972. ???te??p??e? ta ????afa ?a? ta
e??t?µata ?? d?a??sµata ?a? ???s?µ?p??e?
d?a??sµat???? µet????? ??a ?a e?t?µ?se? t??
s?et???t?ta. ???µa ?a? s?µe?a ß??s?eta? se ???s?. - ???a??t??? µ??t???. ???t????e ap? t??? Robertson
?a? Sparck-Jones 3 t? 1975. ???s?µ?p??e? t??
p??a??t?ta eµf???s?? e??? ???? a?t? t??
s????t?ta? p?? ???s?µ?p??e?ta? st? ??a??sµat???
µ??t???, ?a? e?t?µ? t?? s?et???t?ta t??
e??t?µat?? µe t? ????af? ???s?µ?p????ta?
?ata??µ?? - ?a?a??a???
- Naïve Bayesian Networks 13
- Inquery Retrieval System 14
- OKAPI system
41Language Modeling Approach
- ???t????e to 1998 ap? t??? Ponte ?a? Croft 8
- ???s?µ?p??e? ta stat?st??? µ??t??a ???ssa? µe
?µ??? t??p? ?p?? a?t? ???s?µ?p?????ta? st? Speech
Recognition ?a? ????? t?? ?ata???? t??? ap? t??
ep??? t?? Shannon µe t? µ??t??? t?? ????ß?de?
?a?a???? (noisy channel) 12. - ?a s?st?µata a?t? ap?d?d??? ?a?? a??? ????? t?
µe?????t?µa ?t? e??a? pa?aµet???? ?a? ??e?????ta?
e?t?µ?s? pa?aµ?t??? p??? se training data - ?a?a??a???
- Hidden Markov Models 48,11
- Translation Models 10
42? d???? µa? ???s????s?Goodness of Fit (GOF)
??a??t?s?
- G?a ?a ßa?µ?????s??µe ta d??f??a ????afa
ßas???µaste st?? ?-tet?????? stat?st??? ??e??? - ? ?-tet?????? ??e???? pe?????fe? t? p?s? ?a??
µ?a ?p??es? (µ?de???? ?p??es?), st?? ?p??a
?e????µe ?t? ?p??e??ta? ta ded?µ??a ta?????e? µe
ta ded?µ??a - ??? s???e???µ??a d?at?p????µe t?? µ?de????
?p??es? ?t? ???? ?? ???? t?? e??t?µat??
?ata??µ??ta? t??a?a sta d??f??a ????afa - ?et??µe t?? s????t?ta ???e ???? st? ????af?
(observed) ?a? t?? s????????µe µe t?? µ?de????
?p??es? (expected). - ??? ? d?af??? e??a? µe???? a?t? e??a? ??de???
s?s??t?s?? t?? e??t?µat?? µe t? ????af?.
43Stat?st???? ??e???? ?a??? ?a????sµat??
- ?a stat?st??? p??ß??µata a?????ta? s?????? st??
??e??? ??a t?? ep????? µ?a? ap? d?? e?a??a?t????
?p???se?? ??? µ?de???? (null Hypothesis) H0, ?
?p??a ?e??e? ?t? t? de??µa a??????e? t??
?p??e?µe?? ?e????µe?? ?ata??µ?, ?a? t??
e?a??a?t??? H1, ? ?p??a ?e??e? ?t? a?t? de?
s?µßa??e?. - ??a? stat?st???? ??e???? ?e??e?ta? ?s????? e?? ?
p??a??t?ta ap?d???? t?? H0 e??a? µ???? ?ta? ? H0
e??a? ?????.
44?-tet?????? ??e????
- ? p?? s?µa?t???? ?a? ? p?? ???st?? stat?st????
??e???? e??a? ? ?2 ?a? p??t????e ap? t?? Pearson
33, (Pearsons chi-squared test). - G?a t?? ?p?????sµ? t?? ? stat?st??? p??
???s?µ?p??e?ta? e??a? ? e??s
?p?? Oi ? pa?at????e?sa s????t?ta ?a? Ei ?
a?aµe??µe?? s????t?ta ap? t?? µ?de???? ?p??es?.
? stat?st??? ??????? t?? e??s?s?? 2.1 a??????e?
t?? ?2 ?ata??µ? µe k-c ßa?µ??? e?e??e??a?, ?p?? k
? a???µ?? t?? ???se?? ?at??????p???s?? t??
ded?µ???? ?a? c o a???µ?? t?? e?t?µ?µe???
pa?aµ?t??? ??a t?? ?ata??µ? p?? ?e????µe ?t?
d??pe? ta ded?µ??a.
45?-tet?????? ??e???? (s????e?a)
- ???s?µ?p????ta? ??p??? stat?st??? pa??t? ?
p??a?e? t?? ?2 ?ata??µ?? ?p????????µe t?? p t?µ?
(p-value) ??a t?? ?p???????µe?? ?2 t?µ? ap? t??
p??????µe?? e??s?s?. - ??? ? t?µ? p e??a? p??? µ???? (t?p??? ??t? ap?
??a ep?ped? s?µa?t???t?ta?) ap????pt??µe t??
µ?de???? ?p??es?, d?af??et??? t?? ap?de??µaste.
46????d?? ??a??t?s?? ?????f???a? µe t?? ???s? t??
?2 stat?st???? ???????
- ? ??s?a t?? p??te???µe??? µe??d?? e??a? ?a
s??????e? t?? pa?at????e?se? s????t?te? t?? ????
t?? e??t?µat?? st? ????af? µe t?? a?aµe??µe?e?
ap? t?? ?e????µe?? ?p??es? t?? t??a?a?
?ata??µ??. - ? s?????s? a?t? µe t?? ß???e?a t?? ?2 stat?st????
??????? µp??e? ?a p?s?t???p???se? µ?a d?af???
(discrepancy), ? ?p??a te???? ?a ???s?µ?p????e?
sa? ???t???? ßa?µ?????s?? t?? s???fe?a? t??
e??t?µat?? µe t? ????af?.
47????d?? ??a??t?s?? ?????f???a? µe t?? ???s? t??
?2 stat?st???? ??????? (S????e?a)
- ? µ?de???? ?p??es? ap????pteta? ?ta? ?
?p???????µe?? ?2 t?µ? ap? t?? e??s?s? 2.1 t??
Pearson e??a? µe?a??te?? ap? t?? t?µ? p??
?aµß????µe ap? t??? p??a?e? t?? ?2 ?ata??µ?? ??a
??a ep?ped? s?µa?t???t?ta? a (s?????? a0.05, ??a
ßeßa??t?ta 95) - ???ad?, ?s? µe?a??te?? e??a? ? ?p???????µe?? ?2
t?µ? t?s? ?s????te?? e??a? ? ??de??? ?a
ap???????µe t?? µ?de???? ?p??es? ?a? ep?µ???? ?a
????µe µ?a s?s??t?s? (relatedness) µeta??
e??t?µat?? ?a? e????f??
48????d?? ??a??t?s?? ?????f???a? µe t?? ???s? t??
?2 stat?st???? ??????? (S????e?a)
- ?p?µ???? ?s?? af??? t?? te????? µa? ??a t??
µ?t??s? t?? s???fe?a? µeta?? e??t?µat?? ?a?
e????f?? ?a µp????saµe ?a ???s?µ?p???s??µe a?t?
?a? ea?t? t?? ?p???????µe?? ?2 t?µ? ????? ?a
e?d?afe??µaste p?a?µat??? ?a ap???????µe t??
µ?de???? ?p??es? - ?a ????afa µe t?? µe?a??te?? a?t?st???? ?2 t?µ?
?a t?p??et????? st?? ????f? t?? ep?st?ef?µe???
ßa?µ?????µ???? ??sta? µe ta s?et??? ????afa
49????d?? ??a??t?s?? ?????f???a? µe t?? ???s? t??
?2 stat?st???? ??????? (S????e?a)
50????d?? ??a??t?s?? ?????f???a? µe t?? ???s? t??
?2 stat?st???? ??????? (S????e?a)
51????d?? ??a??t?s?? ?????f???a? µe t?? ???s? t??
?2 stat?st???? ??????? (S????e?a)
52????d?? ??a??t?s?? ?????f???a? µe t?? ???s? t??
?2 stat?st???? ??????? (S????e?a)
- ??e??e?t?µata
- ?? ????? p?e????t?µa e??a? ?t? ? p??te???µe??
µ???d?? de? e??a? pa?aµet????. Se ???e? µe??d???
?p?? ? KL-Divergence t? pa?a??µe?? µ??t???
??e???eta? e?t?µ?s? t?? pa?aµ?t??? t?? ?ata??µ??
p??? se ded?µ??a e?pa?de?s?? (training data) - ?????pte? ap??? t?p?? ??a??t?s?? (Retrieval
formula) - ?p????µe ?a d???µ?s??µe p?????? e?a??a?t?????
t?p??? ??a??t?s?? ap?? a??????ta? t?? ßas??? µa?
?p??es? ??a ta ded?µ??a (d??ad? t? µ??t??? t??
t??a??t?ta?)
53?a µ??t??a S?????s??
- Ta pe????????µe d?? d?µ?f??? µ??t??a ??a??t?s??
?????f???a? µe ta ?p??a ?a s????????µe t??
p??te???µe?? ?2 GOF µ???d?, t?? - OKAPI µ???d?, ap? ta ???st? tf-idf s??µata
- KL-Divergence ap? t?? Language Modeling
???s????s? ??a Information Retrieval
54Tf-idf s??µata, OKAPI t?p?? ??a??t?s??
- ?a tf-idf s??µata e??a? ???st? ?a? ?? µ??t??a
d?a??sµat???? ????? ?a? p??t????a? ??a p??t? ap?
t?? Salton t? 1971, 2. - S?µf??a µe a?t? t? µ??t???, ???e ???? kj se ??a
????af? dj s??d?eta? µe ??a ?et??? ß???? wij t?
?p??? e?f???e? t? p?s? s?µa?t???? e??a? ? ????
??a t?? ?a????sµ? t?? s?µas??????a? t?? e????f??
?a? ep?µ???? t?? sp??da??t?t?? t?? st? s?st?µa
??a??t?s?? - ?p?s?? ?a? ???e ???? t?? e??t?µat?? s??d?eta? µe
??a a?t?st???? ß????
55Tf-idf, OKAPI t?p?? ??a??t?s?? (S????e?a)
56Tf-idf, OKAPI t?p?? ??a??t?s?? (S????e?a)
57Tf-idf, OKAPI t?p?? ??a??t?s?? (S????e?a)
- G?a ?a ???e? p?? a?ta????st??? t? s??µa
???s?µ?p????µe µ?a pa?a??a?? t?? ß????? s?et???
µe a?t? p?? d??eta? st?? t?p? (2.8), t?? OKAPI-TF
t?p? ???st? ?a? ?? BM25 t?p? ??a t? ß??t?st?
ta???asµa (Best matching OKAPI retrieval formula
49). - ??? ? OKAPI TF t?p?? s?ed??st??e ??a ?a
???s?µ?p????e? µe t? ??API p??a??t??? µ??t???,
??e? ap?de???e? ?t? ?ta? ???s?µ?p??e?ta? µe t?
d?a??sµat??? µ??t??? d??e? ?a??te?a ap?te??sµata
??a??t?s?? 66
58Tf-idf, OKAPI t?p?? ??a??t?s?? (S????e?a)
59KL-Divergence
- H KL-Divergence 40, e??a? µ?a ?d?a?te?a
ap?d?t??? µ???d? ? ?p??a epe?te??e? t??
p??s????s? t?? µ??t???? ???ssa? (language
modeling approach) st?? pe????? t?? Information
Retrieval - ???a? µ?a pa?aµet???? µ???d?. ? ßas??? ?d?a
???e?ta? st?? e?t?µ?s? e??? µ??t???? ???ssa? ??a
t? ????af? ?a? e??? µ??t???? ???ssa? ??a t?
e??t?µa ?a? ?a ta s??????e? µe t??
Kullback-Leibler Divergence
60KL-Divergence (S????e?a)
- H KL-Divergence a? ?a? de? e??a? p?a?µat???
ap?stas? (de? e??a? s?µµet???? ?a? de? ?s??e? ?
t???????? a??s?t?ta) e??a? ??a p??? ?a?? µ?t??
µ?t??s?? t?? ?µ???t?ta? µeta?? d?? ?ata??µ??.
61KL-Divergence (S????e?a)
? de?te??? ap? ta de??? ???? e??a? µ?a sta?e??
e?a?t?µe?? ap? t? e??t?µa, ? ?a??te?a ap? t??
e?t??p?a t?? µ??t???? t?? e??t?µat?? ?a? de?
e?a?t?ta? ap? t? ????af?, ??a a?t? µp??e? ?a
pa?a??f?e?.
St?? ?d?? t?p?, ? s?et???t?ta t?? e????f?? d se
s??s? µe t? e??t?µa q e?a?t?ta? ap? t?? e?t?µ?s?
t?? µ??t???? t?? e??t?µat?? p(w?q) ?a? t??
µ??t???? ???ssa? t?? e????f?? p(w?d)
62KL-Divergence (S????e?a)
63KL-Divergence (S????e?a)
64??t?µ?s? t?? ?2 S?st?µat?? ??a??t?s?? ?????f???a?
- Sta pa?ad?s?a?? s?st?µata ??a??t?s?? ?????f???a?
ta ????afa pa?aµ????? sta?e?? st?? s??????, e??
??a e??t?µata ?p?ß?????ta? st? s?st?µa ap? t?
?p??? ??te?ta? ?a ep?st???e? ta p?? s?et???
????afa. - ??t? e??a? ???st? ?? Ad-hoc Retrieval.
- ???? se a?t? ?a e??????µe t?? ap?d?t???t?ta t??
p??te???µe??? ?2-GOF µe??d?? ?a? ?a t??
s????????µe µe t?? OKAPI ?a? KL-Divergence µ???d?
??a t? ?d?? p??ß??µa
65?e????af? t?? TREC ?ed?µ???? ?p?t?µ?s??
- ??a s?????? e????f?? p?? ???s?µ?p??e?ta? ep?
?????a ??a t?? ap?t?µ?s? t?? s?st?µ?t??
??a??t?s?? ?????f???a? e??a? ? TIPSTER/TREC
collection 44 - ???? t?? µe????? ????? t?? ?e??e?ta? s?µe?a sa?
standard reference test collection ??a t??
pe????? t?? information Retrieval - H d?µ??????a t?? s??????? ?e????se ap? t?? Domna
Harman, µ?a d?e????t??a st? National Institute of
Standards and technology (NIST), p?? e??e t??
?d?a t?? d???????s?? e??? d?a????sµ?? se et?s?a
ß?se? ??a Information Retrieval s?st?µata, ?p? t?
???µa TREC (Text Retrieval Conference)
66?e????af? t?? TREC ?ed?µ????
67?e????af? t?? TREC ?ed?µ???? (S????e?a)
- ?pe?d? ?? s??????? a?t?? d?µ?????????a? ?p? t?
???µat?d?t??µe?? ap? t? DARPA p????aµµa TIPSTER
a?af????ta? ?a? sa? TIPSTER ? TIPSTER/TREC test
Collection - H TREC Collection a????e? sta?e?? ????? µe t?
?????. S?µe?a d?at??eta? ep? a???? se 6 CD Rom
Disks p?? t? ?a???a ???d???? pe????e? pe??p?? 1
gigabyte s?µp?esµ??? ?e?µe?? - ????? ?????e?s?? t?? ?e?µ????
68?e????af? t?? TREC ?ed?µ???? (S????e?a)
?e??µa ?????f?? st?? S??????
- ??a ta ????afa st?? s?????? ????? et??et?p????e?
(tagged) µe SGML ??a e????? Parsing
69?e????af? t?? TREC ?ed?µ???? (S????e?a)
?e??µa ?????f?? st?? S?????? (S????e?a)
70?e????af? t?? TREC ?ed?µ???? (S????e?a)
- ? TREC s?????? pe????e? ?a? ??a s????? ap?
e??t?µata (queries) p?? e??a? a?t?µata p??
e?f?????? ??p??a p????f???a?? a????? ?a? µe a?t?
µp??e? ?a e?e???e? ??a? ???? a??????µ?? ?? p???
t?? ap?d?t???t?t? t??. - St?? TREC ???????a ??a t?t??? e??t?µa ???µ??eta?
topic - ?a??de??µa e??? topic e??a? t? ep?µe??
71?e????af? t?? TREC ?ed?µ???? (S????e?a)
?e??µa topic
72S?????s? ?p?d?t???t?ta? µe ta tf-idf s??µata
OKAPI µ???d??
- G?a ?a ????µe µ?a ?a??te?? e????a t?? d??at?t?ta?
??a??t?s?? t?? p??te???µe??? X2-GOF µe??d??,
ep????aµe ?a ???e? ? ??e???? t?? ap?d?t???t?ta?
se 3 µe???e? ?p?s??????? t?? TREC s???????, ??
?p??e? e??a? -
- T?? p??te???µe?? µ???d? ?a t?? s????????µe ep?s??
p??? st?? ?d?a s?????? µe t?? OKAPI µ???d? p??
?e??e?ta? ??ass??? ??a Information Retrieval -
73S?????s? ?p?d?t???t?ta? µe ta tf-idf s??µata
OKAPI µ???d?? (S????e?a)
- St?? p??a?a 2.1 fa????ta? ta stat?st??? st???e?a
t?? s??????? a?t??
74S?????s? ?p?d?t???t?ta? µe ta tf-idf s??µata
OKAPI µ???d?? (S????e?a)
- O? e??t?µata ???s?µ?p???saµe ta ??µata 351-400
(topics 351-400), ta ?p??a ???s?µ?p???saµe st?
s???d??? TREC-7 - ??te??saµe d?? pe???µata µe a?t? ta ??µata. St?
??a ???s?µ?p???saµe µ??? t??? t?t???? ap? t?
?e?µe?? t?? e??t?µat?? ?a? st? de?te??
???s?µ?p????ta? µ?a µe?a??te?? ??d?s? t??
e??t?µ?t?? - G?a ta pe???µata a?t? de? ???s?µ?p???saµe ?aµ?a
p??epe?e??as?a sta ?e?µe?a, ?p?? p?,
tokenization, stemming ??te efa?µ?saµe ?aµ?a
??sta ap???e?sµ?? s????? ???e?? (stopword list),
?p?? ??????, s??d?sµ??, ep????µ?t??, ??p.
??t??eta ??ßaµe ?p ???? ??e? a?e?a???t?? t??
???e?? ???? t?? e????f?? st?? s??????
75S?????s? ?p?d?t???t?ta? µe ta tf-idf s??µata
OKAPI µ???d?? (S????e?a)
76S?????s? ?p?d?t???t?ta? µe ta tf-idf s??µata
OKAPI µ???d?? (S????e?a)
77S?????s? ?p?d?t???t?ta? µe ta tf-idf s??µata
OKAPI µ???d?? (S????e?a)
78S?????s? ?p?d?t???t?ta? µe ta tf-idf s??µata
OKAPI µ???d?? (S????e?a)
79S?????s? ?p?d?t???t?ta? µe ta tf-idf s??µata
OKAPI µ???d?? (S????e?a)
80S?????s? ?p?d?t???t?ta? µe ta tf-idf s??µata,
OKAPI µ???d?? (S????e?a)
81S?????s? ?p?d?t???t?ta? µe ta tf-idf s??µata,
OKAPI µ???d?? (S????e?a)
82S?????s? ?p?d?t???t?ta? µe ta tf-idf s??µata,
OKAPI µ???d?? (S????e?a)
83S?????s? ?p?d?t???t?ta? µe ta tf-idf s??µata,
OKAPI µ???d?? (S????e?a)
84S?????s? ?p?d?t???t?ta? µe ta tf-idf s??µata,
OKAPI µ???d?? (S????e?a)
85S?????s? ?p?d?t???t?ta? µe ta tf-idf s??µata,
OKAPI µ???d?? (S????e?a)
86S?????s? µe t?? KL-Divergence ?a? OKAPI st?? TREC
s??????
- G?a t?? ?a??te?? ap?t?µ?s? t?? d??at?t?t?? t??
p??te???µe??? X2-GOF µe??d?? e?te??saµe ??a
µe?a??te?? pe??aµa p??? se ??? t?? TREC t?? CDs
4,5 s????????ta? a?t? t?? f??? ?a? µe t??
KL-Divergence µ???d? - ?a stat?st??? st???e?a t?? s??????? fa????ta?
pa?a??t?
87(No Transcript)
88(No Transcript)
89?a?a?t???st??? ?a? p?e??e?t?µata t??
???te???µe??? ?2-GOF µe??d??
- ?? ?a? ? p??te???µe?? µ???d?? ???s?µ?p??e? ??a
t?? ??a??t?s? µ??? ?a?a??? s????t?te?, ? µ???d??
?epe??? sta?e?? t?? OKAPI BM25 µ???d? ??a??t?s?? - Ost?s? ?a? st?? d?? pe??pt?se?? TREC-7 ?a? TREC-8
? KL-Divergence ??e? t?? ?a??te?? ap?d?t???t?ta - ? µ???d?? ?µ?? a?t? ??e? t? µe?????t?µa ?t? e??a?
pa?aµet???? ?a? ??e???eta? e?t?µ?s? t??
pa?aµ?t??? p??? se ???????? t?? s??????.
90?a?a?t???st??? ?a? p?e??e?t?µata t??
???te???µe??? ?2-GOF µe??d??
- ? ap??t?ta e??a? ??a ap? ta ßas??? p?e??e?t?µata
t?? p??te???µe??? µe??d??. ? ?p???????µe?? X2-GOF
t?µ? ße?t???e? t?? ap?d?t???t?ta ?a? ep?t??pe?
t?? a?e??es? e????f?? p?? p??se??????? t??
s?????e? t?? e??t?µat?? - ? µ???d?? µa? ep?t??pe? ?a ap?fas?s??µe e??
?p???e? µ?a stat?st??? s?µa?t??? s??s? µeta??
e??t?µat?? ?a? e????f?? - ?p? p????, µa? ep?t??pe? µ?sa st? p?a?s?? t??
stat?st???? e?????? ?a d???µ?s??µe e?a??a?t?????
t?p??? ??a??t?s??, ap?? a??????ta? t?? ßas???
?p??es? ??a ta de?dµ??a
91Stat?st??? ??t?µ?s? t?? ?p?d?t???t?ta? t??
S???????µe??? ???????µ??
- ?? ap?d?se?? t?? s???????µe??? a??????µ?? se a?t?
ta pe???µata fa??eta? ?a e??a? d?af??et??? - G?a ?a t? e?t?µ?s??µe ?a? p?? t?p??? a?t? ?a
e?te??s??µe ??a ??e??? paired t-test - O ??e???? paired t-test ???s?µ?p??e?ta? ??a ?a
e??????µe e?? ?? µ?se? t?µ?? t?? p????sµ?? d??
de??µ?t?? e??a? ?s?? - St?? pe??pt?s? µa? ???s?µ?p????µe sa? de??µata
t?? ?aµßa??µe?e? µ?se? t?µ?? a???ße?a? sta
11-s?µe?a ap? ta pe???µata p?? ???aµe
92Stat?st??? ??t?µ?s? t?? ?p?d?t???t?ta? t??
S???????µe??? ???????µ??
- ??te???ta? t?? ??e??? paired t-test ??a ta
µ??t??a X2-GOF ?a? OKAPI, t?te ? ep?st?ef?µe??
p??a??t?ta (p-value) ??a ta ??µata 351-400 e??a?
0.0326 ?a? ??a ta ??µata 401-450 e??a? 0.00010608 - ?p?µ???? s?µpe?a????µe ?t? ?? ?aµßa??µe?e? µ?se?
a???ße?e? ??a ta µ??t??a X2-GOF ?a? OKAPI, e??a?
d?af??et???? µe ßeßa??t?ta 96.74 ?a? 99.98 ??a
ta ??a ta ??µata 351-400 ?a? 401-450 a?t?st???a - ?µ??a s????????ta? ta µ??t??a X2-GOF ?a?
KL-Divergence ß??s???µe ep?st?ef?µe?e?
p??a??t?te? 0.0004 ??a ta ??µata 351-400 ?a?
0.0018 ??a ta ??µata 401-450
93???????ta? t?? ?as??? ?p??es? ??a ta ?ed?µ??a
- St?? e??as?a Divergence from Randomness, Amati
67, p??te??eta? ??a ßas??? µ??t??? t??a??t?ta?
t?? ?ata??µ?? t?? ???? sta d??f??a ????afa.
S?µf??a µe a?t? ?? d?ad??as?e? ?ata??µ?? t?? ????
µp????? ?a ???s???? sa? t??a?e? e?????? (Random
Drawings) ap? ??a d??e?? (urn) p?? pe????e?
t??? d?a??s?µ??? ?????. - ?????????ta? a?t? t?? p??tas? a????aµe t? µ??t???
t??a??t?ta? ap? a?t? t?? ?µ???µ??f?? ?ata??µ??
st? d????µ??? µ??t??? (Binomial model) - S?µf??a µe a?t? t? µ??t??? ? eµf???s? e???
µ??ad???? ???? i se ??a ????af? d ?e??e?ta?
Bernoulli d?ad??as?a µe p??a??t?ta p1/N, ?p?? ?
? a???µ?? t?? e????f??
94???????ta? t?? ?as??? ?p??es? ??a ta ?ed?µ??a
95???????ta? t?? ?as??? ?p??es? ??a ta ?ed?µ??a
- G?a ?a s????????µe t?? ap?d?t???t?ta t?? d??
d?af??et???? ?p???se?? e?te??saµe ??a pe??aµa
??a??t?s?? p??? st?? s?????? FBIS ap? t? CD 5 t??
TREC s???????
96(No Transcript)
97S?µpe??sµata
- ?a???s??saµe µ?a µ???d? ??a efa?µ??? t?? X2-GOF
stat?st???? ??????? st?? ??a??t?s? ?????f???a? - H µ???d?? ap?de????eta? e???st? (robust) ?a?
ap?d?t???, ap?d?d??ta? ?a?? t?sa ??a s??t?µa
e??t?µata ?s? ?a? ??a pe??ss?te?a f??a?a
(Verbose) - ??e? t? p?e????t?µa ?a µa? ep?t??pe? ?a
µ??te??p???s??µe ta ????afa ?a? ta e??t?µata µe
p????? d?af??et???? ?p???se??. - ??p??e? d?af??et???? ?p???se?? ??a ta ded?µ??a
?p?? ? ?a??????t?ta (Normality), Weibull, ??p,
p??a??? ?a e??a? ?a??? e?a??a?t???? ?p???se??. - ?p?s?? ?a? ? d???µ? ????? stat?st???? ???????,
?p?? Kolmogorov-Smirnov ?a? Anderson-Darling.
98- Stat?st???? ????d?? st?? ???es? Collocations
99Collocations
- ???e?? p?? s??e?f????ta? p??? s???? µa??
- ???a? ????? ?a?a?t???st??? t?? f?s???? ???ss??
?a? µp????? ?a eµfa??s???? t?s? se ap?? ?e?µe??
f?s???? ???ssa? ?s? ?a? se te????? ?a?
ep?st?µ????? ?e?µe?? - ??a Collocation µp??e? ?a e??a? s??d?asµ?? ???e??
? (f??se??) p?? eµfa???eta? p??? s???? st??
???ssa µe ??a t??p? p?? ?a fa??eta? f?s????
???µat??? ap? ta s?µf?a??µe?a, pa??t? ?
ap?µ???µ??? s???es? t?? ep? µ????? ???µ?t?? p??
apa?t????? t? collocation, ?d??e? se ???µat???
pe??e??µe?? ?s?et? µe ta s?µf?a??µe?a
100Collocations
- ?a collocations se ???sse? µe ??a p???s?? ???t???
s?st?µa, ?p?? ? ????????, eµfa?????ta? µe 2
t??p??? - ??aµpt??
- ?? ???e?? ???µat?st???? ?a? a??a sa?
???µat?st???? ????? - ?a?a???
- ?? ???e?? St????µa? ?a? d???e?? sa?
- St????µa? st?? d???e??
- ? d???e?? µ?? st???e?
101Collocations
- G?a ta Collocation ?p?????? p????? ???sµ??, af??
?? d??f???? e?e???t?? ????? est??se? p??? se
d?af??et??? ?a?a?t???st??? - Firth 55
- Collocations of a given word are statements of
the habitual or customary places of the word - Benson ?a? Morton 50
- An arbitrary and recurrent word combination
- ?? recurrent s?µa??e? ?t? a?t?? ?? s??d?asµ??
eµfa?????ta? s???? ??a ??a ded?µ??? Context
(s?µf?a??µe?a)
102Collocations
- Smadja 64
- ?a?????e? 4 ?a?a?t???st??? ??a ta Collocations
???s?µa ??a t?? ?p?????st???? efa?µ???? - ?a Collocations e??a? a??a??eta, a?t? s?µa??e?
?t? de? a?t?st?????? se ??p??a s??ta?t??? ?
s?µas???????? pa?a??a?? - ?a Collocations e??a? domain-dependent, ep?µ????
? ?e???sµ?? ?e?µ???? se ??a ped?? apa?te? saf?
???s? t?? ???????a? ?a? t?? domain-dependent
Collocations - ?a Collocations e??a? recurrent, ?p?? ???st??e
pa?ap??? - ?a Collocations e??a? Cohesive lexical clusters,
p?? s?µa??e? ?t? ? eµf???s? µ?a? ? pe??ss?te???
???e?? s???? s??ep??eta? t?? eµf???s? ?a? t??
?p????p?? ???e??
103Collocations
- S?µf??a µe t??? Manning ?a? Schutze 60 ta
Collocations ?a?a?t??????ta? ap? limited
compositionality (pe?????sµ??? s???et???t?ta) - ??a ??f?as? f?s???? ???ssa? e??a? compositional,
e?? ? ?????a t?? ??f?as?? µp??e? ?a p??ß?ef?e?
ap? t?? s???es? t?? e?????? p?? s????t?? t?
collocation - ?a??de??µa ? ??f?as?
- Ge?? p?t???
- ?????
- ??a ???? ?a?a?t???st??? t?? collocations e??a? ?
ap??s?a ??????? s?????µ?? 59, 60 - ?a??de??µa S?????µa Baggage ?a? luggage
- ???? emotional, historical ? psychological
baggage
104? ???s?µ?t?ta t?? Collocations
- ???a? s?µa?t??? ??a ??a s?µa?t??? a???µ?
efa?µ???? ?p?? - Natural Language Generation ??e???eta? t?? s?st?
s??d?asµ? ???e?? - Machine Translation ???a? d?s???? ?a
µetaf??s??µe ap? t?? µ?a ???ssa st?? ???? ta
Collocations, p.?. Clear road -gt ??e??e??? d??µ??
- Text Simplification ??t??at?stas? d?s?????
???e?? µe ap??? ??e???eta? ???s? Collocations - Computational Lexicography ?a Collocations e??a?
apa?a?t?ta ??a ?a ?a?a?t???s??? p????? t??
?e????? ?ata????se??
105H ?????? t?? ??a????? Collocations
- ?a???s?????µe d?? µe??d??? pa?a?????
Collocations. - St?? p??t? pe??pt?s? efa?µ????µe t?? d???µasµ???
µ???d? t?? µ?s?? ?a? t?? d?asp???? - St?? de?te?? µ???d? ?eµe??????µe t?? efa?µ??? t??
X2 stat?st???? e?????? ??a t?? e?a????
Collocations
106H ?????? t?? ??a????? Collocations
- ? pa?ad?s?a?? p??s????s? ??a t?? e?a????
Collocations e??a? ? ?e??????af??? p??s????s?. - S?µf??a µe t??? Benson ?a? Morton 50 de?
µp????µe ?a ?e???st??µe ?e????st? ta s?µµet????ta
µ??? se ??a Collocation (Collocates). ?p?µ???? ?
e?a???? Collocations de? e??a? p??ß????µ?, p??pe?
?a ???eta? p??ta ?e????a?t??? ?a? ?pe?ta ?a
e?s????ta? sta ?e????
107H ?????? t?? ??a????? Collocations
- ???sfata ? stat?st??? ??e? efa?µ?ste? st??
e?a???? Collocations - O Choueka 52, d???µase ?a e?a???e? Collocations
???s?µ?p????ta? N-???µµata (N-grams) s??d?asµ???
ap? 2 ??? 4 ????? ???s?µ?p????ta? ??a p??? ap??
???t???? t?? s????t?ta eµf???s?? - ?t???? ? ep????? a?t? de? ?d??e? p??t?te sta
?a??te?a ap?te??sµata, p.?. st?? ??????? ???ssa
ta s????te?a bigrams Of the, in the, to the -
108H ?????? t?? ??a????? Collocations
- G?a ?a ?epe??s??? t? p??????µe?? p??ß??µa ??
Justenson ?a? Katz 58 p??te??a? ?a ep??????ta?
µ??? e?e??a ta bigrams p?? ap?te???? f??se??. - ???s?µ?p???sa? part-of-speech f??t?a
- AN, NN, AAN, ANN, ?p?? A s?µa??e? ep??et? ?a? ?
??s?ast??? - ?? ?a? e???st??? ap?? µ???d?? ?? s????afe??
a??fe?a? s?µa?t??? ße?t??s? sta ap?te??sµata
109H ?????? t?? ??a????? Collocations
- ? ßas???µe?? st?? s????t?ta eµf???s?? µ???d??
d???e?e? p??? ?a?? µe f??se?? ??s?ast????. Ost?s?
p???? Collocations pe??????? ???e?? µe p??? p??
e?????te? s?s?et?se?? µeta?? t?? - ? µ???d?? t?? µ?s?? ?a? t?? d?asp???? (mean and
Variance method 64) ?epe???e? t? p??ß??µa
?p????????ta? t?? p??s?µasµ??e? ap?st?se?? µeta??
t?? Collocates ?a? ß??s???ta? t?? d?asp???
(spread) a?t?? t?? p??s?µasµ???? ap?st?se??
110H ?????? t?? ??a????? Collocations
- ? p??s????s? t?? µ?s?? ?a? t?? d?asp???? fa??eta?
?????? e??a? ap??. ??a??t??µe ?ata??µ?? µe µ????
d?asp??? - ??a e?a??a?t??? µ???d?? ßas???µe?? st?? s????t?ta
eµf???s?? e??a? aµ??ßa?a p????f???a (mutual
information 53). - O ???? ??e? mutual information t?? ?ata???? t??
ap? t?? Te???a t?? ?????f???a? ?a? e??a? ???d????
??a µ?t?? t?? p?s? p??? µ?a ???? µa? p????f??e?
??a µ?a ????
111? p??te???µe?? µ???d?? t?? X2 stat?st???? e??????
- ? ßas???µe?? st?? s????t?ta eµf???s?? µ???d? ??e?
µ?a ad??aµ?a. ?p?t?????e? st?? pe??pt?s? p??
????µe a??a?e? t?µ?? Outliers (Bigrams µe p???
????? s????t?ta) - ?µe?? ?a pa???s??s??µe µ?a e?a??a?t??? p??s????s?
p?? ßas??eta? st?? X2 stat?st??? ??e???. - Ta d?s??µe ep?s?? ??a e?a??a?t??? t?p? ??a t??
?p?????sµ? t?? X2 stat?st???? ??a t?? pe??pt?s?
t?? e?a????? bigrams ap? t? corpus
112? p??te???µe?? µ???d?? t?? X2 stat?st???? e??????
- To X2 e??a? µ?a p??? ?a?? ???sµ??? stat?st???
p??s????s? p?? e?t?µ? ?at? p?s? ??a s?µß?? e??a?
ap?t??esµa t?? t???? - ??t? e??a? ??a ap? ta ?e????te?a p??ß??µata st??
stat?st??? ?a? s?????? d?at?p??eta? ap? t?? ?p???
t?? Hypothesis testing - St?? pe??pt?s? µa? ?????µe ?a ?????µe ?at? p?s?
d?? ???e?? eµfa?????ta? pe??ss?te?? s???? µa??
ap ?t? st?? t???
113? p??te???µe?? µ???d?? t?? X2 stat?st???? e??????
- ??at?p????µe t?? µ?de???? ?p??es? (null
Hypothesis H0) ?t? de? ?p???e? d?as??des? µeta??
t?? d?? ???e?? p??a? ap? a?t?? t?? eµf???s?? µa??
ap? t???. - ?p????????µe t?? p??a??t?ta (p0) p?? ?a e??e t?
s?µß?? e?? ? H0 ?ta? a??????. - ??? ? p0 e??a? µ????, t?p??? ??t? ap? ??a
p???a????sµ??? ep?ped? s?µa?t???t?ta? p0 lt0.005 ?
p0 lt0.001 ap????pt??µe t?? ?0 d?af??et??? t??
de??µaste ?? a??????
114? p??te???µe?? µ???d?? t?? X2 stat?st???? e??????
- St?? stat?st??? ?e????te?a ??a t?? ?p?????sµ?
t?t???? p??a??t?t?? ??a t?? ap?????? ? µ? t??
µ?de????? ?p??es?? ???s?µ?p????µe t?? student
stat?st??? ??e??? (t-statistic), p?? ?p???te?
?a?????? ?ata?eµ?µ??a stat?st??? de??µata - O ????? p?? ep????aµe t?? ?2 stat?st??? ??e???
e??a? ?t? de? ?p???te? ?t? ta ded?µ??a a?????????
t?? ?a?????? ?ata??µ? (free distribution), ??t?
p?? e??a? p??? s?st? st?? pe??pt?s? ???e??
?e?µ????
115? efa?µ??? t?? µe??d?? ?a? s?????s? µe t?? µ???d?
Mean and Variance
- Se ?t? a??????e? se a?t? t?? e??t?ta
- ?e?????f??µe p?? a?a??t??? t?? d?? µe??d???
- ?????µe pe??aµat??? ap?te??sµata ap? t?? efa?µ???
t??? p??? se ??a s?µa (corpus) ????????? ?e?µ????
116? µ???d?? Mean and Variance
117? efa?µ??? t?? µe??d?? ?a? s?????s? µe t?? µ???d?
Mean and Variance
- ?? d??µe ??a pa??de??µa ?p?????sµ?? t?? µ?s?? ?a?
t?? ?p????s??. - ?st? ?? p??t?se?? ap? t?? ???????? ???ssa ??a t??
???e?? ?t?p?se ?a? p??ta.
118? µ???d?? Mean and Variance
- ?p????µe ?a ?p?????s??µe t? µ?s? (mean) ?a? t??
d?a??µa?s? t?? ap?st?se?? t?? ????? ?t?p?se se
s??s? µe t?? ???? p??ta
119? µ???d?? Mean and Variance
- ? µ?s?? ?a? ? d?asp??? µa? ß???? ?a ß???µe
Collocations ??????ta? ??a ?e?????a µe t?? p??
?aµ??? d?asp??? (spread) - ?s? p?? ?aµ??? e??a? ? d?a??µa?s? µeta?? t??
ap?st?se?? se ??a ?e????? ???e?? t?s? p?? ?s????
e??a? ? ??de??? ?t? a?t? t? ?e????? ap?te?e?
Collocation - ??a ??e?a ????f??µe?? ?ata??µ? t?? ap?st?se??
e??a? ?s???? ??de???. ?? t? e????s??µe a?t? µe
d?? ?ata??µ?? µe p?a?µat??? ded?µ??a ap? t? s?µa
ap?t?µ?s?? t?? µe??d?? (Evaluation Corpus)
120? µ???d?? Mean and Variance
121? µ???d?? Mean and Variance
122? µ???d?? ?-tet??????
- ?? 1900 ? Karl Pearson p??te??e µ?a stat?st???,
t?? ?2 stat?st???, ? ?p??a s??????e? t???
pa?at??????te? µe t??? a?aµe??µe???? a???µ???
?ta? ?? d??at?? e?ß?se?? e??? pe???µat??
?p?d?a?????ta? se aµ??ßa?a ap???e??µe?e?
?at?????e?
?? S pa??st??e? t? ?????sµa ?a? ?p??????eta? ??a
??e? t?? d??at?? e?ß?se?? t?? pe???µat??
123? µ???d?? ?-tet??????
- ?? a?aµe??µe?e? ?a? ?? pa?at????e?se? s????t?te?
µp????? ?a e???????? st? p?a?s?? t?? Hypothesis
testing - ??? ta ded?µ??a d?a?????ta? se aµ??ßa?a
ap???e??µe?e? ?at?????e? ?a? d?at?p?s??µe µ?a
µ?de???? ?p??es? ??a ta ded?µ??a - ??te
- ? a?aµe??µe?? t?µ? e??a? ? t?µ? ??a t?? ???e
?at?????a e?? ? µ?de???? ?p??es? e??a? a?????? - ? pa?at????e?sa t?µ? ??a ???e ?at?????a p????pte?
ap? ta ded?µ??a t?? de??µat??
124? µ???d?? ?-tet??????
- G?a ?a ???e? p?? ?ata???t? ? efa?µ??? t??
pa?ap??? µe??d?? d????µe ??a pa??de??µa - ?st? ?t? ????µe ??a ???ss??????? corpus ?a?
e?d?afe??µaste ?a e?a?????µe Collocations - ??????µe ??a collocational window 10 ???e?? ?a?
µet??µe t?? s????t?ta eµf???s?? t?? ?e??a???? t??
???e?? ?s????? ?a? ??d?a?
125? µ???d?? ?-tet??????
- ?????pt??? ta a??????a.
- 10 eµfa??se?? t?? ?e??a???? (?s?????, ??d?a?)
µ?sa st? corpus - 1000 bigrams ?p?? ? de?te?? ???? e??a? ??d?a?
?a? ? p??t? ??? ?s????? - 500 bigrams ?p?? ? p??t? ???? e??a? ?s????? ?a?
? de?te?? ??? ??d?a? - 1,500,000 bigrams p?? de? pe??????? ?aµµ?a ap?
t?? d?? ???e?? ded?µ???? t?? Collocational window
126? µ???d?? ?-tet??????
St?? pe??pt?s? a?t? ?a ?ta? ???s?µ? ?a
???s?µ?p???s??µe t?? p??a?a s???fe?a?
(Contingency table)
127? µ???d?? ?-tet??????
- ???s?µ?p????ta? maximum likelihood estimates
µp????µe ?a ?p?????s??µe t?? p??a??t?ta
eµf???s?? t?? ?e??a???? p?? ap????e? ap? t??
µ?de???? ?p??es?
? µ?de???? ?p??es? e??a? ?t? ?? eµfa??se?? t??
?s????? ?a? ??d?a? e??a? a?e???t?te?
128? µ???d?? ?-tet??????
- ?pe?ta ?p????????µe t?? ?2 t?µ? ap? t?? e??s?s?
3.7 - ?p? t??? p??a?e? t?? ?2 ?ata??µ?? ß??s???µe t??
???s?µ? t?µ? ??a ??a ep?ped? s?µa?t???t?ta?
(s?????? a0.05) - ??? ? ?p???????µe?? ?2 t?µ? e??a? µe?a??te?? ap?
t?? ???s?µ? t?µ? µp????µe ?a ap???????µe t??
µ?de???? ?p??es? ?t? ?? ???e?? ?s????? ?a?
??d?a? eµfa?????ta? a?e???t?ta - ?p?µ???? ??a µe???e? t?µ?? t?? ?2 stat?st????
e?????? ????µe ?s???? ??de??? ??a t?? s??µat?sµ?
Collocation
129? µ???d?? ?-tet??????
- G?a ??a 2x2 p??a?a s???fe?a? ??a t?? ?p?????sµ?
t?? ?2 stat?st???? µp????µe ?a ???s?µ?p???s??µe
t?? pa?a??t? t?p?
?p?? aij ?? ?ata????se?? t?? 2x2 p??a?a s???fe?a?
?a? ? t? ?????sµa a?t?? t?? ?ata????se??
130?e??aµat??? ap?te??sµata
- ???et? a??e?a ?e?µ???? t?? ?e?e???????? ???ssa?
?ta? d?a??s?µa se eµ?? se ??e?t?????? µ??f? ap?
d??f??e? p???? - ??a p??ta????? µ??f??????? d?ad??as?a
part-of-speech tagging s?µe??se t? µ???? t??
????? ?a? t? ??µµa ??a ???e ???? t?? s?µat??
(corpus) - ?t???? ? p??epe?e??as?a µa? de? ?ta? ??a?? ?a µa?
pa??s?e? ta ??µµata ??a ??µata ?a? ep????µata.
131?e??aµat??? ap?te??sµata
? ?ata??µ? t?? ??µµ?t?? st? corpus fa??eta? st??
pa?a??t? p??a?a
132?e??aµat??? ap?te??sµata
- O µ???? s??d?asµ?? d???aµµ?t?? (bigrams) p??
µp????µe ?a d???µ?s??µe e??a? (?p??et?,
??s?ast???), ?a??? de? pe??????ta? ta ???a µ???
t?? ????? - ??????µe ??a collocational window µ????? 10
???e?? s?µpe???aµßa??µ???? ?a? t?? s?µe??? st????
?????s? d?asp???? ?p????????µe ap? t? Corpus t??
ap?st?se?? ?a? t?? t?p??? ap????s? ??a ????? t???
s??d?asµ??? t?? d???aµµ?t?? (?p??et?, ??s?ast???)
133?????s? d?asp????
134?????s? d?asp????
135?????s? d?asp????
136?????s? d?asp????
137?????s? d?asp????
138?????s? t?? ??????? ?2
- ??at?p????µe t?? µ?de???? ?p??es? t?? stat?st????
a?e?a?t?s?a? µeta?? t?? d?? ???e?? p?? apa?t?????
t? de??µa - ??t? s?µa??e? ?t? ?? d?? ???e?? eµfa?????ta?
a?e???t?te? ? µ?a ap? t?? ???? µ?sa st? de??µa
st? ?p??? ?a? ?ata??µ??ta? t??a?a - ?p????????µe t?? X2 stat?st??? µe t?? t??p? p??
pe??????aµe pa?ap???. ?s? µe?a??te?? e??a? ? t?µ?
t?s? p?? ?s???? e??a? ? ??de??? ??a ?a
ap???????µe t?? µ?de???? ?p??es?
139?????s? t?? ??????? ?2
140?????s? t?? ??????? ?2
141S?µpe??sµata
- St? ?ef??a?? a?t? efa?µ?saµe t?? ??e??? ?2 ??a
t?? a??de??? ?e??a???? ???e?? p?? e?de??µe?a ?a
s??µat????? Collocations. - H µ???d?? a?t? ?pe?te?e? t?? ??ass???? a????s??
t?? d?asp???? ? ?p??a ap?t?????e? st?? pe??pt?s?
a??a??? t?µ?? Outliers. - ?p?s?? ?pe?te?e? ?a? ????? µe??d?? p?? ?????
efa?µ?s?e? ?at? ?a?????, ?p?? t?? t-test,
likelihood (LL) ratio test, mutual Information
??at? a?t?? ?? µ???d?? ????? t? µe?????t?µa ?t?
?p???t??? pa?aµet???? ?ata??µ? ded?µ????.
142S?µpe??sµata
- ?p?p???? ? µ???d?? mutual information (MI),
s??????e? t?? s??dedeµ??? p??a??t?ta p(w1,w2) ?a?
apa?te? ?? a?e???t?te? p??a??t?te? p(w1) ?a?
p(w2) ?a s?µßa????? µe ?p????d?p?te t??p? st?
de??µa, t? ?p??? de? d??e? µ?a ?ea??st??? e????a
st?? pe??pt?s? ?aµ???? s????t?t?? - ????? ????? bigrams ß?????a? st?? p??te? ??s??
ßa?µ?????a? t?? µe??d?? t?? d?asp???? ?a? t?? X2
e??????, se ???e pe??pt?s? ?µ?? e?ap??e?ta? st???
e?d????? ???ss??????? ?a a???????s??? a?t? ta
e???µata.
143O X2 Stat?st???? ??e???? st?? ?p?saf???s? ???????
???e??Word Sense Disambiguation
144?p?saf???s? ???????
- ? s??t??pt??? p?e????t?ta t?? ???e?? p??
eµfa?????ta? se ?e?µe?a f?s???? ???ssa? e??a?
p???s?µe?, d??ad? eµfa?????ta? µe d?af??et????
s?µas?e? se d?af??et??? linguistic contexts
(p?a?s?a ?e?µ????). - ??, ? ??????? ???? bank, µp??e? ?a ??e? se
??p??? context t?? ?????a t?? t??pe?a? ?a? se
???? t?? ?????a t?? ????? p?taµ??. - ??sa st? ?d?? p?a?s?? t?? X2 stat?st???? e??????,
µe t?? ß???e?a t?? ??e?t??????? ?e????? WordNet,
?a a?apt????µe µ?a µ???d? ??a t?? ap?saf???s?
t?? ?????a? µ?a? ????? p?? eµfa???eta? se ??a
context.
145?p?saf???s? ???????
- S?µf??a µe t?? µ???d? a?t? e??a??µaste ?? e???.
- ?pa??????µe t? p?a?s?? (context) st? ?p???
eµfa???eta? ? p??? ap?saf???s? ???? µe
s?s?et???µe?e? ?????e? (Related Synsets) ap? t?
??e?t?????? ?e???? WordNet. ?? epa???µ??? p?a?s??
t? ?e????µe sa? ??a stat?st??? de??µa - ?e?et?µe t?? ?ata??µ? t?? s?s?et???µe??? e??????
t?? ???e µ?a ?????a? t?? p??? ap?saf???s? ?????
st? stat?st??? a?t? de??µa
146?p?saf???s? ???????
- ??at?p????µe t?? µ?de???? ?p??es? ?t? ??e? ??
s?s?et???µe?e? ?????e?, d??ad? ta Related Synsets
ap? t? WordNet ?ata??µ??ta? ?a?????? (Normally)
st? de??µa. - ?e t?? ß???e?a t?? ?2 stat?st???? e?????? ?a???
ta????sµat?? (X2 Goodness of fit statistical
test), p??spa???µe ?a e?t?p?s??µe t?? ?????a t??
?p??a? ta related Synsets ap???????? ap? a?t? t??
?p??es?. - ??? ?????a a?t? t?? ep??????µe sa? t?? s?st?
?????a t?? p??? ap?saf???s? ?????
147?p?saf???s? ????? ?a? WordNet
- To p??ß??µa t?? ap?d?s?? t?? s?st?? ?????a? µ?a
????? (target word) µ?sa st? p?a?s?? (context)
p?? ap?te?e?ta? ap? t?? pe??ß?????se? ???e??
e??a? ? ap?st??? t?? s?st?µ?t?? ap?saf???s??
????? - ??a??????eta? sa? ??a ap? ta p?? d?s???a
p??ß??µata st?? epe?e??as?a f?s???? ???ssa?. - ????? s?st?µata ????? p??ta?e? ?at? ?a????? ?
p?e????t?ta t?? ?p???? st????eta? se stat?st????
µe??d??? epe?e??as?a?
148?p?saf???s? ????? ?a? WordNet
- ?a p??ta s?st?µata eßas????t? se eµpe???????
?a???e? ?a? ???s?µ?p????ta? µ???? ?e????
(?ata?????? e??????) ap?saf????a? µ???? a???µ?
pe??pt?se?? 16,17,18 - S?µe?a µe t?? d?a?es?µ?t?ta µe????? ??e?t???????
?e????? ?p?? t? WordNet, d??e? µe???? ???s? ??a
t?? a??pt??? apa?t?t???? efa?µ???? st??
ap?saf???s? ????? 20,21,22. - ?p? p???? t? ?e????? ?t? ?? d??f??e? ?????e?
s??d???ta? µeta?? t??? µe ??a µe???? a???µ? ap?
s?µas????????? (semantic) ?a? ?e???????????
(lexical) s??se?? ???e? t? WordNet p???t?µ? p???
??a t?? a?apa??stas? t?? d??t??? ???s??
149?p?saf???s? ????? ?a? WordNet
- ???s?µ?p????ta? ???sµ??? ap? t? WordNet ?a?
?e?µe?a ap? t? Internet ?? Mihalcea ?a? Moldovan
23, s?????t??sa? ?e?????????? p????f???a ??a
t?? ap?saf???s? p???s?µ?? ???e?? - ?? Montoyo ?a? Palomar 24 pa???s?asa? µ???d?
??a t?? ap?saf???s? ???e?? st?????µe??? st??
s?µas????????? t??e?? (semantic classes) t??
Wordnet ?a??? ?a? st??? ???sµ??? t?? ?e?????. - ? e??as?a t?? Banerjee ?a? Pederson 25
pa???s???e? µ?a p??sa?µ??? t?? a??????µ?? Lesk
16 p?? ßas??eta? st??? ???sµ??? t?? Wordnet.
150?p?saf???s? ????? ?a? WordNet
- ??t?? ap? t??? ???sµ??? p??? d???e?? ??e?
p?a?µat?p????e? ???s?µ?p????ta? ?a? t?? s??s?
?e?a???a? t?? Wordnet (Hypernymy/Hyponymy
relation) - O Resnic 21 ap?saf???se ?e???? ???s?µ?p????ta?
t?? s?µas???????? ?µ???t?ta (semantic similarity)
µeta?? d?? ???e?? d?a?????ta? t?? ????? p??????
µe t? µe?a??te?? p????f???a?? pe??e??µe?? (the
most informative subsumer), ?p?? t?
p????f???a?? pe??e??µe?? t? ???se sa? s????t?s?
t?? p?????? t?? ?pa??µe??? ???? - Leacock ?a? Chodorow 26, p??te??a? ??a µ?t??
??a t?? ?p?????sµ? t?? s?µas????????? ?µ???t?ta?
µet???ta? t? µ???? t?? d?ad??µ??µeta?? t?? d??
??µß?? t?? ?e?a???a?
151???s????s? t?? stat?st???? e??????
- St? p??????µe?? p?e?µa t?? efa?µ???? stat?st????
e?????? pa???s?????µe µ?a p??s????s? ??a
s?st?µata ap?saf???s?? ????? ßas???µe??? se µ?a
d?af??et??? a?t????? ??a t?? e?t?µ?s? t?? µ?t???
t?? s?et???t?ta? (relatedness) µeta?? t?? context
µ?a? ????? ?a? t?? ???e µ?a? ?????a? ?e????st? - St? WordNet ???e ?e???? ?ata????s? a?apa??st? µ?a
?????a ?a? ap?d?deta? µe ??a s????? ap? s?????µe?
???e?? p?? ???eta? Synset.
152???s????s? t?? stat?st???? e??????
- ?? s??µa de???e? p?? e??a? ?? ?ata????se?? st?
Wordnet (t??a?a se???, ? a???µ?? st?? a??? st??
pa????es? e??a? ? a???? a???µ?? ?ata????s?? st?
WordNet ).
153???s????s? t?? stat?st???? e??????
- ?ts? ???p?? ?? ?ata????se?? st? WordNet
ap??a????ta? synsets. ???e synset e??a? µ??ad???
?a? s????????µe ?a t? s?µß??????µe µe ????st?a. - ?.?. city, metropolis, urban center ,
man, adult male ??p. - ? µe???? d?af??? t?? WordNet µe ta s?µaßat???
?e???? e??a? ?? s?s?et?se?? µeta?? t?? Synsets. - ???e Synset s?s?et??eta? µe ???a Synsets µe
d??f??e? s??se??. T? pa?a??t? s??µa d??e? µ?a
e????a t?? ?at?stas?? st? WordNet .
154???s????s? t?? stat?st???? e??????
155???s????s? t?? stat?st???? e??????
- ??t? ta s?s?et???µe?a Synsets ta ??µe Related
Synsets. - ??t??e? s?s?et?se?? s??a?t?µe a??et?? st?
WordNet. - ?p?????? 32 t?t??e? s?s?et?se?? ?a? ?ata??µ??ta?
sta ????a µ??? t?? ????? (??a µe???? p?s?st?
e??a? ??a ??s?ast??? ?a? ??µata a??? ?p??????
ß?ßa?a ?a? ??a ep??eta ?a? ep????µata - (ß??pe d?at??ß? ??a a?a??t??? pa???s?as?)
156???s? t?? Related Synsets ??a Word Sense
Disambiguation
- ?? ?????e? (Senses)
- ???e ?e???? µ??f? se µ?a ???ssa eµfa???eta? µe
p????? ?????e? se d??f??e? p??t?se?? (context).
?? ? ???? bank µp??e? ?a ??e? µeta?? t?? ????? se
??a context t?? ?????a t?? Financial Institute,
e?? se ??a ???? t?? ?????a t?? ????? t?? p?taµ??
(bank river). - ?f?? ?? ?????e? st? WordNet ap?d?d??ta? µe
Synsets, ap?? ? ???? bank eµfa???eta? se
pe??ss?te?a t?? e??? Synsets. - ?a pa?a??t? ta 10 Sysnsets sta ?p??a eµfa???eta?
? ???? bank. ??a ??a ???e µ?a ap? t?? 10 ?????e?
t?? p?? ??e? st? Wordnet.
1571. (883) depository financial institution, bank,
banking concern, banking company -- (a financial
institution that accepts deposits and channels
the money into lending activities "he cashed a
check at the bank" "that bank holds the mortgage
on my home") 2. (99) bank -- (sloping land
(especially the slope beside a body of water)
"they pulled the canoe up on the bank" "he sat
on the bank of the river and watched the
currents") 3. (76) bank -- (a supply or stock
held in reserve for future use (especially in
emergencies)) 4. (54) bank, bank building -- (a
building in which commercial banking is
transacted "the bank is on the corner of Nassau
and Witherspoon") 5. (7) bank -- (an arrangement
of similar objects in a row or in tiers "he
operated a bank of switches") 6. (6) savings
bank, coin bank, money box, bank -- (a container
(usually with a slot in the top) for keeping
money at home "the coin bank was empty") 7. (3)
bank -- (a long ridge or pile "a huge bank of
earth") 8. (1) bank -- (the funds held by a
gambling house or the dealer in some gambling
games "he tried to break the bank at Monte
Carlo") 9. (1) bank, cant, camber -- (a slope in
the turn of a road or track the outside is
higher than the inside in order to reduce the
effects of centrifugal force) 10. bank -- (a
flight maneuver aircraft tips laterally about
its longitudinal axis (especially in turning)
"the plane w