Title: Comments from Pre-submission Presentation
1Comments from Pre-submission Presentation
- Q Check why kNN is so lower than SVM on Reuters
and 20 Newsgroups corpus? -10. - A Refer to the following four references
Joachims 98 Debole 03 STM Dumais 98
Inductive Yang 99 Reexamination
2Joachims98Debole03Dumais98Results on the
Reuters Corpus
Bayes Rocchio C4.5 kNN SVM (linear) SVM (Poly) SVM (rbf)
Micro-BEP() 69.84 79.14 77.78 82.5 84.2 86 86
kNN SVM (linear)
Micro-F1 85.4 92.0
NBayes DT SVM (linear)
Micro- BEP 81.5 88.4 92.0
3Yang 99 Re-examinationSignificance Test
- Micro-level analysis (s-test)
- SVM gt kNN gtgt LLSF, NNet gtgt NB
- Macro-level analysis
- SVM, kNN, LLSF gtgt NB, NNet
- Error-rate based comparison
- SVM, kNN gt LLSF gt NNet gtgt NB
4Comments from Pre-submission Presentation
- 2. Explain why BEP F1 in Chap 7
- -Add reference
5Breakeven point (1)
- BEP, first proposed by Lewis1992. Later, he
himself pointed out that BEP is not a good
effectiveness measure, because - 1. there may be no parameter setting that yields
the breakeven in this case the final BEP value,
obtained by interpolation, is artificial - 2. to have PR is not necessarily desirable, and
it is not clear that a system that achieves high
BEP can be tuned to score high on other
effectiveness measure.
6Breakeven point (2)
- Yang1999Re-examinatio also noted that when for
no value of the parameters P and R are close
enough, interpolated breakeven may not be a
reliable indicator of effectiveness.
7Comments from Pre-submission Presentation
- 3. Add more qualitative analysis would be better
8Analysis and Proposal Empirical observation
Comparison of idf, rf and chi2 value of four
features in two categories of Reuters Corpus
feature Category 00_acq Category 00_acq Category 00_acq Category 03_earn Category 03_earn Category 03_earn
feature idf rf chi2 idf rf chi2
Acquir 3.553 4.368 850.66 3.553 1.074 81.50
Stake 4.201 2.975 303.94 4.201 1.082 31.26
Payout 4.999 1 10.87 4.999 7.820 44.68
dividend 3.567 1.033 46.63 3.567 4.408 295.46
9Comments from Pre-submission Presentation
- 4. Chap 7 remove Joachims Results using quotation
is fine
10Comments from Pre-submission Presentation
- 5. Tone down best claims
- ? to our knowledge (experience, understanding)
- Pay attention this usage when doing presentation
11IntroductionOther Text Representation
- Word senses (meanings) Kehagias 2001
- same word assumes different meanings in a
different contexts - Term clustering Lewis 1992
- group words with high degree of pairwise semantic
relatedness - Semantic and syntactic representation Scott
matwin 1999 - Relationship between words, i.e. phrases,
synonyms and hypernyms
12IntroductionOther Text Representation
- Latent Semantic Indexing Deerwester 1990
- A feature reconstruction technique
- Combination Approach Peng 2003
- combine two types of indexing terms, i.e. words
and 3-grams - In general, high level representation did not
show good performance in most cases
13Literature ReviewKnowledge-based Representation
- Theme Topic Mixture Model Graphical Model
Keller 2004 - Using keywords from summarization Li 2003
14Literature Review 2. How to weight a term
(feature)
- Salton 1988 elaborated three considerations
- 1. term occurrences closely represent the content
of document - 2. other factors with the discriminating power
pick up the relevant documents from other
irrelevant documents - 3. consider the effect of length of documents
15Literature Review 2. How to weight a term
(feature)
- 1. Term Frequency Factor
- Binary representation (1 for present and 0 for
absent) - Term frequency (tf) number of times a term
occurs in a document - Log(tf) log operation to scale the effect of
unfavorably high term frequency - Inverse term frequency (ITF)
16Literature Review 2. How to weight a term
(feature)
- 2. Collection Frequency Factor
- idf the most-commonly used factor
- Probabilistic idf aka. term relevance weight
- Feature selection metrics chi2, information
gain, gain ratio, odds ratio, etc.
17Literature Review 2. How to weight a term
(feature)
- 3. Normalization Factor
- Combine the above two factors by using
multiplication operation - In order to eliminate the length effect, we use
the cosine normalization to limit the term
weighting range within (0,1)