Title: Classification algorithm overview
1Classification algorithm overview
- LING 572
- Fei Xia
- Week 2 1/9/06
2New time for lab sessions
- Time 3-4pm on Thursday (right after class)
starting from this week. - Location MGH 271
- Please feel free to bring your laptop
- Who did not receive the announcement?
3Assignments
- Hand out every Thursday, explained at the lab
session on the same day. - Due at 11pm the Saturday in the following week.
- Ex Hw2 hand out on 1/11, due 1/20.
- ESubmit is ready for Hw1-Hw3. Give it a try.
4Last time
- Course overview
- Mathematical foundation
- Basic concepts in the classification problem
5Questions?
6Important concepts
- Instance, InstanceList
- Labeled data, unlabeled data
- Training data, test data
- Feature, feature template
- Feature vector
- Attribute-value table
- Trainer, classifier
- Training stage, test stage
7Training stage
- Estimate parameters
- Trainer InstanceList ? Classifier
- Mallet
- NaiveBayesTrainer t new NaiveBayesTrainer
(parameters) - Classifier c t.train (instanceList)
8Input to learners attribute-value table
9Output of learners a classifier
- A classifier
- f(x) y, x is input, y 2 C
- f(x) (ci, scorei), ci 2 C.
- A classifier fills out a decision matrix.
10Testing stage
- Input new instances.
- Output a decision matrix.
- task find the best solution for new instances
- Classifier instance ? classification
- Malllet
- Classification cl c.classify (instance)
11Evaluation
- Precision TP/(TPFP)
- Recall TP/(TPFN)
- F-score 2PR/(PR)
- Accuracy(TPTN)/(TPTNFPFN)
- F-score or Accuracy?
- Why F-score?
12Steps for using a classifier
13Steps for using a classifier
- Convert the task into a classification problem
(optional) - Split data into training/test/validation
- Convert the data into attribute-value table
- Training
- Decoding
- Evaluation
14Important subtasks (for you)
- Converting the data into attribute-value table
- Define feature types
- Feature selection
- Convert an instance into a feature vector
- Understanding training/decoding algorithms for
various algorithms.
15Notation
16How learners differ?
17How learners differ?
- Modeling
- Training stage
- Test (decoding) stage
18Modeling
- What to optimize given data x, find the class c
that maximizes - P(x, c)
- P(c x)
- P(x c)
- Decomposition
- Which variable conditions on which variable?
- What independence assumptions?
19An example of different modeling
20Two types of parameters
- Model parameters ones learned during the
training. They are stored in a classifier. - Ex Naïve Bayes
- p(ci) and p(fk ci)
- Internal (Non-model) parameters ones used to
initialize a trainer, select features, etc. - Ex iteration number, threshold for feature
selection, Gaussian prior for MaxEnt, etc.
21How learners differ?
- Modeling
- What function to optimize?
- How does the decomposition work?
- What kind of assumption is made?
- How many types of model parameters?
- How many internal (or non-model) parameters?
- How to handle multi-class problem?
- How to handle non-binary features?
-
22How learners differ? (cont)
- Training how to estimate parameters?
- Decoding how to find the best solution?
- Weakness and strengths
- Simplicity (conceptual)
- Efficiency at training
- Efficiency at testing time
- Handling multi-class
- Theoretical validity
- Predication accuracy
- Stability and robustness
- Interpretablity
- Scalability
- Output topN
- Sparse data problem e.g., split data
-
23A comparison chart(From Thorsten Joachims 2006
slides)
24Topics not covered in this course
- Classification algorithms
- Neural network
- SVM
-
- Other ML methods
- Many sequence labeling models CRF
- Graphical models
-
- Theoretical properties
- Incremental induction
-
25Assumptions made in this course
- Attribute-value table
- All attribute values are known.
- All the rows are available at the beginning. No
incremental induction. - Evaluation
- Different types of misclassification errors are
equally important. If not, the evaluation metrics
should be changed.
26Using classification algorithms for
non-classification problems
- Regression problem the target attribute has an
infinite number of values. - Sequence labeling problem
- POS tagging
- Parsing