Title: Anne Jorstad
1Leaf Classification from Local Boundary Analysis
- Anne Jorstad
- AMSC 664
- University of Maryland
- Spring 2008
- Final Report
Advisor Dr. David Jacobs, Computer Science
2Background
- Electronic Field Guide for Plants
3Background
- Current System
- Inner-Distance Shape Context (IDSC)
- Measures the shortest distance between two points
on a path contained entirely within a figure - Good for detecting similarities between
deformable structures
4Background
- Current System
- All shape information is compared at a global
level, no specific consideration of edge types
Cephalanthus occidentalis (smooth boundary)
Carpinus caroliniana (serrated boundary)
5Problem Statement
Use local boundary information to make
classification decisions that complement the
existing system.
6The Algorithm
- Input
- Capture
- boundary curve
7The Algorithm Wavelets
- Discrete wavelet transform
- In vector of points
- Out two vectors, each half original length
- Approximation coefficients
- general spatial information
- Detail coefficients
- local detail information
- Repeat for multiple scales
8The Algorithm Wavelets
- Model leaf by its detail coefficients over
several scales
Input
Approximations, continually subtracting out
detail information
9The Algorithm Data
- Forget leaves
- Each boundary point
- Lose one degree of freedom in preserving rotation
invariance - For 3 wavelet scales, leaf is 2000 5-D points
- Combine data for all leaves
- leaves x 2000 5-D points
- Group all points into meaningful clusters
10The Algorithm Clustering
- Goal Sort points into buckets to get a unique
distribution for each leaf species - K-Means Clustering
- group all points into 36
- representative clusters
11The Algorithm Distribution Comparison
- Distribution of individual leafs 2000 points
over the 36 clusters represents leaf
(a)
(b)
(c)
Leaf image and corresponding histogram for (a)
Corylus americana, (b) Corylus americana,
different example, (c) Asimina triloba
12The Algorithm Distribution Comparison
- Compare distributions between leaves using the
chi-squared distance -
- where
- Smallest distance defines best match
- New leaf is assigned the species of the closest
match
13Validation
- Training data 20 species, 10 examples of each
- ? 200 leaves
10 serrated species
10 smooth species
14Validation
- Test data same 20 species, 5 new examples of
each - Nearest-Neighbor Classification
- Species classification 46 correct
- Serration classification 100 correct
- closest match was to species with appropriate
serration
15Validation
- Test data same 20 species, 5 new examples of
each - Nearest-Neighbor Classification
- Species classification 46 correct
- Serration classification 100 correct
- closest match was to species with appropriate
serration
Local serration information IS being captured!
16Combining Results
- Original IDSC results on same data set
- Species correct 62
- Serration correct if species wrong 53
- No better than chance
- How to combine wavelet distances with IDSC
distances?
17Combining Results
18Naïve Bayes Classification
- From Bayes Rule
- Can now calculate all relevant probabilities from
training data
19Naïve Bayes Classification
- Wavelet distances ? binary serration value
- Add small linear smoothing term
- IDSC distances ? species ranked in order from
nearest to farthest - Add Gaussian smoothing term
20Validation Results
- Test on same 20 species, 5 examples of each
- Adding serration information has improved overall
classification results!
21Full Data Set
- 245 species, 7481 leaves
- Binary serration assignment no longer makes sense
22Linear Optimization
- Find best linear weighting of distances
- Train over previous
- training set
correct
alpha
23Full Data Set
- Nearest-Neighbor Classification over all
- 7481 leaves
- Wavelet alone 20 correct
- IDSC alone 54 correct
- Combined 64 correct
24In Practice
- Electronic field guide displays top 5, 10 or 20
matches - Calculate correct in top n matches,
- for n 1, , 20
25In Practice
correct
matches considered
26In Practice
- Need results in near real-time
- Otherwise no benefit over paper field guides
- Running time
- Preprocessing
(several hours) - Determine cluster centers
- Determine distributions for each leaf
- On the spot
(0.92 seconds) - Calculate single distribution
- Compare to all distributions in system
27Conclusions
- Wavelets do capture local serration information
- Wavelet IDSC classification does a better
overall job than the original IDSC alone - Calculations can be done in real time to make the
system realistic to use
28References
- Gaurav Agarwal, Haibin Ling, David Jacobs, Sameer
Shirdhonkar, W. John Kress, Rusty Russell, Peter
Belhumeur, Nandan Dixit, Steve Feiner, Dhruv
Mahajan, Kalyan Sunkavalli, Ravi Ramamoorthi,
Sean White. First Steps Toward an Electronic
Field Guide for Plants. Taxon, vol. 55, no. 3,
Aug. 2006. - Cene C.-H. Chuang, C.-C. Jay Kuo. Wavelet
Descriptor of Planar Curves Theory and
Applications. IEEE Transactions of Image
Processing, Vol. 5, No. 1, January 1996. - Pedro F. Felzenszwalb, Jushua D. Schwartz.
Hierarchical Matching of Deformable Shapes.
IEEE Conference on Computer Vision and Pattern
Recognition, 2007. - Haibin Ling, David Jacobs. Using the
Inner-Distance for Classification of Articulated
Shapes. CVPR, Proceedings of the 2005 IEEE
Computer Society Conference on Computer Vision
and Pattern Recognition, Vol. 2, 2005. - Jitendra Malik, Serge Belongie, Thomas Leung,
Jainbo Shi. Contour and Texture Analysis for
Image Segmentation. International Journal of
Computer Vision, vol. 34, no. 1, July 2001. - Stephane Mallat. A Wavelet Tour of Signal
Processing. Academic Press, Chestnut Hill,
Massachusetts, 1999.