Title: Minimal Loss Hashing for Compact Binary Codes
1Minimal Loss Hashing for Compact Binary Codes
- Mohammad Norouzi
- David Fleet
- University of Toronto
2Near Neighbor Search
3Near Neighbor Search
4Near Neighbor Search
5Similarity-Preserving Binary Hashing
- Why binary codes?
- Sub-linear search using hash indexing(even
exhaustive linear search is fast) - Binary codes are storage-efficient
6Similarity-Preserving Binary Hashing
Hash function
kth row of W
Random projections used by locality-sensitive
hashing (LSH) and related techniques Indyk
Motwani 98 Charikar 02 Raginsky Lazebnik
09
7Learning Binary Hash Functions
- Reasons to learn hash functions
- to find more compact binary codes
- to preserve general similarity measures
- Previous work
- boosting Shakhnarovich et al 03
- neural nets Salakhutdinov Hinton 07 Torralba
et al 07 - spectral methods Weiss et al 08
- loss-based methods Kulis Darrel 09
8Formulation
9Loss Function
Similar items should map to nearby hash
codes Dissimilar items should map to very
different codes
10Hinge Loss
11Empirical Loss
- Good
- incorporates quantization and Hamming distance
- Not so good
- discontinuous, non-convex objective function
12We minimize an upper bound on empirical loss,
inspired by structural SVM formulations
Taskar et al 03 Tsochantaridis et al 04 Yu
Joachims 09
13Bound on loss
LHS RHS
14Bound on loss
- Remarks
- piecewise linear in W
- convex-concave in W
- relates to structural SVM with latent variables
Yu Joachims 09
15Bound on Empirical Loss
- Loss-adjusted inference
- Exact
- Efficient
16Perceptron-like Learning
McAllester et al.., 2010
17Experiment Euclidean ANN
Similarity based on Euclidean distance
- Datasets
- LabelMe (GIST)
- MNIST (pixels)
- PhotoTourism (SIFT)
- Peekaboom (GIST)
- Nursery (8D attributes)
- 10D Uniform
18Experiment Euclidean ANN
- 22K LabelMe
- 512 GIST
- 20K training
- 2K testing
- 1 of pairs are similar
- Evaluation
- Precision hits / number of items retrieved
- Recall hits / number of similar items
19Techniques of interest
- MLH minimal loss hashing (This work)
- LSH locality-sensitive hashing (Charikar 02)
- SH spectral hashing (Weiss, Torralba Fergus
09) - SIKH shift-Invariant kernel hashing (Raginsky
Lazebnik 09) - BRE Binary reconstructive embedding (Kulis
Darrel 09)
20Euclidean Labelme 32 bits
21Euclidean Labelme 32 bits
22Euclidean Labelme 32 bits
23Euclidean Labelme 64 bits
24Euclidean Labelme 64 bits
25Euclidean Labelme 128 bits
26Euclidean Labelme 256 bits
27Experiment Semantic ANN
- Semantic similarity measure based on
annotations(object labels) from LabelMe
database - 512D GIST, 20K training, 2K testing
- Techniques of interest
- MLH minimal loss hashing
- NN nearest neighbor in GIST space
- NNCA multilayer network with RBM pre-training
and nonlinear NCA fine tuning Torralba, et al.
09 Salakhutdinov Hinton 07
28Semantic LabelMe
29Semantic LabelMe
30(No Transcript)
31(No Transcript)
32(No Transcript)
33Summary
- A formulation for learning binary hash
functionsbased on - structured prediction with latent variables
- hinge-like loss function for similarity search
- Experiments show that with minimal loss hashing
- binary codes can be made more compact
- semantic similarity based on human labels can be
preserved
34