Fast Algorithms for Analyzing Massive Data - PowerPoint PPT Presentation

About This Presentation

Title:

Fast Algorithms for Analyzing Massive Data

Description:

Fast Algorithms for Analyzing Massive Data Alexander Gray Georgia Institute of Technology www.fast-lab.org The FASTlab Fundamental Algorithmic and Statistical Tools ... – PowerPoint PPT presentation

Number of Views:85

Avg rating:3.0/5.0

Slides: 20

Provided by: heaharvar

Learn more at: https://hea-www.harvard.edu

Category:

more less

Transcript and Presenter's Notes

Title: Fast Algorithms for Analyzing Massive Data

1
Fast Algorithms for Analyzing Massive Data

Alexander Gray
Georgia Institute of Technology
www.fast-lab.org

2
The FASTlabFundamental Algorithmic and
Statistical Tools Laboratorywww.fast-lab.org

Alexander Gray Assoc Prof, Applied Math CS
PhD CS
Arkadas Ozakin Research Scientist, Math
Physics PhD Physics
Dongryeol Lee PhD student, CS Math
Ryan Riegel PhD student, CS Math
Sooraj Bhat PhD student, CS
Nishant Mehta PhD student, CS
Parikshit Ram PhD student, CS Math
William March PhD student, Math CS
Hua Ouyang PhD student, CS
Ravi Sastry PhD student, CS
Long Tran PhD student, CS
Ryan Curtin PhD student, EE
Ailar Javadi PhD student, EE
Anita Zakrzewska PhD student, CS
5-10 MS students and undergraduates

3
7 tasks ofmachine learning / data mining

Querying spherical range-search O(N), orthogonal
range-search O(N), nearest-neighbor O(N),
all-nearest-neighbors O(N2)
Density estimation mixture of Gaussians, kernel
density estimation O(N2), kernel conditional
density estimation O(N3)
Classification decision tree, nearest-neighbor
classifier O(N2), kernel discriminant analysis
O(N2), support vector machine O(N3) , Lp SVM
Regression linear regression, LASSO, kernel
regression O(N2), Gaussian process regression
O(N3)
Dimension reduction PCA, non-negative matrix
factorization, kernel PCA O(N3), maximum variance
unfolding O(N3) Gaussian graphical models,
discrete graphical models
Clustering k-means, mean-shift O(N2),
hierarchical (FoF) clustering O(N3)
Testing and matching MST O(N3), bipartite
cross-matching O(N3), n-point correlation
2-sample testing O(Nn), kernel embedding

4
7 tasks ofmachine learning / data mining

Querying spherical range-search O(N), orthogonal
range-search O(N), nearest-neighbor O(N),
all-nearest-neighbors O(N2)
Density estimation mixture of Gaussians, kernel
density estimation O(N2), kernel conditional
density estimation O(N3)
Classification decision tree, nearest-neighbor
classifier O(N2), kernel discriminant analysis
O(N2), support vector machine O(N3), Lp SVM
Regression linear regression, LASSO, kernel
regression O(N2), Gaussian process regression
O(N3)
Dimension reduction PCA, non-negative matrix
factorization, kernel PCA O(N3), maximum variance
unfolding O(N3) Gaussian graphical models,
discrete graphical models
Clustering k-means, mean-shift O(N2),
hierarchical (FoF) clustering O(N3)
Testing and matching MST O(N3), bipartite
cross-matching O(N3), n-point correlation
2-sample testing O(Nn), kernel embedding

5
7 tasks ofmachine learning / data mining

Querying spherical range-search O(N), orthogonal
range-search O(N), nearest-neighbor O(N),
all-nearest-neighbors O(N2)
Density estimation mixture of Gaussians, kernel
density estimation O(N2), kernel conditional
density estimation O(N3), submanifold density
estimation Ozakin Gray, NIPS 2010, O(N3),
convex adaptive kernel estimation Sastry Gray,
AISTATS 2011 O(N4)
Classification decision tree, nearest-neighbor
classifier O(N2), kernel discriminant analysis
O(N2), support vector machine O(N3) , Lp SVM,
non-negative SVM Guan et al, 2011
Regression linear regression, LASSO, kernel
regression O(N2), Gaussian process regression
O(N3)
Dimension reduction PCA, non-negative matrix
factorization, kernel PCA O(N3), maximum variance
unfolding O(N3) Gaussian graphical models,
discrete graphical models, rank-preserving maps
Ouyang and Gray, ICML 2008 O(N3) isometric
separation maps Vasiiloglou, Gray, and Anderson
MLSP 2009 O(N3) isometric NMF Vasiiloglou,
Gray, and Anderson MLSP 2009 O(N3) functional
ICA Mehta and Gray, 2009, density preserving
maps Ozakin and Gray, in prep O(N3)
Clustering k-means, mean-shift O(N2),
hierarchical (FoF) clustering O(N3)
Testing and matching MST O(N3), bipartite
cross-matching O(N3), n-point correlation
2-sample testing O(Nn), kernel embedding

6
7 tasks ofmachine learning / data mining

Querying spherical range-search O(N), orthogonal
range-search O(N), nearest-neighbor O(N),
all-nearest-neighbors O(N2)
Density estimation mixture of Gaussians, kernel
density estimation O(N2), kernel conditional
density estimation O(N3)
Classification decision tree, nearest-neighbor
classifier O(N2), kernel discriminant analysis
O(N2), support vector machine O(N3) , Lp SVM
Regression linear regression, kernel regression
O(N2), Gaussian process regression O(N3), LASSO
Dimension reduction PCA, non-negative matrix
factorization, kernel PCA O(N3), maximum variance
unfolding O(N3), Gaussian graphical models,
discrete graphical models
Clustering k-means, mean-shift O(N2),
hierarchical (FoF) clustering O(N3)
Testing and matching MST O(N3), bipartite
cross-matching O(N3), n-point correlation
2-sample testing O(Nn), kernel embedding

Computational Problem!
7
The 7 Giants of Data(computational problem
types)Gray, Indyk, Mahoney, Szalay, in National
Acad of Sci Report on Analysis of Massive Data,
in prep

Basic statistics means, covariances, etc.
Generalized N-body problems distances, geometry
Graph-theoretic problems discrete graphs
Linear-algebraic problems matrix operations
Optimizations unconstrained, convex
Integrations general dimension
Alignment problems dynamic prog, matching

8
7 general strategies

Divide and conquer / indexing (trees)
Function transforms (series)
Sampling (Monte Carlo, active learning)
Locality (caching)
Streaming (online)
Parallelism (clusters, GPUs)
Problem transformation (reformulations)

9
1. Divide and conquer

Fastest approach for
nearest neighbor, range search (exact) O(logN)
Bentley 1970, all-nearest-neighbors (exact)
O(N) Gray Moore, NIPS 2000, Ram, Lee, March,
Gray, NIPS 2010, anytime nearest neighbor
(exact) Ram Gray, SDM 2012, max inner product
Ram Gray, under review
mixture of Gaussians Moore, NIPS 1999, k-means
Pelleg and Moore, KDD 1999, mean-shift
clustering O(N) Lee Gray, AISTATS 2009,
hierarchical clustering (single linkage,
friends-of-friends) O(NlogN) March Gray, KDD
2010
nearest neighbor classification Liu, Moore,
Gray, NIPS 2004, kernel discriminant analysis
O(N) Riegel Gray, SDM 2008
n-point correlation functions O(Nlogn) Gray
Moore, NIPS 2000, Moore et al. Mining the Sky
2000, multi-matcher jackknifed npcf March
Gray, under review

10
3-point correlation
(biggest previous 20K) VIRGO simulation
data, N 75,000,000 naïve 5x109 sec.
(150 years) multi-tree 55 sec.
(exact)
n2 O(N) n3 O(Nlog3) n4 O(N2)
11
3-point correlation
Naive - O(Nn) (estimated) Single bandwidth Gray Moore 2000, Moore et al. 2000 Multi-bandwidth March Gray in prep 2010 new
2 point cor. 100 matchers 2.0 x 107 s 352.8 s 56,000 4.96 s 71.1
3 point cor. 243 matchers 1.1 x 1011 s 891.6 s 1.23 x 108 13.58 s 65.6
4 point cor. 216 matchers 2.3 x 1014 s 14530 s 1.58 x 1010 503.6 s 28.8
106 points, galaxy simulation data
12
2. Function transforms

Fastest approach for
Kernel estimation (low-ish dimension) dual-tree
fast Gauss transforms (multipole/Hermite
expansions) Lee, Gray, Moore NIPS 2005, Lee
and Gray, UAI 2006
KDE and GP (kernel density estimation, Gaussian
process regression) (high-D) random Fourier
functions Lee and Gray, in prep

13
3. Sampling

Fastest approach for (approximate)
PCA cosine trees Holmes, Gray, Isbell, NIPS
2008
Kernel estimation bandwidth learning Holmes,
Gray, Isbell, NIPS 2006, Holmes, Gray, Isbell,
UAI 2007, Monte Carlo multipole method (with SVD
trees) Lee Gray, NIPS 2009
Nearest-neighbor distance-approx spill trees
with random proj Liu, Moore, Gray, Yang, NIPS
2004, rank-approximate Ram, Ouyang, Gray, NIPS
2009

Rank-approximate NN
Best meaning-retaining approximation criterion in
the face of high-dimensional distances
More accurate than LSH

14
3. Sampling

Active learning the sampling can depend on
previous samples
Linear classifiers rigorous framework for
pool-based active learning Sastry and Gray,
AISTATS 2012
Empirically allows reduction in the number of
objects that require labeling
Theoretical rigor unbiasedness

15
4. Caching

Fastest approach for (using disk)
Nearest-neighbor, 2-point Disk-based treee
algorithms in Microsoft SQL Server Riegel,
Aditya, Budavari, Gray, in prep
Builds kd-tree on top of built-in B-trees
Fixed-pass algorithm to build kd-tree

No. of points MLDB (Dual tree) Naive
40,000 8 seconds 159 seconds
200,000 43 seconds 3480 seconds
2,000,000 297 seconds 80 hours
10,000,000 29 mins 27 sec 74 days
20,000,000 58mins 48sec 280 days
40,000,000 112m 32 sec 2 years
16
5. Streaming / online

Fastest approach for (approximate, or streaming)
Online learning/stochastic optimization just use
the current sample to update the gradient
SVM (squared hinge loss) stochastic Frank-Wolfe
Ouyang and Gray, SDM 2010
SVM, LASSO, et al. noise-adaptive stochastic
approximation Ouyang and Gray, in prep, on
arxiv, accelerated non-smooth SGD Ouyang and
Gray, under review
faster than SGD
solves step size problem
beats all existing convergence rates

17
6. Parallelism

Fastest approach for (using many machines)
KDE, GP, n-point distributed trees Lee and
Gray, SDM 2012, 6000 cores March et al, in
prep for Gordon Bell Prize 2012, 100K cores?
Each process owns the global tree and its local
tree
First log p levels built in parallel each
process determines where to send data
Asynchronous averaging provable convergence
SVM, LASSO, et al. distributed online
optimization Ouyang and Gray, in prep, on arxiv
Provable theoretical speedup for the first time

18
7. Transformationsbetween problems

Change the problem type
Linear algebra on kernel matrices ? N-body inside
conjugate gradient Gray, TR 2004
Euclidean graphs ? N-body problems March Gray,
KDD 2010
HMM as graph ? matrix factorization Tran Gray,
in prep
Optimizations reformulate the objective and
constraints
Maximum variance unfolding SDP via
Burer-Monteiro convex relaxation Vasiloglou,
Gray, Anderson MLSP 2009
Lq SVM, 0ltqlt1 DC programming Guan Gray, CSDA
2011
L0 SVM mixed integer nonlinear program via
perspective cuts Guan Gray, under review
Do reformulations automatically Agarwal et al,
PADL 2010, Bhat et al, POPL 2012
Create new ML methods with desired computational
properties
Density estimation trees nonparametric density
estimation, O(NlogN) Ram Gray, KDD 2011
Local linear SVMs nonlinear classification,
O(NlogN) Sastry Gray, under review
Discriminative local coding nonlinear
classification O(NlogN) Mehta Gray, under
review

19
Software

For academic use only MLPACK
Open source, C, written by students
Data must fit in RAM distributed in progress
For institutions Skytree Server
First commercial-grade high-performance machine
learning server
Fastest, biggest ML available up to 10,000x
faster than existing solutions (on one machine)
V.12, April 2012-ish distributed, streaming
Connects to stats packages, Matlab, DBMS, Python,
etc
www.skytreecorp.com
Colleagues Email me to try it out
agray_at_cc.gatech.edu