Title: Spatial Big Data Challenges Intersecting Cloud Computing and Mobility
1Spatial Big Data ChallengesIntersecting Cloud
Computing and Mobility
- Shashi Shekhar
- McKnight Distinguished University Professor
- Department of Computer Science and Engineering
- University of Minnesota
- www.cs.umn.edu/shekhar
2Spatial Databases Representative Projects
3Why cloud computing for spatial data?
- Geospatial Intelligence Dr. M. Pagels, DARPA,
2006 - Estimated at 140 terabytes per day, 150
peta-bytes annually - Annual volume is 150x historical content of the
entire internet - Analyze daily data as well as historical data
-
4Eco-Routing
- Minimize fuel consumption and GPG emission
- rather than proxies, e.g. distance, travel-time
- avoid congestion, idling at red-lights, turns and
elevation changes, etc.
5Real-time and Historic Travel-time, Fuel
Consumption, GPS Tracks
5
6Eco-Routng Research Challenges
- Frames of Reference
- Absolute to moving object based (Lagrangian)
- Data model of lagrangian graphs
- Conceptual generalize time-expanded graph
- Logical Lagrangian abstract data types
- Physical clustering, index, Lagrangian routing
algorithms - Flexible Architecture
- Allow inclusion of new algorithms, e.g.,
gps-track mining - Merge solutions from different algorithms
- Geo-sensing of events,
- e.g., volunteered geographic information (e.g.,
open street map), - social unrest (Ushahidi), flash-mob,
- Geo-Prediction,
- e.g., predict track of a hurricane or a vehicle
- Challenges auto-correlation, non-stationarity
- Geo-privacy
7Cloud Computing and Spatial Big Data
- Motivation
- Case Study 1 Simpler to Parallelize
- Case Study 2 Harder
- Case Study 3 Hardest
- Wrap up
8Simpler Land-cover Classification
- Multiscale Multigranular Image Classification
into land-cover categories
Inputs
Output at 2 Scales
9Parallelization Choice
- 1.   Initialize parameters and memory
- 2.   for each Spatial Scale
- 3. for each Quad
- 4.   for each Class
- 5.   Calculate Quality Measure
- 6 end for Class
- 7. end for Quad
- 8.   end for Spatial Scale
- 9. Post-processing
Input 64 x 64 image (Plymouth County, MA) 4 classes (All, Woodland, Vegetated, Suburban)
Language UPC
Platform Cray X1, 1-8 processors)
10Harder Parallelizing Vector GIS
- (1/30) second Response time constraint on Range
Query - Parallel processing necessary since best
sequential computer cannot meet requirement - Blue rectangle a range query, Polygon colors
shows processor assignment
11Data-Partitioning Approach
- Initial Static Partitioning
- Run-Time dynamic load-balancing (DLB)
- Platforms Cray T3D (Distributed), SGI Challenge
(Shared Memory)
12DLB Pool-Size Choice is Challenging!
13Hardest Location Prediction
Nest locations
Distance to open water
Vegetation durability
Water depth
14Ex. 3 Hardest to Parallelize
15Cloud Computing and Spatial Big Data
- Motivation Spatial Big Data in National Security
Eco-routing - Case Study 1 Simpler to Parallelize
- Map-reduce is okay
- Should it provide spatial declustering services?
- Can query-compiler generate map-reduce parallel
code? - Case Study 2 Harder
- Need dynamic load balancing beyond map-reduce
- Case Study 3 Hardest
- Need new computer science, e.g.,
- Eco-routing algorithms
- determinant of large matrix
- Parallel formulation of evacuation route planning
16Acknowledgments
- HPC Resources, Research Grants
- Army High Performance Computing Research
Center-AHPCRC - Minnesota Supercomputing Institute - MSI
- Spatial Database Group Members
- Mete Celik, Sanjay Chawla, Vijay Gandhi, Betsy
George, James Kang, Baris M. Kazar, QingSong Lu,
Sangho Kim, Sivakumar Ravada - USDOD
- Douglas Chubb, Greg Turner, Dale Shires, Jim
Shine, Jim Rodgers - Richard Welsh (NCS, AHPCRC), Greg Smith
- Academic Colleagues
- Vipin Kumar
- Kelley Pace, James LeSage
- Junchang Ju, Eric D. Kolaczyk, Sucharita Gopal