An Overview of Map-Reduce Research - PowerPoint PPT Presentation

1 / 7
About This Presentation
Title:

An Overview of Map-Reduce Research

Description:

Title: A Method and System for Automated Document Sanitization IN8-2006-0697 Venkatesan Chakaravarthy Himanshu Gupta Prasan Roy Mukesh Mohania IBM India Research Lab – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 8
Provided by: IBMU684
Category:

less

Transcript and Presenter's Notes

Title: An Overview of Map-Reduce Research


1

An Overview of Map-Reduce Research
2
Main Themes
  • Designing Efficient Algorithms on Map-Reduce
  • Extensions on Map-Reduce
  • Modeling Map-Reduce Computation

3
Limitations
  • Selective Access To Data
  • High Communication Cost
  • Redundant and Wasteful Processing
  • Lack of Early Termination
  • Lack of Iteration
  • Quick Retrieval of Approximate Results
  • Load Balancing
  • Lack of Real-time and Interactive Processing
  • Lack of Support for n-way Operations

4
Interactive Processing Streaming Pipelining In-Me
mory Processing Pre-computation Dremel, Tenzing,
BlinkDB M3R, Shark
Data Access Indexing Partitioning Co-location,
Data Layout Co-Hadoop(), Hadoop, HAIL, LlAH,
Llama, Cheetah
Avoidance of Redundant Processing Batch
Processing of Queries Result Materialization Incre
mental Processing Result Sharing ReStore,
InCoop, MRShare
Processing n-way Operations Spatial / Temporal
Joins Additional MR Phase Redistribution of
Keys Record Duplication Controlled-Replicate(),
RCCIS()
Iterative Processing Looping,
Caching Pipelining, Recursion Incremental
Processing HaLoop, ReDoop, InCoop
Extensions On Map-Reduce
Query Optimization Parameter Tuning, Plan
Refinement Operator Reordering, Code
Analysis Data Flow Optimization HadoopDB,
Clydesdale, Starfish, AQUA, Adaptive-MR()
Processing Industry Specific Data Spatio -
Temporal Data Geo-Spatial Data Agriculture / Oil
Gas / Energy BLAST(), Spatial-Hadoop, Hadoop-
GIS
Fair Work Allocation Batching, Sampling,
Re-partitioning Skew-Tune, Skew-Reduce, Themis
Early Termination Sorting , Sampling EARL,
RanKloud
() Contributed by IBM
5
Designing Efficient Algorithms on Map-Reduce
  • Joins
  • Multi-way Joins
  • Similarity Joins
  • Theta Joins
  • Spatial Joins
  • Interval Joins
  • Entity Resolution
  • Graph Algorithms
  • Machine Learning
  • Computational Geometry

6
Modeling Computation on Map-Reduce
  • Two main cost components
  • Time spent in communication from map tasks to
    reduce tasks
  • Time spent in computation as part of reduce tasks
  • These two components involve a trade-off
  • Given - an analytics problem, the input-data and
    the number of reduce tasks
  • What is the minimum communication cost, a
    map-reduce algorithm for the given analytics and
    the corresponding input-data is going to incur?

7
Survey References
  • A Survey on Large-Scale Analytical Query
    Processing in Map-Reduce
  • Christos Doulkeridis and Kjetil Norwag
  • In VLDB Journal, 23(3), 2014
  • Distributed Data Management on Map-Reduce
  • Feng Li, Beng Chin Ooi, M. Tamer. Ojsu and Sai Wu
  • In ACM Computing Survey, 46(3), 2014
Write a Comment
User Comments (0)
About PowerShow.com