Cluster Computing and Datalog - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Cluster Computing and Datalog

Description:

Title: CS206 --- Electronic Commerce Author: Jeff Ullman Last modified by: Jeff Created Date: 3/23/2002 8:14:09 PM Document presentation format: On-screen Show – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 27
Provided by: JeffU5
Learn more at: http://datalog20.org
Category:

less

Transcript and Presenter's Notes

Title: Cluster Computing and Datalog


1
Cluster Computing and Datalog
  • Recursion Via Map-Reduce
  • Seminaïve Evaluation
  • Re-engineering Map-Reduce for Recursion

2
Acknowledgements
  • Joint work with Foto Afrati
  • Alkis Polyzotis and Vinayak Borkar contributed to
    the architecture discussions.

3
Implementing Datalog via Map-Reduce
  • Joins are straightforward to implement as a round
    of map-reduce.
  • Likewise, union/duplicate-elimination is a round
    of map-reduce.
  • But implementation of a recursion can thus take
    many rounds of map-reduce.

4
Seminaïve Evaluation
  • Specific combination of joins and unions.
  • Example chain rule
  • q(W,Z) - r(W,X) s(X,Y) t(Y,Z)
  • Let r, s, t old relations r, s, t
    incremental relations.
  • Simplification assume r ar, etc.

5
A 3-Way Join Using Map-Reduce
  • q(W,Z) - r(W,X) s(X,Y) t(Y,Z)
  • Use k compute nodes.
  • Give X and Y shares to determine the reduce-task
    that gets each tuple.
  • Optimum strategy replicates r and t, not s, using
    communication s 2?krt.

6
Seminaïve Evaluation (2)
  • Need to compute sum (union) of seven terms
    (joins) rstrstrstrstrstrstrst
  • Obvious method for computing a round of
    seminaïve evaluation
  • Replicate r and r replicate t and t do not
    replicate s or s.
  • Communication (1a)(s 2?krt)

7
Seminaïve Evaluation (3)
  • There are many other ways we might use k nodes to
    do the same task.
  • Example one group of nodes does (rr)s(tt)
    a second group does rs(tt) the third group
    does rst.
  • Theorem no grouping does better than the obvious
    method for this example.

8
Networks of Processes for Recursions
  • Is it possible to do a recursion without multiple
    rounds of map-reduce and their associated
    communication cost?
  • Note tasks do not have to be Map or Reduce
    tasks they can have other behaviors.

9
Example Very Simple Recursion
  • p(X,Y) - e(X,Z) p(Z,Y)
  • p(X,Y) - p0(X,Y)
  • Use k compute nodes.
  • Hash Y-values to one of k buckets h(Y).
  • Each node gets a complete copy of e.
  • p0 is distributed among the k nodes, with p0(x,y)
    going to node h(y).

10
Example Continued
  • p(X,Y) - e(X,Z) p(Z,Y)
  • Each node applies the recursive rule and
    generates new tuples p(x,y).
  • Key point since new tuples have a Y-value that
    hashes to the same node, no communication is
    necessary.
  • Duplicates are eliminated locally.

11
Harder Case of Recursion
  • Consider a recursive rule
  • p(X,Y) - p(X,Z) p(Z,Y)
  • Responsibility divided among compute nodes by
    hashing Z-values.
  • Node n gets tuple p(a,b) if either h(a) n or
    h(b) n.

12
Compute Node for h(Z) n
Node for h(Z) n
Remember all Received tuples (eliminate duplicates
)
13
Comparison with Iteration
  • Advantage Lets us avoid some communication of
    data that would be needed in iterated map-reduce
    rounds.
  • Disadvantage Tasks run longer, more likely to
    fail.

14
Node Failures
  • To cope with failures, map-reduce implementations
    rely on each task getting its input at the
    beginning, and on output not being consumed
    elsewhere until the task completes.
  • But recursions cant work that way.
  • What happens if a node fails after some of its
    output has been consumed?

15
Node Failures (2)
  • Actually, there is no problem!
  • We restart the tasks of the failed node at
    another node.
  • The replacement task will send some data that the
    failed task also sent.
  • But each node remembers tuples to eliminate
    duplicates anyway.

16
Node Failures (3)
  • But the no problem conclusion is highly
    dependent on the Datalog assumption that it is
    computing sets.
  • Argument would fail if we were computing bags or
    aggregations of the tuples produced.
  • Similar problems for other recursions, e.g.,
    PDEs.

17
Extension of Map-Reduce Architecture for Recursion
  • Necessarily, all tasks need to operate in rounds.
  • The master controller learns of all input files
    that are part of the round-i input to task T and
    records that T has received these files.

18
Extension (2)
  • Suppose some task S fails, and it never supplies
    the round-(i 1) input to T.
  • A replacement S for S is restarted at some other
    node.
  • The master knows that T has received up to round
    i from S, so it ignores the first i output
    files from S.

19
Extension (3)
  • Master knows where all the inputs ever received
    by S are from, so it can provide those to S.

20
Checkpointing and State
  • Another approach is to design tasks so that they
    can periodically write a state file, which is
    replicated elsewhere.
  • Tasks take input state.
  • Initially, state is empty.
  • Master can restart a task from some state and
    feed it only inputs received after that state was
    written.

21
Example Checkpointing
  • p(X,Y) - p(X,Z) p(Z,Y)
  • Two groups of tasks
  • Join tasks hash on Z, using h(Z).
  • Like tasks from previous example.
  • Eliminate-duplicates tasks hash on X and Y,
    using h(X,Y).
  • Receives tuples from join tasks.
  • Distributes truly new tuples to join tasks.

22
Example (2)
. . .
Dup-elim tasks. State has p(x,y) if h(x,y) is
right.
Join tasks. State has p(x,y) if h(x) or h(y) is
right.
23
Example Details
  • Each task writes buffer files locally, one for
    each of the tasks in the other rank.
  • The two ranks of tasks are run on different racks
    of nodes, to minimize the probability that tasks
    in both ranks will fail at the same time.

24
Example Details (2)
  • Periodically, each task writes its state (tuples
    received so far) incrementally and lets the
    master controller replicate it.
  • Problem the controller cant be too eager to
    pass output files to their input, or files become
    tiny.

25
Future Research
  • There is work to be done on optimization, using
    map-reduce or similar facilities, for restricted
    SQL such as Datalog, Datalog, Datalog
    aggregation.
  • Check out Hive, PIG, as well as work on multiway
    join optimization.

26
Future Research (2)
  • Almost everything is open about recursive Datalog
    implementation under map-reduce or similar
    systems.
  • Seminaïve evaluation in general case.
  • Architectures for managing failures.
  • Clustera and Hyrax are interesting examples of
    (nonrecursive) extension of map-reduce.
  • When can we avoid communication as with p(X,Y) -
    e(X,Z) p(Z,Y)?
Write a Comment
User Comments (0)
About PowerShow.com