Hardening Functions for LargeScale Distributed Computations - PowerPoint PPT Presentation

About This Presentation
Title:

Hardening Functions for LargeScale Distributed Computations

Description:

Participants do not know number of ri in data space ... Assuming tasks require equal time, cost of compute job is at least doubled... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 33
Provided by: dougs96
Category:

less

Transcript and Presenter's Notes

Title: Hardening Functions for LargeScale Distributed Computations


1
Hardening Functions for Large-Scale Distributed
Computations
  • Doug Szajda
  • Barry Lawson
  • Jason Owen

1
2
Large-Scale Distributed Computations
  • Easily parallelizable, compute intensive
  • Divide into independent tasks to be executed on
    participant PCs
  • Significant results collected by supervisor
  • Participants may receive credits
  • Money, e-cash, ISP fees, fame and glory

2
3
Examples
  • seti_at_home
  • Finding Martians
  • folding_at_home
  • Protein folding
  • GIMPS (Entropia)
  • Mersenne Prime search
  • United Devices, IBM, DOD Smallpox study
  • DNA sequencing
  • Graphics
  • Exhaustive Regression
  • Genetic Algorithms
  • Data Mining
  • Monte Carlo simulation

4
The Problem
  • Code is executing in untrusted environments
  • Results may be corrupted either intentionally or
    unintentionally
  • Significant results may be withheld
  • Cheating credit for work not performed

5
An Obvious Solution
  • Assign Tasks Redundantly
  • Collusion may seem unlikely but
  • Firms solicit participants from groups such as
    alumni associations and large corporations
  • Processor cycles are primary resource
  • Some problems can tolerate some bad results

6
Related Work
  • Historical roots in result checking and
    self-correcting programs
  • Golle and Mironov (2001)
  • Golle and Stubblebine (2001)
  • Monrose, Wyckoff, Rubin (1999)

7
Related Work
  • Body of literature on protecting mobile agents
    from malicious hosts
  • Sander and Tschudin, Vigna, Hohl, and others
  • Syverson (1998)

8
Adversary
  • Assumed to be intelligent
  • Can decompile, analyze, modify code
  • Understands task algorithms and measures used to
    prevent corruption
  • Motivation may not be obvious...
  • I.e. gaining credits may not be important
  • E.g. business competitor
  • But does not wish to be caught

9
Our Approach
  • Hardening functions
  • Verb, not adjective
  • Does not guarantee resulting computation returns
    correct results
  • Does not prevent an adversary from disrupting a
    computation
  • Significantly increases likelihood that abnormal
    activity will be detected

10
The Model
  • Computation is evaluation of algorithm f D -gt R
    for every input value x in D
  • Tasks created by partitioning D into subsets Di
  • Each task assigned filter function Gi

11
Two General Classes
  • Non-sequential
  • Computed values of f in task are independent
  • Sequential
  • Participant given single value x0 and asked to
    compute first m elements of sequence xn f (xn-1)

12
Hardening Non-sequentials
  • Plant each tasks data set with values ri such
    that the following hold
  • Supervisor knows f(ri) for each i
  • Participant cannot distinguish ri from other data
    values regardless of number of tasks a
    participant completes

13
Hardening Non-sequentials
  • Participants do not know number of ri in data
    space
  • For some known proportion of ri f(ri) is a
    significant result
  • Nice but not necessary Same set of ri can be
    used for several tasks

14
Difficulties
  • ri are indistinguishable only if they generate
    truly significant results
  • What is indistinguishable in theory may not be in
    practice
  • E.g. DES key search Tasks given ciphertext C and
    subset Ki of key space, told to decrypt C with
    each ki and return any key that generates
    plausible plaintext

15
Even Filter Function Can Be Revealing...
  • E.g. Traveling Salesperson with five precomputed
    circuits of length 100, 105, 102, 113, 104
  • Return any circuit whose length is any of the
    above or less than 100
  • Return the ten best circuits found
  • Return any circuit with length less than 120

16
Optimization Problems
  • Designate small proportion of tasks as initial
    distribution
  • Distribute each of these tasks redundantly
  • Check returned values handle non-matches
    appropriately
  • Retain k best results and use them as ringers for
    remaining tasks

17
Collusion
  • If task in initial distribution is assigned to
    colluding adversaries, supervisor will initially
    miss this
  • Honest participants not in initial distribution
    will eventually return results that do not match
  • Supervisor can then determine which participants
    have been dishonest

18
Size of Initial Distribution
  • Probability that at least k of n best results are
    in proportion p of space is

For 109 inputs, best 105 results are in top 0.01
19
Caveat
  • Previous figures assume
  • n, k much less than size of data space
  • proportion of incorrect results is small
  • Probability should be adjusted to reflect
    expected number of incorrect results returned in
    initial distribution

20
The Good
  • No precomputing required
  • Hardening is achieved at fraction of cost of
    simple redundancy
  • Ringers can be used for multiple tasks
  • Additional good results can be used as ringers
  • Collusion resistant since ringers can be combined
    in many ways

21
The Bad
  • Assuming tasks require equal time, cost of
    compute job is at least doubled...
  • But, by running multiple projects concurrently,
    overall throughput rates can be reduced to factor
    of 1p times rate of unmodified job
  • In some cases, implementation details can give
    away identities of ringers (or require
    significant changes to app)

22
Sequential Computations
  • Seeding the data is impractical
  • Often the validity of returned results can only
    be checked by performing the entire task
  • Ex Mersenne Primes
  • nth Mersenne Number, Mn, is 2n-1

23
The Strategy
  • Share the work of computing N tasks among K
    participants
  • K gt N is very small proportion of total number of
    participants in computation
  • Assume
  • Each task requires roughly m iterations
  • K/N lt 2, else simple redundancy is cheaper

24
The Algorithm
  • Divide tasks into S segments, each containing
    roughly J m/S iterations
  • Each participant in group is given an initial
    value and computes first J iterations using this
    value
  • When J iterations complete, results returned to
    supervisor

25
The Algorithm
  • Supervisor checks correctness of redundantly
    assigned subtasks
  • Supervisor permutes N values and assigns these
    values to K participants as initial value for
    next segment
  • Repeat until all S segments completed

26
The Numbers
  • If K/N lt 2, each task assigned to no more than
    two participants, and adversary cheats in L (of
    S) segments, then in absence of collusion

27
Probabilities
28
Redundancy vs. P values
L 1
L 2
29
Advantages
  • Far fewer task compute cycles than simple
    redundancy
  • Values need not be precomputed
  • Method is relatively collusion resistant (unless
    supervisor picks an entire group of colluding
    participants)
  • Method is tunable
  • Can also be applied to non-sequential case

30
Disadvantages
  • Increased coordination and communication costs
    for supervisor
  • Need for synchronization increases time cost of
    job
  • Dial-up connectivity
  • Sporadic task execution (owners using PCs)

31
Disadvantages
  • Strategy does not protect well against adversary
    who cheats once
  • Cheating damage can be magnified
  • Propagation of undetected incorrect results

32
Conclusions
  • Presented two strategies for hardening
    distributed metacomputations
  • Non-sequential Seed data with ringers
  • Sequential Share N tasks among K gt N
    participants
  • Small increase in average execution time of
    modified task
  • Overall computing costs significantly less than
    redundantly assigning every task
Write a Comment
User Comments (0)
About PowerShow.com