The ConCert Project Peter Lee Carnegie Mellon University - PowerPoint PPT Presentation

About This Presentation
Title:

The ConCert Project Peter Lee Carnegie Mellon University

Description:

Donors must download updates! A central receiving point ... Bindings for free variables. Arguments to the chord. Type information / proof of compliance ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 60
Provided by: pete65
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: The ConCert Project Peter Lee Carnegie Mellon University


1
The ConCert ProjectPeter LeeCarnegie Mellon
University
MRG Workshop
May 2002
2
ConCert Project
  • Collaborators
  • Karl Crary
  • Robert Harper
  • Peter Lee
  • Frank Pfenning
  • Began in fall of 2001

3
Peter Lee
  • ConCert project
  • Special J PCC system (with George Necula)
  • Temporal logic PCC (with Andrew Bernard)
  • Undergrad CS education

4
Some initial feedback
  • PCC for small environments?
  • kJava
  • PCC vs TAL?
  • A look at Special J?
  • Examples games?
  • Type-directed vs theorem prover
  • Proofs vs oracle strings

5
Grid computing
  • The Internet as a power grid
  • Share resources
  • Especially when they would go idle otherwise

6
Grid computing projects
  • Some well-known examples
  • SETI_at_HOME, GIMPS, FOLDING_at_HOME
  • Each is a project unto itself.
  • Donating hosts explicitly choose to participate
  • Lots of popular and govt interest

7
Grid computing today
  • Many more solutions than applications!
  • Common run-time systems
  • Frameworks for building grid applications
  • Etc

8
Grid computing today
  • Many more problems than solutions!
  • How to program the grid?
  • How to exploit the grid efficiently?

9
SETI_at_Home
  • Over 3M donating hosts
  • Each running the same app!
  • Donors must download updates!
  • A central receiving point
  • Uses 30 of Berkeleys available bandwidth to
    the Internet!

10
Why is it like this?
  • Distributed programming is hard
  • A SIMD model is restrictive, but simplifies
    programming and management
  • Donors need only establish a trust relationship
    with the central receiver, but not other donors
  • Even so, user wants to control when/if new code
    is installed

11
Our thesis
  • PL technology can provide a grid computing
    environment in which
  • Developers can write richly distributed
    applications
  • The developers utilization of donated resources
    is completely transparent to the donor
  • But the donor is confident that her specified
    safety, security, and privacy policies will not
    be violated

12
Issues Trust
  • Hosts must run foreign code
  • Today On a case-by-case basis
  • Explicit intervention / attention required
  • Is it a virus?
  • Safety wont crash my machine
  • Resource usage wont soak up cycles, memory
  • Privacy wont reveal secrets

13
Issues Reliability
  • Application developers must ensure that hosts
    play nice
  • Hosts could maliciously provide bad results
  • Current methods based on redundancy and
    randomization to avoid collusion

14
Issues Programming
  • How to write grid applications?
  • Model of parallelism?
  • Massively parallel
  • No shared resources
  • Frequent failures
  • Language? Run-time environment?
  • Portability across platforms
  • How to write grid code?

15
Issues Implementation
  • What is a grid framework?
  • Establishing and maintaining a grid
  • Distribution of work, load balancing, scheduling,
    updating
  • Fault recovery
  • Many different applications with different
    characteristics

16
Issues Applications
  • Can we get work done?
  • How effectively can we exploit the resources of
    the grid?
  • Amortizing overhead
  • Are problems of interest amenable to grid
    solutions?
  • Depth gt 1 feasible?

17
The ConCert Project
18
The ConCert Project
  • Trustless Grid Computing
  • General framework for grid computing
  • Trust model based on code certification
  • Advanced languages for grid computing
  • Applications of trustless grid computing

19
Trustless Grid Computing
  • Minimize trust relationships among developers and
    donors
  • Avoid explicit intervention by host owners for
    running a grid application
  • Adopt a policy-based framework
  • Donors state policies for running grid
    applications
  • Developers prove compliance

20
Trustless Grid Computing
  • Example policies
  • Type- and memory safety no memory overruns, no
    violations of abstraction boundaries.
  • Resource bounds limitations on memory and cpu
    usage.
  • Authentication only from .edu, only from Robert
    Harper, only if pre-negotiated.

21
Trustless Grid Computing
  • Compliance is a matter of proof!
  • Policies are a property of the code
  • Host wishes to know that the code complies with
    its policies
  • Certified binary object code plus proof of
    compliance with host policies
  • Burden of proof is on the developer

22
Minimizing Trust
23
Certified code
  • Proof-carrying code (PCC) and typed assembly
    language (TAL)
  • Supporting x86 architecture
  • Use of types to ensure safety
  • Generated automatically by certifying compiler
  • MLton-gtPopcorn for experimentation

24
Certifying compilers
  • SpecialJ JVML
  • Generates x86 machine code
  • Formal proof of safety represented via oracle
    string
  • PopCorn Safe C dialect
  • Also generates x86 code
  • Certificate consists of type annotations on
    assembly code

25
Properties
  • Type safety
  • Resources
  • Currently primitive
  • Based on Necula/Lee count checking approach
    (LNCS98)
  • See also Crary/Weirach (POPL00)
  • Information flow?

26
The Framework
27
ConCert framework
  • Each host runs a steward
  • Locator building the grid
  • Conductor serving work
  • Player performing work
  • Inspired by Cilk/NOW (Leiserson, et al.)
  • Work-stealing model
  • Dataflow-like scheduling

28
The Steward
  • The steward is parameterized by the host policy
  • But currently it is fixed to be TAL safety
  • Declarative formalism for policies and proofs

29
Symmetric architecture
  • All users who install a steward can
  • Distribute their code
  • Run others code
  • Work stealing
  • Steward donates resources by requesting work
  • Developers do not make (unsolicited) requests for
    cycles

30
The Locator
  • Peer-to-peer discovery protocol
  • Based on GnuTella ping-pong protocol
  • Hosts send pings, receive pongs
  • Start with well-known neighbors
  • Generalizes file sharing to cycle sharing
  • State willingness to contribute cycles, rather
    than music files

31
The Conductor
  • Serves work to grid hosts
  • Implements dataflow scheduling
  • Unit of work chord
  • Entirely passive
  • Components
  • Listener on well-known port
  • Scheduler to manage dependencies

32
The Player
  • Executes chords on behalf of a host
  • Stolen from a host via its conductor
  • Sends result back to host
  • Ensures compliance with host policy

33
Programming
34
A programming model
  • An ML interface for grid programming
  • Task abstraction
  • Synchronization
  • Applications use this interface
  • Maps down to chords at the grid level
  • Currently only simulated

35
A programming model
  • signature Task sig
  • type r task
  • val inject (e -gt r) e -gt r task
  • val enable r task -gt unit
  • val forget r task -gt unit
  • val status r task -gt status
  • val sync r task -gt r
  • val relax
  • r task list -gt r r task list
  • end

36
Example Mergesort
  • fun mergesort (l)
  • let val (lt, md, rt) partition ((length
    l) div 3, l) val t1 inject (mergesort, lt)
    val t2 inject (mergesort, md) val t3 inject
    (mergesort, rt) val (a, rest) relax
    t1,t2,t3 val (b, last) relax restin
    merge (merge (a, b), sync last)end

37
Tasks and chords
  • A task is the application-level unit of
    parallelism
  • A chord is the grid-level unit of work
  • Tasks spawn chords at synch points
  • Each synch creates a chord
  • Dependencies determined by the form of the synch

38
Chords
  • A task is broken into chords
  • A chord is the unit of work
  • Chords form nodes in an and-or dataflow network
  • Conductor schedules cords for stealing
  • Ensures dependencies are met
  • Collects results, updates dependencies

39
Chords
  • A chord is essentially a closure
  • Code for the chord
  • Bindings for free variables
  • Arguments to the chord
  • Type information / proof of compliance
  • Representation splits code from data
  • Facilitates code sharing
  • Reduces network traffic
  • MD5 hash as a code pointer

40
Chord scheduling
Done
Done
Wait
Wait
41
Chord scheduling
Done
Wait
Wait
Ready
42
Failures
  • Simple fail-stop model
  • Processors fail explicitly, rather than
    maliciously
  • Timeouts for slow or dead hosts
  • Assume chords are repeatable
  • No hidden state in or among chords
  • Easily met in a purely functional setting

43
Applications
44
Applications
  • Goals
  • Expose current shortcomings
  • Make framework more robust and stable
  • Better understand programmer requirements
  • Design a programming environment

45
Application Ray tracing
  • GML language from ICFP01 programming contest
  • Simple graphics rendering language
  • Implemented in PopCorn
  • Generates TAL binaries
  • Depth-1 and-dependencies only!
  • Divide work into regions
  • One chord per region

46
Application Theorem proving
  • Fragment of linear logic
  • Sufficient to model Petri net reachability
  • Stresses and/or dependencies, depth gt 1
  • Focusing strategy to control parallelism
  • Currently uses grid simulator

47
And-or parallelism
  • AND-parallelism
  • OR-parallelism

48
Focusing
Use Parallelism Here
Sequential Implementation
Parallel Implementation
49
Managing communication
  • There are multiple ways to prove some subgoals
  • The way a subgoal is proven may affect the
    provability of other subgoals
  • Need communication?

50
Multiple results
  • Our approach
  • Each subtask returns a continuation that will
    attempt to prove the subgoal a different way (if
    requested)
  • In essence, each subtask returns all possible
    results
  • Needs the ability to register code on the
    network without starting it

51
Reliability
52
Malice aforethought
  • What about malicious hosts?
  • Deliberately spoof answers
  • Example TP always answers yes
  • What about malicious failures?
  • Arbitrary bad behavior by hosts

53
Result certification
  • Prove authenticity of answers?
  • Application computes answer plus a certificate of
    authenticity
  • Example GCD(m,n) returns (d,k,l) such that d
    kmln and dm and dn
  • Example TP computes a formal proof of the
    theorem!
  • Cf. Blums self-checking programs
  • Probabilistic methods for many problems

54
Other Activities
55
Summary
56
Summary
  • ConCert a trustless approach to grid computing
  • Hosts dont trust applications
  • Applications dont trust hosts
  • Lots of good research opportunities!
  • Compilers, languages
  • Systems, applications
  • Algorithms, semantics

57
Other activities
  • Oracle strings for Twelf
  • Proof irrelevance
  • TILT-gtTALT

58
Students
  • Framework
  • Tom Murphy, Margaret Delap
  • Applications
  • Jason Liszka, Tom Murphy, Evan Chang
  • Certification
  • Andrew Bernard, Leaf Peterson, Jason Reed
  • Language design
  • Derek Dreyer, Aleks Nanevsky

59
Project URL
  • http//www.cs.cmu.edu/concert
Write a Comment
User Comments (0)
About PowerShow.com