TACC Retrospective: Contributions, Non-Contributions, and What We Really Learned - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

TACC Retrospective: Contributions, Non-Contributions, and What We Really Learned

Description:

But: Lack of unified framework for designing apps that exploit this observation ... Joint work with Dr. Murray Mazer at the Open Group Research Institute ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 32
Provided by: arman3
Category:

less

Transcript and Presenter's Notes

Title: TACC Retrospective: Contributions, Non-Contributions, and What We Really Learned


1
TACC RetrospectiveContributions,
Non-Contributions, and What We Really Learned
  • Armando FoxUniversity of California,Berkeley
  • fox_at_cs.berkeley.edu


2
Vision The Content You Want
  • What do above apps have in common?
  • Adapt (collect, filter, transform) existing
    content
  • according to client constraints
  • respecting network limitations
  • according to per-user preferences
  • But Lack of unified framework for designing apps
    that exploit this observation

3
Contributions
  • TACC, a model for structuring services
  • Transformation, Aggregation, Caching,
    Customization of Internet content
  • Scalable TACC server
  • Based on clusters of commodity PCs
  • Easy to author industrial strength services
  • Scalable Network Service (SNS) platform maps app
    semantics onto cluster-based availability
    mechanisms
  • Experience with real users
  • 15,000 today at UCB

4
Whats TACC?
  • Transformation (local, one-to-one)
  • TranSend, Anonymizer
  • Aggregation (nonlocal, many-to-one)
  • Search engines, crawlers, newswatchers
  • Caching
  • Both original and locally-generated content
  • Customization
  • Per user for content generation
  • Per device data delivery, content packaging

5
TACC Example TranSend
  • Transparent HTTP proxy
  • On-the-fly, lossy compression of specific MIME
    types (GIF, JPG...)
  • Cache both original transformed
  • User specifies aggressiveness and refinement
    UI
  • Parameters to HTML image transformers


6
Top Gun Wingman
  • PalmPilot web browser
  • Intermediate-form page layout
  • Image scaling transcoding
  • Controlled by layout engine
  • Device-specific ADU marshalling
  • Including client versioning
  • Originals and device-specific pages cached

html

A
ADU
7
Application Partitioning
  • Client competence
  • Styled text, images, widgets are fine
  • Bitmaps unnecessary
  • Client responsiveness
  • Scrolling, etc. shouldnt require roundtrip to
    server
  • Client independence
  • Very late conversion to client-specific format

8
TACC Conceptual Data Flow
To Internet
FE
User request
  • Front end accepts RPC-like user requests
  • Users customization profile retrieved
  • Original data fetched from cache or Internet
  • Aggregation/transformation workers operate on
    data according to customization profile

9
TACC Model Summary
  • Mostly stateless, composable workers
  • Unifies previously ad hoc applications under one
    framework
  • Encourages re-use through modularization
  • Composition enables both new services and new
    clients
  • TACC breakdown provides unified way to think
    about app structure

10
Services Should Be Easy To Write
  • Rapid prototyping
  • Insulate workers from mundane details
  • Easy to incorporate existing/legacy code
  • Few assumptions about code structure
  • Must support variety of languages
  • May be fragile
  • Composition to leverage existing code

11
Building a TACC Server
  • Challenge Scalable Network Service (SNS)
    requirements
  • Scalability to 100Ks of users with high
    availability
  • Cost effective to deploy administer
  • But, services should remain easy to write
  • Server provides some bug robustness
  • Server provides availability
  • Server handles load balancing and scaling
  • Preserve modularity ( componentwise
    upgradability) when deploying

12
Layered Model of Internet Services
httpd, etc.
  • TACC Layer
  • Programming model based on composable building
    blocks
  • SNS Layer large virtual server
  • Implements SNS requirements
  • Cluster computing for hardware F/T and
    incremental scaling

TACC
ScalableNetwork Svc
  • Exploit TACC model semantics for software F/T
  • SNS layer is reusable and isolated from TACC
  • Application content orthogonal to SNS
    mechanisms
  • Key to making apps easy to write

13
Why Use a Cluster?
  • Incremental scalability, low cost components
  • High availability through hardware redundancy
  • Goals
  • Demonstrate that clusters and TACC fit well
    together
  • Separate SNS from TACC

14
Cluster-Based TACC Server
  • Component replication for scaling and
    availability
  • High-bandwidth, low-latency interconnect
  • Incremental scaling commodity PCs

User ProfileDatabase
Caches
Front Ends
Workers
Load Balancing Fault Tolerance
AdministrationInterface
15
Starfish Availability LB Death
  • FE detects via broken pipe/timeout, restarts LB

C
FE



FE
FE
LB/FT
16
Starfish Availability LB Death
  • FE detects via broken pipe/timeout, restarts LB
  • New LB announces itself (multicast), contacted by
    workers, gradually rebuilds load tables
  • If partition heals, extra LBs commit suicide
  • FEs operate using cached LB info during failure

C
FE



FE
FE
LB/FT
17
Starfish Availability LB Death
  • FE detects via broken pipe/timeout, restarts LB
  • New LB announces itself (multicast), contacted by
    workers, gradually rebuilds load tables
  • If partition heals, extra LBs commit suicide
  • FEs operate using cached LB info during failure

C
FE



FE
FE
LB/FT
18
Fault Recovery Latency
Task queue length
19
Behavior in the Large
  • TranSend 160 image transformations/sec 10
    Ultra-1 servers
  • Peak seen during UCB traces on 700-modem bank
    15/sec
  • Amortized hardware cost lt0.35/user/month (one
    5K PC serving 15,000 subscribers)
  • Wingman factor of 6-8 worse
  • Administration one undergraduate part-time

20
Building a Big System
  • Restartable, atomic workers
  • Read-only data from other origin server(s)
  • Orthogonal separation of scalability/availability
    from application content
  • Multiple lines of defense
  • App modules agree to obey semantics compatible
    with these mechanisms
  • Common-case failure behavior compatible with
    users Internet experience
  • Enables reuse of whole workers, however diverse

21
Availability Scalability Summary
  • Pervasive strategy timeout, retry, restart
  • Transient failures usually invisible to user
  • Process peers watch each other
  • Mostly stateless workers, xact support possible
  • Simplicity from exploiting soft state
  • Piggyback status info on multicast beacons
  • Use of stale LB info fine in practice
  • Starfish availability works in practice

22
Service Authoring
  • Keyword hiliting lt 1 day
  • Wingman 2-3 weeks
  • Various apps from graduate seminar projects
  • Safe worker upload
  • Annotate the Web
  • Channel aggregators

23
New Services By Composition
  • Compose existing services to create a new one
  • 2.5 hours to implement
  • Composes with TranSend or Wingman

Internet
TranSend Metasearch
24
Experience With Real Users
  • Transparent enhancements
  • Minimal downtime
  • Low administration cost
  • Multicast-based administration GUI
  • Virtually no dedicated resources at UCB
  • Overflow pool of 100 UltraSPARC servers
  • Users dont mind relying on middleware proxy

25
Why Now?
  • Internets critical mass
  • Commercial push for many device types (transistor
    curves)
  • Cluster computing economically viable
  • A good time for infrastructural services

26
Related Work
  • Transformational proxy services WBI, Strands
  • Application partitioning Wit, InfoPad, PARC
    Ubiquitous Computing
  • Computing in the infrastructure Active Networks
  • Soft state for simplicity and robustness
    Microsoft Tiger, multicast routing protocols

27
Summary of Contributions
  • TACC, a composition-based Internet services
    programming model
  • captures rich variety of apps
  • one view of customization
  • No-hassle deployment on a cluster
  • Automatic and robust partial-failure handling
  • Availability scaling strategies work in
    practice
  • New apps are easy to write, deploy, debug
  • SNS behaviors are free
  • Compose existing services to enable new clients

28
Non-Contributions (a/k/a Future Work)
  • Accidental contributions
  • Legacy code glue
  • Cheap test rig for next project (prototyping path
    discovery a bare bones cluster OS)
  • Non-contributions
  • Fair resource allocation over cluster
  • Built-in security abstractions
  • Rich state management abstractions

29
What We Really Learned
  • Design for failure
  • It will fail anyway
  • End-to-end argument applied to availability
  • Orthogonality is even better than layering
  • Narrow interface vs. no interface
  • A great way to manage system complexity
  • The price of orthogonality
  • Techniques Refreshable soft state
    watchdogs/timeouts sandboxing

30
How About State Management?
  • Transactional apps?
  • APIs are there, but you have to roll your own
    consistency
  • Groupware apps with group state?
  • One way distributed, F/T group state like SRM!
  • Keeps state management orthogonal to SNS layer

The Moral Consistency, Availability,
Partition-resilience pick at most 2
31
Future Work
  • TACC as test rig for Ninja
  • Taxonomy of app structure and platforms
  • What is the big picture of different types of
    Internet services, and where does TACC fit in?
  • Joint work with Dr. Murray Mazer at the Open
    Group Research Institute
  • Apply lessons to reliable distributed systems
  • Formalize programming model
  • Finish writing thesis
Write a Comment
User Comments (0)
About PowerShow.com