Title: EECE 411: Design of Distributed Software Applications (or Distributed Systems 101)
1EECE 411 Design of Distributed Software
Applications(or Distributed Systems 101)
-
- Matei Ripeanu
- http//www.ece.ubc.ca/matei
2Todays Objectives
- Class mechanics
- http//www.ece.ubc.ca/matei/EECE411/
- Understand real-world applications in terms of
- Motivation and objectives
- Resource requirements
- compute/storage/network resources
- Architecture (distributed systems part)
- Examples P2P applications
- Start thinking of computer networks from the
perspective of a networked-application
3P2P Definition(s)
- Def 1 A class of applications that takes
advantage of resources storage, cycles,
content, human presence available at the
edges of the Internet. - Edges often turned on/off, without permanent IP
addresses - Def 2 A class of decentralized,
self-organizing distributed systems, in which
all or most communication is symmetric. - Lots of other definitions that fit in between
- Lots of (P2P?) systems that fit nowhere
4P2P Impact Widespread adoption
- Skype
- 560M registered users (Q210)
- 120M active, 8M paying
- 15M user online
- Number of users for file-sharing applications
(estimate www.slyck.com, Sept 06) - P2P design techniques
- are now mainstream!
- Â
5P2P Impact (2) Huge resource users
- P2P generated traffic now dominates the Internet
load (30-50 of the traffic) - Cornell.edu (March 02) 60 P2P
- Internet2 traffic statistics
6P2P Impact (3) Demonstrate that small,
volatile, non-proprietary resources can be
efficiently harnessed
- Resources CPU, storage space,
- Also network bandwidth, availability, user
attention and expertise - Boinc statistics
7P2P Impact (4) Social / Business
- Ability to aggregate resources at large scale is
disruptive - May force companies to change their business
models - Digital content production and distribution
- Telecommunications companies
- New collaboration models
- Crowd-sourcing!
8Roadmap
- Definitions
- Impact
- Applications
- Mechanisms
- A case study
9Applications (1) Number crunching
- Examples Boinc, Folding_at_Home, Seti_at_Home, etc.
- Application characteristics
- Massive parallelism
- Low bandwidth/computation ratio
- Error tolerance
- Users do donate real resources
- However
- Centralized.
- Cheating!
- Approach suitable for a particular class of
problems. - How to extend the model to problems that are not
massively parallel
1.5M / year for power only
10Applications (2) Online content distribution
(files, streaming)
- The killer application to date
- Too many to list them all
- BitTorrent, FastTrack (KaZaA, KazaaLite, iMesh),
Gnutella (LimeWire,BearShare) - Two independent problems
- Distributed index
- Fast content download
- Environment unreliable, non-cooperative
11Applications (3) Performance evaluation
- Poor online performance is costly
- 25 billion per year (Zone Research)
- 28 of attempted online purchases fail (BCG)
- Slow page download is the primary reason for
abandoning a transaction - User expectations for page download are around 4
seconds - Performance evaluation monitoring requires
multiple vantage points - Connectivity statistics
- Routing errors
- Evaluate Web-site performance form end-user
perspective
12Measurements The Performance Blind Spot
Back-end Infrastructure
NetworkLandscape
Web server
ISP
Backbone
Enterprise Provider
Database
T1
Firewall
Corporate User
CorporateNetwork
ISP
App server
Backbone
MajorProvider
3rd partycontent
RegionalNetwork
Local ISP
ComponentTesting
Datacenter Monitoring
- BMC
- Mercury Interactive
- Tivoli
- ProactiveNet
- HP OpenView
- Computer Associates
Consumer User
- Keynote Systems
- Mercury Interactive
- BMC/SiteAngel
- Service Metrics
Critical to estimate end-to-end performance
Slide source www.porivo.com
13Measurements End-to-end Performance
Back-end Infrastructure
NetworkLandscape
Web server
ISP
Backbone
Enterprise Provider
Database
T1
Firewall
Corporate User
CorporateNetwork
ISP
App server
Backbone
MajorProvider
3rd partycontent
RegionalNetwork
Local ISP
ComponentTesting
Datacenter Monitoring
Consumer User
End-to-endWeb PerformanceTesting
Slide source www.porivo.com
Slide source www.porivo.com
14More applications
- Backup storage (HiveNet, OceanStore)
- Crowd-sourcing
- Spam filtering
- Anonymous email
- Censorship-resistant publishing systems
(Ethernity, Freenet)
15Roadmap
- Definitions
- Impact
- Applications
- Mechanisms
- A Case Study
16Mechanisms
- To obtain a resilient system
- use redundancy for data and services
- integrate multiple components with uncorrelated
failure curves. - To reduce cost and improve the QoS delivered
- move service delivery closer to the user
- integrate multiple clients with uncorrelated
demand curves - (lower over-provisioning at resource providers)
17Example (I) Cooperative Web serving
Origin Server
www.matei.com
Problem Flash-crowds!
dnssrv
DNS Query
Resolver
Browser
www.matei.com
216.165.108.10
18Example (I) Cooperative Web serving
Origin Server
httpprx
?
httpprx
dnssrv
Fetch data from nearby
DNS Redirection Return proxy, preferably one
near client
Cooperative Web Caching
Resolver
Browser
akamai.cnn.com
216.165.108.10
19Roadmap
- Definitions
- Impact
- Uses and Examples
- Mechanisms
- A case study
- File sharing The Gnutella Network BitTorrent
20Basic Primitives for File Sharing
- Join How do I begin participating?
- Publish How do I advertise my file(s)?
- Search How do I find a file?
- Fetch How do I retrieve a file?
- Lots of different solutions for each of these
four primitives.
21What makes these systems interesting?
- Large scale
- Self-organizing networks
- Fast growth
- Gnutella more than 50x during first half of
2001 50x again 2001 to 2006 - Open architecture, simple and flexible protocols
- Interesting mix of social and technical issues
22Gnutella search mechanism
Boston
Chicago
MIT
UBC
Beatles Yellow Submarine
QBeatles
Calgary
- Search steps
- Initiates search for Yellow Submarine
- Sends message to all neighbors
- Neighbors forward message
- Initiate reply message
- Reply message is back-propagated
- File download
23Gnutella Overview
- Join on startup, client contacts a few other
nodes these become its neighbors - Publish no need
- Search
- Flooding pass query to neighbors, who pass the
query in turn to their own neighbors, and so
on... - Back-propagation in case of success
- Fetch get the file directly from peer (HTTP)
- Note this was the original design. Later the
network moved to a two-layer structure
24BitTorrent
- Ingredients
- A seed node that has the file
- A .torrent meta-file is built for the file
- A web-sever (usually) to index torrents
- A tracker node is associated with each file
- Identified in the .torrent
- File is split into fixed-size segments (e.g.,
256KB)
25How does it work
26Overview system components
27Overview system components
28Overview system components
29Overview system components
30Overview system components
31Overview system components
32BitTorrent Overview
- Join nothing
- just find a server/community
- Publish create tracker, spread .torrent file
- Search
- for file (not included in the protocol)
- the community is supposed to provide search tools
- for segments exchange segment IDs maps with
other peers. - Fetch exchange segments with other peers (HTTP)
33Gnutella vs. BitTorrent Discussion
- System properties
- Reliability?
- Scalability?
- Fairness?
- Overheads?
- Quality of Service
- Search coverage for content?
- Ability to download content fast?
- Ability to survive flash crowds?
The rest of this course How to build
(distributed) systems with desirable properties.
34Assignment 0
- To do Subscribe to mailing list
35(No Transcript)
36Gnutella -- Network Resilience
Topology
Random 30 die
Targeted 4 die
from Saroiu et al., MMCN 2002
37Gnutella Query distribution
- Highly heterogeneous distribution for query
popularity - similar to Web pages popularity
- ? caching will work well
from Kunwadee et al., 2002
38Gnutella Topology issues (1)
39Gnutella Topology Mismatch