Yasushi Saito - PowerPoint PPT Presentation

1 / 62

About This Presentation

Title:

Yasushi Saito

Description:

Functionally Homogeneous Clustering: A New Architecture for Scalable ... Functional homogeneous clustering: Dynamic data and function distribution. ... – PowerPoint PPT presentation

Number of Views:65

Avg rating:3.0/5.0

Slides: 63

Provided by: Yasu6

Category:

more less

Transcript and Presenter's Notes

Title: Yasushi Saito

1
Functionally Homogeneous Clustering A New
Architecture for Scalable Data-intensive
Internet Services

Yasushi Saito
yasushi_at_cs.washington.edu

University of Washington Department of Computer
Science and Engineering, Seattle, WA
2
Goals

Use cheap, unreliable hardware components to
build scalable data-intensive Internet services.
Data-intensive Internet services email, BBS,
calendar, etc..
Three facets of scalability ...
Performance linear increase with system size.
Manageability react to changes automatically.
Availability survive failures gracefully.

3
Contributions

Functional homogeneous clustering
Dynamic data and function distribution.
Exploitation of application semantics.
Three techniques
Naming and automatic recovery.
High-throughput optimistic replication.
Load balancing.
Email as the first target application.
Evaluation of the architecture using Porcupine.

4
Presentation Outline

Introduction.
What are Internet data-intensive services?
Existing solutions and their problems.
Functional homogeneous clustering.
Challenges and solutions.
Performance scaling.
Reacting to failures and recoveries.
Deciding on data placement.
Conclusion.

5
Data-intensive Internet services

Examples Email, Usenet, BBS, calendar, Internet
collaboration (photobook, equill.com, crit.org).
Growing rapidly as demand for personal services
grows.

High update frequency
Low access locality.
Web techniques (caching, stateless data
transformation) not effective.
Weak data consistency requirements.
Well-defined structured data access path.
Embarrassingly parallel.
? RDB overkill.

6
Rationale for Email

Email as the first target application.
Most important among data-intensive services.
Service concentration (Hotmail, AOL,, ...).
? Practical demands.
The most update-intensive.
No access locality.
? Challenging application.
Prototype implementation
Porcupine email server.

7
Conventional SolutionsBig Iron

Just buy a big machine
Easy deployment
Easy management

Limited scalability
Single failure domain
- Really expensive

8
Conventional SolutionsClustering

Connect many small machines
Cheap
Incremental scalability
Natural failure boundary

- Software managerial complexity.
9
Existing Cluster Solutions

Static partitioning assign data function to
nodes statically.
Management problems
Manual data partition.
Performance problems
No dynamic load balancing.
Availability problems
Limited fault tolerance.

10
Presentation Outline

Introduction
Functionally homogeneous clustering
Key concepts.
Key techniques recovery, replication, load
balancing.
Basic operations and data structures.
Challenges and solutions
Evaluation
Conclusion

11
Functionally Homogeneous Clustering

Clustering is the way to go.
Static function and data partitioning leads to
the problems.
So, make everything dynamic
Any node can handle any task (client interaction,
user management, etc).
Any node can store any piece of data (email
messages, user profile).

12
Advantages

Advantages
Better load balance, hot spot dispersion.
Support for heterogeneous clusters.
Automatic reconfiguration and task
re-distribution upon node failure/recovery. Easy
node addition/retirement.
Results
Better Performance.
Better Manageability.
Better Availability.

13
Challenges

Dynamic function distribution
Solution run every function on every node.
Dynamic data distribution
How are data named and located?
How are data placed?
How do data survive failures?

14
Key Techniques and Relationships
Functional Homogeneity
Framework
Load Balancing
Name DB w/ Reconfiguration
Techniques
Replication
Goals
Manageability
Performance
Availability
15
Overview Porcupine
Replication Manager
Mail map
Email msgs
User profile
16
Receiving Email in Porcupine
Protocol handling
User lookup
Load Balancing
Data store (replication)
C
D
A
DNS-RR selection
1. send mail to bob
4. OK, bob has msgs on C, D, E
7. Store msg
3. Verify bob
6. Store msg
...
...
A
B
C
D
B
5. Pick the best nodes to store new msg ? C,D
2. Who manages bob? ? A
17
Basic Data Structures
bob
hash(bob) 2
User map
B
C
A
C
A
B
A
C
B
C
A
C
A
B
A
C
Mail map / user profile
bob A,C
suzy A,C
joe B
ann B
Bobs MSGs
Suzys MSGs
Bobs MSGs
Joes MSGs
Anns MSGs
Suzys MSGs
Mailbox storage
A
B
C
18
Presentation Outline

Overview
Functionally homogeneous clustering
Challenges and solutions
Scaling performance
Reacting to failures and recoveries
Recovering name space
Replicating of on-disk data
Load balancing
Evaluation
Conclusion

19
Scaling Performance

User map distributes user management
responsibility evenly to nodes.
Load balancing distributes data storage
responsibility evenly to nodes.
Workload is very parallel.
? Scalable performance.

20
Measurement Environment

Porcupine email server.
Linux-2.2.7glibc-2.1.1ext2.
50,000 lines of C code.
30 node cluster of not-quite-all-identical PCs.
100Mb/s Ethernet 1Gb/s hubs.
Performance disk-bound..
Homogeneous configuration.
Synthetic load
Modeled after UWCSE server.
Mixture of SMTP and POP sessions.

21
Porcupine Performance
POP performance, no email replication
68m/day
25m/day
22
Presentation Outline

Overview
Functionally homogeneous clustering
Challenges and solutions
Scaling performance
Reacting to failures and recoveries
Recovering name space
Replicating of on-disk data
Load balancing
Evaluation
Conclusion

23
How Do Computers Fail?

Large clusters are unreliable.
Assumption live nodes respond correctly in
bounded time time most of the time.
Network can partition
Nodes can become very slow temporarily.
Nodes can fail (and may never recover).
Byzantine failures excluded.

24
Recovery Goals and Strategies

Goals
Maintain function after unusual failures.
React to changes quickly.
Graceful performance degradation / improvement.
Strategy Two complementary mechanisms.
Make data soft as much as possible.
Hard state email messages, user profile.
? Optimistic fine-grain replication.
Soft state user map, mail map.
? Reconstruction after configuration change.

25
Soft-state Recovery Overview
2. Distributed disk scan
1. Membership protocol Usermap recomputation
B
A
A
B
A
B
A
B
B
A
A
B
A
B
A
B
A
bob A,C
bob A,C
bob A,C
suzy A,B
suzy
B
A
A
B
A
B
A
B
B
A
A
B
A
B
A
B
B
joe C
joe C
joe C
ann B
ann
suzy A,B
C
suzy A,B
suzy A,B
ann B,C
ann B,C
ann B,C
Timeline
26
Cost of Soft State Recovery

Data bucketing allows fast discovery.
Cost of a bucket scan
? O(U).
of buckets scanned.
? O(1/N).
Freq. of changes.
? O(N/MTBF).
Total cost.
O(U/MTBF) .

U5 million per node
27
How does Porcupine React to Configuration Changes?
See breakdown
28
Soft state recovery Summary

Scalable reliable recovery.
Quick, constant-cost recovery.
Recover soft state after any type/number of
failures.
No residual references to dead nodes.
Proven correct
Soft state will eventually and correctly reflect
the contents in the disk.

29
Replicating Hard State

Goals
Keep serving hard state (email msgs, user
profile) after unusual failures.
Per-object site replica-site selection.
Space- and computational- efficiency.
Dynamic addition/removal of replicas.
Strategy Exploit application semantics.
Be optimistic.
Whole state transfer Thomas write rule.

30
Example Update Propagation
Object contents
Retire 310pm
310pm
A
310pm
Ack 310pm
C
A
A
Replica set
A
B
C
A
B
C
B
A
B
C
31
Example Update Propagation
Object contents
A
C
Replica set
A
B
C
A
B
C
B
310pm
310pm
A
A
C
A
B
C
Timestamp
Ack set
32
Example Update Propagation
A
C
A
B
C
A
B
C
B
310pm
310pm
A
A
C
A
B
C
310pm
A
B
33
Example Update Propagation
A
C
A
B
C
A
B
C
B
310pm
310pm
A
A
B
C
A
C
A
B
C
310pm
A
B
34
Example Update Propagation
A
Ack 310pm
C
A
B
C
A
B
C
B
310pm
310pm
A
A
B
C
A
C
A
B
C
310pm
A
B
Ack 310pm
35
Example Update Propagation
A
C
A
B
C
A
B
C
B
310pm
310pm
A
B
C
A
C
A
B
C
310pm
A
B
36
Example Update Propagation
Retire 310pm
A
C
A
B
C
A
B
C
B
310pm
310pm
A
B
C
A
C
A
B
C
310pm
A
B
Retire 310pm
37
Example Update Propagation
A
C
A
B
C
A
B
C
B
A
B
C
38
Replica Addition and Removal
A issues an update to delete C
A
B
A
B
C
B

Unified treatment of updates to contents and
to replica-set.

310pm
A
B
A
B
C
A
B
C
C
A
A
New replica set
Target set
Ack set
39
What If Updates Conflict?

Apply Thomas write rule.
Newest update always wins.
Older update canceled by overwriting by the newer
update.
Same rule applied to replica addition/deletion.
But some subtleties...

40
Node Discovery Protocol
D
A
B
A
B
C
320pm
310pm
A
B
D
A
B
C
A
B
C
D
C
A
B
A
B
D
A
B
C
D
B
A
310pm
320pm
Apply 320 update
A
B
A
B
C
D
A
B
C
A
B
D
C
Add targets C
A
B
A
B
New replica set
Target set
Ack set
41
Replication Space Overhead
Spool size2GB, Avg email msg4.7KB
42
How Efficient is Replication?
43
Replication Summary

Flexibility
Allow any object to be stored on any node.
Support dynamic replica-set changes.
Simplicity efficiency
Two-phase propagation/retirement.
Unifying contents- and replica-set updates.
Proven correct
All live replicas agree on the newest contents,
regardless of of concurrent updates and of
failures.
When network do not partition for long period.

44
Presentation Outline

Overview
Functionally homogeneous clustering
Challenges and solutions
Reacting to failures and recoveries
Soft state namespace recovery
Replication
Load balancing
Conclusion

45
Distributing Incoming Workload
Users mail map

Goals
Minimize voodoo parameter tuning.
Handle skewed configuration.
Handle skewed workload.
Lightweight.
Reconcile affinity load balance.
Strategy Local spread-based load balancing.
Spread soft limit on mailmap . .
Load measure of pending disk I/O requests.

1. Add nodes if mailmapltspread
Nodes load (cached)
2. Pick least loaded node(s) from the set
46
Choosing the Optimal Spread Limit

Trade off
Large spread ? more nodes to choose from.
? better load balance.
Small spread ? fewer files to access.
? better overall throughput.
Spread 2 optimal for uniform configuration.
Spread gt 2 (e.g., 4) for heterogeneous
configuration.

47
How Well does Porcupine Support Heterogeneous
Clusters?
16.8m/day (25)
0.5m/day (0.8)
48
Presentation Outline

Overview
Functionally homogeneous clustering
Challenges and solutions
Evaluation
Conclusion
Summary
Future directions

49
Conclusions

Cheap, fast, available, and manageable clusters
can be built for data-intensive Internet
services.
Key ideas can be extended beyond mail.
Dynamic data and function distribution.
Automatic reconfiguration.
High-throughput, optimistic replication.
Load balancing.
Exploiting application semantics.
Use of soft state.
Optimism.

50
Future Directions

Geographical distribution.
Running multiple services.
Software reuse.

51
Example Replica Removal
A
C
B
A
B
C
A
B
C
310pm
A
B
C
A
C
52
Example Replica Removal
A
C
B
A
B
A
B
C
310pm
310pm
A
B
C
A
B
C
A
C
A
B
A
Targets
New replica set
Ack set
53
Example Replica Removal
A
C
B
A
B
A
B
C
A
B
C
54
Example Replica Removal
A
C
B
A
B
310pm
A
B
A
C
310pm
A
B
C
A
B
A
B
55
Example Updating Contents
Object contents
Replica set
A
B
C
A
B
C
Timestamp
C
Ack set
310pm
A
A
A
B
C
Update record (exists only during update
propagation)
B
56
Example Update Propagation
A
B
C
A
B
C
310pm
310pm
A
A
C
A
B
C
A
C
310pm
A
B
B
57
Update Retirement
Retire 310pm
A
B
C
A
B
C
310pm
310pm
A
B
C
A
C
A
B
C
A
C
310pm
A
B
Retire 310pm
B
58
Example Final State

Algorithm quiescent after update retirement
New contents absent from the update record
Contents are read from replica directly
Update stored only during propagation
Computational space efficiency

A
A
B
C
B
A
B
C
C
A
B
C
59
Handling Long-term Failures

Algorithm maintains consistency of remaining
replicas.
But updates will get stuck and clog nodes
disks.
Solution erase dead nodes names from replica
sets update records after the grace period.

60
Replication Space Overhead
? 6-17 MB for replica setupdate records.
? 2000 MB for email msgs
61
Scaling to Large User Population

Large user population increases the memory
requirement.
Recovery cost grows linearly w/ per-node user
population.

62
Rebalancing

Load balancing may cause suboptimal data
distribution after node addition/retirement.
Resource wasted during night (1/2 to 1/5
traffic).
Rebalancer
Runs during midnight.
Adds replicas for under-replicated objects.
Removes replicas for over-replicated objects.
Deletes objects without owners.

Write a Comment

User Comments (0)