Scaleabilty - PowerPoint PPT Presentation

About This Presentation
Title:

Scaleabilty

Description:

Doors, rooms, cars... Computing will be ubiquitous. Billions Of Clients. Need ... Refrigerator-sized CPU. 1987: Tandem Mini _at_ 256 tps. 14 M$ computer (Tandem) ... – PowerPoint PPT presentation

Number of Views:70
Avg rating:3.0/5.0
Slides: 88
Provided by: jimg178
Category:

less

Transcript and Presenter's Notes

Title: Scaleabilty


1
Scaleabilty
  • Jim Gray
  • Gray_at_Microsoft.com
  • (with help from Gordon Bell, George Spix,
    Catharine van Ingen

Mon
Tue
Wed
Thur
Fri
900
Overview
TP mons
Log
Files Buffers
B-tree
1100
Faults
Lock Theory
ResMgr
COM
Access Paths
130
Tolerance
Lock Techniq
CICS Inet
Corba
Groupware
330
T Models
Queues
Adv TM
Replication
Benchmark
700
Party
Workflow
Cyberbrick
Party
2
A peta-op business app?
  • PG and friends pay for the web (like they paid
    for broadcast television) no new money, but
    given Moore, traditional advertising revenues can
    pay for all of our connectivity - voice, video,
    data (presuming we figure out how to allow
    them to brand the experience.)
  • Advertisers pay for impressions and ability to
    analyze same.
  • A terabyte sort a minute to one a second.
  • Bisection bw of 20gbytes/s to 200gbytes/s.
  • Really a tera-op business app (todays portals)

3
ScaleabilityScale Up and Scale Out
Grow Up with SMP 4xP6 is now standard Grow Out
with Cluster Cluster has inexpensive parts
Cluster of PCs
4
There'll be Billions Trillions Of Clients
  • Every device will be intelligent
  • Doors, rooms, cars
  • Computing will be ubiquitous

5
Billions Of ClientsNeed Millions Of Servers
Trillions
Billions
  • All clients networked to servers
  • May be nomadicor on-demand
  • Fast clients wantfaster servers
  • Servers provide
  • Shared Data
  • Control
  • Coordination
  • Communication

Clients
Mobileclients
Fixedclients
Servers
Server
Super server
6
ThesisMany little beat few big
1 million
100 K
10 K
Pico Processor
Nano
Micro
10 pico-second ram
1 MB
Mini
Mainframe
10
0

MB
1
0 GB
1
TB
1
00 TB
1.8"
2.5"
3.5"
5.25"
1 M SPECmarks, 1TFLOP 106 clocks to bulk
ram Event-horizon on chip VM reincarnated Multi
-program cache, On-Chip SMP
9"
14"
  • Smoking, hairy golf ball
  • How to connect the many little parts?
  • How to program the many little parts?
  • Fault tolerance Management?

7
4 B PCs (1 Bips, .1GB dram, 10 GB disk 1 Gbps
Net, BG) The Bricks of Cyberspace
  • Cost 1,000
  • Come with
  • NT
  • DBMS
  • High speed Net
  • System management
  • GUI / OOUI
  • Tools
  • Compatible with everyone else
  • CyberBricks

8
Computers shrink to a point
Kilo Mega Giga Tera Peta Exa Zetta Yotta
  • Disks 100x in 10 years 2 TB 3.5 drive
  • Shrink to 1 is 200GB
  • Disk is super computer!
  • This is already true of printers and terminals

9
Super Server 4T Machine
  • Array of 1,000 4B machines
  • 1 b ips processors
  • 1 B B DRAM
  • 10 B B disks
  • 1 Bbps comm lines
  • 1 TB tape robot
  • A few megabucks
  • Challenge
  • Manageability
  • Programmability
  • Security
  • Availability
  • Scaleability
  • Affordability
  • As easy as a single system

Cyber Brick a 4B machine
Future servers are CLUSTERS of processors,
discs Distributed database techniques make
clusters work
10
Cluster VisionBuying Computers by the Slice
  • Rack Stack
  • Mail-order components
  • Plug them into the cluster
  • Modular growth without limits
  • Grow by adding small modules
  • Fault tolerance
  • Spare modules mask failures
  • Parallel execution data search
  • Use multiple processors and disks
  • Clients and servers made from the same stuff
  • Inexpensive built with commodity CyberBricks

11
Systems 30 Years Ago
  • MegaBuck per Mega Instruction Per Second (mips)
  • MegaBuck per MegaByte
  • Sys Admin Data Admin per MegaBuck

12
Disks of 30 Years Ago
  • 10 MB
  • Failed every few weeks

13
1988 IBM DB2 CICS Mainframe65 tps
  • IBM 4391
  • Simulated network of 800 clients
  • 2m computer
  • Staff of 6 to do benchmark

2 x 3725 network controllers
Refrigerator-sized CPU
16 GB disk farm 4 x 8 x .5GB
14
1987 Tandem Mini _at_ 256 tps
  • 14 M computer (Tandem)
  • A dozen people (1.8M/y)
  • False floor, 2 rooms of machines

Admin expert
32 node processor array
Performance expert
Hardware experts
Simulate 25,600 clients
Network expert
Auditor
Manager
40 GB disk array (80 drives)
OS expert
DB expert
15
1997 9 years later1 Person and 1 box 1250 tps
  • 1 Breadbox 5x 1987 machine room
  • 23 GB is hand-held
  • One person does all the work
  • Cost/tps is 100,000x less5 micro dollars per
    transaction

4x200 Mhz cpu 1/2 GB DRAM 12 x 4GB disk
Hardware expert OS expert Net expert DB
expert App expert
3 x7 x 4GB disk arrays
16
What Happened?Where did the 100,000x come from?
  • Moores law 100X (at most)
  • Software improvements 10X (at most)
  • Commodity Pricing 100X (at least)
  • Total 100,000X
  • 100x from commodity
  • (DBMS was 100K to start now 1k to start
  • IBM 390 MIPS is 7.5K today
  • Intel MIPS is 10 today
  • Commodity disk is 50/GB vs 1,500/GB
  • ...

17
Web server farms, server consolidation /
sqft http//www.exodus.com (charges by mbps
times sqft)
Standard package, full height, fully populated,
3.5 disks HP, DELL, Compaq are trading places
wrt rack mount lead PoPC Celeron NLX shoeboxes
1000 nodes in 48 (24x2) sq ft.
650K from Arrow (3yr warrantee!) on chip at
speed L2
18
Application Taxonomy
General purpose, non-parallelizable codesPCs
have it! Vectorizable Vectorizable
//able(Supers small DSMs) Hand tuned,
one-ofMPP course grainMPP embarrassingly
//(Clusters of PCs) DatabaseDatabase/TP Web
Host Stream Audio/Video
Technical
Commercial
If central control rich then IBM or large
SMPs else PC Clusters
19
Peta Scale Computing
 
10x every 5 years, 100x every 10 (1000x in 20
if SC) Except --- memory IO bandwidth
 
 
20
I think there is a world market for maybe five
computers.


Thomas Watson Senior, Chairman of IBM, 1943
21
Microsoft.com 150x4 nodes a crowd
(3)
22
HotMail (a year ago) 400 Computers Crowd (now
2x bigger)
23
DB Clusters (crowds)
  • 16-node Cluster
  • 64 cpus
  • 2 TB of disk
  • Decision support
  • 45-node Cluster
  • 140 cpus
  • 14 GB DRAM
  • 4 TB RAID disk
  • OLTP (Debit Credit)
  • 1 B tpd (14 k tps)

24
The Microsoft TerraServer Hardware
  • Compaq AlphaServer 8400
  • 8x400Mhz Alpha cpus
  • 10 GB DRAM
  • 324 9.2 GB StorageWorks Disks
  • 3 TB raw, 2.4 TB of RAID5
  • STK 9710 tape robot (4 TB)
  • WindowsNT 4 EE, SQL Server 7.0

25
TerraServer Lots of Web Hits
  • A billion web hits!
  • 1 TB, largest SQL DB on the Web
  • 100 Qps average, 1,000 Qps peak
  • 877 M SQL queries so far

26
TerraServer Availability
  • Operating for 13 months
  • Unscheduled outage 2.9 hrs
  • Scheduled outage 2.0 hrsSoftware upgrades
  • Availability 99.93 overall up
  • No NT failures (ever)
  • One SQL7 Beta2 bug
  • One major operator-assisted outage

27
Backup / Restore

28
Windows NT Versus UNIXBest Results on an SMP
SemiLog plot shows 3x (2 year) lead by UNIX
Does not show Oracle/Alpha Cluster at 100,000
tpmCAll these numbers are off-scale huge (40,000
active users?)
29
TPC C Improvements (MS SQL) 250/year on Price,
100/year performancebottleneck is 3GB address
space
40 hardware, 100 software, 100 PC Technology
30
UNIX (dis) Economy Of Scale
31
Two different pricing regimesThis is late 1998
prices
32
Storage Latency How far away is the data?
33
Thesis Performance Storage Accesses not
Instructions Executed
  • In the old days we counted instructions and
    IOs
  • Now we count memory references
  • Processors wait most of the time

Where the time goes
clock ticks used by AlphaSort Components
Disc Wait
Sort
Sort
Disc Wait
OS
Memory Wait
34
Storage Hierarchy (10 levels)
  • Registers, Cache L1, L2
  • Main (1, 2, 3 if nUMA).
  • Disk (1 (cached), 2)
  • Tape (1 (mounted), 2)

35
Todays Storage Hierarchy Speed Capacity vs
Cost Tradeoffs
Size vs Speed
Price vs Speed
Cache
Nearline
Tape
Offline
Main
Tape
Disc
Secondary
Online
Online
Secondary
/MB
Tape
Tape
Disc
Typical System (bytes)
Main
Offline
Nearline
Tape
Tape
Cache
-9
-6
-3
0
3
-9
-6
-3
0
3
10
10
10
10
10
10
10
10
10
10
Access Time (seconds)
Access Time (seconds)
36
Meta-Message Technology Ratios Are Important
  • If everything gets faster cheaper at the
    same rate THEN nothing really changes.
  • Things getting MUCH BETTER
  • communication speed cost 1,000x
  • processor speed cost 100x
  • storage size cost 100x
  • Things staying about the same
  • speed of light (more or less constant)
  • people (10x more expensive)
  • storage speed (only 10x better)

37
Storage Ratios Changed
  • 10x better access time
  • 10x more bandwidth
  • 4,000x lower media price
  • DRAM/DISK 1001 to 1010 to 501

38
The Pico Processor
1 M SPECmarks 106 clocks/ fault to
bulk ram Event-horizon on chip. VM
reincarnated Multi-program cache
Terror Bytes!
39
Bottleneck Analysis
  • Drawn to linear scale

Theoretical Bus Bandwidth 422MBps 66 Mhz x 64
bits
MemoryRead/Write 150 MBps
MemCopy 50 MBps
Disk R/W 9MBps
40
Bottleneck Analysis
  • NTFS Read/Write
  • 18 Ultra 3 SCSI on 4 strings (2x4 and 2x5) 3 PCI
    64
  • 155 MBps Unbuffered read (175 raw)
  • 95 MBps Unbuffered write
  • Good, but 10x down from our UNIX brethren (SGI,
    SUN)


155 MBps
41
PennySort
  • Hardware
  • 266 Mhz Intel PPro
  • 64 MB SDRAM (10ns)
  • Dual Fujitsu DMA 3.2GB EIDE disks
  • Software
  • NT workstation 4.3
  • NT 5 sort
  • Performance
  • sort 15 M 100-byte records (1.5 GB)
  • Disk to disk
  • elapsed time 820 sec
  • cpu time 404 sec

42
Penny Sort Ground Ruleshttp//research.microsoft.
com/barc/SortBenchmark
  • How much can you sort for a penny.
  • Hardware and Software cost
  • Depreciated over 3 years
  • 1M system gets about 1 second,
  • 1K system gets about 1,000 seconds.
  • Time (seconds) SystemPrice () / 946,080
  • Input and output are disk resident
  • Input is
  • 100-byte records (random data)
  • key is first 10 bytes.
  • Must create output file and fill with sorted
    version of input file.
  • Daytona (product) and Indy (special) categories

43
How Good is NT5 Sort?
  • CPU and IO not overlapped.
  • System should be able to sort 2x more
  • RAM has spare capacity
  • Disk is space saturated (1.5GB in, 1.5GB out on
    3GB drive.) Need an extra 3GB drive or a gt6GB
    drive


Disk
CPU
Fixed
ram
44
Sandia/Compaq/ServerNet/NT Sort
  • Sort 1.1 Terabyte (13 Billion records) in 47
    minutes
  • 68 nodes (dual 450 Mhz processors)543 disks,
    1.5 M
  • 1.2 GBps network rap (2.8 GBps pap)
  • 5.2 GBps of disk rap (same as pap)
  • (rapreal application performance,pap peak
    advertised performance)

45
SP sort
  • 2 4 GBps!

46
Progress on Sorting NT now leads both price and
performance
  • Speedup comes from Moores law 40/year
  • Processor/Disk/Network arrays 60/year (this is
    a software speedup).

47
Recent Results
  • NOW Sort 9 GB on a cluster of 100 UltraSparcs
    in 1 minute
  • MilleniumSort 16x Dell NT cluster 100 MB in
    1.18 Sec (Datamation)
  • Tandem/Sandia Sort 68 CPU ServerNet 1 TB in
    47 minutes
  • IBM SPsort
  • 408 nodes, 1952 cpu
  • 2168 disks
  • 17.6 minutes 1057sec
  • (all for 1/3 of 94M,
  • slice price is 64k for 4cpu, 2GB ram, 6 9GB
    disks interconnect

48
Data Gravity Processing Moves to Transducers
  • Move Processing to data sources
  • Move to where the power (and sheet metal) is
  • Processor in
  • Modem
  • Display
  • Microphones (speech recognition) cameras
    (vision)
  • Storage Data storage and analysis
  • System is distributed (a cluster/mob)

49
SAN Standard Interconnect
Gbps SAN 110 MBps
  • LAN faster than memory bus?
  • 1 GBps links in lab.
  • 100 port cost soon
  • Port is computer
  • Winsock 110 MBps(10 cpu utilization at each
    end)

PCI 70 MBps
UW Scsi 40 MBps
FW scsi 20 MBps
scsi 5 MBps
50
Disk Node
  • has magnetic storage (100 GB?)
  • has processor DRAM
  • has SAN attachment
  • has execution environment

Applications
Services
DBMS
File System
RPC, ...
SAN driver
Disk driver
OS Kernel
51
end

52
Standard Storage Metrics
  • Capacity
  • RAM MB and /MB today at 10MB 100/MB
  • Disk GB and /GB today at 10 GB and 200/GB
  • Tape TB and /TB today at .1TB and 25k/TB
    (nearline)
  • Access time (latency)
  • RAM 100 ns
  • Disk 10 ms
  • Tape 30 second pick, 30 second position
  • Transfer rate
  • RAM 1 GB/s
  • Disk 5 MB/s - - - Arrays can go to 1GB/s
  • Tape 5 MB/s - - - striping is problematic

53
New Storage Metrics Kaps, Maps, SCAN?
  • Kaps How many KB objects served per second
  • The file server, transaction processing metric
  • This is the OLD metric.
  • Maps How many MB objects served per sec
  • The Multi-Media metric
  • SCAN How long to scan all the data
  • The data mining and utility metric
  • And
  • Kaps/, Maps/, TBscan/

54
For the Record (good 1998 devices packaged in
systemhttp//www.tpc.org/results/individual_resul
ts/Dell/dell.6100.9801.es.pdf)
X 14
55
For the Record (good 1998 devices packaged in
systemhttp//www.tpc.org/results/individual_resul
ts/Dell/dell.6100.9801.es.pdf)
X 14
56
How To Get Lots of Maps, SCANs
At 10 MB/s 1.2 days to scan
1,000 x parallel 100 seconds SCAN.
  • parallelism use many little devices in parallel
  • Beware of the media myth
  • Beware of the access time myth

Parallelism divide a big problem into many
smaller ones to be solved in parallel.
57
The Disk Farm On a Card
  • The 1 TB disc card
  • An array of discs
  • Can be used as
  • 100 discs
  • 1 striped disc
  • 10 Fault Tolerant discs
  • ....etc
  • LOTS of accesses/second
  • bandwidth

14"
Life is cheap, its the accessories that cost
ya. Processors are cheap, its the peripherals
that cost ya (a 10k disc card).
58
Tape Farms for Tertiary StorageNot Mainframe
Silos
100 robots
1M
50TB
50/GB
3K Maps
10K robot

14 tapes
27 hr Scan
500 GB
5 MB/s
20/GB
Scan in 27 hours. many independent tape
robots (like a disc farm)
30 Maps
59
Tape Optical Beware of the Media Myth
Optical is cheap 200 /platter
2 GB/platter gt 100/GB (2x
cheaper than disc) Tape is cheap 30 /tape
20 GB/tape gt 1.5 /GB (100x
cheaper than disc).
60
Tape Optical Reality Media is 10 of System
Cost
Tape needs a robot (10 k ... 3 m ) 10 ...
1000 tapes (at 20GB each) gt 20/GB ... 200/GB
(1x10x cheaper than disc) Optical needs a
robot (100 k ) 100 platters 200GB ( TODAY )
gt 400 /GB ( more expensive than mag disc )
Robots have poor access times Not good for
Library of Congress (25TB) Data motel data
checks in but it never checks out!
61
The Access Time Myth
  • The Myth seek or pick time dominates
  • The reality (1) Queuing dominates
  • (2) Transfer dominates BLOBs
  • (3) Disk seeks often short
  • Implication many cheap servers better than
    one fast expensive server
  • shorter queues
  • parallel transfer
  • lower cost/access and cost/byte
  • This is now obvious for disk arrays
  • This will be obvious for tape arrays

62
What To Do About HIGH Availability
  • Need remote MIRRORED site to tolerate environment
    al failures (power, net, fire, flood) operations
    failures
  • Replicate changes across the net
  • Failover servers across the net (some
    distance)
  • Allows software upgrades, site moves, fires,...
  • Tolerates operations errors, hiesenbugs,

client

server
server
State Changes
gt100 feet or gt100 miles
63
Scaleup Has Limits(chart courtesy of Catharine
Van Ingen)
  • Vector Supers 10x supers
  • 3 Gflops/cpu
  • bus/memory 20 GBps
  • IO 1GBps
  • Supers 10x PCs
  • 300 Mflops/cpu
  • bus/memory 2 GBps
  • IO 1 GBps
  • PCs are slow
  • 30 Mflops/cpu
  • and bus/memory 200MBps
  • and IO 100 MBps

64
TOP500 Systems by Vendor(courtesy of Larry Smarr
NCSA)
500
Other
Japanese Vector Machines
Other
DEC
400
Intel
Japanese
TMC
Sun
DEC
Intel
HP
300
TMC
IBM
Number of Systems
Sun
Convex
HP
200
Convex
SGI
IBM
SGI
100
CRI
CRI
0
Jun-93
Jun-94
Jun-95
Jun-96
Jun-97
Jun-98
Nov-93
Nov-94
Nov-95
Nov-96
Nov-97
TOP500 Reports http//www.netlib.org/benchmark/t
op500.html
65
NCSA Super Cluster
http//access.ncsa.uiuc.edu/CoverStories/SuperClus
ter/super.html
  • National Center for Supercomputing
    ApplicationsUniversity of Illinois _at_ Urbana
  • 512 Pentium II cpus, 2,096 disks, SAN
  • Compaq HP Myricom WindowsNT
  • A Super Computer for 3M
  • Classic Fortran/MPI programming
  • DCOM programming model

66
Avalon Alpha Clusters for Sciencehttp//cnls.lan
l.gov/avalon/
  • 140 Alpha Processors(533 Mhz)
  • x 256 MB 3GB disk
  • Fast Ethernet switches
  • 45 Gbytes RAM 550 GB disk
  • Linux...
  • 10 real Gflops for 313,000
  • gt 34 real Mflops/k
  • on 150 benchmark Mflops/k
  • Beowulf project is Parent
  • http//www.cacr.caltech.edu/beowulf/naegling.html
  • 114 nodes, 2k/node,
  • Scientists want cheap mips.

67
Your Tax Dollars At WorkASCI for Stockpile
Stewardship
  • Intel/Sandia 9000x1 node Ppro
  • LLNL/IBM 512x8 PowerPC (SP2)
  • LANL/Cray ?
  • Maui Supercomputer Center
  • 512x1 SP2

68
Observations
  • Uniprocessor RAP ltlt PAP
  • real app performance ltlt peak advertised
    performance
  • Growth has slowed (Bell Prize
  • 1987 0.5 GFLOPS
  • 1988 1.0 GFLOPS 1 year
  • 1990 14 GFLOPS 2 years
  • 1994 140 GFLOPS 4 years
  • 1997 604 GFLOPS
  • 1998 1600 G__OPS 4 years

69
Two Generic Kinds of computing
  • Many little
  • embarrassingly parallel
  • Fit RPC model
  • Fit partitioned data and computation model
  • Random works OK
  • OLTP, File Server, Email, Web,..
  • Few big
  • sometimes not obviously parallel
  • Do not fit RPC model (BIG rpcs)
  • Scientific, simulation, data mining, ...

70
Many Little Programming Model
  • many small requests
  • route requests to data
  • encapsulate data with procedures (objects)
  • three-tier computing
  • RPC is a convenient/appropriate model
  • Transactions are a big help in error handling
  • Auto partition (e.g. hash data and computation)
  • Works fine.
  • Software CyberBricks

71
Object Oriented ProgrammingParallelism From Many
Little Jobs
  • Gives location transparency
  • ORB/web/tpmon multiplexes clients to servers
  • Enables distribution
  • Exploits embarrassingly parallel apps
    (transactions)
  • HTTP and RPC (dcom, corba, rmi, iiop, ) are
    basis

Tp mon / orb/ web server
72
Few Big Programming Model
  • Finding parallelism is hard
  • Pipelines are short (3x 6x speedup)
  • Spreading objects/data is easy, but getting
    locality is HARD
  • Mapping big job onto cluster is hard
  • Scheduling is hard
  • coarse grained (job) and fine grain (co-schedule)
  • Fault tolerance is hard

73
Kinds of Parallel Execution
Any
Any
Sequential
Sequential
Pipeline
Program
Program
Sequential
Sequential
Partition outputs split N ways inputs merge
M ways
Any
Any
Sequential
Sequential
Sequential
Sequential
Program
Program
74
Why Parallel Access To Data?
At 10 MB/s 1.2 days to scan
1,000 x parallel 100 second SCAN.
BANDWIDTH
Parallelism divide a big problem into many
smaller ones to be solved in parallel.
75
Why are Relational OperatorsSuccessful for
Parallelism?
Relational data model uniform operators on
uniform data stream Closed under
composition Each operator consumes 1 or 2 input
streams Each stream is a uniform collection of
data Sequential data in and out Pure
dataflow partitioning some operators (e.g.
aggregates, non-equi-join, sort,..) requires
innovation AUTOMATIC PARALLELISM
76
Database Systems Hide Parallelism
  • Automate system management via tools
  • data placement
  • data organization (indexing)
  • periodic tasks (dump / recover / reorganize)
  • Automatic fault tolerance
  • duplex failover
  • transactions
  • Automatic parallelism
  • among transactions (locking)
  • within a transaction (parallel execution)

77
SQL a Non-Procedural Programming Language
  • SQL functional programming language
    describes answer set.
  • Optimizer picks best execution plan
  • Picks data flow web (pipeline),
  • degree of parallelism (partitioning)
  • other execution parameters (process placement,
    memory,...)

Execution
Planning
Monitor
Schema
Executors
Plan
GUI
Optimizer
Rivers
78
Partitioned Execution
Spreads computation and IO among processors

Partitioned data gives
NATURAL parallelism
79
N x M way Parallelism
N inputs, M outputs, no bottlenecks. Partitioned
Data Partitioned and Pipelined Data Flows
80
Automatic Parallel Object Relational DB
Select image from landsat where date between 1970
and 1990 and overlaps(location, Rockies) and
snow_cover(image) gt.7
Temporal
Spatial
Image
Assign one process per processor/disk find
images with right data location analyze image,
if 70 snow, return it
Landsat
Answer
date
loc
image
image
33N 120W . . . . . . . 34N 120W
1/2/72 . . . . . .. . . 4/8/95
date, location, image tests
81
Data Rivers Split Merge Streams
Producers add records to the river, Consumers
consume records from the river Purely sequential
programming. River does flow control and
buffering does partition and merge of data
records River Split/Merge in Gamma
Exchange operator in Volcano /SQL Server.
82
Generalization Object-oriented Rivers
  • Rivers transport sub-class of record-set (
    stream of objects)
  • record type and partitioning are part of subclass
  • Node transformers are data pumps
  • an object with river inputs and outputs
  • do late-binding to record-type
  • Programming becomes data flow programming
  • specify the pipelines
  • Compiler/Scheduler does data partitioning and
    transformer placement

83
NT Cluster Sort as a Prototype
  • Using
  • data generation and
  • sort as a prototypical app
  • Hello world of distributed processing
  • goal easy install execute

84
Remote Install
  • Add Registry entry to each remote node.

RegConnectRegistry() RegCreateKeyEx()
85
Cluster StartupExecution
  • Setup
  • MULTI_QI struct
  • COSERVERINFO struct
  • CoCreateInstanceEx()
  • Retrieve remote object handle
  • from MULTI_QI struct
  • Invoke methods as usual

86
Cluster Sort Conceptual Model
  • Multiple Data Sources
  • Multiple Data Destinations
  • Multiple nodes
  • Disks -gt Sockets -gt Disk -gt Disk

A
AAA BBB CCC
B
C
AAA BBB CCC
AAA BBB CCC
87
How Do They Talk to Each Other?
  • Each node has an OS
  • Each node has local resources A federation.
  • Each node does not completely trust the others.
  • Nodes use RPC to talk to each other
  • CORBA? DCOM? IIOP? RMI?
  • One or all of the above.
  • Huge leverage in high-level interfaces.
  • Same old distributed system story.

Applications
Applications
datagrams
datagrams
streams
RPC
?
streams
RPC
?
h
VIAL/VIPL
Wire(s)
Write a Comment
User Comments (0)
About PowerShow.com