Computer Technology Forecast - PowerPoint PPT Presentation

About This Presentation

Title:

Computer Technology Forecast

Description:

Computer Technology Forecast Jim Gray Microsoft Research Gray_at_Microsoft.com http://~research.Microsoft.com/~Gray Reality Check Good news In the limit, processing ... – PowerPoint PPT presentation

Number of Views:82

Avg rating:3.0/5.0

Slides: 51

Provided by: JimG170

Category:

more less

Transcript and Presenter's Notes

Title: Computer Technology Forecast

1
ComputerTechnology Forecast

Jim Gray
Microsoft Research
Gray_at_Microsoft.com
http//research.Microsoft.com/Gray

2
Reality Check

Good news
In the limit, processing storage network is
free
Processing network is infinitely fast
Bad news
Most of us live in the present.
People are getting more expensive.Management/prog
ramming cost exceeds hardware cost.
Speed of light not improving.
WAN prices have not changed much in last 8 years.

3
Interesting Topics

Ill talk about server-side hardware
What about client hardware?
Displays, cameras, speech,.
What about Software?
Databases, data mining, PDB, OODB
Objects / class libraries
Visualization
Open Source movement

4
How Much Information Is there?
Yotta Zetta Exa Peta Tera Giga Mega Kilo
Everything! Recorded

Soon everything can be recorded and indexed
Most data never be seen by humans
Precious Resource Human attention
Auto-Summarization Auto-Searchis key
technology.www.lesk.com/mlesk/ksg97/ksg.html

All Books MultiMedia
All LoC books (words)
.Movie
A Photo
A Book
24 Yecto, 21 zepto, 18 atto, 15 femto, 12 pico, 9
nano, 6 micro, 3 milli
5
Moores Law

Performance/Price doubles every 18 months
100x per decade
Progress in next 18 months ALL previous
progress
New storage sum of all old storage (ever)
New processing sum of all old processing.
E. coli double ever 20 minutes!

15 years ago
6
Trends ops/s/ Had Three Growth Phases

1890-1945
Mechanical
Relay
7-year doubling
1945-1985
Tube, transistor,..
2.3 year doubling
1985-2000
Microprocessor
1.0 year doubling

7
Whats a Balanced System?
System Bus
PCI Bus
PCI Bus
8
Storage capacity beating Moores law

5 k/TB today (raw disk)

9
Cheap Storage

Disks are getting cheap
7 k/TB disks (25 40 GB disks _at_ 230 each)

10
Cheap Storage or Balanced System

Low cost storage (2 x 1.5k servers) 7K TB2x
(1K system 8x60GB disks 100MbEthernet)
Balanced server (7k/.5 TB)
2x800Mhz (2k)
256 MB (400)
8 x 60 GB drives (3K)
Gbps Ethernet switch (1.5k)
14k TB, 28K/RAIDED TB

11
The Absurd Disk

2.5 hr scan time (poor sequential access)
1 aps / 5 GB (VERY cold data)
Its a tape!

1 TB
100 MB/s
200 Kaps
12
Hot Swap Drives for Archive or Data Interchange

25 MBps write(so can write N x 60 GB in 40
minutes)
60 GB/overnite
N x 2 MB/second
_at_ 19.95/nite

17
260
13
240 GB, 2k (now)300 GB by year end.

4x60 GB IDE(2 hot plugable)
(1,100)
SCSI-IDE bridge
200k
Box
500 Mhz cpu
256 MB SRAM
Fan, power, Enet
700
Or 8 disks/box600 GB for 3K ( or 300 GB RAID)

14
Hot Swap Drives for Archive or Data Interchange

25 MBps write(so can write N x 74 GB in 3
hours)
74 GB/overnite
N x 2 MB/second
_at_ 19.95/nite

15
Its Hard to Archive a PetabyteIt takes a LONG
time to restore it.

At 1GBps it takes 12 days!
Store it in two (or more) places online (on
disk?). A geo-plex
Scrub it continuously (look for errors)
On failure,
use other copy until failure repaired,
refresh lost copy from safe copy.
Can organize the two copies differently
(e.g. one by time, one by space)

16
Disk vs Tape

Disk
60 GB
30 MBps
5 ms seek time
3 ms rotate latency
7/GB for drive 3/GB for ctlrs/cabinet
4 TB/rack
1 hour scan

Tape
40 GB
10 MBps
10 sec pick time
30-120 second seek time
2/GB for media8/GB for drivelibrary
10 TB/rack
1 week scan

Guestimates Cern 200 TB 3480 tapes 2 col
50GB Rack 1 TB 20 drives
The price advantage of tape is narrowing, and
the performance advantage of disk is growing At
10K/TB, disk is competitive with nearline tape.
17
Trends Gilders Law 3x bandwidth/year for 25
more years

Today
10 Gbps per channel
4 channels per fiber 40 Gbps
32 fibers/bundle 1.2 Tbps/bundle
In lab 3 Tbps/fiber (400 x WDM)
In theory 25 Tbps per fiber
1 Tbps USA 1996 WAN bisection bandwidth
Aggregate bandwidth doubles every 8 months!

1 fiber 25 Tbps
18
Sense of scale
300 MBps OC48 G2 Or memcpy()

How fat is your pipe?
Fattest pipe on MS campus is the WAN!

20 MBps disk / ATM / OC3
94 MBps Coast to Coast
90 MBps PCI
19
Redmond/Seattle, WA
Information Sciences Institute Microsoft Qwest Uni
versity of Washington Pacific Northwest
Gigapop HSCC (high speed connectivity
consortium) DARPA
New York
Arlington, VA
San Francisco, CA
5626 km 10 hops
20
The Path

DC -gt SEA
C\tracert -d 131.107.151.194
Tracing route to 131.107.151.194 over a maximum
of 30 hops
0
------- DELL 4400 Win2K WKS
Arlington Virginia, ISI Alteon GbE
1 16 ms lt10 ms lt10 ms 140.173.170.65
------- Juniper M40 GbE
Arlington Virginia, ISI Interface ISIe
2 lt10 ms lt10 ms lt10 ms 205.171.40.61
------- Cisco GSR OC48
Arlington Virginia, Qwest DC Edge
3 lt10 ms lt10 ms lt10 ms 205.171.24.85
------- Cisco GSR OC48
Arlington Virginia, Qwest DC Core
4 lt10 ms lt10 ms 16 ms 205.171.5.233
------- Cisco GSR OC48
New York, New York, Qwest NYC Core
5 62 ms 63 ms 62 ms 205.171.5.115
------- Cisco GSR OC48
San Francisco, CA, Qwest SF Core
6 78 ms 78 ms 78 ms 205.171.5.108
------- Cisco GSR OC48
Seattle, Washington, Qwest Sea Core
7 78 ms 78 ms 94 ms 205.171.26.42
------- Juniper M40 OC48 Seattle, Washington,
Qwest Sea Edge
8 78 ms 79 ms 78 ms 208.46.239.90
------- Juniper M40 OC48

21
PetaBumps

751 mbps for 300 seconds (28 GB)
single-thread single-stream tcp/ip
desktop-to-desktop out of the box performance
5626 km x 751Mbps 4.2e15 bit meter /
second 4.2 Peta bmps
Multi-steam is 952 mbps 5.2 Peta bmps

4470 byte MTUs were enabled on all routers.
20 MB window size

22
(No Transcript)
23
The Promise of SAN/VIA10x in 2 years
http//www.ViArch.org/

Yesterday
10 MBps (100 Mbps Ethernet)
20 MBps tcp/ip saturates 2 cpus
round-trip latency 250 µs
Now
Wires are 10x faster Myrinet, Gbps Ethernet,
ServerNet,
Fast user-level communication
tcp/ip 100 MBps 10 cpu
round-trip latency is 15 us
1.6 Gbps demoed on a WAN

24
Pointers

The single-stream submission http//research.micr
osoft.com/gray/papers/Windows2000_I2_land_Speed_
Contest_Entry_(Single_Stream_mail).htm
The multi-stream submission http//research.Micro
soft.com/gray/papers/
Windows2000_I2_land_Speed_Contest_Entry_(Multi_St
ream_mail).htm
The code http//research.Microsoft.com/gray/pap
ers/speedy.htm speedy.h speedy.cAnd
a PowerPoint presentation about it.
http//research.Microsoft.com/gray/papers/ Wi
ndows2000_WAN_Speed_Record.ppt

25
Networking

WANS are getting faster than LANSG8 OC192
8Gbps is standard
Link bandwidth improves 4x per 3 years
Speed of light (60 ms round trip in US)
Software stacks have always been the problem.

Time SenderCPU ReceiverCPU bytes/bandwidth
This has been the problem
26
Rules of Thumb in Data Engineering

Moores law -gt an address bit per 18 months.
Storage grows 100x/decade (except 1000x last
decade!)
Disk data of 10 years ago now fits in RAM
(iso-price).
Device bandwidth grows 10x/decade so need
parallelism
RAMdisktape price is 11030 going to 11010
Amdahls speedup law S/(SP)
Amdahls IO law bit of IO per instruction/second
(tBps/10 top! 50,000 disks/10 teraOP 100 M
Dollars)
Amdahls memory law byte per instruction/second
(going to 10) (1 TB RAM per TOP 1 TeraDollars)
PetaOps anyone?
Gilders law aggregate bandwidth doubles every 8
months.
5 Minute rule cache disk data that is reused in
5 minutes.
Web rule cache everything!
http//research.Microsoft.com/gray/papers/MS_TR_
99_100_Rules_of_Thumb_in_Data_Engineering.doc

27
Dealing With TeraBytes (Petabytes)Requires
Parallelism

parallelism use many little devices in parallel

28
Parallelism Must Be Automatic

There are thousands of MPI programmers.
There are hundreds-of-millions of people using
parallel database search.
Parallel programming is HARD!
Find design patterns and automate them.
Data search/mining has parallel design patterns.

29
Scalability Up and Out
30
Everyone scales outWhats the Brick?

1M/slice
IBM S390?
Sun E 10,000?
100 K/slice
HPUX/AIX/Solaris/IRIX/EMC
10 K/slice
Utel / Wintel 4x
1 K/slice
Beowulf / Wintel 1x

31
Terminology for scaleability

Farms of servers
Clones identical
Scaleability availability
Partitions
Scaleability
Packs
Partition availability via fail-over
GeoPlex for disaster tolerance.

32
(No Transcript)
33
Unpredictable Growth

The TerraServer Story
We expected 5 M hits per day
We got 50 M hits on day 1
We peak at 15-20 M hpd on a hot day
Average 5 M hpd after 1 year
Most of us cannot predict demand
Must be able to deal with NO demand
Must be able to deal with HUGE demand

34
An Architecture for Internet Services?

Need to be able to add capacity
New processing
New storage
New networking
Need continuous service
Online change of all components (hardware and
software)
Multiple service sites
Multiple network providers
Need great development tools
Change the application several times per year.
Add new services several times per year.

35
Premise Each Site is a Farm

Buy computing by the slice (brick)
Rack of servers disks.
Grow by adding slices
Spread data and computation to new slices
Two styles
Clones anonymous servers
PartsPacks Partitions fail over within a pack
In both cases, remote farm for disaster recovery

36
Clones AvailabilityScalability

Some applications are
Read-mostly
Low consistency requirements
Modest storage requirement (less than 1TB)
Examples
HTML web servers (IP sprayer/sieve replication)
LDAP servers (replication via gossip)
Replicate app at all nodes (clones)
Spray requests across nodes.
Grow by adding clones
Fault tolerance stop sending to that clone.
Growth add a clone.

37
Two Clone Geometries

Shared-Nothing exact replicas
Shared-Disk (state stored in server)

38
Facilities Clones Need

Automatic replication
Applications (and system software)
Data
Automatic request routing
Spray or sieve
Management
Who is up?
Update management propagation
Application monitoring.
Clones are very easy to manage
Rule of thumb 100s of clones per admin

39
Partitions for Scalability

Clones are not appropriate for some apps.
Statefull apps do not replicate well
high update rates do not replicate well
Examples
Email / chat /
Databases
Partition state among servers
Scalability (online)
Partition split/merge
Partitioning must be transparent to client.

40
Partitioned/Clustered Apps

Mail servers
Perfectly partitionable
Business Object Servers
Partition by set of objects.
Parallel Databases
Transparent access to partitioned tables
Parallel Query

41
Packs for Availability

Each partition may fail (independent of others)
Partitions migrate to new node via fail-over
Fail-over in seconds
Pack the nodes supporting a partition
VMS Cluster
Tandem Process Pair
SP2 HACMP
Sysplex
WinNT MSCS (wolfpack)
Cluster In A Box now commodity
Partitions typically grow in packs.

42
What PartsPacks Need

Automatic partitioning (in dbms, mail, files,)
Location transparent
Partition split/merge
Grow without limits (100x10TB)
Simple failover model
Partition migration is transparent
MSCS-like model for services
Application-centric request routing
Management
Who is up?
Automatic partition management (split/merge)
Application monitoring.

43
Partitions and Packs

Packs for availabilty

44
GeoPlex Farm pairs

Two farms
Changes from one sent to other
When one farm failsother provides service
Masks
Hardware/Software faults
Operations tasks (reorganize, upgrade move
Environmental faults (power fail)

45
Services on Clones Partitions

Application provides a set of services
If cloned
Services are on subset of clones
If partitioned
Services run at each partition
System load balancing routes request to
Any clone
Correct partition.
Routes around failures.

46
Cluster Scenarios 3- tier systems
A simple web site
SQL Database
Web File Store
SQL Temp State
Front End
47
Cluster Scale Out Scenarios
The FARM Clones and Packs of Partitions
SQL Temp State
Web File StoreA
ClonedFront Ends(firewall, sprayer, web server)
Web Clients
Load Balance
48
Terminology

Terminology for scaleability
Farms of servers
Clones identical
Scaleability availability
Partitions
Scaleability
Packs
Partition availability via fail-over
GeoPlex for disaster tolerance.

49
What we have been doing with SDSS

Helping move the data to SQL
Database design
Data loading
Experimenting with queries on a 4 M object DB
20 questions like find gravitational lens
candidates
Queries use parallelism, most run in a few
seconds.(auto parallel)
Some run in hours (neighbors within 1 arcsec)
EASY to ask questions.
Helping with an outreach website SkyServer
Personal goal Try datamining techniques to
re-discover Astronomy

50
References (.doc or .pdf)

Technology forecast http//research.microsoft.co
m/gray/papers/MS_TR_99_100_Rules_of_Thumb_in_Dat
a_Engineering.doc
Gbps experimentshttp//research.microsoft.com/g
ray/
Disk experiments (10K TB)http//research.microso
ft.com/gray/papers/Win2K_IO_MSTR_2000_55.doc
Scaleability Terminologyhttp//research.microsoft
.com/gray/papers/MS_TR_99_85_Scalability_Terminol
ogy.doc