Computer Technology Forecast - PowerPoint PPT Presentation

About This Presentation
Title:

Computer Technology Forecast

Description:

Computer Technology Forecast Jim Gray Microsoft Research Gray_at_Microsoft.com http://~research.Microsoft.com/~Gray Reality Check Good news In the limit, processing ... – PowerPoint PPT presentation

Number of Views:77
Avg rating:3.0/5.0
Slides: 51
Provided by: JimG170
Category:

less

Transcript and Presenter's Notes

Title: Computer Technology Forecast


1
ComputerTechnology Forecast
  • Jim Gray
  • Microsoft Research
  • Gray_at_Microsoft.com
  • http//research.Microsoft.com/Gray

2
Reality Check
  • Good news
  • In the limit, processing storage network is
    free
  • Processing network is infinitely fast
  • Bad news
  • Most of us live in the present.
  • People are getting more expensive.Management/prog
    ramming cost exceeds hardware cost.
  • Speed of light not improving.
  • WAN prices have not changed much in last 8 years.

3
Interesting Topics
  • Ill talk about server-side hardware
  • What about client hardware?
  • Displays, cameras, speech,.
  • What about Software?
  • Databases, data mining, PDB, OODB
  • Objects / class libraries
  • Visualization
  • Open Source movement

4
How Much Information Is there?
Yotta Zetta Exa Peta Tera Giga Mega Kilo
Everything! Recorded
  • Soon everything can be recorded and indexed
  • Most data never be seen by humans
  • Precious Resource Human attention
    Auto-Summarization Auto-Searchis key
    technology.www.lesk.com/mlesk/ksg97/ksg.html

All Books MultiMedia
All LoC books (words)
.Movie
A Photo
A Book
24 Yecto, 21 zepto, 18 atto, 15 femto, 12 pico, 9
nano, 6 micro, 3 milli
5
Moores Law
  • Performance/Price doubles every 18 months
  • 100x per decade
  • Progress in next 18 months ALL previous
    progress
  • New storage sum of all old storage (ever)
  • New processing sum of all old processing.
  • E. coli double ever 20 minutes!

15 years ago
6
Trends ops/s/ Had Three Growth Phases
  • 1890-1945
  • Mechanical
  • Relay
  • 7-year doubling
  • 1945-1985
  • Tube, transistor,..
  • 2.3 year doubling
  • 1985-2000
  • Microprocessor
  • 1.0 year doubling

7
Whats a Balanced System?
System Bus
PCI Bus
PCI Bus
8
Storage capacity beating Moores law
  • 5 k/TB today (raw disk)

9
Cheap Storage
  • Disks are getting cheap
  • 7 k/TB disks (25 40 GB disks _at_ 230 each)

10
Cheap Storage or Balanced System
  • Low cost storage (2 x 1.5k servers) 7K TB2x
    (1K system 8x60GB disks 100MbEthernet)
  • Balanced server (7k/.5 TB)
  • 2x800Mhz (2k)
  • 256 MB (400)
  • 8 x 60 GB drives (3K)
  • Gbps Ethernet switch (1.5k)
  • 14k TB, 28K/RAIDED TB

11
The Absurd Disk
  • 2.5 hr scan time (poor sequential access)
  • 1 aps / 5 GB (VERY cold data)
  • Its a tape!

1 TB
100 MB/s
200 Kaps
12
Hot Swap Drives for Archive or Data Interchange
  • 25 MBps write(so can write N x 60 GB in 40
    minutes)
  • 60 GB/overnite
  • N x 2 MB/second
  • _at_ 19.95/nite

17
260
13
240 GB, 2k (now)300 GB by year end.
  • 4x60 GB IDE(2 hot plugable)
  • (1,100)
  • SCSI-IDE bridge
  • 200k
  • Box
  • 500 Mhz cpu
  • 256 MB SRAM
  • Fan, power, Enet
  • 700
  • Or 8 disks/box600 GB for 3K ( or 300 GB RAID)

14
Hot Swap Drives for Archive or Data Interchange
  • 25 MBps write(so can write N x 74 GB in 3
    hours)
  • 74 GB/overnite
  • N x 2 MB/second
  • _at_ 19.95/nite

15
Its Hard to Archive a PetabyteIt takes a LONG
time to restore it.
  • At 1GBps it takes 12 days!
  • Store it in two (or more) places online (on
    disk?). A geo-plex
  • Scrub it continuously (look for errors)
  • On failure,
  • use other copy until failure repaired,
  • refresh lost copy from safe copy.
  • Can organize the two copies differently
    (e.g. one by time, one by space)

16
Disk vs Tape
  • Disk
  • 60 GB
  • 30 MBps
  • 5 ms seek time
  • 3 ms rotate latency
  • 7/GB for drive 3/GB for ctlrs/cabinet
  • 4 TB/rack
  • 1 hour scan
  • Tape
  • 40 GB
  • 10 MBps
  • 10 sec pick time
  • 30-120 second seek time
  • 2/GB for media8/GB for drivelibrary
  • 10 TB/rack
  • 1 week scan

Guestimates Cern 200 TB 3480 tapes 2 col
50GB Rack 1 TB 20 drives
The price advantage of tape is narrowing, and
the performance advantage of disk is growing At
10K/TB, disk is competitive with nearline tape.
17
Trends Gilders Law 3x bandwidth/year for 25
more years
  • Today
  • 10 Gbps per channel
  • 4 channels per fiber 40 Gbps
  • 32 fibers/bundle 1.2 Tbps/bundle
  • In lab 3 Tbps/fiber (400 x WDM)
  • In theory 25 Tbps per fiber
  • 1 Tbps USA 1996 WAN bisection bandwidth
  • Aggregate bandwidth doubles every 8 months!

1 fiber 25 Tbps
18
Sense of scale
300 MBps OC48 G2 Or memcpy()
  • How fat is your pipe?
  • Fattest pipe on MS campus is the WAN!

20 MBps disk / ATM / OC3
94 MBps Coast to Coast
90 MBps PCI
19
Redmond/Seattle, WA
Information Sciences Institute Microsoft Qwest Uni
versity of Washington Pacific Northwest
Gigapop HSCC (high speed connectivity
consortium) DARPA
New York
Arlington, VA
San Francisco, CA
5626 km 10 hops
20
The Path
  • DC -gt SEA
  • C\tracert -d 131.107.151.194
  • Tracing route to 131.107.151.194 over a maximum
    of 30 hops
  • 0
    ------- DELL 4400 Win2K WKS
  • Arlington Virginia, ISI Alteon GbE
  • 1 16 ms lt10 ms lt10 ms 140.173.170.65
    ------- Juniper M40 GbE
  • Arlington Virginia, ISI Interface ISIe
  • 2 lt10 ms lt10 ms lt10 ms 205.171.40.61
    ------- Cisco GSR OC48
  • Arlington Virginia, Qwest DC Edge
  • 3 lt10 ms lt10 ms lt10 ms 205.171.24.85
    ------- Cisco GSR OC48
  • Arlington Virginia, Qwest DC Core
  • 4 lt10 ms lt10 ms 16 ms 205.171.5.233
    ------- Cisco GSR OC48
  • New York, New York, Qwest NYC Core
  • 5 62 ms 63 ms 62 ms 205.171.5.115
    ------- Cisco GSR OC48
  • San Francisco, CA, Qwest SF Core
  • 6 78 ms 78 ms 78 ms 205.171.5.108
    ------- Cisco GSR OC48
  • Seattle, Washington, Qwest Sea Core
  • 7 78 ms 78 ms 94 ms 205.171.26.42
    ------- Juniper M40 OC48 Seattle, Washington,
    Qwest Sea Edge
  • 8 78 ms 79 ms 78 ms 208.46.239.90
    ------- Juniper M40 OC48

21
PetaBumps
  • 751 mbps for 300 seconds (28 GB)
  • single-thread single-stream tcp/ip
    desktop-to-desktop out of the box performance
  • 5626 km x 751Mbps 4.2e15 bit meter /
    second 4.2 Peta bmps
  • Multi-steam is 952 mbps 5.2 Peta bmps
  • 4470 byte MTUs were enabled on all routers.
  • 20 MB window size

22
(No Transcript)
23
The Promise of SAN/VIA10x in 2 years
http//www.ViArch.org/
  • Yesterday
  • 10 MBps (100 Mbps Ethernet)
  • 20 MBps tcp/ip saturates 2 cpus
  • round-trip latency 250 µs
  • Now
  • Wires are 10x faster Myrinet, Gbps Ethernet,
    ServerNet,
  • Fast user-level communication
  • tcp/ip 100 MBps 10 cpu
  • round-trip latency is 15 us
  • 1.6 Gbps demoed on a WAN

24
Pointers
  • The single-stream submission http//research.micr
    osoft.com/gray/papers/Windows2000_I2_land_Speed_
    Contest_Entry_(Single_Stream_mail).htm
  • The multi-stream submission http//research.Micro
    soft.com/gray/papers/
  • Windows2000_I2_land_Speed_Contest_Entry_(Multi_St
    ream_mail).htm
  • The code http//research.Microsoft.com/gray/pap
    ers/speedy.htm speedy.h speedy.cAnd
    a PowerPoint presentation about it.
    http//research.Microsoft.com/gray/papers/ Wi
    ndows2000_WAN_Speed_Record.ppt

25
Networking
  • WANS are getting faster than LANSG8 OC192
    8Gbps is standard
  • Link bandwidth improves 4x per 3 years
  • Speed of light (60 ms round trip in US)
  • Software stacks have always been the problem.

Time SenderCPU ReceiverCPU bytes/bandwidth
This has been the problem
26
Rules of Thumb in Data Engineering
  • Moores law -gt an address bit per 18 months.
  • Storage grows 100x/decade (except 1000x last
    decade!)
  • Disk data of 10 years ago now fits in RAM
    (iso-price).
  • Device bandwidth grows 10x/decade so need
    parallelism
  • RAMdisktape price is 11030 going to 11010
  • Amdahls speedup law S/(SP)
  • Amdahls IO law bit of IO per instruction/second
    (tBps/10 top! 50,000 disks/10 teraOP 100 M
    Dollars)
  • Amdahls memory law byte per instruction/second
    (going to 10) (1 TB RAM per TOP 1 TeraDollars)
  • PetaOps anyone?
  • Gilders law aggregate bandwidth doubles every 8
    months.
  • 5 Minute rule cache disk data that is reused in
    5 minutes.
  • Web rule cache everything!
  • http//research.Microsoft.com/gray/papers/MS_TR_
    99_100_Rules_of_Thumb_in_Data_Engineering.doc

27
Dealing With TeraBytes (Petabytes)Requires
Parallelism
  • parallelism use many little devices in parallel

28
Parallelism Must Be Automatic
  • There are thousands of MPI programmers.
  • There are hundreds-of-millions of people using
    parallel database search.
  • Parallel programming is HARD!
  • Find design patterns and automate them.
  • Data search/mining has parallel design patterns.

29
Scalability Up and Out
30
Everyone scales outWhats the Brick?
  • 1M/slice
  • IBM S390?
  • Sun E 10,000?
  • 100 K/slice
  • HPUX/AIX/Solaris/IRIX/EMC
  • 10 K/slice
  • Utel / Wintel 4x
  • 1 K/slice
  • Beowulf / Wintel 1x

31
Terminology for scaleability
  • Farms of servers
  • Clones identical
  • Scaleability availability
  • Partitions
  • Scaleability
  • Packs
  • Partition availability via fail-over
  • GeoPlex for disaster tolerance.

32
(No Transcript)
33
Unpredictable Growth
  • The TerraServer Story
  • We expected 5 M hits per day
  • We got 50 M hits on day 1
  • We peak at 15-20 M hpd on a hot day
  • Average 5 M hpd after 1 year
  • Most of us cannot predict demand
  • Must be able to deal with NO demand
  • Must be able to deal with HUGE demand

34
An Architecture for Internet Services?
  • Need to be able to add capacity
  • New processing
  • New storage
  • New networking
  • Need continuous service
  • Online change of all components (hardware and
    software)
  • Multiple service sites
  • Multiple network providers
  • Need great development tools
  • Change the application several times per year.
  • Add new services several times per year.

35
Premise Each Site is a Farm
  • Buy computing by the slice (brick)
  • Rack of servers disks.
  • Grow by adding slices
  • Spread data and computation to new slices
  • Two styles
  • Clones anonymous servers
  • PartsPacks Partitions fail over within a pack
  • In both cases, remote farm for disaster recovery

36
Clones AvailabilityScalability
  • Some applications are
  • Read-mostly
  • Low consistency requirements
  • Modest storage requirement (less than 1TB)
  • Examples
  • HTML web servers (IP sprayer/sieve replication)
  • LDAP servers (replication via gossip)
  • Replicate app at all nodes (clones)
  • Spray requests across nodes.
  • Grow by adding clones
  • Fault tolerance stop sending to that clone.
  • Growth add a clone.

37
Two Clone Geometries
  • Shared-Nothing exact replicas
  • Shared-Disk (state stored in server)

38
Facilities Clones Need
  • Automatic replication
  • Applications (and system software)
  • Data
  • Automatic request routing
  • Spray or sieve
  • Management
  • Who is up?
  • Update management propagation
  • Application monitoring.
  • Clones are very easy to manage
  • Rule of thumb 100s of clones per admin

39
Partitions for Scalability
  • Clones are not appropriate for some apps.
  • Statefull apps do not replicate well
  • high update rates do not replicate well
  • Examples
  • Email / chat /
  • Databases
  • Partition state among servers
  • Scalability (online)
  • Partition split/merge
  • Partitioning must be transparent to client.


40
Partitioned/Clustered Apps
  • Mail servers
  • Perfectly partitionable
  • Business Object Servers
  • Partition by set of objects.
  • Parallel Databases
  • Transparent access to partitioned tables
  • Parallel Query

41
Packs for Availability
  • Each partition may fail (independent of others)
  • Partitions migrate to new node via fail-over
  • Fail-over in seconds
  • Pack the nodes supporting a partition
  • VMS Cluster
  • Tandem Process Pair
  • SP2 HACMP
  • Sysplex
  • WinNT MSCS (wolfpack)
  • Cluster In A Box now commodity
  • Partitions typically grow in packs.

42
What PartsPacks Need
  • Automatic partitioning (in dbms, mail, files,)
  • Location transparent
  • Partition split/merge
  • Grow without limits (100x10TB)
  • Simple failover model
  • Partition migration is transparent
  • MSCS-like model for services
  • Application-centric request routing
  • Management
  • Who is up?
  • Automatic partition management (split/merge)
  • Application monitoring.

43
Partitions and Packs
  • Packs for availabilty

44
GeoPlex Farm pairs
  • Two farms
  • Changes from one sent to other
  • When one farm failsother provides service
  • Masks
  • Hardware/Software faults
  • Operations tasks (reorganize, upgrade move
  • Environmental faults (power fail)

45
Services on Clones Partitions
  • Application provides a set of services
  • If cloned
  • Services are on subset of clones
  • If partitioned
  • Services run at each partition
  • System load balancing routes request to
  • Any clone
  • Correct partition.
  • Routes around failures.

46
Cluster Scenarios 3- tier systems
A simple web site
SQL Database
Web File Store
SQL Temp State
Front End
47
Cluster Scale Out Scenarios
The FARM Clones and Packs of Partitions
SQL Temp State
Web File StoreA
ClonedFront Ends(firewall, sprayer, web server)
Web Clients
Load Balance
48
Terminology
  • Terminology for scaleability
  • Farms of servers
  • Clones identical
  • Scaleability availability
  • Partitions
  • Scaleability
  • Packs
  • Partition availability via fail-over
  • GeoPlex for disaster tolerance.

49
What we have been doing with SDSS
  • Helping move the data to SQL
  • Database design
  • Data loading
  • Experimenting with queries on a 4 M object DB
  • 20 questions like find gravitational lens
    candidates
  • Queries use parallelism, most run in a few
    seconds.(auto parallel)
  • Some run in hours (neighbors within 1 arcsec)
  • EASY to ask questions.
  • Helping with an outreach website SkyServer
  • Personal goal Try datamining techniques to
    re-discover Astronomy

50
References (.doc or .pdf)
  • Technology forecast http//research.microsoft.co
    m/gray/papers/MS_TR_99_100_Rules_of_Thumb_in_Dat
    a_Engineering.doc
  • Gbps experimentshttp//research.microsoft.com/g
    ray/
  • Disk experiments (10K TB)http//research.microso
    ft.com/gray/papers/Win2K_IO_MSTR_2000_55.doc
  • Scaleability Terminologyhttp//research.microsoft
    .com/gray/papers/MS_TR_99_85_Scalability_Terminol
    ogy.doc
Write a Comment
User Comments (0)
About PowerShow.com