Data%20Centric%20Computing - PowerPoint PPT Presentation

About This Presentation

Title:

Data%20Centric%20Computing

Description:

Data Centric Computing – PowerPoint PPT presentation

Number of Views:446

Avg rating:3.0/5.0

Slides: 52

Provided by: jimg178

Category:

more less

Transcript and Presenter's Notes

Title: Data%20Centric%20Computing

1
Data Centric Computing
Yotta Zetta Exa Peta Tera Giga Mega Kilo

Jim Gray
Microsoft Research
Research.Microsoft.com/Gray/talks
FAST 2002
Monterey, CA, 14 Oct 1999

2
Put Everything in Future (Disk)
Controllers(its not if, its when?) Jim
Gray Microsoft Research http//Research.Micrsoft
.com/Gray/talks FAST 2002 Monterey, CA, 14
Oct 1999AcknowledgementsDave Patterson
explained this to me long ago Leonard
Chung Kim Keeton Erik Riedel
Catharine Van Ingen
Sub-Title

Helped me sharpen these arguments
3
First Disk 1956

IBM 305 RAMAC
4 MB
50x24 disks
1200 rpm
100 ms access
35k/y rent
Included computer accounting software(tubes
not transistors)

4
10 years later
1.6 meters
5
Disk Evolution
Kilo Mega Giga Tera Peta Exa Zetta Yotta

Capacity100x in 10 years 1 TB 3.5 drive in
2005 20 GB as 1 micro-drive
System on a chip
High-speed SAN
Disk replacing tape
Disk is super computer!

6
Disks are becoming computers

Smart drives
Camera with micro-drive
Replay / Tivo / Ultimate TV
Phone with micro-drive
MP3 players
Tablet
Xbox
Many more

ApplicationsWeb, DBMS, Files OS
Disk Ctlr 1Ghz cpu 1GB RAM
Comm Infiniband, Ethernet, radio
7
Data Gravity Processing Moves to Transducers
smart displays, microphones, printers, NICs,
disks
Processing decentralized Moving to data
sources Moving to power sources Moving to sheet
metal ? The end of computers ?

Storage
Network
Display

8
Its Already True of PrintersPeripheral
CyberBrick

You buy a printer
You get a
several network interfaces
A Postscript engine
cpu,
memory,
software,
a spooler (soon)
and a print engine.

9
The (absurd?) consequences of Moores Law

256 way nUMA?
Huge main memories now 500MB - 64GB memories
then 10GB - 1TB memories
Huge disksnow 20-200 GB 3.5 disks then .1 -
1 TB disks
Petabyte storage farms
(that you cant back up or restore).
Disks gtgt tapes
Small disksOne platter one inch 10GB
SAN convergence 1 GBps point to point is easy

1 GB RAM chips
MAD at 200 Gbpsi
Drives shrink one quantum
10 GBps SANs are ubiquitous
1 bips cpus for 10
10 bips cpus at high end

10
The Absurd Design?

Further segregate processing from storage
Poor locality
Much useless data movement
Amdahls laws bus 10 B/ips io 1 b/ips

Disks
Processors
100 GBps
10 TBps
1 Tips
100TB
11
Whats a Balanced System?(40 disk arms / cpu)
12
Amdahls Balance Laws Revised

Laws right, just need interpretation
(imagination?)
Balanced System Law A system needs 8
MIPS/MBpsIO, but instruction rate must be
measured on the workload.
Sequential workloads have low CPI (clocks per
instruction),
random workloads tend to have higher CPI.
Alpha (the MB/MIPS ratio) is rising from 1 to 6.
This trend will likely continue.
One Random IOs per 50k instructions.
Sequential IOs are larger One sequential IO per
200k instructions

13
Observations re TPC C, H systems

More than ½ the hardware cost is in disks
Most of the mips are in the disk controllers
20 mips/arm is enough for tpcC
50 mips/arm is enough for tpcH
Need 128MB to 256MB/arm
Ref
Gray Shenoy Rules of Thumb
Keeton, Riedel, Uysal, PhD thesis.
? The end of computers ?

14
TPC systems

Normalize for CPI (clocks per instruction)
TPC-C has about 7 ins/byte of IO
TPC-H has 3 ins/byte of IO
TPC-H needs ½ as many disks, sequential vs random
Both use 9GB 10 krpm disks (need arms, not bytes)

15
TPC systems Whats alpha (MB/MIPS)?

Hard to say
Intel 32 bit addressing ( 4GB limit). Known CPI.
IBM, HP, Sun have 64 GB limit. Unknown CPI.
Look at both, guess CPI for IBM, HP, Sun
Alpha is between 1 and 6

Mips Memory Alpha
Amdahl 1 1 1
tpcC Intel 8x262 2Gips 4GB 2
tpcH Intel 8x458 4Gips 4GB 1
tpcC IBM 24 cpus ? 12 Gips 64GB 6
tpcH HP 32 cpus ? 16 Gips 32 GB 2
16
When each disk has 1bips, no need for cpu
17
Implications
Conventional
Radical

Move app to NIC/device controller
higher-higher level protocols CORBA / COM.
Cluster parallelism is VERY important.

Offload device handling to NIC/HBA
higher level protocols I2O, NASD, VIA, IP, TCP
SMP and Cluster parallelism is important.

18
Interim Step Shared Logic

Brick with 8-12 disk drives
200 mips/arm (or more)
2xGbpsEthernet
General purpose OS (except NetApp )
10k/TB to 50k/TB
Shared
Sheet metal
Power
Support/Config
Security
Network ports

Snap 1TB 12x80GB NAS
NetApp .5TB 8x70GB NAS
Maxstor 2TB 12x160GB NAS
19
Next step in the Evolution

Disks become supercomputers
Controller will have 1bips, 1 GB ram, 1 GBps net
And a disk arm.
Disks will run full-blown app/web/db/os stack
Distributed computing
Processors migrate to transducers.

20
Gordon Bells Seven Price Tiers

10 wrist watch computers
100 pocket/ palm computers
1,000 portable computers
10,000 personal computers
(desktop)
100,000 departmental computers
(closet)
1,000,000 site computers
(glass house)
10,000,000 regional computers (glass
castle)

Super-Server Costs more than 100,000
Mainframe Costs more than 1M Must be an
array of processors, disks, tapes comm
ports
21
Bells Evolution of Computer Classes
Technology enable two evolutionary paths 1.
constant performance, decreasing cost 2.
constant price, increasing performance
1.26 2x/3 yrs -- 10x/decade 1/1.26 .8 1.6
4x/3 yrs --100x/decade 1/1.6 .62
22
NAS vs SAN
High level Interfaces are better

Network Attached Storage
File servers
Database servers
Application servers
(its a slippery slope as Novell showed)
Storage Area Network
A lower life form
Block server get block / put block
Wrong abstraction level (too low level)
Security is VERY hard to understand.
(who can read that disk block?)

SCSI and iSCSI are popular.
23
How Do They Talk to Each Other?

Each node has an OS
Each node has local resources A federation.
Each node does not completely trust the others.
Nodes use RPC to talk to each other
WebServices/SOAP? CORBA? COM? RMI?
One or all of the above.
Huge leverage in high-level interfaces.
Same old distributed system story.

Applications
Applications
datagrams
datagrams
streams
RPC
?
streams
RPC
?
SIO
SIO
SAN
24
Basic Argument for x-Disks

Future disk controller is a super-computer.
1 bips processor
256 MB dram
1 TB disk plus one arm
Connects to SAN via high-level protocols
RPC, HTTP, SOAP, COM, Kerberos, Directory
Services,.
Commands are RPCs
management, security,.
Services file/web/db/ requests
Managed by general-purpose OS with good dev
environment
Move apps to disk to save data movement
need programming environment in controller

25
The Slippery Slope
Nothing Sector Server

If you add function to server
Then you add more function to server
Function gravitates to data.

Something Fixed App Server
Everything App Server
26
Why Not a Sector Server?(lets get physical!)

Good idea, thats what we have today.
But
cache added for performance
Sector remap added for fault tolerance
error reporting and diagnostics added
SCSI commends (reserve,.. are growing)
Sharing problematic (space mgmt, security,)
Slipping down the slope to a 2-D block server

27
Why Not a 1-D Block Server?Put A LITTLE on the
Disk Server

Tried and true design
HSC - VAX cluster
EMC
IBM Sysplex (3980?)
But look inside
Has a cache
Has space management
Has error reporting management
Has RAID 0, 1, 2, 3, 4, 5, 10, 50,
Has locking
Has remote replication
Has an OS
Security is problematic
Low-level interface moves too many bytes

28
Why Not a 2-D Block Server?Put A LITTLE on the
Disk Server

Tried and true design
Cedar -gt NFS
file server, cache, space,..
Open file is many fewer msgs
Grows to have
Directories Naming
Authentication access control
RAID 0, 1, 2, 3, 4, 5, 10, 50,
Locking
Backup/restore/admin
Cooperative caching with client

29
Why Not a File Server?Put a Little on the 2-D
Block Server

Tried and true design
NetWare, Windows, Linux, NetApp, Cobalt,
SNAP,...WebDav
Yes, but look at NetWare
File interface grew
Became an app server
Mail, DB, Web,.
Netware had a primitive OS
Hard to program, so optimized wrong thing

30
Why Not Everything?Allow Everything on Disk
Server(thin clients)

Tried and true design
Mainframes, Minis, ...
Web servers,
Encapsulates data
Minimizes data moves
Scaleable
It is where everyone ends up.
All the arguments against are short-term.

31
The Slippery Slope
Nothing Sector Server

If you add function to server
Then you add more function to server
Function gravitates to data.

Something Fixed App Server
Everything App Server
32
Disk Node

has magnetic storage (1TB?)
has processor DRAM
has SAN attachment
has execution environment

Applications
Services
DBMS
File System
RPC, ...
SAN driver
Disk driver
OS Kernel
33
Hardware

Homogenous machines leads to quick response
through reallocation
HP desktop machines, 320MB RAM, 3u high, 4 100GB
IDE Drives
4k/TB (street), 2.5processors/TB, 1GB RAM/TB
3 weeks from ordering to operational

Slide courtesy of Brewster Kahle, _at_ Archive.org
34
Disk as Tape

Tape is unreliable, specialized, slow, low
density, not improving fast, and expensive
Using removable hard drives to replace tapes
function has been successful
When a tape is needed, the drive is put in a
machine and it is online. No need to copy from
tape before it is used.
Portable, durable, fast, media cost raw tapes,
dense. Unknown longevity suspected good.

Slide courtesy of Brewster Kahle, _at_ Archive.org
35
Disk As Tape What format?

Today I send NTFS/SQL disks.
But that is not a good format for Linux.
Solution Ship NFS/CIFS/ODBC servers (not disks)
Plug disk into LAN.
DHCP then file or DB server via standard
interface.
Web Service in long term

36
Some Questions

Will the disk folks deliver?
What is the product?
How do I manage 1,000 nodes (disks)?
How do I program 1,000 nodes (disks)?
How does RAID work?
How do I backup a PB?
How do I restore a PB?

37
Will the disk folks deliver? Maybe!Hard Drive
Unit Shipments
Source DiskTrend/IDC
Not a pretty picture (lately)
38
Most Disks are Personal

85 of disks are desktop/mobile (not SCSI)
Personal media is AT LEAST 50 of the problem.
How to manage your shoebox of
Documents
Voicemail
Photos
Music
Videos

39
What is the Product?(see next section on media
management)

Concept Plug it in and it works!
Music/Video/Photo appliance (home)
Game appliance
PC
File server appliance
Data archive/interchange appliance
Web appliance
Email appliance
Application appliance
Router appliance

network
power
40
Auto Manage Storage

1980 rule of thumb
A DataAdmin per 10GB, SysAdmin per mips
2000 rule of thumb
A DataAdmin per 5TB
SysAdmin per 100 clones (varies with app).
Problem
5TB is 50k today, 5k in a few years.
Admin cost gtgt storage cost !!!!
Challenge
Automate ALL storage admin tasks

41
How do I manage 1,000 nodes?

You cant manage 1,000 x (for any x).
They manage themselves.
You manage exceptional exceptions.
Auto Manage
Plug Play hardware
Auto-load balance placement storage
processing
Simple parallel programming model
Fault masking
Some positive signs
Few admins at Google 10k nodes 2 PB ,
Yahoo! ? nodes, 0.3 PB, Hotmail 10k
nodes, 0.3 PB

42
How do I program 1,000 nodes?

You cant program 1,000 x (for any x).
They program themselves.
You write embarrassingly parallel programs
Examples SQL, Web, Google, Inktomi, HotMail,.
PVM and MPI prove it must be automatic (unless
you have a PhD)!
Auto Parallelism is ESSENTIAL

43
Plug Play Software

RPC is standardizing (SOAP/HTTP, COM,
RMI/IIOP)
Gives huge TOOL LEVERAGE
Solves the hard problems
naming,
security,
directory service,
operations,...
Commoditized programming environments
FreeBSD, Linix, Solaris, tools
NetWare tools
WinCE, WinNT, tools
JavaOS tools
Apps gravitate to data.
General purpose OS on dedicated ctlr can run apps.

44
Its Hard to Archive a PetabyteIt takes a LONG
time to restore it.

At 1GBps it takes 12 days!
Store it in two (or more) places online (on
disk?). A geo-plex
Scrub it continuously (look for errors)
On failure,
use other copy until failure repaired,
refresh lost copy from safe copy.
Can organize the two copies differently
(e.g. one by time, one by space)

45
Disk vs Tape

Disk
160 GB
25 MBps
5 ms seek time
3 ms rotate latency
2/GB for drive 1/GB for ctlrs/cabinet
4 TB/rack

Tape
100 GB
10 MBps
30 sec pick time
Many minute seek time
5/GB for media10/GB for drivelibrary
10 TB/rack

Guestimates Cern 200 TB 3480 tapes 2 col
50GB Rack 1 TB 20 drives
The price advantage of tape is narrowing, and
the performance advantage of disk is growing
46
Im a disk bigot

I hate tape, tape hates me.
Unreliable hardware
Unreliable software
Poor human factors
Terrible latency, bandwidth
Disk
Much easier to use
Much faster
Cheaper!
But needs new concepts

47
Disk as Tape Challenges

Offline disk (safe from virus)
Trivialize Backup/Restore software
Things never change
Just object versions
Snapshot for continuous change (databases)
RAID in a SAN
(cross-disk journaling)
Massive replication (a la Farsite)

48
Summary

Disks will become supercomputers
Compete in Linux appliance space
Build best NAS software (compete with NetApp, ..)
Auto-manage huge storage farms FarSite, SQL
autoAdmin,
Build worlds best disk-based backup system
Including Geoplex (compete with Veritas,..)
Push faster on 64-bit

49
Storage capacity beating Moores law

2 k/TB today (raw disk)
1k/TB by end of 2002

50
Trends Magnetic Storage Densities

Amazing progress
Ratios have changed
Capacity grows 60/y
Access speed grows 10x more slowly

51
Trends Density Limits
Density vs Time b/µm2 Gb/in2
Bit Density

The end is near!
Products23 GbpsiLab 50 Gbpsilimit
60 Gbpsi
Butlimit keeps rising there are alternatives

b/µm2 Gb/in2
? NEMS, Florescent? Holographic, DNA?
3,000 2,000
1,000 600
300 200
SuperParmagnetic Limit
100 60
30 20
Wavelength Limit
ODD
10 6
DVD
3 2
CD

1 0.6
Figure adapted from Franco Vitaliano, The NEW
new media the growing attraction of nonmagnetic
storage, Data Storage, Feb 2000, pp 21-32,
www.datastorage.com
1990 1992 1994 1996 1998 2000 2002 2004
2006 2008
52
CyberBricks

Disks are becoming supercomputers.
Each disk will be a file server then SOAP server
Multi-disk bricks are transitional
Long-term brick will have OS per disk.
Systems will be built from bricks.
There will also be
Network Bricks
Display Bricks
Camera Bricks
.

53
Data Centric Computing
Yotta Zetta Exa Peta Tera Giga Mega Kilo

Jim Gray
Microsoft Research
Research.Microsoft.com/Gray/talks
FAST 2002
Monterey, CA, 14 Oct 1999

54
Communications Excitement!!
Point-to-Point
Broadcast
lecture concert
conversation money
Net Work DB
Immediate
book newspaper
mail
Time Shifted
Data Base
Its ALL going electronic Information is being
stored for analysis (so ALL database) Analysis
Automatic Processing are being added
Slide borrowed from Craig Mundie
55
Information Excitement!

But comm just carries information
Real value added is
information capture render speech, vision,
graphics, animation,
Information storage retrieval,
Information analysis

56
Information At Your Fingertips

All information will be in an online database
(somewhere)
You might record everything you
read 10MB/day, 400 GB/lifetime (5 disks today)
hear 400MB/day, 16 TB/lifetime (2 disks/year
today)
see 1MB/s, 40GB/day, 1.6 PB/lifetime (150
disks/year maybe someday)
Data storage, organization, and analysis is
challenge.
text, speech, sound, vision, graphics, spatial,
time
Information at Your Fingertips
Make it easy to capture
Make it easy to store organize analyze
Make it easy to present access

57
How much information is there?
Yotta Zetta Exa Peta Tera Giga Mega Kilo

Soon everything can be recorded and indexed
Most bytes will never be seen by humans.
Data summarization, trend detection anomaly
detection are key technologies
See Mike Lesk How much information is there
http//www.lesk.com/mlesk/ksg97/ksg.html
See Lyman Varian
How much information
http//www.sims.berkeley.edu/research/projects/how
-much-info/

Everything! Recorded
All Books MultiMedia
All LoC books (words)
.Movie
A Photo
A Book
24 Yecto, 21 zepto, 18 atto, 15 femto, 12 pico, 9
nano, 6 micro, 3 milli
58
Why Put Everything in Cyberspace?
Low rent min /byte Shrinks time now or
later Shrinks space here or there Automate
processing knowbots
Point-to-Point OR Broadcast
Immediate OR Time Delayed
Locate Process Analyze Summarize
59
Disk Storage Cheaper than Paper

File Cabinet cabinet (4 drawer) 250 paper
(24,000 sheets) 250 space (2x3 _at_
10/ft2) 180 total 700 3 /sheet
Disk disk (160 GB ) 300 ASCII
100 m pages 0.0001 /sheet (10,000x
cheaper)
Image 1 m photos 0.03
/sheet (100x cheaper)
Store everything on disk

60
Gordon Bells MainBrainDigitize EverythingA
BIG shoebox?

Scans 20 k pages tiff_at_ 300 dpi 1 GB
Music 2 k tacks 7 GB
Photos 13 k images 2 GB
Video 10 hrs 3 GB
Docs 3 k (ppt, word,..) 2 GB
Mail 50 k messages 1 GB
16 GB

61
Gary Starkweather

Scan EVERYTHING
400 dpi TIFF
70k pages 14GB
OCR all scans (98 recognition ocr accuracy)
All indexed (5 second access to anything)
All on his laptop.

Q What happens when the personal terabyte
arrives?
A Things will run SLOWLY. unless we add good
software

63
Summary

Disks will morph to appliances
Main barriers to this happening
Lack of Cool Apps
Cost of Information management

64
The Absurd Disk

2.5 hr scan time (poor sequential access)
1 aps / 5 GB (VERY cold data)
Its a tape!

1 TB
100 MB/s
200 Kaps
65
Crazy Disk Ideas

Disk Farm on a card surface mount disks
Disk (magnetic store) on a chip (micro machines
in Silicon)
Full Apps (e.g. SAP, Exchange/Notes,..) in the
disk controller (a processor with 128 MB dram)

ASIC
The Innovator's Dilemma When New Technologies
Cause Great Firms to FailClayton M.
Christensen.ISBN 0875845851
66
The Disk Farm On a Card

The 500GB disc card
An array of discs
Can be used as
100 discs
1 striped disc
50 Fault Tolerant discs
....etc
LOTS of accesses/second
bandwidth

14"
67
Trends promises NEMS (Nano Electro Mechanical
Systems)(http//www.nanochip.com/) also
Cornell, IBM, CMU,

250 Gbpsi by using tunneling electronic
microscope
Disk replacement
Capacity 180 GB now, 1.4 TB in 2 years
Transfer rate 100 MB/sec RW
Latency 0.5msec
Power 23W active, .05W Standby
10k/TB now, 2k/TB in 2004

68
Trends Gilders Law 3x bandwidth/year for 25
more years

Today
40 Gbps per channel (?)
12 channels per fiber (wdm) 500 Gbps
32 fibers/bundle 16 Tbps/bundle
In lab 3 Tbps/fiber (400 x WDM)
In theory 25 Tbps per fiber
1 Tbps USA 1996 WAN bisection bandwidth
Aggregate bandwidth doubles every 8 months!

1 fiber 25 Tbps
69
Technology Drivers What if Networking Was as
Cheap As Disk IO?

TCP/IP
Unix/NT 100 cpu _at_ 40MBps

Disk
Unix/NT 8 cpu _at_ 40MBps

70
SAN Standard Interconnect
Gbps Ethernet 110 MBps

LAN faster than memory bus?
1 GBps links in lab.
100 port cost soon
Port is computer

PCI 70 MBps
UW Scsi 40 MBps
FW scsi 20 MBps
scsi 5 MBps
71
Building a Petabyte Store

EMC 500k/TB 500M/PB plus FC switches
plus 800M/PB
TPC-C SANs (Dell 18GB/) 62 M/PB
Dell local SCSI, 3ware 20M/PB
Do it yourself 5M/PB

72
The Cost of Storage(heading for 1K/TB soon)
73
Cheap Storage or Balanced System

Low cost storage (2 x 1.5k servers) 6K TB2x
(1K system 8x80GB disks 100MbEthernet)
Balanced server (7k/.5 TB)
2x800Mhz (2k)
256 MB (400)
8 x 80 GB drives (2K)
Gbps Ethernet switch (1k)
11k TB, 22K/RAIDED TB

74
320 GB, 2k (now)

4x80 GB IDE(2 hot plugable)
(1,000)
SCSI-IDE bridge
200k
Box
500 Mhz cpu
256 MB SRAM
Fan, power, Enet
700
Or 8 disks/box640 GB for 3K ( or 300 GB RAID)

75
(No Transcript)
76
Hot Swap Drives for Archive or Data Interchange

25 MBps write(so can write N x 160 GB in 3
hours)
160 GB/overnite
N x 4 MB/second
_at_ 19.95/nite

77
Data delivery costs 1/GB today

Rent for big customers 300/megabit per
second per month
Improved 3x in last 6 years (!).
That translates to 1/GB at each end.
You can mail a 160 GB disk for 20.
Thats 16x cheaper
If overnight its 3 MBps.

3x160 GB ½ TB
78
Data on Disk Can Move to RAM in 8 years
301
6 years
79
Storage Latency How Far Away is the Data?
Andromeda
9
Tape /Optical
10
2,000 Years
Robot
6
Pluto
Disk
2 Years
10
1.5 hr
Springfield
Memory
100
This Campus
10
10 min
On Board Cache
On Chip Cache
2
This Room
Registers
1
My Head
1 min
80
More Kaps and Kaps/ but.