Jim Gray Talk at University of Tokyo - PowerPoint PPT Presentation

About This Presentation
Title:

Jim Gray Talk at University of Tokyo

Description:

'Upon the Burning of Our House' Edward Taylor: 'Huswifery' God: Creator - 'Clockmaker theory' ... William Cullen Bryant: 'To a Waterfowl' 'Thanatopsis' From ' ... – PowerPoint PPT presentation

Number of Views:79
Avg rating:3.0/5.0
Slides: 41
Provided by: LLNL
Category:
Tags: gray | jim | talk | tokyo | university

less

Transcript and Presenter's Notes

Title: Jim Gray Talk at University of Tokyo


1
Jim GrayTalk at University of Tokyo
  • Personal views on PITAC report invest in long
    term research
  • Preview of Turing lecture 10 long term research
    problems
  • Bush Summarize info in cyberspace
  • Turing Intelligent Computers
  • 7 9s build systems that are always up and prove
    it.
  • 5-Minute rule
  • For disks
  • For tapes
  • Sorting Progress
  • PennySort
  • Terabyte Sort (!)
  • Slides will be at http//research.Microsoft.com/G
    ray/talks

2
Presidential Advisory Committee onHigh
Performance Computing and Communications,Informat
ion Technologies, and the Next Generation
InternetInformation Technology

http//www.ccic.gov/ac/interim/ or
http//research.microsoft.com/Gray/papers/PITAC_I
nterim_Report_8_98.doc

3
Charter for the Committee provide an
independent assessment of
  • High-Performance Computing and Communications
    (HPCC)
  • Progress
  • Balance among research components
  • Next Generation Internet initiative
  • Progress
  • Balance
  • IT Research and development
  • Maintain United States leadership in
  • IT and
  • Applications

4
Committee Members
  • Co-Chairs
  • Bill Joy, Sun Microsystems Ken Kennedy, Rice
    University
  • Members
  • Eric Benhamou, 3Com Vinton Cerf, MCI
  • Ching-chih Chen, Simmons David Cooper, LLNL
  • Steve Dorfman, Hughes David Dorman, PointCast
  • Bob Ewald, SGI David Farber, U. of
    Pennsylvania
  • Sherri Fuller, U. of Washington Hector
    Garcia-Molina, Stanford
  • Susan Graham, UC Berkeley Jim Gray, Microsoft
  • Danny Hillis, Disney, Inc John Miller, Montana
    State Univ.
  • David Nagel, ATT Raj Reddy, Carnegie Mellon
  • Ted Shortliffe, Stanford Larry Smarr, U. of
    Illinois _at_ UC
  • Joe Thompson, Miss. State U. Les Vadasz, Intel
  • Andy Viterbi, Qualcom Steve Wallach,
    Centerpoint
  • Irving Wladawsky-Berger, IBM

5
My Summary of the Report
  • 1/3 of the US economic growth since 1992 was in
    the IT sector. IT is key to our health, wealth,
    and safety.
  • Created 400 B of wealth in last 3 years (!!)
  • Federal IT research funding of twenty years ago,
    created the boom.
  • Federal IT research funding for the last decade
    has been flat (in constant dollars).
  • Research funding is increasingly near-term
    applied development
  • The committee recommends Increase long-term
    research funding in
  • Software design and implementation technologies
  • Technologies to scale the Next Generation
    Internet to 6 billion users.
  • Tools, algorithms, and systems for
    high-performance computing.
  • Spend a billion dollars over the next 5 years on
    Lewis and Clark style "expeditions" into
    cyberspace.

6
Myths
  • Now that IT is a big business, Industry will do
    long term research.
  • FACT
  • industry spends LITTLE on long-term research.
  • it is not in their best interest
  • IT research buy computers for scientists.
  • FACT
  • computer science research
  • is different from
  • the application of computers to some discipline.

7
Research Priorities
  • Findings
  • Total federal Information technology RD
    investment is inadequate
  • Federal IT RD is excessively focused on
    near-term problems
  • Recommendations
  • Create a strategic initiative in long-term IT
    RD
  • Increase the investment for research in
    software, scalable information
    infrastructure, high-end computing, and
    socio-economic and workforce impacts

8
Software Research
  • Findings
  • Demand for software far exceeds the nations
    ability to produce it
  • The nation depends on fragile software
  • Technologies to build reliable and secure
    software are inadequate
  • The nation is under-investing in fundamental
    software research
  • Recommendations
  • Fund more fundamental research in software
    development methods and component technologies
  • Sponsor a national library of software
    components
  • Make software research a substantive component of
    every major IT research initiative
  • Support research in human-computer interfaces and
    interaction
  • Make fundamental software research an absolute
    priority

9
Scalable Information Infrastructure
  • Findings
  • The Internet has grown well beyond the intent of
    its original designers
  • Our nations dependence on the information
    infrastructure is increasing daily
  • We cannot safely extend what we currently know to
    more complex systems
  • Learning how to build large-scale, highly
    reliable and secure systems requires research
  • Recommendations
  • Increase funding in research and development of
    core software and communications technologies
    aimed directly at the challenge of scaling the
    information infrastructure
  • Expand the Next Generation Internet test beds to
    include additional industry partnerships in order
    to foster the rapid commercialization and
    deployment of enabling technologies

10
High-End Computing
  • Findings HEC is
  • essential for science and engineering research
  • an element of the United States national security

  • ripe for new applications
  • suppliers suffer from unusual market pressures
  • Research Development Recommendations
  • Fund innovative technologies and architectures
  • Fund HEC software (parallel programming)
  • Aim for a real application petaops by 2010
    through a both hardware and software strategies
  • Fund HEC systems for science and engineering
    research

11
Social, Economic, Workforce Recommendations
  • Expand research on the social and economic
    impacts of information technology diffusion and
    adoption
  • Expand initiatives to increase IT literacy,
    access and research capabilities
  • Address the shortage of high-technology workers
  • Programs to re-train stale IT workers
  • Encourage participation by women and minorities
  • Short-term increase in immigration of skilled IT
    workers

12
Conclusions
  • IT is an essential foundation for commerce,
    education, health care, environmental
    stewardship, and national security
  • Dramatically transform the way we communicate,
    learn, deal with information and conduct
    research
  • Transform the nature of work, nature of commerce,
    product design cycle, practice of health care,
    and the government itself
  • The total Federal IT RD investment is
    inadequate
  • The Federal IT RD is excessively focused on
    near-term problems
  • U. S. government must
  • Create a strategic initiative in long-term IT
    RD
  • Establish an effective structure for managing and
    coordinating IT

13
Jim GrayTalk at University of Tokyo
  • Personal views on PITAC report invest in long
    term research
  • Preview of Turing lecture 10 long term research
    problems
  • Bush Summarize info in cyberspace
  • Turing Intelligent Computers
  • 7 9s build systems that are always up and prove
    it.
  • 5-Minute rule
  • For disks
  • For tapes
  • Sorting Progress
  • PennySort
  • Terabyte Sort (!)
  • Slides will be at http//research.Microsoft.com/G
    ray/talks

14
Vanaveer Bush Memex
  • Memex Proposed putting all information online
    (1948)
  • It will happen
  • Result InfoGlut. Too much information in the
    shoebox
  • Challenge
  • Organize the information.
  • Give answers as good as an expert in the field.
  • Anticipate questions and so inform subscriber
  • Protect personal privacy
  • A hacker cannot get access to your personal
    information without your consent.

15
Turings Test (1951) Intelligent Machines
  • Computers helped with the 4-color problem end
    game
  • Computers (and people) won world chess
    championship
  • Computers will likely be our 5th brain
  • Augment our intelligence
  • See for us, hear for us, read for us,
  • Prosthetic eyes, ears, voices, arms, legs,.
  • Probably computers will be intelligent like
    plants and animals.
  • Perhaps computers can be intelligent like people
  • Pass the Turing Test (easy/impossible?) (70, 5
    minutes, B can lie)
  • Translating telephone (as good as a human
    translator)
  • Read a textbook and pass the written exam.
  • Pass a graduate programming class
  • Pass a graduate literature class
  • Radical Download someone.

16
Dependable Systems
  • Build a system used by millions of people each
    day.
  • Then
  • Prove that it does what it is supposed to do
    (code matches spec).
  • Prove that it delivers 99.99999 (7 9s)
    availability (1 hr per millennium)
  • Prove that it cannot be hacked for less than
    1B (Y2K )
  • Then build the system automatically from the
    specification.

17
Jim GrayTalk at University of Tokyo
  • Personal views on PITAC report invest in long
    term research
  • Preview of Turing lecture 10 long term research
    problems
  • Bush Summarize info in cyberspace
  • Turing Intelligent Computers
  • 7 9s build systems that are always up and prove
    it.
  • 5-Minute rule
  • For disks
  • For tapes
  • Sorting Progress
  • PennySort
  • Terabyte Sort (!)
  • Slides will be at http//research.Microsoft.com/G
    ray/talks

18
Storage Hierarchy (9 levels)
  • Cache 1, 2
  • Main (1, 2, 3 if nUMA).
  • Disk (1 (cached), 2)
  • Tape (1 (mounted), 2)

19
Meta-Message Technology Ratios Are Important
  • If everything gets faster cheaper at the
    same rate THEN nothing really changes.
  • Things getting MUCH BETTER
  • communication speed cost 1,000x
  • processor speed cost 100x
  • storage size cost 100x
  • Things staying about the same
  • speed of light (more or less constant)
  • people (10x more expensive)
  • storage speed (only 10x better)

20
Todays Storage Hierarchy Speed Capacity vs
Cost Tradeoffs
Size vs Speed
Price vs Speed
Cache
Nearline
Tape
Offline
Main
Tape
Disc
Secondary
Online
Online
Secondary
/MB
Tape
Tape
Disc
Typical System (bytes)
Main
Offline
Nearline
Tape
Tape
Cache
-9
-6
-3
0
3
-9
-6
-3
0
3
10
10
10
10
10
10
10
10
10
10
Access Time (seconds)
Access Time (seconds)
21
Storage Ratios Changed
  • 10x better access time
  • 10x more bandwidth
  • 4,000x lower media price
  • DRAM/DISK 1001 to 1010 to 501

22
Thesis Performance Storage Accesses not
Instructions Executed
  • In the old days we counted instructions and
    IOs
  • Now we count memory references
  • Processors wait most of the time

Where the time goes
clock ticks used by AlphaSort Components
Disc Wait
Sort
Sort
Disc Wait
OS
Memory Wait
23
The Pico Processor
1 M SPECmarks 106 clocks/ fault to b
ulk ram Event-horizon on chip. VM reincarnat
ed
Multi-program cache
Terror Bytes!
24
Storage Latency How Far Away is the Data?
Andromeda
9
Tape /Optical
10
2,000 Years
Robot
6
Pluto
Disk
2 Years
10
1.5 hr
Sacramento
Memory
100
This Campus
10
10 min
On Board Cache
On Chip Cache
2
This Room
Registers
1
My Head
1 min
25
The 5 Minute Rule Derived
  • M cost of a RAM page
  • RAM /MB
  • PageSize x Lifetime
  • A cost of a disk access
  • Disk Price
  • AccessesPerSec x Lifetime
  • RI Reference Interval
  • time between accesses to page


Breakeven M A / Reference Interval
Reference Interval M/A
DiskPrice x PageSize
RAMprice x
AccPerSec
Reference Interval Time
26
The Five Minute Rule Observations
  • Break even has two terms
  • (2) Economic term DiskPrice /
    RAM_MB_Price 4004 1001
  • (1) Technology term PageSize /
    DiskAccPerSec 8KB 80 1001
  • Economic term trends down
  • Technology term trends up to compensate.
  • Still at 5 minute for random, 1 minute sequential

27
Shows Best Page Index Page Size 16KB
28
Standard Storage Metrics
  • Capacity
  • RAM MB and /MB today at 10MB 100/MB
  • Disk GB and /GB today at 10 GB and 200/GB
  • Tape TB and /TB today at .1TB and 25k/TB
    (nearline)
  • Access time (latency)
  • RAM 100 ns
  • Disk 10 ms
  • Tape 30 second pick, 30 second position
  • Transfer rate
  • RAM 1 GB/s
  • Disk 5 MB/s - - - Arrays can go to 1GB/s
  • Tape 5 MB/s - - - striping is problematic

29
New Storage Metrics Kaps, Maps, SCAN?
  • Kaps How many KB objects served per second
  • The file server, transaction processing metric
  • This is the OLD metric.
  • Maps How many MB objects served per sec
  • The Multi-Media metric
  • SCAN How long to scan all the data
  • The data mining and utility metric
  • And
  • Kaps/, Maps/, TBscan/

30
For the Record (good 1998 devices packaged in
systemhttp//www.tpc.org/results/individual_resul
ts/Dell/dell.6100.9801.es.pdf)
X 14
31
For the Record (good 1998 devices packaged in
systemhttp//www.tpc.org/results/individual_resul
ts/Dell/dell.6100.9801.es.pdf)
X 14
32
How To Get Lots of Maps, SCANs
  • parallelism use many little devices in parallel
  • Beware of the media myth
  • Beware of the access time myth

At 10 MB/s 1.2 days to scan
1,000 x parallel 100 seconds SCAN.
Parallelism divide a big problem into many
smaller ones to be solved in parallel.
33
The Disk Farm On a Card
  • The 1 TB disc card
  • An array of discs
  • Can be used as
  • 100 discs
  • 1 striped disc
  • 10 Fault Tolerant discs
  • ....etc
  • LOTS of accesses/second
  • bandwidth

14"
Life is cheap, its the accessories that cost ya.
Processors are cheap, its the peripherals that
cost ya
(a 10k disc card).
34
Tape Farms for Tertiary StorageNot Mainframe
Silos
100 robots
1M
50TB
50/GB
3K Maps
10K robot

14 tapes
27 hr Scan
500 GB
5 MB/s
20/GB
Scan in 27 hours. many independent tape robots (
like a disc farm)

30 Maps
35
Tape Optical Beware of the Media Myth
Optical is cheap 200 /platter
2 GB/platter
100/GB (2x cheaper than disc)
Tape is cheap 30 /tape 20 GB
/tape 1.5 /GB (100x cheaper than disc
).
36
Tape Optical Reality Media is 10 of System
Cost
Tape needs a robot (10 k ... 3 m )
10 ... 1000 tapes (at 20GB each) 20/GB
... 200/GB (1x10x cheaper than disc) O
ptical needs a robot (100 k )
100 platters 200GB ( TODAY ) 400 /GB
( more expensive than mag disc )
Robots have poor access times Not good fo
r Library of Congress (25TB) Data motel da
ta checks in but it never checks out!
37
The Access Time Myth
  • The Myth seek or pick time dominates
  • The reality (1) Queuing dominates
  • (2) Transfer dominates
    BLOBs
  • (3) Disk seeks often short
  • Implication many cheap servers better than
    one fast expensive server
  • shorter queues
  • parallel transfer
  • lower cost/access and cost/byte
  • This is now obvious for disk arrays
  • This will be obvious for tape arrays

38
Jim GrayTalk at University of Tokyo
  • Personal views on PITAC report invest in long
    term research
  • Preview of Turing lecture 10 long term research
    problems
  • Bush Summarize info in cyberspace
  • Turing Intelligent Computers
  • 7 9s build systems that are always up and prove
    it.
  • 5-Minute rule
  • For disks
  • For tapes
  • Sorting Progress
  • PennySort
  • Terabyte Sort (!)
  • Slides will be at http//research.Microsoft.com/G
    ray/talks

39
Penny Sort Ground Ruleshttp//research.microsoft.
com/barc/SortBenchmark
  • How much can you sort for a penny.
  • Hardware and Software cost
  • Depreciated over 3 years
  • 1M system gets about 1 second,
  • 1K system gets about 1,000 seconds.
  • Time (seconds) SystemPrice () / 946,080
  • Input and output are disk resident
  • Input is
  • 100-byte records (random data)
  • key is first 10 bytes.
  • Must create output file and fill with sorted
    version of input file.
  • Daytona (product) and Indy (special) categories

40
PennySort
  • Hardware
  • 266 Mhz Intel PPro
  • 64 MB SDRAM (10ns)
  • Dual Fujitsu DMA 3.2GB EIDE disks
  • Software
  • NT workstation 4.3
  • NT 5 sort
  • Performance
  • sort 15 M 100-byte records (1.5 GB)
  • Disk to disk
  • elapsed time 820 sec
  • cpu time 404 sec

41
How Good is NT5 Sort?
  • CPU and IO not overlapped.
  • System should be able to sort 2x more
  • RAM has spare capacity
  • Disk is space saturated (1.5GB in, 1.5GB out on
    3GB drive.) Need an extra 3GB drive or a 6GB
    drive


Disk
CPU
Fixed
ram
42
Sort Speed Doubles Every Year
?
?h
?
43
Recent Results
  • NOW Sort 9 GB on a cluster of 100 UltraSparcs
    in 1 minute
  • MilleniumSort 16x Dell NT cluster 100 MB in 1.8
    Sec (Datamation)
  • Tandem/Sandia Sort 68 CPU ServerNet 1 TB in
    47 minutes
  • Rumor of IBM Sort 7000 cpu Blue Pacific 1
    TB in 1024 seconds (17 minutes). 10 Mrps
    (1GBps)

44
Jim GrayTalk at University of Tokyo
  • Personal views on PITAC report invest in long
    term research
  • Preview of Turing lecture 10 long term research
    problems
  • Bush Summarize info in cyberspace
  • Turing Intelligent Computers
  • 7 9s build systems that are always up and prove
    it.
  • 5-Minute rule
  • For disks
  • For tapes
  • Sorting Progress
  • PennySort
  • Terabyte Sort (!)
  • Slides will be at http//research.Microsoft.com/G
    ray/talks
Write a Comment
User Comments (0)
About PowerShow.com