Title: What I have been Doing
1What I have been Doing
- Peta Bumps
- 10k TB
- Scaleable Computing
- Sloan Digital Sky Survey
2Sense of scale
300 MBps OC48 G2 Or memcpy()
- How fat is your pipe?
- Fattest pipe on MS campus is the WAN!
20 MBps disk / ATM / OC3
94 MBps Coast to Coast
90 MBps PCI
3Redmond/Seattle, WA
Information Sciences Institute Microsoft Qwest Uni
versity of Washington Pacific Northwest
Gigapop HSCC (high speed connectivity
consortium) DARPA
New York
Arlington, VA
San Francisco, CA
5626 km 10 hops
4The Path
- DC -gt SEA
- C\tracert -d 131.107.151.194
- Tracing route to 131.107.151.194 over a maximum
of 30 hops - 0
------- DELL 4400 Win2K WKS - Arlington Virginia, ISI Alteon GbE
- 1 16 ms lt10 ms lt10 ms 140.173.170.65
------- Juniper M40 GbE - Arlington Virginia, ISI Interface ISIe
- 2 lt10 ms lt10 ms lt10 ms 205.171.40.61
------- Cisco GSR OC48 - Arlington Virginia, Qwest DC Edge
- 3 lt10 ms lt10 ms lt10 ms 205.171.24.85
------- Cisco GSR OC48 - Arlington Virginia, Qwest DC Core
- 4 lt10 ms lt10 ms 16 ms 205.171.5.233
------- Cisco GSR OC48 - New York, New York, Qwest NYC Core
- 5 62 ms 63 ms 62 ms 205.171.5.115
------- Cisco GSR OC48 - San Francisco, CA, Qwest SF Core
- 6 78 ms 78 ms 78 ms 205.171.5.108
------- Cisco GSR OC48 - Seattle, Washington, Qwest Sea Core
- 7 78 ms 78 ms 94 ms 205.171.26.42
------- Juniper M40 OC48 Seattle, Washington,
Qwest Sea Edge - 8 78 ms 79 ms 78 ms 208.46.239.90
------- Juniper M40 OC48
5 750mbps over 5000 km (957 mbps multi-stream)
4e15 bit meters per second4 Peta bmps (peta
bumps)Single Stream tcp/ip throughput
- Information Sciences Institute
- Microsoft
- Qwest
- University of Washington
- Pacific Northwest Gigapop
- HSCC (high speed connectivity consortium)
- DARPA
5 Peta bmps multi-stream
6 PetaBumps
- 751 mbps for 300 seconds (28 GB)
- single-thread single-stream tcp/ip
desktop-to-desktop out of the box performance - 5626 km x 751Mbps 4.2e15 bit meter /
second 4.2 Peta bmps - Multi-steam is 952 mbps 5.2 Peta bmps
- 4470 byte MTUs were enabled on all routers.
- 20 MB window size
7(No Transcript)
8Pointers
- The single-stream submission http//research.micr
osoft.com/gray/papers/Windows2000_I2_land_Speed_
Contest_Entry_(Single_Stream_mail).htm - The multi-stream submission http//research.Micro
soft.com/gray/papers/ - Windows2000_I2_land_Speed_Contest_Entry_(Multi_St
ream_mail).htm - The code http//research.Microsoft.com/gray/pap
ers/speedy.htm speedy.h speedy.cAnd
a PowerPoint presentation about it.
http//research.Microsoft.com/gray/papers/ Wi
ndows2000_WAN_Speed_Record.ppt
9What I have been Doing
- Peta Bumps
- 10k TB
- Scaleable Computing
- Sloan Digital Sky Survey
10TPC-C high performance clusters
- Standard transaction processing benchmark
- Mix of 5 simple transaction types.
- Database scales with workload
- Measures balanced system.
11Scalability Successes
- Single Site Clusters
- Billions of transactions per day
- Tera-Ops Peta-Bytes (10 k node clusters)
- Micro-dollar/transaction
- Hardware Software advances
- TPC Sort examples (2x/year)
- Many other examples
12Progress since Jan 99 Running out of gas?
- 50 better peak perf (not 2x)
- 2x better Price/Performance
- At a cost ceiling Systems cost 7M-13M
- June 98 result hero effort(off-scale
good!)(Compaq/Alpha/Oracle 96 cpu, 8node
cluster, 102,542 tpmC _at_139/tpmC, 5/5/98)
Outa gas?
Outa gas?
132/17/00 back on Schedule!!
Back on Schedule!
- First proof point of commoditized scale-out
- 1.7x Better Performance3x Better
price/performance - 4M vs 7M-13M
- Much more to do, butgreat start!
144.5 GB (45 m records) 886 seconds on a 1010
Win2K/Intel system HMsort doc (74KB), pdf
(32KB). Brad Helmkamp, Keith McCready,Stenograp
h LLC
Penny
4.5 GB (45 m records) 886 seconds on a 1010
Win2K/Intel system HMsort doc (74KB), pdf
(32KB). Brad Helmkamp, Keith McCready,Stenograp
h LLC
Minute
7.6 GB in 60 secondsOrdinal NsortSGI 32 cpu
Origin IRIXÂ
21.8 GB 218 M records in 56.51 secNOWHPVMsort
64 nodes WinNT pdf (170KB). Luis Rivera , Xianan
Zhang, Andrew Chien UCSD
TeraByte
49 minutesDaivd Cossock, Sam Fineberg, Pankaj
Mehra, John Peck68x2 Compaq Tandem Sandia Labs
1057 secondsSPsort 1952 SP cluster 2168 disks
Jim Wyllie PDF SPsort.pdf (80KB)
1 M records in .998 Seconds (doc 703KB) or (pdf
50KB) Mitsubishi DIAPRISM Hardware Sorter
with HP 4 x 550MHz Xeon PC server 32 SCSI
disks, Windows NT4 Shinsuke Azuma, Takao Sakuma,
Tetsuya Takeo, Takaaki Ando, Kenji
ShiraiMitsubishi Electric Corp.
Datamation
Datamation
15Whats a Balanced System?
System Bus
PCI Bus
PCI Bus
16Rules of Thumb in Data Engineering
- Moores law -gt an address bit per 18 months.
- Storage grows 100x/decade (except 1000x last
decade!) - Disk data of 10 years ago now fits in RAM
(iso-price). - Device bandwidth grows 10x/decade so need
parallelism - RAMdisktape price is 11030 going to 11010
- Amdahls speedup law S/(SP)
- Amdahls IO law bit of IO per instruction/second
(tBps/10 top! 50,000 disks/10 teraOP 100 M
Dollars) - Amdahls memory law byte per instruction/second
(going to 10) (1 TB RAM per TOP 1 TeraDollars) - PetaOps anyone?
- Gilders law aggregate bandwidth doubles every 8
months. - 5 Minute rule cache disk data that is reused in
5 minutes. - Web rule cache everything!
- http//research.Microsoft.com/gray/papers/MS_TR_
99_100_Rules_of_Thumb_in_Data_Engineering.doc
17Cheap Storage
- Disks are getting cheap
- 7 k/TB disks (25 40 GB disks _at_ 230 each)
18Cheap Storage or Balanced System
- Low cost storage (2 x 1.5k servers) 10K TB2x
(1K system 8x70GB disks 100MbEthernet) - Balanced server (9k/.5 TB)
- 2x800Mhz (2k)
- 256 MB (500)
- 8 x 73 GB drives (4K)
- Gbps Ethernet switch (1.5k)
- 18k TB, 36K/RAIDED TB
19160 GB, 2k (now)300 GB by year end.
- 4x40 GB ID(2 hot plugable)
- (1,100)
- SCSI-IDE bridge
- 200k
- Box
- 500 Mhz cpu
- 256 MB SRAM
- Fan, power, Enet
- 700
- Or 8 disks/box600 GB for 3K ( or 300 GB RAID)
20Hot Swap Drives for Archive or Data Interchange
- 25 MBps write(so can write N x 74 GB in 3
hours) - 74 GB/overnite
- N x 2 MB/second
- _at_ 19.95/nite
21Doing Studies of IO bandwidth
- SCSI IDE bandwidth
- 15-30 MBps sequential
- SCSI 10rpm 110 kaps _at_ 600
- IDE 7.2krpm 80 kaps _at_ 250
- Get 2 disks for the price of 1
- More bandwidth for reads
- RAID
- 10K raid TB by 2001
22What I have been Doing
- Peta Bumps
- 10k TB
- Scaleable Computing
- Sloan Digital Sky Survey
23The Sloan Digital Sky Survey
A project run by the Astrophysical Research
Consortium (ARC)
The University of Chicago Princeton
University The Johns Hopkins University The
University of Washington Fermi National
Accelerator Laboratory US Naval Observatory
The Japanese Participation Group The Institute
for Advanced Study SLOAN Foundation, NSF, DOE,
NASA
Goal To create a detailed multicolor map of the
Northern Sky over 5 years, with a budget of
approximately 80M Data Size 40 TB raw, 1 TB
processed
24Scientific Motivation
Create the ultimate map of the Universe ? The
Cosmic Genome Project! Study the distribution of
galaxies ? What is the origin of
fluctuations? ? What is the topology of the
distribution? Measure the global properties of
the Universe ? How much dark matter is
there? Local census of the galaxy population ?
How did galaxies form? Find the most distant
objects in the Universe ? What are the highest
quasar redshifts?
25First Light Images
Telescope First light May 9th 1998
Equatorial scans
26The First Stripes
Camera 5 color imaging of gt100 square
degrees Multiple scans across the same
fields Photometric limits as expected
27SDSS Data Flow
28SDSS Data Products
Object catalog 400 GB parameters of
gt108 objects Redshift Catalog 1 GB
parameters of 106 objects Atlas Images 1.5
TB 5 color cutouts of gt108 objects
Spectra 60 GB in a one-dimensional
form Derived Catalogs 20 GB - clusters
- QSO absorption lines 4x4 Pixel All-Sky Map
60 GB heavily compressed
All raw data saved in a tape vault at Fermilab
29Distributed Implementation
User Interface
Analysis Engine
Master
SX Engine
Objectivity Federation
Objectivity
Slave
Slave
Slave
Objectivity
Slave
Objectivity
Objectivity
RAID
Objectivity
RAID
RAID
RAID
30What We Have Been Doing
- Helping move the data to SQL
- Database design
- Data loading
- Experimenting with queries on a 4 M object DB
- 20 questions like find gravitational lens
candidates - Queries use parallelism, most run in a few
seconds.(auto parallel) - Some run in hours (neighbors within 1 arcsec)
- EASY to ask questions.
- Helping with an outreach website SkyServer
- Personal goal Try datamining techniques to
re-discover Astronomy
31What I have been Doing
- Peta Bumps
- 10k TB
- Scaleable Computing
- Sloan Digital Sky Survey