Building Peta Byte Data Stores - PowerPoint PPT Presentation

About This Presentation
Title:

Building Peta Byte Data Stores

Description:

A Photo. 24 Yecto, 21 zepto, 18 atto, 15 femto, 12 pico, 9 nano, 6 micro, 3 milli. Trends: ... Hot Swap Drives for Archive or Data Interchange. 35 MBps write ... – PowerPoint PPT presentation

Number of Views:79
Avg rating:3.0/5.0
Slides: 66
Provided by: jimg178
Category:
Tags: building | byte | data | peta | stores

less

Transcript and Presenter's Notes

Title: Building Peta Byte Data Stores


1
Building Peta Byte Data Stores
  • Jim Gray
  • Microsoft Research
  • Research.Microsoft.com/Gray

2
The Asilomar Report on Database Research Phil
Bernstein, Michael Brodie, Stefano Ceri, David
DeWitt, Mike Franklin, Hector Garcia-Molina,
Jim Gray, Jerry Held, Joe Hellerstein, H. V.
Jagadish, Michael Lesk, Dave Maier, Jeff
Naughton, Hamid Pirahesh, Mike Stonebraker, and
Jeff Ullman September 1998
  • the field needs to radically broaden its
    research focus to attack the issues of capturing,
    storing, analyzing, and presenting the vast array
    of online data.
  • -- broadening the definition of database
    management to embrace all the content of the Web
    and other online data stores, and rethinking our
    fundamental assumptions in light of technology
    shifts.
  • encouraging more speculative and long-range
    work, moving conferences to a poster format, and
    publishing all research literature on the Web.
  • http//research.microsoft.com/gray/Asilomar_DB_98
    .html

3
So, how are we doing?
  • Capture, store, analyze, present terabytes?
  • Making web data accessible?
  • Publishing on the web (CoRR?)
  • Posters-Workshops vs Conferences-Journals?

4
Outline
  • Technology
  • 1M/PB store everything online (twice!)
  • End-to-end high-speed networks
  • Gigabit to the desktop
  • So You can store everything,
  • Anywhere in the world
  • Online everywhere
  • Research driven by apps
  • TerraServer
  • National Virtual Astronomy Observatory.

5
Reality Check
  • Good news
  • In the limit, processing storage network is
    free
  • Processing network is infinitely fast
  • Bad news
  • Most of us live in the present.
  • People are getting more expensive.Management/prog
    ramming cost exceeds hardware cost.
  • Speed of light not improving.
  • WAN prices have not changed much in last 8 years.

6
How Much Information Is there?
Yotta Zetta Exa Peta Tera Giga Mega Kilo
Everything! Recorded
  • Soon everything can be recorded and indexed
  • Most data never be seen by humans
  • Precious Resource Human attention
    Auto-Summarization Auto-Searchis key
    technology.www.lesk.com/mlesk/ksg97/ksg.html

All Books MultiMedia
All LoC books (words)
.Movie
A Photo
A Book
24 Yecto, 21 zepto, 18 atto, 15 femto, 12 pico, 9
nano, 6 micro, 3 milli
7
Trends ops/s/ Had Three Growth Phases
  • 1890-1945
  • Mechanical
  • Relay
  • 7-year doubling
  • 1945-1985
  • Tube, transistor,..
  • 2.3 year doubling
  • 1985-2000
  • Microprocessor
  • 1.0 year doubling

8
Storage capacity beating Moores law
  • 4 k/TB today (raw disk)

9
Cheap Storage and/or Balanced System
  • Low cost storage (2 x 3k servers) 6K TB2x (
    800 Mhz, 256Mb 8x80GB disks 100MbE)
  • Balanced server (5k/.64 TB)
  • 2x800Mhz (2k)
  • 512 MB
  • 8 x 80 GB drives (2.4K)
  • Gbps Ethernet switch (500/port)
  • 10k TB, 20K/RAIDED TB

10
Hot Swap Drives for Archive or Data Interchange
  • 35 MBps write(so can write N x 80 GB in 40
    minutes)
  • 80 GB/overnite
  • N x 3 MB/second
  • _at_ 19.95/nite

13
250
11
The Absurd Disk
  • 2.5 hr scan time (poor sequential access)
  • 1 access per second / 5 GB (VERY cold data)
  • Its a tape!

1 TB
100 MB/s
200 Kaps
12
Disk vs Tape
  • Disk
  • 80 GB
  • 35 MBps
  • 5 ms seek time
  • 3 ms rotate latency
  • 4/GB for drive 3/GB for ctlrs/cabinet
  • 4 TB/rack
  • 1 hour scan
  • Tape
  • 40 GB
  • 10 MBps
  • 10 sec pick time
  • 30-120 second seek time
  • 2/GB for media8/GB for drivelibrary
  • 10 TB/rack
  • 1 week scan

Guestimates Cern 200 TB 3480 tapes 2 col
50GB Rack 1 TB 12 drives
The price advantage of tape is narrowing, and
the performance advantage of disk is growing At
10K/TB, disk is competitive with nearline tape.
13
Its Hard to Archive a PetabyteIt takes a LONG
time to restore it.
  • At 1GBps it takes 12 days!
  • Store it in two (or more) places online (on
    disk?). A geo-plex
  • Scrub it continuously (look for errors)
  • On failure,
  • use other copy until failure repaired,
  • refresh lost copy from safe copy.
  • Can organize the two copies differently
    (e.g. one by time, one by space)

14
Next step in the Evolution
  • Disks become supercomputers
  • Controller will have 1bips, 1 GB ram, 1 GBps net
  • And a disk arm.
  • Disks will run full-blown app/web/db/os stack
  • Distributed computing
  • Processors migrate to transducers.

15
Terabyte (Petabyte) ProcessingRequires
Parallelism
  • parallelism use many little devices in parallel

16
Parallelism Must Be Automatic
  • There are thousands of MPI programmers.
  • There are hundreds-of-millions of people using
    parallel database search.
  • Parallel programming is HARD!
  • Find design patterns and automate them.
  • Data search/mining has parallel design patterns.

17
Gilders Law 3x bandwidth/year for 25 more years
  • Today
  • 10 Gbps per channel
  • 4 channels per fiber 40 Gbps
  • 32 fibers/bundle 1.2 Tbps/bundle
  • In lab 3 Tbps/fiber (400 x WDM)
  • In theory 25 Tbps per fiber
  • 1 Tbps USA 1996 WAN bisection bandwidth
  • Aggregate bandwidth doubles every 8 months!

1 fiber 25 Tbps
18
Sense of scale
300 MBps OC48 G2 Or memcpy()
  • How fat is your pipe?
  • Fattest pipe on MS campus is the WAN!

20MBps disk / ATM / OC3
90 MBps PCI
94 MBps Coast to Coast
19
Redmond/Seattle, WA
Information Sciences Institute Microsoft Qwest Uni
versity of Washington Pacific Northwest
Gigapop HSCC (high speed connectivity
consortium) DARPA
New York
Arlington, VA
San Francisco, CA
5626 km 10 hops
20
Outline
  • Technology
  • 1M/PB store everything online (twice!)
  • End-to-end high-speed networks
  • Gigabit to the desktop
  • So You can store everything,
  • Anywhere in the world
  • Online everywhere
  • Research driven by apps
  • TerraServer
  • National Virtual Astronomy Observatory.

21
Interesting Apps
  • EOS/DIS
  • TerraServer
  • Sloan Digital Sky Survey

Kilo 103 Mega 106 Giga 109 Tera 1012 today, we
are here Peta 1015 Exa 1018
22
The Challenge -- EOS/DIS
  • Antarctica is melting -- 77 of fresh water
    liberated
  • sea level rises 70 meters
  • Chico Memphis are beach-front property
  • New York, Washington, SF, LA, London, Paris
  • Lets study it! Mission to Planet Earth
  • EOS Earth Observing System (17B gt 10B)
  • 50 instruments on 10 satellites 1999-2003
  • Landsat (added later)
  • EOS DIS Data Information System
  • 3-5 MB/s raw, 30-50 MB/s processed.
  • 4 TB/day,
  • 15 PB by year 2007

23
The Process Flow
  • Data arrives and is pre-processed.
  • instrument data is calibrated,
    gridded averaged
  • Geophysical data is derived
  • Users ask for stored data OR to analyze and
    combine data.
  • Can make the pull-push split dynamically

Pull Processing
Push Processing
Other Data
24
Key Architecture Features
  • 2N data center design
  • Scaleable OR-DBMS
  • Emphasize Pull vs Push processing
  • Storage hierarchy
  • Data Pump
  • Just in time acquisition

25
2N data center design
  • duplex the archive (for fault tolerance)
  • let anyone build an extract (the N)
  • Partition data by time and by space (store 2 or 4
    ways).
  • Each partition is a free-standing
    OR-DBBMS (similar to Tandem, Teradata designs).
  • Clients and Partitions interact via standard
    protocols
  • HTTPXML,

26
Data Pump
  • Some queries require reading ALL the data (for
    reprocessing)
  • Each Data Center scans ALL the data every 2 days.
  • Data rate 10 PB/day 10 TB/node/day 120 MB/s
  • Compute on demand small jobs
  • less than 100 M disk accesses
  • less than 100 TeraOps.
  • (less than 30 minute response time)
  • For BIG JOBS scan entire 15PB database
  • Queries (and extracts) snoop this data pump.

27
Just-in-time acquisition 30
  • Hardware prices decline 20-40/year
  • So buy at last moment
  • Buy best product that day commodity
  • Depreciate over 3 years so that facility is
    fresh.
  • (after 3 years, cost is 23 of original). 60
    decline peaks at 10M

EOS DIS Disk Storage Size and Cost
assume 40 price decline/year
Data Need TB
Storage Cost M
2 PB _at_ 100M
1996
1994
1998
2000
2002
2004
2006
2008
28
Problems
  • Management (and HSM)
  • Design and Meta-data
  • Ingest
  • Data discovery, search, and analysis
  • Auto Parallelism
  • reorg-reprocess

29
What this system taught me
  • Traditional storage metrics
  • KAPS KB objects accessed per second
  • /GB Storage cost
  • New metrics
  • MAPS megabyte objects accessed per second
  • SCANS Time to scan the archive
  • Admin cost dominates (!!)
  • Auto parallelism is essential.

30
Outline
  • Technology
  • 1M/PB store everything online (twice!)
  • End-to-end high-speed networks
  • Gigabit to the desktop
  • So You can store everything,
  • Anywhere in the world
  • Online everywhere
  • Research driven by apps
  • TerraServer
  • National Virtual Astronomy Observatory.

31
Microsoft TerraServer http//TerraServer.Microso
ft.com/
  • Build a multi-TB SQL Server database
  • Data must be
  • 1 TB
  • Unencumbered
  • Interesting to everyone everywhere
  • And not offensive to anyone anywhere
  • Loaded
  • 1.5 M place names from Encarta World Atlas
  • 7 M Sq Km USGS doq (1 meter resolution)
  • 10 M sq Km USGS topos (2m)
  • 1 M Sq Km from Russian Space agency (2 m)
  • On the web (worlds largest atlas)
  • Sell images with commerce server.

32
Background
  • Earth is 500 Tera-meters square
  • USA is 10 tm2
  • 100 TM2 land in 70ºN to 70ºS
  • We have pictures of 9 of it
  • 7 tsm from USGS
  • 1 tsm from Russian Space Agency
  • Compress 51 (JPEG) to 1.5 TB.
  • Slice into 10 KB chunks (200x200 pixels)
  • Store chunks in DB
  • Navigate with
  • Encarta Atlas
  • globe
  • gazetteer
  • Someday
  • multi-spectral image
  • of everywhere
  • once a day / hour

33
TerraServer 4.0 Configuration
3 Active Database Servers
SQL\Inst1 - Topo Relief Data
SQL\Inst2 Aerial Imagery
SQL\Inst3 Aerial Imagery
Logical Volume Structure
One rack per database All volumes triple mirrored
(3x) MetaData on 15k rpm 18.2 GB drives Image
Data on 10k rpm 72.8 GB drives
2 spare volumes allocated per cluster 6
Additional 339 GB volumes to be added by year
end (2 per Db Server)
34
TerraServer 4.0 Schema
35
File System Config
  • Use StorageWorks to form 28 RAID5 sets Each
    raid set has 11 disks (16 spare drives)
  • Use NTFS to form 4 595GB NT volumes Each
    striped over 7 Raid sets on 7 controllers
  • DB is File Group of 80 20,000 MB files (1.5TB)

36
BAD OLD Load
37
Load Process
Internet Data CenterTukwila, WA
2 TBDatabase
2 TBDatabase
2 TBDatabase
Read 4 Images
Write 1
TerraScale
CorporateNetwork
Executive Briefing Center, Redmond WA
TerraCutter
ReadImageFiles
38
After a Year
TerraServer Daily Traffic Jun 22, 1998 thru June
22, 1999
30M
Sessions
  • 15 TB of data (raw) 3B records
  • 2.3 billion Hits
  • 2.0 billion DB Queries
  • 1.7 billion Images sent(2 TB of download)
  • 368 million Page Views
  • 99.93 DB Availability
  • 4rd design now Online
  • Built and operated by team of 4 people

20M
Hit
Count
Page View
DB Query
Image
10M
0
6/22/98
7/22/98
8/22/98
9/22/98
1/22/99
2/22/99
3/22/99
4/22/99
5/22/99
6/22/99
10/22/98
11/22/98
12/22/98
39
TerraServer Activity
40
TerraServer.Microsoft.NET A Web Service
Before .NET
With .NET
41
TerraServer Recent/Current Effort
  • Added USGS Topographic maps (4 TB)
  • High availability (4 node cluster with failover)
  • Integrated with Encarta Online
  • The other 25 of the US DOQs (photos)
  • Adding digital elevation maps
  • Open architecture publish SOAP interfaces.
  • Adding mult-layer maps (with UC Berkeley)
  • Geo-Spatial extension to SQL Server

42
Thank You!
43
Outline
  • Technology
  • 1M/PB store everything online (twice!)
  • End-to-end high-speed networks
  • Gigabit to the desktop
  • So You can store everything,
  • Anywhere in the world
  • Online everywhere
  • Research driven by apps
  • TerraServer
  • National Virtual Astronomy Observatory.

44
Astronomy is Changing(and so are other sciences)
  • Astronomers have a few PB
  • Doubles every 2 years.
  • Data is public after 2 years.
  • So Everyone has ½ the data
  • Some people have 5more private data
  • So, its a nearly level playing field
  • Most accessible data is public.

45
(inter) National Virtual Observatory
  • Almost all astronomy datasets will be online
  • Some are big (gtgt10 TB)
  • Total is a few Petabytes
  • Bigger datasets coming
  • Data is public
  • Scientists can mine these datasets
  • Computer Science challenge Organize these
    datasets Provide easy access to them.

46
The Sloan Digital Sky SurveySLIDES BY Alex Szlay
A project run by the Astrophysical Research
Consortium (ARC)
The University of Chicago Princeton
University The Johns Hopkins University The
University of Washington Fermi National
Accelerator Laboratory US Naval Observatory
The Japanese Participation Group The Institute
for Advanced Study SLOAN Foundation, NSF, DOE,
NASA
Goal To create a detailed multicolor map of the
Northern Sky over 5 years, with a budget of
approximately 80M Data Size 40 TB raw, 1 TB
processed
47
Features of the SDSS
Special 2.5m telescope, located at Apache Point,
NM 3 degree field of view. Zero distortion
focal plane. Two surveys in one Photometric
survey in 5 bands. Spectroscopic redshift
survey. Huge CCD Mosaic 30 CCDs 2K x
2K (imaging) 22 CCDs 2K x 400 (astrometry) Two
high resolution spectrographs 2 x 320 fibers,
with 3 arcsec diameter. R2000 resolution with
4096 pixels. Spectral coverage from 3900Ã… to
9200Ã…. Automated data reduction Over 70
man-years of development effort. (Fermilab
collaboration scientists) Very high data
volume Expect over 40 TB of raw data. About 3
TB processed catalogs. Data made available to
the public.
48
Apache Point Observatory
Located in New Mexico, near White Sands National
Monument
Special 2.5m telescope 3 degree field of
view Zero distortion focal plane Wind
screen moved separately
49
Scientific Motivation
Create the ultimate map of the Universe ? The
Cosmic Genome Project! Study the distribution of
galaxies ? What is the origin of
fluctuations? ? What is the topology of the
distribution? Measure the global properties of
the Universe ? How much dark matter is
there? Local census of the galaxy population ?
How did galaxies form? Find the most distant
objects in the Universe ? What are the highest
quasar redshifts?
50
Cosmology Primer
The Universe is expanding the galaxies move
away from us spectral lines are redshifted
v Ho r Hubbles law
The fate of the universe depends on the balance
between gravity and the expansion velocity
? density/criticalif ? lt1, expand forever
?dgt ?
Most of the mass in the Universe is dark matter,
and it may be cold (CDM)
P(k) power spectrum
The spatial distribution of galaxies is
correlated, due to small ripples in the early
Universe.
51
The Naught Problem
What are the global parameters of the
Universe? H0 the Hubble constant 55-75
km/s/Mpc ?0 the density parameter 0.25-1 ?0 the
cosmological constant 0 - 0.7 Their values are
still quite uncertain today... Goal measure
these parameters with an accuracy of a few percent
High Precision Cosmology!
52
The Cosmic Genome Project
The SDSS will create the ultimate mapof the
Universe, with much more detailthan any other
measurement before
53
Area and Size of Redshift Surveys
54
The Topology of Local Universe
Measure the Topology of the Universe
Does it consist of walls and voids
or is it randomly distributed?
55
Finding the Most Distant Objects
Intermediate and high redshift QSOs
Multicolor selection function.
Luminosity functions and spatial clustering.
High redshift QSOs (zgt5).
56
The Photometric Survey
Northern Galactic Cap 5 broad-band filters
( u', g', r', i', z )
limiting magnitudes (22.3, 23.3, 23.1, 22.3,
20.8) drift scan of 10,000 square degrees
55 sec exposure time 40 TB raw imaging
data -gt pipeline -gt 100,000,000 galaxies
50,000,000 stars calibration to 2 at
r'19.8 only done in the best seeing (20
nights/yr) pixel size is 0.4 arcsec,
astrometric precision is 60 milliarcsec Southern
Galactic Cap multiple scans (gt 30 times) of
the same stripe Continuous data rate of 8
Mbytes/sec
57
Survey Strategy
Overlapping 2.5 degree wide stripes Avoiding the
Galactic Plane (dust) Multiple exposures on the
three Southern stripes
58
The Spectroscopic Survey
Measure redshifts of objects ? distance SDSS
Redshift Survey 1 million galaxies 100,000
quasars 100,000 stars Two high throughput
spectrographs spectral range 3900-9200 Ã…. 640
spectra simultaneously. R2000
resolution. Automated reduction of spectra Very
high sampling density and completeness Objects in
other catalogs also targeted
59
First Light Images
Telescope First light May 9th 1998
Equatorial scans
60
The First Stripes
Camera 5 color imaging of gt100 square
degrees Multiple scans across the same
fields Photometric limits as expected
61
NGC 6070
62
The First Quasars
Three of the four highest redshift quasars have
been found in the first SDSS test data !
63
SDSS Data Flow
64
Data Processing Pipelines
65
SDSS Data Products
Object catalog 400 GB parameters of gt108
objects Redshift Catalog 2 GB
parameters of 106 objects Atlas Images
1.5 TB 5 color cutouts of gt109 objects
Spectra 60 GB in a one-dimensional
form 106 Derived Catalogs 60 GB -
clusters - QSO absorption lines 4x4 Pixel
All-Sky Map 1 TB heavily compressed 5 x
105
All raw data saved in a tape vault at Fermilab
66
Concept of the SDSS Archive
Science Archive (products accessible to users)
OperationalArchive (raw processed data)
67
Parallel Query Implementation
  • Getting 200MBps/node thru SQL today
  • 4 GB/s on 20 node cluster.

User Interface
Analysis Engine
Master
SX Engine
DBMS Federation
DBMS
Slave
Slave
Slave
DBMS
Slave
DBMS
DBMS
RAID
DBMS
RAID
RAID
RAID
68
Who will be using the archive?
Power Users sophisticated, with lots of
resources research is centered around the
archive data moderate number of very intensive
queries mostly statistical, large output
sizes General Astronomy Public frequent, but
casual lookup of objects/regions the archives
help their research, but not central to
it large number of small queries a lot of
cross-identification requests Wide
Public browsing a Virtual Telescope can have
large public appeal need special
packaging could be a very large number of
requests
69
How will the data be analyzed?
The data are inherently multidimensional gt
positions, colors, size, redshift Improved
classifications result in complex N-dimensional
volumes gt complex constraints, not
ranges Spatial relations will be
investigated gt nearest neighbors gt other
objects within a radius Data Mining finding the
needle in the haystack gt separate typical
from rare gt recognize patterns in the
data Output size can be prohibitively large for
intermediate files gt import output directly
into analysis tools
70
Summary
SDSS combines astronomy, physics, and computer
science Promises to fundamentally change our view
of the universe High precision cosmology Serves
as standard astronomy reference for several
decades Virtual universe can be explored by both
scientists public A new paradigm in astronomy.
71
Desiging and Mining Multi-Terabyte Astronomy
Archives The Sloan Digital Sky Survey (SDSS)
http//www.sdss.org/
  • Scan 10,000 sq. degrees (50) of northern sky.
  • 200,000,000 objects.
  • 100 dimensions.
  • 40 TB of raw data.
  • 1 TB of catalog data.

Alex S. Szalay, Peter J. Kunszt, Ani Thakar (The
Johns Hopkins University)Jim Gray, Don Slutz
(Microsoft Research)Robert J. Brunner (Calif.
Institute of Technology)
72
Astronomical Growth of Collected Data
  • Data Gathering Rate doubles every 20
    months.(Moores Law here too)
  • Several orders of magnitude more data now!
  • SDSS telescope has 120 Million CCDs
  • 55 second photometric exposure.
  • 8 MB/sec data rate.
  • 0.4 arc-sec pixel size.
  • Also Spectroscopic Survey of 1 million objects.

73
Major Changes in Astronomy
  • Visual Observation --gt Photographic Plates--gt
    Massive Scans of the Sky collecting Terabytes.
  • A Practice Scan of the SDSS Telescope Discovered
    3 of the 4 most Distant Quasars!
  • SDSS plus other Surveys will yield a Digital Sky
  • Telescope Quality Data available Online.
  • Spatial Data Mining will find new objects.
  • New research areas - Study Density Fluctuations.

74
Different Kind of Spatial Data
  • All Objects on Celestial Sphere Surface
  • Position a point by 2 spherical angles (RA, DEC).
  • Position by Cartesian x,y,z easier to search
    within 1 arc-minute.
  • Hierarchy of Spherical Trianglesfor Indexing.
  • SDSS tree is 5 levels deep 8192 triangles

75
Experiment with Relational DBMS
  • See if SQLs Good Indexing and Scanning
    Compensates for Poor Object Support.
  • Leverage Fast/Big/Cheap Commodity Hardware.
  • Ported 40 GB Sample Database (from SDSS Sample
    Scan) to SQL Server 2000
  • Building public web site and data server

76
20 Astronomy Queries
  • Implemented spatial access extension to SQL (HTM)
  • Implement 20 Astronomy Queries in SQL (see paper
    for details).
  • 15M rows 378 cols, 30 GB. Can scan it in 8
    minutes (disk IO limited).
  • Many queries run in seconds
  • Create Covering Indexes on queried columns.
  • Create Neighbors Table listing objects within 1
    arc-minute (5 neighbors on the average) for
    spatial joins.
  • Install some more disks!

77
Query to Find Gravitational Lenses
Find all objects within 1 arc-minute of each
other that have very similar colors (the color
ratios u-g, g-r, r-i are less than 0.05m)
1 arc-minute
78
SQL Query to Find Gravitational Lenses
  • select count() from sxTag T, sxTag U, neighbors
    Nwhere T.UObj_id N.UObj_id and U.UObj_id
    N.neighbor_UObj_id and N.UObj_id lt
    N.neighbor_UObj_id -- no dups and T.ugt0 and
    T.ggt0 and T.rgt0 and T.igt0 and U.ugt0 and U.ggt0
    and U.rgt0 and U.igt0 and ABS((T.u-T.g)-(U.u-U.g
    ))lt0.05 -- similar color and
    ABS((T.g-T.r)-(U.g-U.r))lt0.05 and
    ABS((T.r-T.i)-(U.r-U.i))lt0.05
  • Finds 5223 objects, executes in 6 minutes.

79
SQL Results so far.
  • Have run 17 of 20 Queries so far.
  • Most Queries are IO bound, scanning at 80MB/sec
    on 4 disks in 6 minutes (at the PCI bus limit)
  • Covering indexes reduce execution to lt 30 secs.
  • Common to get Grid Distributionsselect
    convert(int,ra30)/30.0, -- ra bucket
    convert(int,dec30)/30.0, -- dec bucket
    count() --
    bucket count from Galaxieswhere (u-g)gt1 and
    rlt21.5group by (1), (2)

80
Distribution of Galaxies
81
Outline
  • Technology
  • 1M/PB store everything online (twice!)
  • End-to-end high-speed networks
  • Gigabit to the desktop
  • So You can store everything,
  • Anywhere in the world
  • Online everywhere
  • Research driven by apps
  • TerraServer
  • National Virtual Astronomy Observatory.
Write a Comment
User Comments (0)
About PowerShow.com