Title: Features of the SDSS
1Features of the SDSS
Special 2.5m telescope, at Apache Point, NM 3
degree field of view Zero distortion focal
plane Two surveys in one Photometric survey in 5
bands - 200 million objects Spectroscopic
redshift survey - 1 million
distances Automated data reduction Over 120
man-years of development (Fermilab
collaboration scientists) Very high data
volume Expect over 40 TB of raw data About 2 TB
processed catalogs Data made available to the
public
2Data Processing Pipelines
3SDSS Data Products
Object catalog 500 GB parameters of
gt108 objects Redshift Catalog 1 GB
parameters of 106 objects Atlas Images 1500
GB 5 color cutouts of gt108 objects
Spectra 60 GB in a one-dimensional
form Derived Catalogs 20 GB clusters
QSO absorption lines 4x4 Pixel All-Sky Map
60 GB heavily compressed Corrected
Frames 15 TB
All raw data (40TB) saved at Fermilab
4Accessing the Data
- Few fixed access patterns
- one cannot build indices for all possible queries
- worst case scenario is linear scan of the whole
table - Increasingly large differences between
- Random access
- Sequential I/O
- Often much faster to scan than to seek
- Good layout of data gt more sequential I/O
- Geometric indexing partitioning in storage
- Using Objectivity/DB
- Ported to MS SQL Server (w. Jim Gray)
5SDSS in GriPhyN
- Two Tier2 Nodes (FNALJHU)
- testing framework on real data in different
scenarios - FNAL node
- massive reprocessing of images
- full regeneration of catalogs from the images (on
disk) - gravitational lensing, finer morphological
classification - Image coaddition, differencing
- JHU node
- catalog calculations, integrated with database
- tasks require lots of data, can be run in
parallel - various statistical calculations, likelihood
analyses - power spectra, correlation functions, Monte-Carlo
- Public access
- creating virtual data for NVO services
(implemented later)
6The SDSS Southern Survey
- Scanning a single stripe on the sky gt30 times
over - Coaddition gt extra depth
- Differencing gt time dimension
- Multiple ways to combine the stripes
- Rerun the pipelines with custom parameters
- Build a new object catalog
- Perform particular science analysis (lensing map)
- On the right timescale to try GriPhyN framework
7Large Scale Statistical Analysis
- Galaxy distribution has non-trivial clustering
patterns - Reflects conditions in the early universe
- Spatial statistical tools to be run on object
catalog, applying many different cuts to the data - Spatial power spectrum
- Correlation functions
- These algorithms are typically N2 or N3 with the
number objects!! - Some of the analyses will partition well
(likelihood), others will not (pair counts)
8Trends in Astronomy
- Future dominated by detector improvements
- Moores Law growth in CCD capabilities
- Gigapixel arrays on the horizon
- Improvements in computing and storage will
track growth in data volume - Investment in software is critical, and
growing
Total area of 3m telescopes in the world in m2,
total number of CCD pixels in Megapix, as a
function of time. Growth over 25 years is a
factor of 30 in glass, 3000 in pixels.
9VO- The challenges
- Large number of new surveys
- multi-TB in size, 100 million objects or more
- individual archives planned, or under way
- Multi-wavelength view of the sky
- more than 13 wavelength coverage in 5 years
- Size of the archived data
- 40,000 square degrees is 2 Trillion pixels
- One band 4 Terabytes
- Multi-wavelength 10-100 Terabytes
- Time dimension 10 Petabytes
- Current techniques inadequate
- Scalable hardware/networking requirements
- Transition to the new astronomy
MACHO 2MASS DENIS SDSS DPOSS GSC-II VISTA COBE
MAP NVSS FIRST GALEX ROSAT OGLE, ...