Title: Mining the Sky The World-Wide Telescope
1Mining the SkyThe World-Wide Telescope
- Jim Gray
- Microsoft Research
- Collaborating with
- Alex Szalay, Peter Kunszt, Ani Thakar _at_ JHU
- Robert Brunner, Roy Williams _at_ Caltech
- George Djorgovski, Julian Bunn _at_ Caltech
2Outline
- The revolution in Computational Science
- The Virtual Observatory Concept
- World-Wide Telescope
- The Sloan Digital Sky Survey DB technology
3Computational Science The Third Science Branch
is Evolving
- In the beginning science was empirical.
- Then theoretical branches evolved.
- Now, we have computational branches.
- Has primarily been simulation
- Growth area data analysis/visualizationof
peta-scale instrument data. - Analysis Visualization tools
- Help both simulation and instruments.
- Are primitive today.
4Computational Science
- Traditional Empirical Science
- Scientist gathers data by direct observation
- Scientist analyzes data
- Computational Science
- Data captured by instrumentsOr data generated by
simulator - Processed by software
- Placed in a database / files
- Scientist analyzes database / files
5Exploring Parameter SpaceManual or Automatic
Data Mining
- There is LOTS of data
- people cannot examine most of it.
- Need computers to do analysis.
- Manual or Automatic Exploration
- Manual person suggests hypothesis, computer
checks hypothesis - Automatic Computer suggests hypothesis person
evaluates significance - Given an arbitrary parameter space
- Data Clusters
- Points between Data Clusters
- Isolated Data Clusters
- Isolated Data Groups
- Holes in Data Clusters
- Isolated Points
Nichol et al. 2001 Slide courtesy of and adapted
fromRobert Brunner _at_ CalTech.
6Challenge to Data Miners Rediscover Astronomy
- Astronomy needs deep understanding of physics.
- But, some was discovered as variable correlation
then explained with physics. - Famous example Hertzsprung-Russell Diagramstar
luminosity vs color (temperature) - Challenge 1 (the student test) How much of
astronomy can data mining discover? - Challenge 2 (the Turing test)Can data mining
discover NEW correlations?
7Whats needed?(not drawn to scale)
8Data MiningScience vs Commerce
- Data in files FTP a local copy /subset.ASCII or
Binary. - Each scientist builds own analysis toolkit
- Analysis is tcl script of toolkit on local data.
- Some simple visualization tools x vs y
- Data in a database
- Standard reports for standard things.
- Report writers for non-standard things
- GUI tools to explore data.
- Decision trees
- Clustering
- Anomaly finders
9Butsome science is hitting a wallFTP and GREP
are not adequate
- You can GREP 1 MB in a second
- You can GREP 1 GB in a minute
- You can GREP 1 TB in 2 days
- You can GREP 1 PB in 3 years.
- Oh!, and 1PB 10,000 disks
- At some point you need indices to limit
search parallel data search and analysis - This is where databases can help
- You can FTP 1 MB in 1 sec
- You can FTP 1 GB / min ( 1 /GB)
- 2 days and 1K
- 3 years and 1M
10Why is Science Behind?
- Inertia
- Science started earlier (Fortran,)
- Science culture works (no big incentive to
change) - Energy
- Commerce is about profit better answers
translate to better profits - So companies to build tools.
- Impedance Mismatch
- Databases dont accommodate analysis packages
- Scientists analysis needs to be inside the dbms.
11Goal Easy Data Publication Access
- Augment FTP with data query Return
intelligent data subsets - Make it easy to
- Publish Record structured data
- Find
- Find data anywhere in the network
- Get the subset you need
- Explore datasets interactively
- Realistic goal
- Make it as easy as publishing/reading web sites
today. -
12Web Services The Key?
Your program
Web Server
- Web SERVER
- Given a url parameters
- Returns a web page (often dynamic)
- Web SERVICE
- Given a XML document (soap msg)
- Returns an XML document
- Tools make this look like an RPC.
- F(x,y,z) returns (u, v, w)
- Distributed objects for the web.
- naming, discovery, security,..
- Internet-scale distributed computing
http
Web page
Your program
Web Service
soap
Data In your address space
objectin xml
13Data Federations of Web Services
- Massive datasets live near their owners
- Near the instruments software pipeline
- Near the applications
- Near data knowledge and curation
- Super Computer centers become Super Data Centers
- Each Archive publishes a web service
- Schema documents the data
- Methods on objects (queries)
- Scientists get personalized extracts
- Uniform access to multiple Archives
- A common global schema
Federation
14Grid and Web Services Synergy
- I believe the Grid will have many web services
- IETF standards Provide
- Naming
- Authorization / Security / Privacy
- Distributed Objects
- Discovery, Definition, Invocation, Object Model
- Higher level services workflow, transactions,
DB,.. - Synergy commercial Internet Grid tools
15Outline
- The revolution in Computational Science
- The Virtual Observatory Concept
- World-Wide Telescope
- The Sloan Digital Sky Survey DB technology
16Why Astronomy Data?
- It has no commercial value
- No privacy concerns
- Can freely share results with others
- Great for experimenting with algorithms
- It is real and well documented
- High-dimensional data (with confidence intervals)
- Spatial data
- Temporal data
- Many different instruments from many different
places and many different times - Federation is a goal
- The questions are interesting
- How did the universe form?
- There is a lot of it (petabytes)
17Time and Spectral DimensionsThe Multiwavelength
Crab Nebulae
Crab star 1053 AD
X-ray, optical, infrared, and radio views of
the nearby Crab Nebula, which is now in a state
of chaotic expansion after a supernova explosion
first sighted in 1054 A.D. by Chinese Astronomers.
Slide courtesy of Robert Brunner _at_ CalTech.
18Even in optical images are very different
Optical Near-Infrared Galaxy Image Mosaics
BJ RF IN J H K
One object in 6 different color bands
Slide courtesy of Robert Brunner _at_ CalTech.
19Astronomy Data Growth
- In the old days astronomers took photos.
- Starting in the 1960s they began to digitize.
- New instruments are digital (100s of GB/nite)
- Detectors are following Moores law.
- Data avalanche double every 2 years
Total area of 3m telescopes in the world in m2,
total number of CCD pixels in megapixel, as a
function of time. Growth over 25 years is a
factor of 30 in glass, 3000 in pixels.
3 M telescopes area m2
Courtesy of Alex Szalay
CCD area mpixels
20Universal Access to Astronomy Data
- Astronomers have a few Petabytes now.
- 1 pixel (byte) / sq arc second 4TB
- Multi-spectral, temporal, ? 1PB
- They mine it looking for new (kinds of) objects
or more of interesting ones (quasars),
density variations in 400-D space correlations
in 400-D space - Data doubles every 2 years.
- Data is public after 2 years.
- So, 50 of the data is public.
- Some have private access to 5 more data.
- So 50 vs 55 access for everyone
21The Age of Mega-Surveys
- Large number of new surveys
- multi-TB in size, 100 million objects or more
- Data publication an integral part of the survey
- Software bill a major cost in the survey
- The next generation mega-surveys are different
- top-down design
- large sky coverage
- sound statistical plans
- well controlled/documented data processing
- Each survey has a publication plan
- Federating these archives
- ? Virtual Observatory
MACHO 2MASS DENIS SDSS PRIME DPOSS GSC-II COBE
MAP NVSS FIRST GALEX ROSAT OGLE ...
Slide courtesy of Alex Szalay, modified by Jim
22Data Publishing and Access
- But..
- How do I get at that 50 of the data?
- Astronomers have culture of publishing.
- FITS files and many tools.http//fits.gsfc.nasa.g
ov/fits_home.html - Encouraged by NASA.
- FTP what you need.
- But, data details are hard to document.
Astronomers want to do it but it is VERY
hard.(What programs where used? What were the
processing steps? How were errors treated?) - And by the way, few astronomers have a spare
petabyte of storage in their pocket. - THESIS Challenging problems are publishing
data providing good query visualization tools
23Virtual Observatoryhttp//www.astro.caltech.edu/n
voconf/http//www.voforum.org/
- Premise Most data is (or could be online)
- So, the Internet is the worlds best telescope
- It has data on every part of the sky
- In every measured spectral band optical, x-ray,
radio.. - As deep as the best instruments (2 years ago).
- It is up when you are up.The seeing is always
great (no working at night, no clouds no moons
no..). - Its a smart telescope links objects and
data to literature on them.
24Demo of VirtualSky
- Roy Williams _at_ CaltechPalomar Data with links to
NED. - Shows multiple themes, shows link to other sites
(NED, VizeR, Sinbad, ) - http//virtualsky.org/servlet/Page?T3S21P1X
0Y0W4F1 - And
- NED _at_ http//nedwww.ipac.caltech.edu/index.html
25Demo of Sky Server
- Alex Szalay of Johns Hopkins built SkyServer
(based on TerraServer design). - http//skyserver.sdss.org/
26Virtual Observatory Challenges
- Size multi-Petabyte
- 40,000 square degrees is 2 Trillion pixels
- One band (at 1 sq arcsec) 4 Terabytes
- Multi-wavelength 10-100
Terabytes - Time dimension gtgt 10 Petabytes
- Need auto parallelism tools
- Unsolved MetaData problem
- Hard to publish data programs
- How to federate Archives
- Hard to find/understand data programs
- Current tools inadequate
- new analysis visualization tools
- Data Federation is problematic
- Transition to the new astronomy
- Sociological issues
27Steps to Virtual Observatory Prototype
- Get SDSS and Palomar data online
- Alex Szalay, Jan Vandenberg, Ani Thacker.
- Roy Williams, Robert Brunner, Julian Bunn,
- Do local queries and crossID matches to expose
- Schema, Units,
- Dataset problems
- Typical use scenarios.
- Define a set of Astronomy Objects and methods.
- Based on UDDI, WSDL, SOAP.
- Started this with TerraService http//TerraService
.net/ ideas. - Working with Caltech (Brunner, Williams,
Djorgovski, Bunn) and JHU (Szalay et al) on this - Each archive is a web service
- Move crossID app to web-service base
28Virtual Observatory and Education
- The Virtual Observatory can be used to
- Teach astronomy make it interactive,
demonstrate ideas and phenomena - Teach computational science skills
29Outline
- The revolution in Computational Science
- The Virtual Observatory Concept
- World-Wide Telescope
- The Sloan Digital Sky Survey DB technology
30Sloan Digital Sky Survey http//www.sdss.org/
- For the last 12 years a group of astronomers has
been building a telescope (with funding from
Sloan Foundation, NSF, and a dozen
universities). 90M. - Y2000 engineer, calibrate, commission now
public data. - 5 of the survey, 600 sq degrees, 15 M objects
60GB, ½ TB raw. - This data includes most of the known high z
quasars. - It has a lot of science left in it but.
- New the data is arriving
- 250GB/nite (20 nights per year) 5TB/y.
- 100 M stars, 100 M galaxies, 1 M spectra.
- http//www.sdss.org/ and http//www.sdss.jhu.edu/
31Two kinds of SDSS data in an SQL DB(objects and
images all in DB)
- 15M Photo Objects 400 attributes
50K Spectra with 30 lines/ spectrum
32Spatial Data Access SQL extension(Szalay,
Kunszt, Brunner) http//www.sdss.jhu.edu/htm
- Added Hierarchical Triangular Mesh (HTM)
table-valued function for spatial joins. - Every object has a 20-deep Mesh ID.
- Given a spatial definitionRoutine returns up to
10 covering triangles. - Spatial query is then up to 10 range queries.
- Very fast 10,000 triangles / second / cpu.
33Data Loading
- JavaScript of DB loader (DTS)
- Web ops interface workflow system
- Data ingest and scrubbing is major effort
- Test data quality
- Chase down bugs / inconsistencies
- Other major task is data documentation
- Explain the data
- Explain the schema and functions.
- If we supported users,
34Scenario Design
- Astronomers proposed 20 questions
- Typical of things they want to do
- Each would require a week of programming in tcl /
C/ FTP - Goal, make it easy to answer questions
- DB and tools design motivated by this goal
- Implementd utility prodecures
- JHU Built GUI for Linux clients
35The 20 Queries
- Q11 Find all elliptical galaxies with spectra
that have an anomalous emission line. - Q12 Create a grided count of galaxies with u-ggt1
and rlt21.5 over 60ltdeclinationlt70, and 200ltright
ascensionlt210, on a grid of 2, and create a map
of masks over the same grid. - Q13 Create a count of galaxies for each of the
HTM triangles which satisfy a certain color cut,
like 0.7u-0.5g-0.2ilt1.25 rlt21.75, output it in
a form adequate for visualization. - Q14 Find stars with multiple measurements and
have magnitude variations gt0.1. Scan for stars
that have a secondary object (observed at a
different time) and compare their magnitudes. - Q15 Provide a list of moving objects consistent
with an asteroid. - Q16 Find all objects similar to the colors of a
quasar at 5.5ltredshiftlt6.5. - Q17 Find binary stars where at least one of them
has the colors of a white dwarf. - Q18 Find all objects within 30 arcseconds of one
another that have very similar colors that is
where the color ratios u-g, g-r, r-I are less
than 0.05m. - Q19 Find quasars with a broad absorption line in
their spectra and at least one galaxy within 10
arcseconds. Return both the quasars and the
galaxies. - Q20 For each galaxy in the BCG data set
(brightest color galaxy), in 160ltright
ascensionlt170, -25ltdeclinationlt35 count of
galaxies within 30"of it that have a photoz
within 0.05 of that galaxy.
- Q1 Find all galaxies without unsaturated pixels
within 1' of a given point of ra75.327,
dec21.023 - Q2 Find all galaxies with blue surface
brightness between and 23 and 25 mag per square
arcseconds, and -10ltsuper galactic latitude (sgb)
lt10, and declination less than zero. - Q3 Find all galaxies brighter than magnitude 22,
where the local extinction is gt0.75. - Q4 Find galaxies with an isophotal surface
brightness (SB) larger than 24 in the red band,
with an ellipticitygt0.5, and with the major axis
of the ellipse having a declination of between
30 and 60arc seconds. - Q5 Find all galaxies with a deVaucouleours
profile (r¼ falloff of intensity on disk) and the
photometric colors consistent with an elliptical
galaxy. The deVaucouleours profile - Q6 Find galaxies that are blended with a star,
output the deblended galaxy magnitudes. - Q7 Provide a list of star-like objects that are
1 rare. - Q8 Find all objects with unclassified spectra.
- Q9 Find quasars with a line width gt2000 km/s and
2.5ltredshiftlt2.7. - Q10 Find galaxies with spectra that have an
equivalent width in Ha gt40Ã… (Ha is the main
hydrogen spectral line.)
Also some good queries at http//www.sdss.jhu.edu
/ScienceArchive/sxqt/sxQT/Example_Queries.html
36An easy oneQ7 Provide a list of star-like
objects that are 1 rare.
- Found 14,681 buckets, first 140 buckets have
99 time 62 seconds - CPU bound 226 k records/second (2 cpu)
250 KB/s.
Select cast((u-g) as int) as ug, cast((g-r) as
int) as gr, cast((r-i) as int) as ri,
cast((i-z) as int) as iz, count()
as Population from stars group by cast((u-g) as
int), cast((g-r) as int), cast((r-i) as int),
cast((i-z) as int) order by count()
37An Easy OneQ15 Provide a list of moving objects
consistent with an asteroid.
- Sounds hard but there are 5 pictures of the
object at 5 different times (color filters) and
so can see velocity. - Image pipeline computes velocity.
- Computing it from the 5 color x,y would also be
fast - Finds 1,303 objects in 3 minutes,
140MBps. (could go 2x faster with more disks)
select objId, dbo.fGetUrlEq(ra,dec) as url
--return object ID url sqrt(power(rowv,2)powe
r(colv,2)) as velocity from photoObj --
check each object. where (power(rowv,2)
power(colv, 2)) -- square of velocity
between 50 and 1000 -- huge values error
38Q15 Fast Moving Objects
- Find near earth asteroids
-
SELECT r.objID as rId, g.objId as gId,
dbo.fGetUrlEq(g.ra, g.dec) as url FROM PhotoObj
r, PhotoObj g WHERE r.run g.run and
r.camcolg.camcol and abs(g.field-r.field)lt2
-- nearby -- the red selection criteria and
((power(r.q_r,2) power(r.u_r,2)) gt 0.111111
) and r.fiberMag_r between 6 and 22 and
r.fiberMag_r lt r.fiberMag_g and r.fiberMag_r lt
r.fiberMag_i and r.parentID0 and r.fiberMag_r lt
r.fiberMag_u and r.fiberMag_r lt
r.fiberMag_z and r.isoA_r/r.isoB_r gt 1.5 and
r.isoA_rgt2.0 -- the green selection
criteria and ((power(g.q_g,2) power(g.u_g,2))
gt 0.111111 ) and g.fiberMag_g between 6 and 22
and g.fiberMag_g lt g.fiberMag_r and
g.fiberMag_g lt g.fiberMag_i and g.fiberMag_g lt
g.fiberMag_u and g.fiberMag_g lt g.fiberMag_z and
g.parentID0 and g.isoA_g/g.isoB_g gt 1.5 and
g.isoA_g gt 2.0 -- the matchup of the pair and
sqrt(power(r.cx -g.cx,2) power(r.cy-g.cy,2)power
(r.cz-g.cz,2))(10800/PI())lt 4.0 and
abs(r.fiberMag_r-g.fiberMag_g)lt 2.0
39(No Transcript)
40(No Transcript)
41(No Transcript)
42Performance (on current SDSS data)
- Run times on 15k COMPAQ Server (2 cpu, 1 GB ,
8 disk) - Some take 10 minutes
- Some take 1 minute
- Median 22 sec.
- Ghz processors are fast!
- (10 mips/IO, 200 ins/byte)
- 2.5 m rec/s/cpu
1000 IO/cpu sec 70 MB IO/cpu sec
43Summary of Queries
- All have fairly short SQL programs -- a
substantial advance over (tcl, C) - Many are sequential one-pass and two-pass over
data - Covering indices make scans run fast
- Table valued functions are wonderful but
limitations are painful. - Counting, Binning, Histograms VERY common
- Spatial indices helpful,
- Materialized view (Neighbors) helpful.
44Call to Action
- If you do data visualization we need you(and we
know it). - If you do databaseshere is some data you can
practice on. - If you do distributed systemshere is a
federation you can practice on. - If you do data mininghere are datasets to test
your algorithms. - If you do astronomy educational outreachhere is
a tool for you. - The astronomers are very good, and very smart,
and a pleasure to work with, and the questions
are cosmic, so
45 46HTM and SQL
- Spatial spec in http//www.sdss.jhu.edu/htm/
- List of triangles out (about 10-20 range queries)
- Table valued function, then geometry rejects
false positives
Use SkyServerV3 GO -- show an HTM
ID select dbo.fHTM_To_String(dbo.fHTM_Lookup('J200
0 20 185 0')) Go -- show triangles covering a
circle select dbo.fHTM_To_String(HTMIDstart) as
start, dbo.fHTM_To_String(HTMIDend) as stop from
dbo.fHTM_Cover('CIRCLE J2000 12 185 0 5 ')
GO -- Show the spatial join declare _at_shift
real set _at_shift CONVERT(int,POWER(4.,20-12))
-- 4 22 and 2 bits per htm level select
ObjID from PhotoObj as P, dbo.fHTM_Cover('CIR
CLE J2000 12 185 0 1 ') as C where P.htmID
between C.HTMIDstart_at_shift and
C.HTMIDend_at_shift GO -- show a user-level
function. select ObjID from dbo.fGetNearbyObjEq(18
5,0,1)
47A Hard One Q14 Find stars with multiple
measurements that have magnitude variations
gt0.1.
- This should work, but SQL Server does not allow
table values to be piped to table-valued
functions.
- This should work, but SQL Server does not allow
table values to be piped to table-valued
functions.
48A Hard one Second TryQ14 Find stars with
multiple measurements that have magnitude
variations gt0.1.
- Write a program with a cursor, ran for 2 days
--------------------------------------------------
----------------------------- -- Table-valued
function that returns the binary stars within a
certain radius -- of another (in arc-minutes)
(typically 5 arc seconds). -- Returns the ID
pairs and the distance between them (in
arcseconds). create function BinaryStars(_at_MaxDista
nceArcMins float) returns _at_BinaryCandidatesTable
table( S1_object_ID bigint not null, -- Star
1 S2_object_ID bigint not null, -- Star
2 distance_arcSec float) -- distance between
them as begin declare _at_star_ID bigint,
_at_binary_ID bigint-- Star's ID and binary ID
declare _at_ra float, _at_dec float -- Star's
position declare _at_u float, _at_g float, _at_r float,
_at_i float,_at_z float -- Star's colors Â
----------------Open a cursor over stars and get
position and colors declare star_cursor cursor
for select object_ID, ra, dec, u, g, r, i,
z from Stars open star_cursor  while
(11) -- for each star begin -- get its
attribues fetch next from star_cursor into
_at_star_ID, _at_ra, _at_dec, _at_u, _at_g, _at_r, _at_i, _at_z if
(_at__at_fetch_status -1) break -- end if no more
stars insert into _at_BinaryCandidatesTable --
insert its binaries select _at_star_ID,
S1.object_ID, -- return stars pairs
sqrt(N.DotProd)/PI()10800 -- and distance in
arc-seconds from getNearbyObjEq(_at_ra, _at_dec,
-- Find objects nearby S. _at_MaxDistanceArcMins)
as N, -- call them N. Stars as S1 --
S1 gets N's color values where _at_star_ID lt
N.Object_ID -- S1 different from S and
N.objType dbo.PhotoType('Star') -- S1 is a
star and N.object_ID S1.object_ID -- join
stars to get colors of S1N and
(abs(_at_u-S1.u) gt 0.1 -- one of the colors is
different. or abs(_at_g-S1.g) gt 0.1 or
abs(_at_r-S1.r) gt 0.1 or abs(_at_i-S1.i) gt 0.1
or abs(_at_z-S1.z) gt 0.1 ) end -- end
of loop over all stars -------------- Looped
over all stars, close cursor and exit. close
star_cursor -- deallocate star_cursor
return -- return table end -- end of
BinaryStars GO select from dbo.BinaryStars(.05)
49A Hard one Third TryQ14 Find stars with
multiple measurements that have magnitude
variations gt0.1.
- Use pre-computed neighbors table.
- Ran in 2 minutes, found 48k pairs.
-- Plan 2 Use
the precomputed neighbors table select top 100
S.object_ID, S1.object_ID, -- return star pairs
and distance str(N.Distance_mins 60,6,1) as
DistArcSec from Star S, -- S is a
star Neighbors N, -- N within 3 arcsec (10
pixels) of S. Star S1 -- S1 N has the
color attibutes where S.Object_ID
N.Object_ID -- connect S and N. and
S.Object_ID lt N.Neighbor_Object_ID -- S1
different from S and N.Neighbor_objType
dbo.fPhotoType('Star')-- S1 is a star (an
optimization) and N.Distance_mins lt .05 --
the 3 arcsecond test and N.Neighbor_object_ID
S1.Object_ID -- N S1 and (
abs(S.u-S1.u) gt 0.1 -- one of the colors is
different. or abs(S.g-S1.g) gt 0.1 or
abs(S.r-S1.r) gt 0.1 or abs(S.i-S1.i) gt 0.1 or
abs(S.z-S1.z) gt 0.1 ) -- Found 48,425 pairs
(out of 4.4 m stars) in 121 sec.
50The Pain of Going Outside SQL(its fortunate that
all the queries are single statements)
- Use a cursor
- No cpu parallelism
- CPU bound
- 6 MBps, 2.7 k rps
- 5,450 seconds (10x slower)
- Count parent objects
- 503 seconds for 14.7 M objects in 33.3 GB
- 66 MBps
- IO bound (30 of one cpu)
- 100 k records/cpu sec
declare _at_count int declare _at_sum int set _at_sum
0 declare PhotoCursor cursor for select nChild
from sxPhotoObj open PhotoCursor while (11)
begin fetch next from PhotoCursor into
_at_count if (_at__at_fetch_status -1) break set
_at_sum _at_sum _at_count end close
PhotoCursor deallocate PhotoCursor print 'Sum
is 'cast(_at_sum as varchar(12))
select count() from sxPhotoObj where nChild
gt 0
51Reflections on the 20 Queries
- Data loading/scrubbing is labor intensive
tedious - AUTOMATE!!!
- This is 5 of the data, and some queries take 10
minutes. - But this is not tuned (disk bound).
- All queries benefit from parallelism (both disk
and cpu)(if you can state the query inside SQL). - Parallel database machines will do well on this
- Hash machines
- Data pumps
- See paper in word or pdf on my web site.
- Conclusion SQL answered the questions.Once you
get the answers, you need visualization
52Astronomy Data Characteristics
- Lots of it (petabytes)
- Hundreds of dimensions per object
- Cross-correlation is challenging because
- Multi-resolution
- Time varying
- Data is dirty (cosmic rays, airplanes)
53SkyServer as a WebServerWSDLSOAPjust add
details ?
- Archive ss new VOService(SkyServer)
- Attributes A ss.GetObjects(ra,dec,radius)
-
- ?? What are the objects (attributes)?
- ?? What are the methods (GetObjects()...)?
- ?? Is the query language SQL or Xquery or what?
54SDSS what I have been doing
- Work with Alex Szalay, Don Slutz, and others to
define 20 canonical queries and 10 visualization
tasks. - Working with Alex Szalay on building Sky Server
and making data it public (send out 80GB
SQL DBs)
55What Next?(after the data online, after the web
servers)
- How to federate the Archives to make a VO?
- Send XML a non-answer equivalent to send
Unicode - Bytes is the wrong abstractionPublish Methods
on Objects.
56Survey Cross-Identification
- Billions of Sources
- High Source Densities
- Multi-Wavelength Radio to g-Ray
- All Sky - Thousands of Sq. Degrees
- Computational Challenge
- Probabilistic Associations
- Optimized Likelihood Ratios
- A Priori Astrophysical Knowledge Important
- Secondary Parameters
- Temporal Variability
- Dynamic Static Associations
- User-Defined Cross-Identification Algorithms
Optical-Infrared-Radio Quasar-Environment Survey
Radio Survey Cross-Identification Steep Spectrum
Sources
Optical-Infrared-X-Ray Serendipitous Chandra
Identification
Slide courtesy of Robert Brunner _at_ CalTech.
57Data Federation A Computational Challenge
- 2MASS vs. DPOSS Cross-identification
- 2MASS J lt 15
- DPOSS IN lt 18