The Information Avalanche: Reducing Information Overload - PowerPoint PPT Presentation

1 / 77
About This Presentation
Title:

The Information Avalanche: Reducing Information Overload

Description:

The Information Avalanche: Reducing Information Overload – PowerPoint PPT presentation

Number of Views:90
Avg rating:3.0/5.0
Slides: 78
Provided by: jimg178
Category:

less

Transcript and Presenter's Notes

Title: The Information Avalanche: Reducing Information Overload


1
The Information AvalancheReducing Information
Overload
  • Jim Gray
  • Microsoft Research
  • Onassis Foundation Science Lecture Series
  • http//www.forth.gr/onassis/lectures/2002-07-15/in
    dex.html
  • Heraklion, Crete, Greece, 15-19 July 2002

2
Thesis
  • Most new information is digital(and old
    information is being digitized)
  • A Computer Science Grand Challenge
  • Capture
  • Organize
  • Summarize
  • Visualize
  • This information
  • Optimize Human Attention as a resource.
  • Improve information quality

3
Information Avalanche
  • The Situation a census of the data
  • We can record everything
  • Everything is a LOT!
  • The Good news
  • Changes science, education, medicine,
    entertainment,.
  • Shrinks time and space
  • Can augment human intelligence
  • The Bad News
  • The end of privacy
  • Cyber Crime / Cyber Terrorism
  • Monoculture
  • The Technical Challenges
  • Amplify human intellect
  • Organize, summarize and prioritize information
  • Make programming easy.

4
How much information is there?
Yotta Zetta Exa Peta Tera Giga Mega Kilo
  • Soon everything can be recorded and indexed
  • Most bytes will never be seen by humans.
  • Data summarization, trend detection anomaly
    detection are key technologies
  • See Mike Lesk How much information is there
    http//www.lesk.com/mlesk/ksg97/ksg.html
  • See Lyman Varian
  • How much information
  • http//www.sims.berkeley.edu/research/projects/how
    -much-info/

Everything! Recorded
All Books MultiMedia
All LoC books (words)
.Movie
A Photo
A Book
24 Yecto, 21 zepto, 18 atto, 15 femto, 12 pico, 9
nano, 6 micro, 3 milli
5
Information CensusLesk Varian Leyman
  • 10 Exabytes
  • 90 digital
  • gt 55 personal
  • Print .003 of bytes5TB/y, but text has lowest
    entropy
  • Email is (10 Bmpd) 4PB/y and is 20 text
    (estimate by Gray)
  • WWW is 50TBdeep web 50 PB
  • Growth 50/y

6
93
7
Storage capacity beating Moores law
  • ImprovementsCapacity 60/yBandwidth 40/yAcc
    ess time 16/y
  • 1000 /TB today
  • 100 /TB in 2007

8
Disk Storage Cheaper than Paper
  • File Cabinet cabinet (4 drawer) 250 paper
    (24,000 sheets) 250 space (2x3 _at_
    10/ft2) 180 total 700 0.03 /sheet
  • Disk disk (160 GB ) 200 ASCII
    500 m pages 2e-7 /sheet (10,000x cheaper)
  • Image 1 m photos
    3e-4 /photo (100x cheaper)
  • Store everything on disk

9
Why Put Everything in Cyberspace?
Low rent min /byte Shrinks time now or
later Shrinks space here or there Automate
processing knowbots
Point-to-Point OR Broadcast
Immediate OR Time Delayed
Locate Process Analyze Summarize
10
Storage trends
  • Right now, its affordable to buy 100 GB/year
  • In 5 years you can afford to buy
    1TB/year!(assuming storage doubles every 18
    months)

11
Trying to fill a terabyte in a year
12
MemexAs We May Think, Vannevar Bush, 1945
  • A memex is a device in which an individual
    stores all his books, records, and
    communications, and which is mechanized so that
    it may be consulted with exceeding speed and
    flexibility
  • yet if the user inserted 5000 pages of material
    a day it would take him hundreds of years to fill
    the repository, so that he can be profligate and
    enter material freely

13
Gordon Bells MainBrainDigitize EverythingA
BIG shoebox?
  • Scans 20 k pages tiff_at_ 300 dpi 1 GB
  • Music 2 k tacks 7 GB
  • Photos 13 k images 2 GB
  • Video 10 hrs 3 GB
  • Docs 3 k (ppt, word,..) 2 GB
  • Mail 100 k messages 3 GB
  • 18 GB

14
Gary Starkweather
  • Scan EVERYTHING
  • 400 dpi TIFF
  • 70k pages 14GB
  • OCR all scans (98 recognition ocr accuracy)
  • All indexed (5 second access to anything)
  • All on his laptop.

15
Access!
16
50 personal, What about the other 50
  • Business
  • Wall Mart online 1PB and growing.
  • Paradox most transaction systems have mere
    PBs.
  • Have to go to image/data monitoring for big data
  • Government
  • Online government is big thrust (cheaper,
    better,)
  • Science

17
Instruments CERN LHCPeta Bytes per Year
  • Looking for the Higgs Particle
  • Sensors 1000 GB/s (1TB/s)
  • Events 75 GB/s
  • Filtered 5 GB/s
  • Reduced 0.1 GB/s 2 PB/y
  • Data pyramid 100GB 1TB 100TB 1PB 10PB

18
LHC Requirements (2005- )
  • 1E9 events pa _at_ 1MB/ev 1PB/year/expt
  • Reconstructed 100TB/recon/year/expt
  • Send to Tier1 Regional Centres
  • gt 400TB/year to RAL?
  • Keep one set derivatives on disk
  • and rest on tape
  • But UK plans a Tier1 clone
  • Many data clones

Source John Gordon IT Department, CLRC/RAL CUF
Meeting, October 2000
19
Science Data VolumeESO/STECF Science Archive
  • 100 TB archive
  • Similar at Hubble, Keck, SDSS,
  • 1PB aggregate

20
Data Pipeline NASA
  • Level 0 raw data data stream
  • Level 1 calibrated data measured values
  • Level 1A calibrated normalized
    flux/magnitude/
  • Level 2 derived data metrics vegetation index
  • Data volume
  • 0 1 1A ltlt 2
  • Level 2 gtgt level 1 because
  • MANY data products
  • Must keep all published
  • data Editions (versions)

EOSDIS Core System Information for Scientists,
http//observer.gsfc.nasa.gov/sec3/ProductLevels.
html
21
TerraServer http//TerraService.net/
  • 3 x 2 TB databases
  • 18TB disk tri-plexed (6TB)
  • 3 1 Cluster
  • 99.96 uptime
  • 1B page views5B DB queries
  • Now a .NET web service

22
Image Data
USGS Topo Maps
USGS Aerial photos DOQ
  • All in the database 200x200 pixel tiles
    compressed
  • Spatial access z-Tranform Btree

Encarta Virtual Globe
1 Km resolution
100 World Coverage
23
Hardware
8 Compaq DL360 Photon Web Servers
One SQL database per rack Each rack contains 4.5
tb 261 total drives / 13.7 TB total
Fiber SAN Switches
Meta Data Stored on 101 GB Fast, Small
Disks(18 x 18.2 GB)
SQL\Inst1
Imagery Data Stored on 4 339 GB Slow, Big
Disks (15 x 73.8 GB)
SQL\Inst2
SQL\Inst3
To Add 90 72.8 GB Disks in Feb 2001 to create 18
TB SAN
Spare
4 Compaq ProLiant 8500 Db Servers
24
TerraServer Lessons Learned
  • Hardware is 5 9s (with clustering)
  • Software is 5 9s (with clustering)
  • Admin is 4 9s (offline maintenance)
  • Network is 3 9s (mistakes, environment)
  • Simple designs are best
  • 10 TB DB is management limit1 PB 100 x 10 TB
    DBthis is 100x better than 5 years ago.
  • Minimize use of tape
  • Backup to disk (snapshots)
  • Portable disk TBs

25
Sensor Applications
  • Earth Observation
  • 15 PB by 2007
  • Medical Images Information Health Monitoring
  • Potential 1 GB/patient/y ? 1 EB/y
  • Video Monitoring
  • 1E8 video cameras _at_ 1E5 MBps ? 10TB/s ? 100
    EB/y ? filtered???
  • Airplane Engines
  • 1 GB sensor data/flight,
  • 100,000 engine hours/day
  • 30PB/y
  • Smart Dust ?? EB/y

http//robotics.eecs.berkeley.edu/pister/SmartDus
t/
http//www-bsac.eecs.berkeley.edu/shollar/macro_m
otes/macromotes.html
26
What do they do with the databusiness,
government, scienceMore later in talk
  • Look for anomalies
  • 1, 2, 1, 2, 1, 1, 1, 2, -5, 1, 0, 2,
  • Look for trends and patterns
  • 1, 2, 3, 4, 5,
  • Look for correlations
  • ln(x) ln(y) c ln(z)
  • Look at summaries then drill down to details
  • LOTS of histograms

27
Premise Grid Computing
  • Store exabytes once or twice (for redundancy)
  • Access them from anywhere
  • Implies huge archive/data centers
  • Supercomputer centers become super data centers
  • Examples Google, Yahoo!, Hotmail,CERN,
    Fermilab, SDSC

28
Bandwidth 3x bandwidth/year for 25 more years
  • Today
  • 40 Gbps per channel (?)
  • 12 channels per fiber (wdm) 500 Gbps
  • 32 fibers/bundle 16 Tbps/bundle
  • In lab 3 Tbps/fiber (400 x WDM)
  • In theory 25 Tbps per fiber
  • 1 Tbps USA 1996 WAN bisection bandwidth
  • Aggregate bandwidth doubles every 8 months!

1 fiber 25 Tbps
29
Underlying Theme
  • Digital Everything
  • From words and numbersto sights and sounds
  • New Devices
  • From isolated to adaptive, synchronized, and
    connected
  • Automation
  • From dumb to Web services
  • From manual to self-tuning, self organizing, and
    self maintaining
  • Beyond reliability to availability
  • One inter-connected network
  • From stand alone/basic connectivity to always
    wired (and wireless)
  • Everything over IP

30
Information Avalanche
  • The Situation a census of the data
  • We can record everything
  • Everything is a LOT!
  • The Good news
  • Changes science, education, medicine,
    entertainment,.
  • Shrinks time and space
  • Can augment human intelligence
  • The Bad News
  • The end of privacy
  • Cyber Crime / Cyber Terrorism
  • Monoculture
  • The Technical Challenges
  • Amplify human intellect
  • Organize, summarize and prioritize information
  • Make programming easy.

31
Online Science
  • All literature online
  • All data online
  • All instruments online
  • Great analysis tools.

32
Online Education
  • All literature online
  • All lectures online
  • Interactive and time-shifted education
  • Just-in-time education
  • Available to everyone everywhere
  • Economic model is not understood (who pays?)
  • One model society pays

33
Online Business
  • Frictionless economy
  • Near-perfect information
  • Very efficient
  • Fully customized products
  • Example Wallmart / Dell
  • Traditional business 1-10 inventory turns/y
  • eBuisiness 100-500 turns/y no inventory
  • VERY efficient, huge economic advantage
  • Your customers suppliers loan you money!

34
Online Medicine
  • Traditional medicine
  • Can monitor your health continuously
  • Instant diagnosis
  • Personalized drugs
  • New Biology
  • DNA is software
  • solve each disease
  • Huge impact on agriculture too

35
Cyber-Space Shrinks Time and Distance
  • Everyone is always connected
  • Can get information they want
  • Can communicate with friends family
  • Everything is online
  • You never miss a meeting/game/party/movie (you
    can always watch it)
  • You never forget anything (its there somewhere)

36
Sustainable Society
  • Year 2050 9 B people living at Europes
    standard of living
  • 100M people in a city?
  • Environment cant sustain it
  • More efficient cities/transportation/
  • 20 consume 60 now if 100 consume 1/3 of
    current levels net consumption unchanged.
  • Need to reduce energy/water/metal consumption 3x
    in developed world.

37
CyberSpace (data) and ToolsCan Augment Human
Intelligence
  • See next talk (12 CS challenges)
  • MyMainBrain is a personal exampleimproved
    memory
  • Data mining tools are promising

38
Information Avalanche
  • The Situation a census of the data
  • We can record everything
  • Everything is a LOT!
  • The Good news
  • Changes science, education, medicine,
    entertainment,.
  • Shrinks time and space
  • Can augment human intelligence
  • The Bad News
  • The end of privacy
  • Cyber Crime / Cyber Terrorism
  • Monoculture
  • The Technical Challenges
  • Amplify human intellect
  • Organize, summarize and prioritize information
  • Make programming easy.

39
The End Of Privacy
  • You can find out all about me.
  • Organizations can precisely track us
  • Credit cards, email, cellphone,
  • Animals have tags in them, I will probably get
    a tag (eventually)(I already carry a dozen ID
    smart cards).
  • You have no privacy, get over it Scott Mcnealy

40
The Centralization of Power
  • Computers enable an Orwellian future (1984)
  • The government can know everything you ever
  • Buy
  • Say
  • Hear
  • See/Read/
  • Where you are (phone company already knows)
  • Who you see and talk to
  • OK now, but what if Nero/Hitler/Stalin/.. comes
    to power?

41
Cyber Crime
  • You can steal my identity
  • Sell my house
  • Accumulate huge debts
  • Make a video of me doing terrible things.
  • You can steal on a grand scale
  • Now Trillions of dollars are online.
  • A LARGE honey-pot for criminals.

42
Cyber Terrorism
  • It is easier to attack/destroy than to steal.
  • Viruses, data corruption, data modification
  • Denial of Service
  • Hijacking and then destroying equipment
  • Utilities (water, energy, transportation)
  • Production (factories)

43
Monoculture
  • Radio TV movies Internetare making the
    world more homogenous.
  • ½ the world has never made a phone call
  • But this is changing fast (they want to make
    phone calls!)
  • The wired world enables communities to form very
    easily e.g. Sanskrit scholars.
  • But the community has to speak a common language.

44
Information Clutter
  • Most mail is junk mail
  • Most eMail will soon be junk mail
  • 30 of hotmail, 75 of my mail (130 m/d).
  • Telemarketing wastes peoples time.
  • Creates info-glut
  • You have 50,000 new mail messages
  • Need systems and interfaces to filter,
    summarize, prioritize information

45
Information Avalanche
  • The Situation a census of the data
  • We can record everything
  • Everything is a LOT!
  • The Good news
  • Changes science, education, medicine,
    entertainment,.
  • Shrinks time and space
  • Can augment human intelligence
  • The Bad News
  • The end of privacy
  • Cyber Crime / Cyber Terrorism
  • Monoculture
  • The Technical Challenges
  • Amplify human intellect
  • Organize, summarize and prioritize information
  • Make programming easy.

46
Technical Challenges
  • Storing information
  • Organizing information
  • Summarizing information
  • Visualizing information
  • Make programming easy

47
The personal Terabyte (all your stuff online)So
youve got it now what do you do with it?
  • Probably not accessed very often but TREASURED
    (whats the one thing you would save in a fire?)
  • Can you find anything?
  • Can you organize that many objects?
  • Once you find it will you know what it is?
  • Once youve found it, could you find it again?
  • Research Goal Have GOOD answers for all these
    Questions

48
Bell, Gemmell, Lueder MyLifeBits Guiding
Principles
  • Freedom from strict hierarchy
  • Full text search Collections
  • Many visualizations
  • dont metaphor me in
  • Annotations add value
  • So make them easy!
  • Keep the links when you author
  • transclusion
  • Everything goes in a database

49
How will we find it?Put everything in the DB
(and index it)
  • Need dbms features Consistency, Indexing,
    Pivoting, Queries, Speed/scalability, Backup,
    replicationIf you dont use one, creating one!
  • Simple logical structure
  • Blob and link is all that is inherent
  • Additional properties (facets extra
    tables)and methods on those tables
    (encapsulation)
  • More than a file system
  • Unifies data and meta-data
  • Simpler to manage
  • Easier to subset and reorganize
  • Set-oriented access
  • Allows online updates
  • Automatic indexing, replication

SQL
SQL
50
How do we represent it to the outside world?
lt?xml version"1.0" encoding"utf-8" ?gt -
ltDataSet xmlns"http//WWT.sdss.org/"gt -
ltxsschema id"radec" xmlns"" xmlnsxs"http//ww
w.w3.org/2001/XMLSchema" xmlnsmsdata"urnschemas
-microsoft-comxml-msdata"gt ltxselement
name"radec" msdataIsDataSet"true"gt ltxselement
name"Table"gt   ltxselement name"ra"
type"xsdouble" minOccurs"0" /gt   ltxselement
name"dec" type"xsdouble" minOccurs"0" /gt
- ltdiffgrdiffgram xmlnsmsdata"urnschemas-micr
osoft-comxml-msdata" xmlnsdiffgr"urnschemas-m
icrosoft-comxml-diffgram-v1"gt - ltradec
xmlns""gt - ltTable diffgrid"Table1"
msdatarowOrder"0"gt   ltragt184.028935351008lt/ragt
  ltdecgt-1.12590950121524lt/decgt   lt/Tablegt -
ltTable diffgrid"Table10" msdatarowOrder"9"gt  
ltragt184.025719033547lt/ragt   ltdecgt-1.2179582792018
6lt/decgt lt/Tablegt lt/radecgt  lt/diffgrdiffgramgt lt/
DataSetgt
  • File metaphor too primitive just a blob
  • Table metaphor too primitive just records
  • Need Metadata describing data context
  • Format
  • Providence (author/publisher/ citations/)
  • Rights
  • History
  • Related documents
  • In a standard format
  • XML and XML schema
  • DataSet is great example of this
  • World is now defining standard schemas

schema
Data or difgram
51
There is a problem
Niklaus Wirth Algorithms Data Structures
Programs
  • GREAT!!!!
  • XML documents are portable objects
  • XML documents are complex objects
  • WSDL defines the methods on objects (the class)
  • But will all the implementations match?
  • Think of UNIX or SQL or C or
  • This is a work in progress.

52
PhotoServer Managing Photos
  • Load all photos into the database
  • Annotate the photos
  • View by various attributes
  • Do similarity Search
  • Use XML for interchange
  • Use dbObject, Template for access

SQL, Templates, XML data
IIS
jScript
XML datasets mime data
Templates Schema
SQL (for xml)
53
How Similarity Search Works
  • For each picture Loader
  • Inserts thumbnails
  • Extracts 270 Features into a blob
  • When looking for similar picture
  • Scan all photos comparing features(dot product
    of vectors)
  • Sort by similarity
  • Feature blob is an array
  • Today I fake the array with functions and
    castcast(substring(feature,72,8) as float)
  • When SQL Server gets C I will not have to fake
    it.
  • And it will run 100x faster (compiled managed
    code).
  • Idea pioneered by IBM Research,we use a variant
    by MS Beijing Research.

No black squares 20 orange etc
many black squares 10 orange etc
72 match
27 match
54
Key Observations
  • Data
  • XML data sets are a universal way to represent
    answers
  • XML data sets minimize round trips 1
    request/response
  • Search
  • It is BEST to index
  • You can put objects and attributes in a row (SQL
    puts big blobs off-page)
  • If you cant index, You can extract attributes
    and quickly compare
  • SQL can scan at 2M records/cpu/second
  • Sequential scans are embarrassingly parallel.

55
What about Big Data
  • Talked about organizing personal data
  • What about BIG data.
  • Most of the following slides inspired by (or even
    copied from)
  • Alex Szalay JHU and
  • George Djorgovski Cal Tech

56
Data ? Knowledge ?
  • Exponential growth of data volume,
    complexity, quality
  • But growth SLOW growth of knowledge
    understanding
  • Why? Methodology bottleneck Human wetware
    limitations
  • Need AI-assisted discovery

Adapted from slides by Alex Szalay and George
Djorgovski
57
Whats needed?(not drawn to scale)
58
How Discoveries Made?adapted from slide by
George Djorgovski
  • Conceptual Discoveries e.g., Relativity, QM,
    Brane World, Inflation Theoretical, may be
    inspired by observations
  • Phenomenological Discoveries e.g., Dark Matter,
    QSOs, GRBs, CMBR, Extrasolar Planets, Obscured
    Universe
  • Empirical, inspire theories, can be motivated
    by them

New Technical Capabilities
Observational Discoveries
Theory
Phenomenological Discoveries ? Explore
parameter space ? Make new connections (e.g.,
multi-?) Understanding of complex phenomena
requires complex, information-rich data (and
simulations?)
59
Data Mining in the Image Domain Can We
Discover New Types of Phenomena Using Automated
Pattern Recognition? (Every object detection
algorithm has its biases and limitations)
Effective parametrization of source
morphologies and environments Multiscale
analysis (Also in the
time/lightcurve domain)
60
Exploration of Parameter Spaces in the Catalog
Domain (Source Attributes)
  • Clustering Analysis (supervised and
    unsupervised)
  • How many different types of objects are there?
  • Are there any rare or new types, outliers?
  • Multivariate Correlation Search
  • Are there significant, nontrivial correlations
    present in the data?

Clusters vs. Correlations
Science ? Correlations
Correlations ? reduction of the statistical
dimensionality
61
New Science from Multivariate Correlations
Data dimension DD 2 Statistical dim. DS 2
DD 2 DS 1
xi
If DS lt DD, then MV correlations are present
f (xi, xj, )
xj
xk
Fundamental Plane of E-galaxies
Correlations objectively define types of objects,
e.g., TFR ? normal spirals, FP ? normal
ellipticals and can lead to some new
insights
62
The Curse of Hyper-dimensionality,
But DD gtgt 1, DS gtgt 1 Data Complexity ?
Multidimensionality ? Discoveries But the bad
news is
The computational cost of clustering analysis
K-means K ? N ? I ? D Expectation
Maximization K ? N ? I ? D2 Monte Carlo
Cross-Validation M ? Kmax2 ? N ? I ? D2 N
no. of data vectors 1e12, D no. of data
dimensions 1e4 K no. of clusters chosen,
Kmax max no. of clusters tried I no. of
iterations, M no. of Monte Carlo
trials/partitions
Exascale computing and / or better algorithms
Some dimensionality reduction methods do exist
(e.g., PCA, class prototypes, hierarchical
methods, etc.), but more work is needed
63
The Curse of Hyper-dimensionality
  • Visualization!
  • A fundamental limitation of the human perception
    DMAX 3? 5? (NB We can certainly
    understand mathematically much higher
    dimensionalities, but cannot really visualize
    them our own Neural Nets are powerful pattern
    recognition tools)
  • Interactive visualization a key part of the
    data mining process
  • Some methodology exists, but much more is needed

DM Algorithm
?
?
?
User
Visualization
64
Online Multivariate Analysis Challenges
  • Data heterogeneity, biases, selection effects
  • Non-Gaussianity of clusters (data models)
  • Non-trivial topology of clustering
  • Useful vs. useless parameters

Outlier population, or a non-Gaussian tail?
65
Useful vs. Useless Parameters
Clusters (classes) and correlations may
exist/separate in some parameter subspaces, but
not in others
xi
xn
xj
xm
66
Optimal Statisticsfollowing slides adapted from
Alex Szalay
  • statistics algorithms scale poorly
  • Correlation functions N2, likelihood techniques
    N3
  • Even if data and computers grow at same
    rateComputers can do at most N logN algorithms
  • Possible solutions?
  • Assumes infinite computational resources
  • Assumes that only source of error is statistical
  • Cosmic Variance we can only observe the
    Universe from one location (finite sample size)
  • Solutions require combination of Statistics and
    CS
  • New algorithms not worse than N logN

67
Clever Data Structures
  • Heavy use of tree structures
  • Initial cost NlnN
  • Large speedup later
  • Tree-codes for correlations (A. Moore et al 2001)
  • Fast, approximate heuristic algorithms
  • No need to be more accurate than cosmic variance
  • Fast CMB analysis by Szapudi etal (2001)
  • N logN instead of N3 gt 1 day instead of 10
    million years
  • Take cost of computation into account
  • Controlled level of accuracy
  • Best result in a given time, given our computing
    resources

68
Angular Clustering with Photo-z
  • w(?) by Peebles and Groth
  • The first example of publishing and analyzing
    large data
  • Samples based on rest-frame quantities
  • Strictly volume limited samples
  • Largest angular correlation study to date
  • Very clear detection of
  • Luminosity and color dependence
  • Results consistent with 3D clustering

T. Budavari, A. Connolly, I. Csabai, I. Szapudi,
A. Szalay, S. Dodelson, J. Frieman, R. Scranton,
D. Johnston and the SDSS Collaboration
69
The Samples
2800 square degrees in 10 stripes, data in custom
DB
All 50M
mrlt21 15M
10 stripes 10M
0.1ltzlt0.3 -20 gt Mr 2.2M
0.1ltzlt0.5 -21.4 gt Mr 3.1M
-20 gt Mr gt-21 1182k
-21 gt Mr gt-23 931k
-21 gt Mr gt-22 662k
-22 gt Mr gt-23 269k
70
The Stripes
  • 10 stripes over the SDSS area, covering about
    2800 square degrees
  • About 20 lost due to bad seeing
  • Masks seeing, bright stars, etc.
  • Images generated from query by web service

71
The Masks
  • Stripe 11 masks
  • Masks are derived from the database
  • Search and intersect extended objects with
    boundaries

72
The Analysis
  • eSpICE I.Szapudi, S.Colombi and S.Prunet
  • Integrated with the database by T. Budavari
  • Extremely fast processing (N logN)
  • 1 stripe with about 1 million galaxies is
    processed in 3 mins
  • Usual figure was 10 min for 10,000 galaxies gt 70
    days
  • Each stripe processed separately for each cut
  • 2D angular correlation function computed
  • w(?) average with rejection of pixels along the
    scan
  • flat field vector causes mock correlations

73
Angular Correlations I.
  • Luminosity dependence 3 cuts
  • -20gt M gt -21
  • -21gt M gt -22
  • -22gt M gt -23

74
Angular Correlations II.
  • Color Dependence
  • 4 bins by rest-frame SED type

75
If theres time
  • Better User Interfaces 0 TaskGalary.MPG
  • Organizing photos 1 Digital Photo.mpg
  • Organizing newsgroups 2 Communities.mpg
  • Enhancing meetings. 3 flows.mpg
  • Attentional interfaces 4 Side Show.mpg

76
Thesis
  • Most new information is digital(and old
    information is being digitized)
  • A Computer Science Grand Challenge
  • Capture
  • Organize
  • Summarize
  • Visualize
  • This information
  • Optimize Human Attention as a resource.
  • Improve information quality

77
Information Avalanche
  • The Situation a census of the data
  • We can record everything
  • Everything is a LOT!
  • The Good news
  • Changes science, education, medicine,
    entertainment,.
  • Shrinks time and space
  • Can augment human intelligence
  • The Bad News
  • The end of privacy
  • Cyber Crime / Cyber Terrorism
  • Monoculture
  • The Technical Challenges
  • Amplify human intellect
  • Organize, summarize and prioritize information
  • Make programming easy.
Write a Comment
User Comments (0)
About PowerShow.com