Title: Mining Solar Images to Support Astrophysics Research
1Mining Solar Images to Support Astrophysics
Research
- Olfa Nasraoui
- Computer Engineering Computer Science
- University of Louisville
- Olfa.nasraoui_at_louisville.edu
- In collaboration with
- Joan Schmelz
- Department of Physics
- University of Memphis
- jschmelz_at_memphis.edu
- Acknowledgement team members who worked on this
project - Nurcan Durak, Sofiane Sellah, Heba Elgazzar,
Carlos Rojas (Univ. of Louisville) - Jonatan Gomez and Fabio Gonzalez (National Univ.
of Colombia) - Jennifer Roames, Kaouther Nasraoui (Univ. of
Memphis)
NASA-AISRP PI Meeting, Univ. Maryland, Oct. 3-5
2006
2Outline
- Motivations Goals of the Project
- Sources of Data
- Methodology
- Results on EIT Data
- Upcoming Plans
3Motivations (1) The Coronal Heating Problem
- The question of why the solar corona is so hot
remains one of the most exciting astronomy
puzzles for the last 60 years. - Temperature increases very steeply
- from 6000 degrees in photosphere (visible surface
of the Sun) - to a few million degrees in the corona (region
500 kilometers above the photosphere). - Even though the Sun is hotter on the inside than
it is on the outside. - The outer atmosphere of the Sun (the corona) is
indeed hotter than the underlying photosphere! - Measurements of the temperature distribution
along the coronal loop length can be used to
support or eliminate various classes of coronal
temperature models. - Scientific analysis requires data observed by
instruments such as EIT, TRACE, and SXT.
4Motivations (2) Finding Needles in Haystacks
(manually)
- The biggest obstacle to completing the coronal
temperature analysis task is collecting the right
data (manually). - The search for interesting images (with coronal
loops) is by far the most time consuming aspect
of this coronal temperature analysis. - Currently, this process is performed manually.
- It is therefore extremely tedious, and hinders
the progress of science in this field. - The next generation "EIT" called MAGRITE,
scheduled for launch in a few years on NASA's
Solar Dynamics Observatory, should be able to
take - as many images in about four days
- as was taken by EIT over 6 years!
- and will no doubt need state of the art
techniques to sift through the massive data to
support scientific discoveries
5Goals of the project Finding Needles in
Haystacks (automatically)
- Develop an image retrieval system based on Data
Mining - to quickly sift through data sets downloaded from
online solar image databases - and automatically discover the rare but
interesting images containing solar loops, which
are essential in studies of the Coronal Heating
Problem - Publishing mined knowledge on the web in an
easily exchangeable format for astronomers.
6Sources of Data
- EIT Extreme UV Imaging Telescope aboard the
NASA/European Space Agency spacecraft called SOHO
(Solar and Heliospheric Observatory) - http//umbra.nascom.nasa.gov/eit
- TRACE NASAs Transition Region And Coronal
Explorer - http//vestige/lmsal.com/TRACE.SXT
- SXT Soft X-ray Telescope database on the
Japanese spacecraft Yohkoh - http//ydac.mssl.ucl.ac.uk/ydac/sxt/sfm-cal-top.h
tml
7Samples of Data EIT
8(No Transcript)
9(No Transcript)
10(No Transcript)
11(No Transcript)
12(No Transcript)
13(No Transcript)
14(No Transcript)
15(No Transcript)
16Steps
- Sample Image Acquisition and Labeling
- images with and without solar loops, 1020 X 1022
2 MB / image - Image Preprocessing, Block Extraction, and
Feature extraction - Building Evaluating Classification Models
- At block level (is a block a loop or no-loop
block?) - 10-fold cross validation
- Train, then test on independent set 10 times,
- average results
- At image level (does an image contain a loop
block?) - Use model learned from training data
- One global model, or 1 model/solar cycle
- Test on independent set of images from different
solar cycles
17Step 1. Sample Image Acquisition and Labeling
- Used for
- downloading training images to use as example for
learning stage - Marking the blocks containing interesting loops
- Marking data is added as metadata in header of
the training image
18(No Transcript)
19Step 2. Image Preprocessing and Block Extraction
- Despeckling (to clean noise) and Gradient
Transformation (to bring out the edges) - Phase I (loops out of solar disk) divide area
outside solar disk into blocks with an optimal
size (to maximize overlap with marked areas over
all training images) - Use each block as one data record to extract
individual data attributes for learning and
testing
20Step 2 (contd) Block Extraction Labeling
- Starting with marked (labeled) image
- A mark rectangle that includes a real loop
- Extract several out-of-disk blocks from each
image - Label each block automatically based on overlap
with marked blocks - class 1 Loops
- class 2 No loop
- Hence, will generate several positive negative
examples
21(No Transcript)
22Difficult classification problemLoops come in
different sizes, shapes, intensities, etc
23Hardly distinguishable regions without
interesting loops
- Inconsistencies in labeling are common
- Subjectivity, quality of data
24Even at edge level challenging
- Which block is NOT a loop block?
25Even at edge level challenging
- Which block is NOT a loop block?
26Defective and Asymmetric nature of Loop Shapes
27Features inside each block - applied on the
original intensity levels
- Statistical Features
- Mean
- Standard Deviation
- Smoothness
- Third Moment
- Uniformity
- Entropy
28Features inside each block - applied on edges
- Hough-based Features
- First apply Hough transform
- Image space ? Hough Space (H.S.)
- Pixel ? parameter combination for a given shape
- All pixels ? vote for several parameter
combinations - Extract peaks from H.S.
- Then construct features based on H.S.
- Peak detection is very challenging
- Many false peaks (noise)
- Bin splitting (peaks are split)
- Biggest problem size of Hough accumulator array
- Every pixel votes for all possible curves that go
trough this pixel - Combinatorial explosion as we add more parameters
- Solution we feed the Hough space into a stream
clustering algorithm to detect peaks
29Stream clustering(published in SIAM Data Mining,
2006)eliminate need to store Hough accumulator
array by processing it in 1 pass
- Input initial scales s0, max. No. of clusters
- Output running (real-time) synopsis of clusters
in input stream - Repeat until end of stream
- Input next data point x
- For each cluster in current synopsis
- Perform Chebyshev test (test for compatibility
without any assumptions on distributions, but
requires robust scale estimates) - If x passes Chebyshev test Then
- Update cluster parameters centroid, scale
-
- If no cluster or x fails all Chebyshev tests Then
- Create new cluster (cx, s s0)
- Perform pairwise Chebyshev tests to merge
compatible clusters - densest cluster absorbs merged cluster
- Centroid updated
- Eliminate clusters with low density
-
30Examples of results
- Ability to learn (in 1 pass) cluster locations
and sizes (scales) from very noisy data - Clusters of different
- densities,
- sizes,
- shapes
31Examples of clustering 2-D Hough space
32Spatial Features
33Curvature Features
34Curvature Features
l
d
35Step 3. Classification
36(No Transcript)
37Block-based results
150 solar images from 1996, 1997, 2000, 2001,
2004 2005 403 Loop blocks 7950 No-loop blocks
38Loop Mining Tool
39(No Transcript)
40Image Based Testing Results
41Upcoming Plans
- Construction of better shape features from
outputs of clustering of Hough space - TRACE data sets
- Started
- Online learning
- Users can change the label after seeing results
- System adapts learned models online
- Use testing tool for labeling
- Can help at least as a filter
- Improve demo downloadable tools