Mining Solar Images to Support Astrophysics Research - PowerPoint PPT Presentation

1 / 41

About This Presentation

Title:

Mining Solar Images to Support Astrophysics Research

Description:

from 6000 degrees in photosphere (visible surface of the Sun) ... The outer atmosphere of the Sun (the corona) is indeed hotter than the underlying photosphere! ... – PowerPoint PPT presentation

Number of Views:65

Avg rating:3.0/5.0

Slides: 42

Provided by: olf5

Category:

more less

Transcript and Presenter's Notes

Title: Mining Solar Images to Support Astrophysics Research

1
Mining Solar Images to Support Astrophysics
Research

Olfa Nasraoui
Computer Engineering Computer Science
University of Louisville
Olfa.nasraoui_at_louisville.edu
In collaboration with
Joan Schmelz
Department of Physics
University of Memphis
jschmelz_at_memphis.edu
Acknowledgement team members who worked on this
project
Nurcan Durak, Sofiane Sellah, Heba Elgazzar,
Carlos Rojas (Univ. of Louisville)
Jonatan Gomez and Fabio Gonzalez (National Univ.
of Colombia)
Jennifer Roames, Kaouther Nasraoui (Univ. of
Memphis)

NASA-AISRP PI Meeting, Univ. Maryland, Oct. 3-5
2006
2
Outline

Motivations Goals of the Project
Sources of Data
Methodology
Results on EIT Data
Upcoming Plans

3
Motivations (1) The Coronal Heating Problem

The question of why the solar corona is so hot
remains one of the most exciting astronomy
puzzles for the last 60 years.
Temperature increases very steeply
from 6000 degrees in photosphere (visible surface
of the Sun)
to a few million degrees in the corona (region
500 kilometers above the photosphere).
Even though the Sun is hotter on the inside than
it is on the outside.
The outer atmosphere of the Sun (the corona) is
indeed hotter than the underlying photosphere!
Measurements of the temperature distribution
along the coronal loop length can be used to
support or eliminate various classes of coronal
temperature models.
Scientific analysis requires data observed by
instruments such as EIT, TRACE, and SXT.

4
Motivations (2) Finding Needles in Haystacks
(manually)

The biggest obstacle to completing the coronal
temperature analysis task is collecting the right
data (manually).
The search for interesting images (with coronal
loops) is by far the most time consuming aspect
of this coronal temperature analysis.
Currently, this process is performed manually.
It is therefore extremely tedious, and hinders
the progress of science in this field.
The next generation "EIT" called MAGRITE,
scheduled for launch in a few years on NASA's
Solar Dynamics Observatory, should be able to
take
as many images in about four days
as was taken by EIT over 6 years!
and will no doubt need state of the art
techniques to sift through the massive data to
support scientific discoveries

5
Goals of the project Finding Needles in
Haystacks (automatically)

Develop an image retrieval system based on Data
Mining
to quickly sift through data sets downloaded from
online solar image databases
and automatically discover the rare but
interesting images containing solar loops, which
are essential in studies of the Coronal Heating
Problem
Publishing mined knowledge on the web in an
easily exchangeable format for astronomers.

6
Sources of Data

EIT Extreme UV Imaging Telescope aboard the
NASA/European Space Agency spacecraft called SOHO
(Solar and Heliospheric Observatory)
http//umbra.nascom.nasa.gov/eit
TRACE NASAs Transition Region And Coronal
Explorer
http//vestige/lmsal.com/TRACE.SXT
SXT Soft X-ray Telescope database on the
Japanese spacecraft Yohkoh
http//ydac.mssl.ucl.ac.uk/ydac/sxt/sfm-cal-top.h
tml

7
Samples of Data EIT

8
(No Transcript)
9
(No Transcript)
10
(No Transcript)
11
(No Transcript)
12
(No Transcript)
13
(No Transcript)
14
(No Transcript)
15
(No Transcript)
16
Steps

Sample Image Acquisition and Labeling
images with and without solar loops, 1020 X 1022
2 MB / image
Image Preprocessing, Block Extraction, and
Feature extraction
Building Evaluating Classification Models
At block level (is a block a loop or no-loop
block?)
10-fold cross validation
Train, then test on independent set 10 times,
average results
At image level (does an image contain a loop
block?)
Use model learned from training data
One global model, or 1 model/solar cycle
Test on independent set of images from different
solar cycles

17
Step 1. Sample Image Acquisition and Labeling

Used for
downloading training images to use as example for
learning stage
Marking the blocks containing interesting loops
Marking data is added as metadata in header of
the training image

18
(No Transcript)
19
Step 2. Image Preprocessing and Block Extraction

Despeckling (to clean noise) and Gradient
Transformation (to bring out the edges)
Phase I (loops out of solar disk) divide area
outside solar disk into blocks with an optimal
size (to maximize overlap with marked areas over
all training images)
Use each block as one data record to extract
individual data attributes for learning and
testing

20
Step 2 (contd) Block Extraction Labeling

Starting with marked (labeled) image
A mark rectangle that includes a real loop
Extract several out-of-disk blocks from each
image
Label each block automatically based on overlap
with marked blocks
class 1 Loops
class 2 No loop
Hence, will generate several positive negative
examples

21
(No Transcript)
22
Difficult classification problemLoops come in
different sizes, shapes, intensities, etc

23
Hardly distinguishable regions without
interesting loops

Inconsistencies in labeling are common
Subjectivity, quality of data

24
Even at edge level challenging

Which block is NOT a loop block?

25
Even at edge level challenging

Which block is NOT a loop block?

26
Defective and Asymmetric nature of Loop Shapes
27
Features inside each block - applied on the
original intensity levels

Statistical Features
Mean
Standard Deviation
Smoothness
Third Moment
Uniformity
Entropy

28
Features inside each block - applied on edges

Hough-based Features
First apply Hough transform
Image space ? Hough Space (H.S.)
Pixel ? parameter combination for a given shape
All pixels ? vote for several parameter
combinations
Extract peaks from H.S.
Then construct features based on H.S.
Peak detection is very challenging
Many false peaks (noise)
Bin splitting (peaks are split)
Biggest problem size of Hough accumulator array
Every pixel votes for all possible curves that go
trough this pixel
Combinatorial explosion as we add more parameters
Solution we feed the Hough space into a stream
clustering algorithm to detect peaks

29
Stream clustering(published in SIAM Data Mining,
2006)eliminate need to store Hough accumulator
array by processing it in 1 pass

Input initial scales s0, max. No. of clusters
Output running (real-time) synopsis of clusters
in input stream
Repeat until end of stream
Input next data point x
For each cluster in current synopsis
Perform Chebyshev test (test for compatibility
without any assumptions on distributions, but
requires robust scale estimates)
If x passes Chebyshev test Then
Update cluster parameters centroid, scale
If no cluster or x fails all Chebyshev tests Then
Create new cluster (cx, s s0)
Perform pairwise Chebyshev tests to merge
compatible clusters
densest cluster absorbs merged cluster
Centroid updated
Eliminate clusters with low density

30
Examples of results

Ability to learn (in 1 pass) cluster locations
and sizes (scales) from very noisy data
Clusters of different
densities,
sizes,
shapes

31
Examples of clustering 2-D Hough space
32
Spatial Features
33
Curvature Features
34
Curvature Features
l
d
35
Step 3. Classification
36
(No Transcript)
37
Block-based results
150 solar images from 1996, 1997, 2000, 2001,
2004 2005 403 Loop blocks 7950 No-loop blocks
38
Loop Mining Tool
39
(No Transcript)
40
Image Based Testing Results
41
Upcoming Plans