Title: Feature mining paradigms for scientific data
1- Feature mining paradigms for scientific data
Ming Jiangy Tat-Sang Choyz Sameep Mehtay Matt
Coatneyy Steve Barrz Kaden Hazzardz David Richiex
Srinivasan Parthasarathyy Raghu Machirajuy David
Thompson John Wilkinsz Boyd Gatlin
2- Numerical simulation is replacing experimentation
- enormous sizes of the datasets generated
- It challenges our ability to explore and
comprehend the generated data
- some degree of automation be incorporated into
the data exploration process for large-scale
datasets
3Knowledge discovery and data mining (KDD) refers
to the overall process of discovering new
patterns or building models from a given dataset
The ultimate goal of data mining is prediction
4- The process of data mining
consists of following stages
- the initial exploration
- model building or pattern identification
- validation
- Deployment
5- traditional data mining techniques for Scientic
data
- Focus on business and marketing domains
- the problem of finding frequent patterns in the
dataset becomes that of discovering subgraphs
that occur frequently throughout the entire set
of graphs.
- such an abstract approach cannot exploit many of
the inherent physical and dynamical properties of
flow fields
6- a general framework--generalized feature mining.
- 1.Event Detection
- Physically derived attributes (such as swirl in
a CFD simulation) are monitored for temporal
events at multiple time scales - 2. Feature Mining
- to locating, characterizing,and tracking features
in unsteady phenomena - 3. Interaction Discovery
- apply traditional data mining techniques
on the processed features rather than the
unprocessed raw data
7 a feature is a pattern occurring in a dataset
that is the manifestation of correlations among
various components of the data
For instance, In the field of computational fluid
dynamics, common examples of features are
vortices, shock waves, and recirculation zones
8- feature mining are based on a classification or
segmentation of the data
- Purpose is to identify the parts of the data sets
that belong to the features.
- feature detection algorithms are highly
application-specific
- achieve data reductions of up to 110000 while
still permitting the visualization of essential
parts of the flow. - It can be used for visually browsing through the
datasets or it can be combined with other types
of visualization
9- two feature mining paradigms
10- Point Classication Paradigm
- Local feature detection at every point
- Point-based binary classication (verication)
- Aggregation of similarly classied points
- Denoising to eliminate weak features
- Ranking based on saliency of derived attributes
- Shape and dynamical attributes characterization
- Spatial and temporal feature tracking
11- Aggregate Classication Paradigm
- Local feature detection at every point
- Aggregation of contiguous candidate points
- Binary classication (verication) of aggregates
- Denoising to eliminate weak features
- Ranking based on saliency of derived attributes
- Shape and dynamical attributes characterization
- Spatial and temporal feature tracking
12- Point Classication-- vortex detection
A vortex exists when instantaneous streamlines
mapped onto a plane normal to the vortex core
exhibit a roughly circular or spiral pattern,
when viewed from a reference frame moving with
the center of the vortex.
13- Point Classication-- Shocks detection
A shock is a compression wave that may occur in
fluid flows when the velocity of the fluid
exceeds the local speed of sound.
the local acoustics speed
normalized pressure gradient velocity vector
In regions where Mn changes from greater than
unity to less than unity in the direction of the
flow,a shock exists
14- Point Classication-- Shocks detection
the change of Mn in the direction of the flow, is
a local definition
15- Aggregate Classication-- vortex core detection
direction labeling corresponds to assigning
labels to vectors according to the direction
ranges in which they point.
A fully labeled triangular cell, in this case,
corresponds to a property which we call
direction-spanning.
16- 2D vortex core region detection
17- 3D vortex core region detection
18- 3D vortex core region detection
19- 3D vortex core region detection
Point-based vortex detection algorithm applied
to a delta wing with vortex burst.
20- Aggregate Classication- Vortices
Core DETECTION
segments candidate vortex core regions by
aggregating points identified from the detection
step. We then classify (or verify) these
candidate core regions based on the existence of
swirling streamlines surrounding them
(a) Detected core regions for blunt n/at plate
21- Aggregate Classication-Geometric verification
Project tangent vector onto the swirl plane
normal to the candidate core tangent vector
To check the profile if spans a measure of
22- Aggregate Classication-Geometric verification
Checking for swirling streamlines is a global (or
aggregate) approach to feature classification (or
verication) because swirling is measured with
respect to the entire core region, not just
individual points within the core region.
23- Aggregate Classication-Geometric verification
vortices can bend and twist in various ways, So
introduce a local alignment process, based on the
probed direction vectors, to accommodate vortex
cores with nontrivial curvatures. The process
individually rotates the direction vectors to
align with the z-axis, and then applies the same
transformation to the streamline. In tangent
space, the transformed tangent vectors are
projected onto the (x y)-plane to create a
tangent profile.
24- Aggregate Classication-Geometric verification
without transformation, the tangent vectors can
point in various directions without any
particular order, and with transformation, the
tangent vectors uniformly revolve around the
z-axis, forming a cone shape with its apex at the
origin of the tangent space.
25- Aggregate Classication-GEOMETRIC VERIFICATION
even though the vortex cross section is highly
elliptical, the tangent vectors projected into
the local swirling plane, gray vectors in
(bottom), still satisfy the swirling criterion.
26- Aggregate Classication-GEOMETRIC VERIFICATION
27 28characterizing features using shapes attributes
The ellipsoid provides a good indication
of position, orientation, and size, is a good
approximation of shape
The swirling region of vortices can be
characterized using a sequence of elliptical
frusta
29- shape characterization process
- finest level of approximation to the swirling
region
Each elliptical frustum is oriented along the
segment of the vortex core The ellipses at each
end of the frustum is fitted to the set of
streamlines intersecting the swirling plane
- the next level of elliptical frusta are produced
by merging every contiguous pair of frusta at the
current level.
preserves their volumes in the new frustum, while
averaging the rest of their shape attributes
30 31this article propose two paradigms for detecting
the significant features
both approaches is to exploit the physics of the
problem at hand to develop highly discriminating,
application-dependent feature detection algorithms
then use available data mining algorithms to
classify, cluster,and categorize the identified
features