Automated classification of rainfall systems using statistical characterization

About This Presentation

Title:

Automated classification of rainfall systems using statistical characterization

Description:

Automated classification of rainfall systems using statistical ... Some measure of facial ellipticity. Hair: red, scarce. Teeth: 5. Classification terminology ... – PowerPoint PPT presentation

Number of Views:86

Avg rating:3.0/5.0

Slides: 130

Provided by: gpetr

Category:

more less

Transcript and Presenter's Notes

Title: Automated classification of rainfall systems using statistical characterization

1
Automated classification of rainfall systems
using statistical characterization

Michael Baldwin
University of Oklahoma
CIMMS

2
Motivation

Verification and predictability
e.g., Ebert and McBride (2000)
Climatology
e.g., Houze et al. (1990)
Forecasting
e.g., Doswell et al. (1996)
Diagnosis of ensemble forecasts
e.g, Elmore et al. (2002)

3
Subjective classification

From Houze et al. (1990)
Leading line-trailing stratiform
Symmetric/asymmetric
Unclassifyable class

4
Verification of detailed forecasts
observed
RMSE 3.4 MAE 0.97 ETS 0.06
RMSE 1.7 MAE 0.64 ETS 0.00

12h forecasts of 1h precipitation valid 00Z 24
Apr 2003

5
Outline

Training
48 cases from Aug-Nov 2000 (target data set)
Histogram analysis
Correlogram analysis
Automation
Classification
Object identification
Validation
100 cases from 2002 data

6
Classify people?
7
Classification terminology

Objects things you wish to classify
Individuals, cases, subjects, entities
Attributes descriptions of the objects
Variables, features, descriptors,
characteristics, properties
2 types of automated classification
Supervised
Unsupervised

8
Object classes are known ahead of timewhere
should new objects go?
9
Using cluster analysis to discover classes
10
What if you know the classes, but dont know how
to characterize the objects in such a way that an
automated classification will agree with your
classification?

First, build a training data set
Second, put together a set of trial attributes
that might be useful in a classification
Third, do a lot of unsupervised classification
experiments using combinations of trial
attributes
Next, build a supervised classification procedure
from the essential trial attributes

11
Target data set - training

48 cases
Summer-Fall of 2000
Fixed domain size

12
Target data set - training

Cases selected by hand
Populate data set with typical rainfall systems

13
Target data set - training

NCEP Stage IV radargage analyses
1h accumulation
128 x 128 4km grid boxes

14
Target data set training

Variety of phenomena, geographical locations,
times

15
Expert (subjective) classification

Two level classification hierarchy
Three-class Two-class
Linear
Cellular
Stratiform

Convective
Rain systems
Non-convective
16
Subjective classification based upon objective
criteria

Considering intensity, degree of alignment or
linear organization
Significant fraction gt 5mm/hr Convective
Otherwise Stratiform
Bounding box around convective region aspect
ratio gt 31 Linear
Otherwise Cellular

17
Intensity-related attributes

Compact way to describe how much rain fell
Histogram analysis
Parameters of a statistical distribution fit to
histogram will be used as attributes

18
Attributes gamma distribution parameters

Gamma PDF depends on 2 parameters shape (a) and
scale (b)f(xa,b) (x/b)a-1 exp(-x/b)
bG(a)-1 x?0, a,bgt0

shape
scale
19
Example shape, scale (2-moments)
Cluster 1
4
Cluster 3
Cluster 2

Percent correct 3-class 63.8 2-class 97.8

20
Spatial organization related attributes
xh
h
xt

Geostatistics
Measure aspects of spatial field as a function of
separation vector h
Variogram
Covariance
Correlogram

21
Synthetic data

Similar to Wood and Brown (1986) with Doppler
velocity data
rainfall
correlogram

22
Synthetic data

Degree of linear organization related to shape of
correlation contours

23
Image processing
24
Image processing

Thresholding, connected component labeling, edge
detection

25
Example shape, scale, 0.6 contour eccentricity
Cluster 1
5
3
Cluster 2
4

Percent correct 3-class 76.1 2-class 100

26
Essential attributes

gamma scale parameter (b) and correlogram contour
eccentricity (a/b)
No clear advantage to standardization, these
attributes have approximately the same range
Addition of contour area and/or gamma shape parm
did not improve classification
Expand the number of correlation contours to 0.2,
0.4, 0.6, and 0.8

27
Example scale, eccentricity of 0.2, 0.4, 0.6,
0.8
Cluster 1
5
3
Cluster 2
4

Percent correct 3-class 90.5 2-class 100

28
Automated classification

Now using supervised classification
Find cluster means from best HCA results
Any new object will be classified by its nearest
neighbor to these 5 cluster means

29
Automated rainfall object identification

Contiguous regions of measurable rainfall
(similar to CRA Ebert and McBride (2000))

30
Connected component labeling
31
Expand area by 15, connect regions that are
within 20km, relabel
32
Object analysis

Extract features

33
2002 data 799,014 objects
Validation

Looking for power-law regimes on a log-log plot
Small
meso-g (lt50km)2
65.6 (524224, 60/h)
Medium
meso-b (50-200km)2
30.4 (242914, 28/h)
Large
meso-a (gt200km)2
4 of total (31876, 3.7/h)

34
Example of object sizes
large
small
medium
medium
35
Distribution of object centers of mass

From NOAA (2002, 2003)

36
Random sample of large objects to validate
classification procedure

100 cases, classified by hand (stratiform,
linear, cellular) and by automated procedure
Training data set consisted of large objects
Overwhelming majority of small, medium objects
have nearly identical attributes (low scale,
eccentricity values)
Large objects will have least amount of
uncertainty in parameter estimation
Large objects have wide range of attribute values

37
Validation of automated classification procedure

85 correct in three-class (linear, cellular,
stratiform)
89 correct in two-class (convective,
non-convective)

38
Classification of 2002 data
All
Small
Large
Medium
39
Summary

Developed an automated rainfall system
classification procedure
Using statistically-based characteristics of
intensity and degree of linear organization
Validated against random sample of 2002 data

40
Future work

Do these attributes allow more refined classes,
such as leading-line trailing stratiform,
symmetric/asymmetric?
Very large (synoptic scale) objects contain
multiple classes of rainfall systems
Why are 99 of small objects stratiform?
Apply this to forecast and observed rain for
verification purposes

41
Verification
b 7.8 a/b 0.2 3.6 a/b 0.4 3.1 a/b 0.6
4.5 a/b 0.8 3.6
observed
b 3.1 a/b 0.2 2.6 a/b 0.4 2.0 a/b 0.6
2.1 a/b 0.8 2.8
b 1.6 a/b 0.2 10.7 a/b 0.4 7.5 a/b 0.6
4.3 a/b 0.8 2.8

12h forecasts of 1h precipitation valid 00Z 24
Apr 2003

42
(No Transcript)
43
Cluster analysis - data matrix

Objects are columns
Attributes are rows
Cluster based on the similarity between objects
(column vectors)

ith attribute
jth object
44
(No Transcript)
45
Subjective classification

Three main classes
Linear
Cellular
Stratiform

46
Subjective classification

Three main classes
Linear
Cellular
Stratiform

47
Subjective classification

Three main classes
Linear
Cellular
Stratiform

48
Objective classification
1

Hierarchical cluster analysis (HCA)
Similar to ensemble data analysis by
Alhamed et al (2002) SAMEX
Yussouf et al. (2003) New England
Analysis of similarity of objects
Similarity correlation
Dissimilarity distance
Clusters are groups of similar objects
Optimal clusters minimize within-cluster
variation and maximize between-cluster variation

2
49
Cluster analysis Wards method

Ward (1963), based on variance conservation law
Agglomerative clustering algorithm
Step 1 Place each object is a separate cluster
Step 2 Compute within-cluster variance for every
possible merger of two clusters
Step 3 Merge the two clusters that increase the
within-cluster variance the least
Repeat steps 2 3 until all objects are in one
cluster

50
Rainfall distribution

Distribution of rainfall amounts is highly
positively skewed, non-negative
Heavy rain is a rare event
Considered using
Weibull (Wilks 1989)
Two-parameter kappa (Mielke 1973)
Gamma (Wilks 1990)

51
Gamma distribution

Gamma PDF depends on 2 parameters a,b
f(xa,b) (x/b)a-1 exp(-x/b) bG(a)-1
x?0, a,bgt0

Modify b (scale) parameter
Modify a (shape) parameter
52
Continuous spectrum of objects
53
Parameter estimation

For 2 parameters, a set of 2 equations are
typically used relating the population and sample
moments (e.g., 1st and 2nd)
x a b
s2 a b2
Familiar method of moments
Resulting distribution fits 1st and 2nd moments
but not higher-order moments
Wilks (1990) discusses problems with method of
moments estimates particularly for small values
of a

54
Maximum likelihood estimation

Find parameters that make observed data most
likely
Assuming independent, identically distributed
data
Likelihood function becomes product of
likelihoods for each observed value
Wilks (1990) used this on rainfall time-series,
we have spatial data which are correlated
Want to use an estimation method that can take
serial correlation into account

55
Method of validating objective classification

In order to convert HCA to classification,
subjective decision is required
Kalkstein et al. (1987) suggest calling it
automated instead of objective
Cut tree so we get 3-5 clusters 6 outliers
(at most)
Count up number of cases in each class
Determine dominant class for each cluster
Number of cases correctly in dominant classes
divided by total number of cases minus outliers
is percent correct

56
Cluster membership

Cluster 1 5 lines, 5 cells, 0 stratiform
Cluster 2 8 lines, 10 cells, 0 stratiform
Cluster 3 1 lines, 0 cells, 11 stratiform
Cluster 4 4 lines, 3 cells, 0 stratiform
2-class 46 correct cases / 47 total (48 1
outlier) 97.8 correct
3-class 30 correct / 47 total 63.8

57
and b for all 48 cases
non-conv
conv
58
Performance of cluster analysis using gamma
distribution attributes

Slight variation in performance as number of
moments increase and with changes in q

59
Spatial organization related attributes
head
xh
h
tail
xt

Geostatistics
Measure aspects of spatial field as a function of
separation vector h
Variogram
Covariance
Correlogram

60
Example variogram

Germann and Joss (2001) 1-D
Harris et al. (2001) 1-D structure function

61
Example covariance
62
Example correlogram

Kessler and Russo (1963)
Kessler (1966)
Zawadzki (1973)

63
Attributes summary measures of correlation
contours

Approximations to the area and eccentricity of an
ellipse
Lengths of major, minor axes (a,b) found using
image processing techniques
Area ab
Eccentricity a/b
1.0 for circle, larger for flatter ellipses

64
Image processing
65
Image processing

Threshold
Connected component labeling

66
Image processing

Thresholding, connected component labeling, edge
detection

67
Example a b 0.6 a/b
Cluster 1
5
3
Cluster 2
4

Percent correct 3-class 76.1 2-class 100

68
Percent correct

Question as to whether/how attributes should be
standardized
Test every combination of 2, 3, and 4 attributes

69
Essential attributes

b and a/b
No clear advantage to standardization, these
attributes have approximately the same range
Addition of ab and/or a did not improve
classification
Expand the number of correlation contours to 0.2,
0.4, 0.6, and 0.8

70
Example b a/b 0.2 0.4 0.6 0.8
Cluster 1
5
3
Cluster 2
4

Percent correct 3-class 90.5 2-class 100

71
Examples of hybrid cases
72
Automated classification

Now using partitional clustering
Find cluster means from best HCA results
Object class is nearest neighbor (Euclidean
distance) to these 5 cluster means

73
Automated rainfall object identification

Contiguous regions of measurable rainfall
(similar to CRA Ebert and McBride (2000))

74
Connected component labeling
75
Expand area by 15, connect regions that are
within 20km, relabel
76
Object analysis

Extract features

77
Attributes from example objects
78
Summary stats for 2002 data

799014 objects, 8679 hours (99.1 of year)

79
Size regimes

Looking for power-law regimes on a log-log plot
Small
meso-g (lt50km)2
65.6 (524224, 60/h)
Medium
meso-b (50-200km)2
30.4 (242914, 28/h)
Large
meso-a (gt200km)2
4 of total (31876, 3.7/h)

80
Data reduction - feature extraction

Find useful features that represent the data with
a relatively small number of variables or
dimensions
Determine which attributes are essential (those
that help to discriminate) by experiment
Automate the computation of essential attributes

81
Random sample of large objects to validate
classification procedure

100 cases, classified subjectively (stratiform,
linear, cellular) and by automated procedure
Target data set consisted of large objects
Overwhelming majority of small, medium objects
have nearly identical attributes (low b, low a/b)
Large objects will have least amount of
uncertainty in parameter estimation
Large objects have wide range of attribute values

82
Log(density) of attributes

Drop a regular grid (in log-log space)
Count up number of objects in each gridbox

83
Diurnal cycle
84
Monthly distribution
85
Spatial distribution
86
a - b
87
b a/b 0.4
88
Validation of automated classification procedure

85 correct in three-class (linear, cellular,
stratiform)
89 correct in two-class (convective,
non-convective)

89
Summary

Developed an automated rainfall system
classification procedure
Using statistically-based characteristics of
intensity and degree of organization
Validated against random sample of 2002 data

90
Future work

Further refinement of classification scheme
Diagnosis of ensemble forecast systems
Feature tracking
Climatological studies
Apply to forecast and observed rainfall for
verification purposes

91
Classification procedure

Categorizing entities based on their similarity
to other members of a class
A taxonomy, considering rainfall systems as
objects in their entirety
General, automated procedure using 1h accumulated
precipitation analyses
Universally applicable to rainfall systems
observable by NCEP Stage IV analysis system

92
Method of determining essential attributes

Classify a set of rainfall patterns (target data
set) both objectively and subjectively
If results of an objective classification agree
with subjective classification, then attributes
are considered essential
We will then have set of attributes that describe
rainfall patterns in a manner consistent with
expert analyst

93
Feature extraction requires tools to manipulate
data

Multivariate statistical analysis
Parameter estimation
Geostatistics
Image processing
Pattern recognition

94
Verification

Main motivation for this work
Verify forecasts using an object-oriented
approach (e.g., Somerville 1977, Williamson 1981,
Neilley 1993)
In order to do this, must first be able to
locate, analyze, and characterize objects in an
automated fashion
An automated classification procedure is needed

95
Previous classification methods

Subjective
e.g., Maddox (1980), Bluestein and Jain (1985),
Houze et al. (1990), Doswell et al. (1996),
Parker and Johnson (2000)
Rain rate threshold
Johnson and Hamilton (1988)
Agglomerative image segmentation
Lakshmanan (2001)

96
Subjective classification

From Parker and Johnson (2000)

97
Previous classification methods

Analysis of local peaks
e.g., Churchill and Houze (1984), Steiner et al.
(1995), Mohr and Zipser (1996), Biggerstaff and
Listemaa (2000)
Drop size distribution
Yuter and Houze (1997), Rao et al. (2001)
Cloud model analysis
Xu (1995), Lang et al. (2003)

98
Analysis of local peaks

Micro-classification
From Biggerstaff and Listemaa (2000)

99
Why segment convective/stratiform?
convective
stratiform

Tropical rainfall
Vertical latent heating estimation
MCS parameterization (Alexander and Cotton 1998)

Heating rate
Divergence
Adapted from Houze (1997)
100
Development of an automated classification
procedure required

Macro-classification approach
Want to classify entire system rather than
separate regions within a system
Use subjective methods as a guide

101
Outline of talk

Introduction
Process of developing classification procedure
Analysis of target data set
Automating the procedure
Analysis of 2002 data
Validation of automated procedure
Future work

102
(No Transcript)
103
Results from using beta, a/b for 0.2, 0.4, 0.6,
0.8 contours

PCA scores

104
PCA
105
Include a discussion of future work or remaining
issues

Various ways to verify forecasts using the
attribute vector
Generalized euclidean distance (how to determine
weight matrix?)
Marginal distributions
Mean errors given a certain range of 1 or more
attributes
Joint distribution of errors given .

106
Summary

Developing an events-oriented verification
approach by characterizing forecasts and
observations
Cluster analysis on gamma distribution parameters
successfully discriminated convective/non-convecti
ve events
Future work involves finding attributes that
describe the spatial organization of rainfall

107
Ideal set of attributes

Small set of numbers
Easy to compute (minimize CPU time)
Able to characterize important aspects of
meteorological phenomena
Discriminate among different significant and
interesting phenomena
Easy to explain to meteorologists

108
Synthetic data
109
a
b
q
110
Weight matrix

Correlation can be taken into account by
modifying the error covariance matrix estimate
Iterative solution
First guess of q using AI
Use this to estimate S, invert to get A
Iterate until convergence is reached

111
Aspects of rainfall systems

Intensity
Degree of organization (mode)
Orientation
Location

112
Subjective classificaiton - MCS

Bluestein and Jain (1985)
Bluestein et al. (1987)
Houze et al. (1990)
Blanchard (1990)
Geerts (1998)
Parker and Johnson (2000)

113
Agglomerative

Lakshmanan (2001)
Region-growing

114
Segment convective and stratiform Analysis of
local peaks

Churchill and Houze (1984)
Steiner et al. (1995)
Biggerstaff and Listemaa (2000)
Mohr and Zipser (1996)
Looking for reflectivity/satellite brightness
pixels that stand out above the crowd
Micro-classification, dividing a system into
convective/stratiform regions

115
Segment convective and stratiform Drop size
distribution

Yuter and Houze (1997)
Rao et al. (2001)
Again, micro-classification
Do not have routine access to this kind of data

116
Segment convective and stratiform Cloud model
analysis

Lang et al. (2003)
Xu (1995)
Looking at vertical motion, cloud/rain water
mixing ratio
Do not have access to this kind of information in
real atmosphere

117
Rain rate

6mm/hr (Johnson and Hamilton (1988))
20mm/hr (Churchill and Houze (1984))

118
Traditional verification methods

Compute statistics based upon matching pairs of
forecast and observed variables at the same set
of points in space/time

119
Subjective not gospel truth

Ask 5 economists youll get 6 opinions
4th class catdog could be classified either
lines or cells depending on who you ask
Should not punish the objective classification if
experts would have valid disagreements
Going to compute a more realistic correct by
not punishing if catdogs are called lines or
cells

120
Rand statistic

What you are trying to do is the general problem
of comparing partitions between cluster
solutions. Although people often use simple
missclassification rates or Cohen's kappa, the
best approach is to use the adjusted Rand
statistic which has been developed to do exactly
what you want. For more information on the
adjusted Rand, you should refer to
Hubert, L., Arabie, P. (1985). Comparing
partitions. Journal of Classification, 2,
193-218.
I dont think I can use this since I have a
different number of subjective and objective
classes

121
Scores

Can produce single scores (like euclidean
distance) but should we?
How do you weigh errors in various attributes?
Again, this must be user-specific like value
question

122
Determining weights for generalized Euclidean
distance

Could this be taken from Gerritys response to
Gandin and Murphy?
Inverse of the attribute observed frequency
covariance matrix?

123
Problems

Cluster analysis finds groups with similar
attributes
My target data set might happen to have clusters
that are not representative of those that may or
may not be found in real atmosphere
Classification is single-valued, either the case
is in one class or another. Might need fuzzy
classficiation
Cell/line class in the transition zone between
more definitive line and cell cases
What does the difference between a 30 cell and
a 50 cell mean in practical terms? (cost/loss
for misclassification/misforecasting)

124
Process of developing classification procedure