Automated classification of rainfall systems using statistical characterization - PowerPoint PPT Presentation

1 / 129
About This Presentation
Title:

Automated classification of rainfall systems using statistical characterization

Description:

Automated classification of rainfall systems using statistical ... Some measure of facial ellipticity. Hair: red, scarce. Teeth: 5. Classification terminology ... – PowerPoint PPT presentation

Number of Views:86
Avg rating:3.0/5.0
Slides: 130
Provided by: gpetr
Category:

less

Transcript and Presenter's Notes

Title: Automated classification of rainfall systems using statistical characterization


1
Automated classification of rainfall systems
using statistical characterization
  • Michael Baldwin
  • University of Oklahoma
  • CIMMS

2
Motivation
  • Verification and predictability
  • e.g., Ebert and McBride (2000)
  • Climatology
  • e.g., Houze et al. (1990)
  • Forecasting
  • e.g., Doswell et al. (1996)
  • Diagnosis of ensemble forecasts
  • e.g, Elmore et al. (2002)

3
Subjective classification
  • From Houze et al. (1990)
  • Leading line-trailing stratiform
  • Symmetric/asymmetric
  • Unclassifyable class

4
Verification of detailed forecasts
observed
RMSE 3.4 MAE 0.97 ETS 0.06
RMSE 1.7 MAE 0.64 ETS 0.00
  • 12h forecasts of 1h precipitation valid 00Z 24
    Apr 2003

5
Outline
  • Training
  • 48 cases from Aug-Nov 2000 (target data set)
  • Histogram analysis
  • Correlogram analysis
  • Automation
  • Classification
  • Object identification
  • Validation
  • 100 cases from 2002 data

6
Classify people?
7
Classification terminology
  • Objects things you wish to classify
  • Individuals, cases, subjects, entities
  • Attributes descriptions of the objects
  • Variables, features, descriptors,
    characteristics, properties
  • 2 types of automated classification
  • Supervised
  • Unsupervised

8
Object classes are known ahead of timewhere
should new objects go?
9
Using cluster analysis to discover classes
10
What if you know the classes, but dont know how
to characterize the objects in such a way that an
automated classification will agree with your
classification?
  • First, build a training data set
  • Second, put together a set of trial attributes
    that might be useful in a classification
  • Third, do a lot of unsupervised classification
    experiments using combinations of trial
    attributes
  • Next, build a supervised classification procedure
    from the essential trial attributes

11
Target data set - training
  • 48 cases
  • Summer-Fall of 2000
  • Fixed domain size

12
Target data set - training
  • Cases selected by hand
  • Populate data set with typical rainfall systems

13
Target data set - training
  • NCEP Stage IV radargage analyses
  • 1h accumulation
  • 128 x 128 4km grid boxes

14
Target data set training
  • Variety of phenomena, geographical locations,
    times

15
Expert (subjective) classification
  • Two level classification hierarchy
  • Three-class Two-class
  • Linear
  • Cellular
  • Stratiform

Convective
Rain systems
Non-convective
16
Subjective classification based upon objective
criteria
  • Considering intensity, degree of alignment or
    linear organization
  • Significant fraction gt 5mm/hr Convective
  • Otherwise Stratiform
  • Bounding box around convective region aspect
    ratio gt 31 Linear
  • Otherwise Cellular

17
Intensity-related attributes
  • Compact way to describe how much rain fell
  • Histogram analysis
  • Parameters of a statistical distribution fit to
    histogram will be used as attributes

18
Attributes gamma distribution parameters
  • Gamma PDF depends on 2 parameters shape (a) and
    scale (b)f(xa,b) (x/b)a-1 exp(-x/b)
    bG(a)-1 x?0, a,bgt0

shape
scale
19
Example shape, scale (2-moments)
Cluster 1
4
Cluster 3
Cluster 2
  • Percent correct 3-class 63.8 2-class 97.8

20
Spatial organization related attributes
xh
h
xt
  • Geostatistics
  • Measure aspects of spatial field as a function of
    separation vector h
  • Variogram
  • Covariance
  • Correlogram

21
Synthetic data
  • Similar to Wood and Brown (1986) with Doppler
    velocity data
  • rainfall
  • correlogram

22
Synthetic data
  • Degree of linear organization related to shape of
    correlation contours

23
Image processing
24
Image processing
  • Thresholding, connected component labeling, edge
    detection

25
Example shape, scale, 0.6 contour eccentricity
Cluster 1
5
3
Cluster 2
4
  • Percent correct 3-class 76.1 2-class 100

26
Essential attributes
  • gamma scale parameter (b) and correlogram contour
    eccentricity (a/b)
  • No clear advantage to standardization, these
    attributes have approximately the same range
  • Addition of contour area and/or gamma shape parm
    did not improve classification
  • Expand the number of correlation contours to 0.2,
    0.4, 0.6, and 0.8

27
Example scale, eccentricity of 0.2, 0.4, 0.6,
0.8
Cluster 1
5
3
Cluster 2
4
  • Percent correct 3-class 90.5 2-class 100

28
Automated classification
  • Now using supervised classification
  • Find cluster means from best HCA results
  • Any new object will be classified by its nearest
    neighbor to these 5 cluster means

29
Automated rainfall object identification
  • Contiguous regions of measurable rainfall
    (similar to CRA Ebert and McBride (2000))

30
Connected component labeling
31
Expand area by 15, connect regions that are
within 20km, relabel
32
Object analysis
  • Extract features

33
2002 data 799,014 objects
Validation
  • Looking for power-law regimes on a log-log plot
  • Small
  • meso-g (lt50km)2
  • 65.6 (524224, 60/h)
  • Medium
  • meso-b (50-200km)2
  • 30.4 (242914, 28/h)
  • Large
  • meso-a (gt200km)2
  • 4 of total (31876, 3.7/h)

34
Example of object sizes
large
small
medium
medium
35
Distribution of object centers of mass
  • From NOAA (2002, 2003)

36
Random sample of large objects to validate
classification procedure
  • 100 cases, classified by hand (stratiform,
    linear, cellular) and by automated procedure
  • Training data set consisted of large objects
  • Overwhelming majority of small, medium objects
    have nearly identical attributes (low scale,
    eccentricity values)
  • Large objects will have least amount of
    uncertainty in parameter estimation
  • Large objects have wide range of attribute values

37
Validation of automated classification procedure
  • 85 correct in three-class (linear, cellular,
    stratiform)
  • 89 correct in two-class (convective,
    non-convective)

38
Classification of 2002 data
All
Small
Large
Medium
39
Summary
  • Developed an automated rainfall system
    classification procedure
  • Using statistically-based characteristics of
    intensity and degree of linear organization
  • Validated against random sample of 2002 data

40
Future work
  • Do these attributes allow more refined classes,
    such as leading-line trailing stratiform,
    symmetric/asymmetric?
  • Very large (synoptic scale) objects contain
    multiple classes of rainfall systems
  • Why are 99 of small objects stratiform?
  • Apply this to forecast and observed rain for
    verification purposes

41
Verification
b 7.8 a/b 0.2 3.6 a/b 0.4 3.1 a/b 0.6
4.5 a/b 0.8 3.6
observed
b 3.1 a/b 0.2 2.6 a/b 0.4 2.0 a/b 0.6
2.1 a/b 0.8 2.8
b 1.6 a/b 0.2 10.7 a/b 0.4 7.5 a/b 0.6
4.3 a/b 0.8 2.8
  • 12h forecasts of 1h precipitation valid 00Z 24
    Apr 2003

42
(No Transcript)
43
Cluster analysis - data matrix
  • Objects are columns
  • Attributes are rows
  • Cluster based on the similarity between objects
    (column vectors)

ith attribute
jth object
44
(No Transcript)
45
Subjective classification
  • Three main classes
  • Linear
  • Cellular
  • Stratiform

46
Subjective classification
  • Three main classes
  • Linear
  • Cellular
  • Stratiform

47
Subjective classification
  • Three main classes
  • Linear
  • Cellular
  • Stratiform

48
Objective classification
1
  • Hierarchical cluster analysis (HCA)
  • Similar to ensemble data analysis by
  • Alhamed et al (2002) SAMEX
  • Yussouf et al. (2003) New England
  • Analysis of similarity of objects
  • Similarity correlation
  • Dissimilarity distance
  • Clusters are groups of similar objects
  • Optimal clusters minimize within-cluster
    variation and maximize between-cluster variation

2
49
Cluster analysis Wards method
  • Ward (1963), based on variance conservation law
  • Agglomerative clustering algorithm
  • Step 1 Place each object is a separate cluster
  • Step 2 Compute within-cluster variance for every
    possible merger of two clusters
  • Step 3 Merge the two clusters that increase the
    within-cluster variance the least
  • Repeat steps 2 3 until all objects are in one
    cluster

50
Rainfall distribution
  • Distribution of rainfall amounts is highly
    positively skewed, non-negative
  • Heavy rain is a rare event
  • Considered using
  • Weibull (Wilks 1989)
  • Two-parameter kappa (Mielke 1973)
  • Gamma (Wilks 1990)

51
Gamma distribution
  • Gamma PDF depends on 2 parameters a,b
  • f(xa,b) (x/b)a-1 exp(-x/b) bG(a)-1
  • x?0, a,bgt0

Modify b (scale) parameter
Modify a (shape) parameter
52
Continuous spectrum of objects
53
Parameter estimation
  • For 2 parameters, a set of 2 equations are
    typically used relating the population and sample
    moments (e.g., 1st and 2nd)
  • x a b
  • s2 a b2
  • Familiar method of moments
  • Resulting distribution fits 1st and 2nd moments
    but not higher-order moments
  • Wilks (1990) discusses problems with method of
    moments estimates particularly for small values
    of a

54
Maximum likelihood estimation
  • Find parameters that make observed data most
    likely
  • Assuming independent, identically distributed
    data
  • Likelihood function becomes product of
    likelihoods for each observed value
  • Wilks (1990) used this on rainfall time-series,
    we have spatial data which are correlated
  • Want to use an estimation method that can take
    serial correlation into account

55
Method of validating objective classification
  • In order to convert HCA to classification,
    subjective decision is required
  • Kalkstein et al. (1987) suggest calling it
    automated instead of objective
  • Cut tree so we get 3-5 clusters 6 outliers
    (at most)
  • Count up number of cases in each class
  • Determine dominant class for each cluster
  • Number of cases correctly in dominant classes
    divided by total number of cases minus outliers
    is percent correct

56
Cluster membership
  • Cluster 1 5 lines, 5 cells, 0 stratiform
  • Cluster 2 8 lines, 10 cells, 0 stratiform
  • Cluster 3 1 lines, 0 cells, 11 stratiform
  • Cluster 4 4 lines, 3 cells, 0 stratiform
  • 2-class 46 correct cases / 47 total (48 1
    outlier) 97.8 correct
  • 3-class 30 correct / 47 total 63.8

57
and b for all 48 cases
non-conv
conv
58
Performance of cluster analysis using gamma
distribution attributes
  • Slight variation in performance as number of
    moments increase and with changes in q

59
Spatial organization related attributes
head
xh
h
tail
xt
  • Geostatistics
  • Measure aspects of spatial field as a function of
    separation vector h
  • Variogram
  • Covariance
  • Correlogram

60
Example variogram
  • Germann and Joss (2001) 1-D
  • Harris et al. (2001) 1-D structure function

61
Example covariance
62
Example correlogram
  • Kessler and Russo (1963)
  • Kessler (1966)
  • Zawadzki (1973)

63
Attributes summary measures of correlation
contours
  • Approximations to the area and eccentricity of an
    ellipse
  • Lengths of major, minor axes (a,b) found using
    image processing techniques
  • Area ab
  • Eccentricity a/b
  • 1.0 for circle, larger for flatter ellipses

64
Image processing
65
Image processing
  • Threshold
  • Connected component labeling

66
Image processing
  • Thresholding, connected component labeling, edge
    detection

67
Example a b 0.6 a/b
Cluster 1
5
3
Cluster 2
4
  • Percent correct 3-class 76.1 2-class 100

68
Percent correct
  • Question as to whether/how attributes should be
    standardized
  • Test every combination of 2, 3, and 4 attributes

69
Essential attributes
  • b and a/b
  • No clear advantage to standardization, these
    attributes have approximately the same range
  • Addition of ab and/or a did not improve
    classification
  • Expand the number of correlation contours to 0.2,
    0.4, 0.6, and 0.8

70
Example b a/b 0.2 0.4 0.6 0.8
Cluster 1
5
3
Cluster 2
4
  • Percent correct 3-class 90.5 2-class 100

71
Examples of hybrid cases
72
Automated classification
  • Now using partitional clustering
  • Find cluster means from best HCA results
  • Object class is nearest neighbor (Euclidean
    distance) to these 5 cluster means

73
Automated rainfall object identification
  • Contiguous regions of measurable rainfall
    (similar to CRA Ebert and McBride (2000))

74
Connected component labeling
75
Expand area by 15, connect regions that are
within 20km, relabel
76
Object analysis
  • Extract features

77
Attributes from example objects
78
Summary stats for 2002 data
  • 799014 objects, 8679 hours (99.1 of year)

79
Size regimes
  • Looking for power-law regimes on a log-log plot
  • Small
  • meso-g (lt50km)2
  • 65.6 (524224, 60/h)
  • Medium
  • meso-b (50-200km)2
  • 30.4 (242914, 28/h)
  • Large
  • meso-a (gt200km)2
  • 4 of total (31876, 3.7/h)

80
Data reduction - feature extraction
  • Find useful features that represent the data with
    a relatively small number of variables or
    dimensions
  • Determine which attributes are essential (those
    that help to discriminate) by experiment
  • Automate the computation of essential attributes

81
Random sample of large objects to validate
classification procedure
  • 100 cases, classified subjectively (stratiform,
    linear, cellular) and by automated procedure
  • Target data set consisted of large objects
  • Overwhelming majority of small, medium objects
    have nearly identical attributes (low b, low a/b)
  • Large objects will have least amount of
    uncertainty in parameter estimation
  • Large objects have wide range of attribute values

82
Log(density) of attributes
  • Drop a regular grid (in log-log space)
  • Count up number of objects in each gridbox

83
Diurnal cycle
84
Monthly distribution
85
Spatial distribution
86
a - b
87
b a/b 0.4
88
Validation of automated classification procedure
  • 85 correct in three-class (linear, cellular,
    stratiform)
  • 89 correct in two-class (convective,
    non-convective)

89
Summary
  • Developed an automated rainfall system
    classification procedure
  • Using statistically-based characteristics of
    intensity and degree of organization
  • Validated against random sample of 2002 data

90
Future work
  • Further refinement of classification scheme
  • Diagnosis of ensemble forecast systems
  • Feature tracking
  • Climatological studies
  • Apply to forecast and observed rainfall for
    verification purposes

91
Classification procedure
  • Categorizing entities based on their similarity
    to other members of a class
  • A taxonomy, considering rainfall systems as
    objects in their entirety
  • General, automated procedure using 1h accumulated
    precipitation analyses
  • Universally applicable to rainfall systems
    observable by NCEP Stage IV analysis system

92
Method of determining essential attributes
  • Classify a set of rainfall patterns (target data
    set) both objectively and subjectively
  • If results of an objective classification agree
    with subjective classification, then attributes
    are considered essential
  • We will then have set of attributes that describe
    rainfall patterns in a manner consistent with
    expert analyst

93
Feature extraction requires tools to manipulate
data
  • Multivariate statistical analysis
  • Parameter estimation
  • Geostatistics
  • Image processing
  • Pattern recognition

94
Verification
  • Main motivation for this work
  • Verify forecasts using an object-oriented
    approach (e.g., Somerville 1977, Williamson 1981,
    Neilley 1993)
  • In order to do this, must first be able to
    locate, analyze, and characterize objects in an
    automated fashion
  • An automated classification procedure is needed

95
Previous classification methods
  • Subjective
  • e.g., Maddox (1980), Bluestein and Jain (1985),
    Houze et al. (1990), Doswell et al. (1996),
    Parker and Johnson (2000)
  • Rain rate threshold
  • Johnson and Hamilton (1988)
  • Agglomerative image segmentation
  • Lakshmanan (2001)

96
Subjective classification
  • From Parker and Johnson (2000)

97
Previous classification methods
  • Analysis of local peaks
  • e.g., Churchill and Houze (1984), Steiner et al.
    (1995), Mohr and Zipser (1996), Biggerstaff and
    Listemaa (2000)
  • Drop size distribution
  • Yuter and Houze (1997), Rao et al. (2001)
  • Cloud model analysis
  • Xu (1995), Lang et al. (2003)

98
Analysis of local peaks
  • Micro-classification
  • From Biggerstaff and Listemaa (2000)

99
Why segment convective/stratiform?
convective
stratiform
  • Tropical rainfall
  • Vertical latent heating estimation
  • MCS parameterization (Alexander and Cotton 1998)

Heating rate
Divergence
Adapted from Houze (1997)
100
Development of an automated classification
procedure required
  • Macro-classification approach
  • Want to classify entire system rather than
    separate regions within a system
  • Use subjective methods as a guide

101
Outline of talk
  • Introduction
  • Process of developing classification procedure
  • Analysis of target data set
  • Automating the procedure
  • Analysis of 2002 data
  • Validation of automated procedure
  • Future work

102
(No Transcript)
103
Results from using beta, a/b for 0.2, 0.4, 0.6,
0.8 contours
  • PCA scores

104
PCA
105
Include a discussion of future work or remaining
issues
  • Various ways to verify forecasts using the
    attribute vector
  • Generalized euclidean distance (how to determine
    weight matrix?)
  • Marginal distributions
  • Mean errors given a certain range of 1 or more
    attributes
  • Joint distribution of errors given .

106
Summary
  • Developing an events-oriented verification
    approach by characterizing forecasts and
    observations
  • Cluster analysis on gamma distribution parameters
    successfully discriminated convective/non-convecti
    ve events
  • Future work involves finding attributes that
    describe the spatial organization of rainfall

107
Ideal set of attributes
  • Small set of numbers
  • Easy to compute (minimize CPU time)
  • Able to characterize important aspects of
    meteorological phenomena
  • Discriminate among different significant and
    interesting phenomena
  • Easy to explain to meteorologists

108
Synthetic data
109
a
b
q
110
Weight matrix
  • Correlation can be taken into account by
    modifying the error covariance matrix estimate
  • Iterative solution
  • First guess of q using AI
  • Use this to estimate S, invert to get A
  • Iterate until convergence is reached

111
Aspects of rainfall systems
  • Intensity
  • Degree of organization (mode)
  • Orientation
  • Location

112
Subjective classificaiton - MCS
  • Bluestein and Jain (1985)
  • Bluestein et al. (1987)
  • Houze et al. (1990)
  • Blanchard (1990)
  • Geerts (1998)
  • Parker and Johnson (2000)

113
Agglomerative
  • Lakshmanan (2001)
  • Region-growing

114
Segment convective and stratiform Analysis of
local peaks
  • Churchill and Houze (1984)
  • Steiner et al. (1995)
  • Biggerstaff and Listemaa (2000)
  • Mohr and Zipser (1996)
  • Looking for reflectivity/satellite brightness
    pixels that stand out above the crowd
  • Micro-classification, dividing a system into
    convective/stratiform regions

115
Segment convective and stratiform Drop size
distribution
  • Yuter and Houze (1997)
  • Rao et al. (2001)
  • Again, micro-classification
  • Do not have routine access to this kind of data

116
Segment convective and stratiform Cloud model
analysis
  • Lang et al. (2003)
  • Xu (1995)
  • Looking at vertical motion, cloud/rain water
    mixing ratio
  • Do not have access to this kind of information in
    real atmosphere

117
Rain rate
  • 6mm/hr (Johnson and Hamilton (1988))
  • 20mm/hr (Churchill and Houze (1984))

118
Traditional verification methods
  • Compute statistics based upon matching pairs of
    forecast and observed variables at the same set
    of points in space/time

119
Subjective not gospel truth
  • Ask 5 economists youll get 6 opinions
  • 4th class catdog could be classified either
    lines or cells depending on who you ask
  • Should not punish the objective classification if
    experts would have valid disagreements
  • Going to compute a more realistic correct by
    not punishing if catdogs are called lines or
    cells

120
Rand statistic
  • What you are trying to do is the general problem
    of comparing partitions between cluster
    solutions. Although people often use simple
    missclassification rates or Cohen's kappa, the
    best approach is to use the adjusted Rand
    statistic which has been developed to do exactly
    what you want. For more information on the
    adjusted Rand, you should refer to
  • Hubert, L., Arabie, P. (1985). Comparing
    partitions. Journal of Classification, 2,
    193-218.
  • I dont think I can use this since I have a
    different number of subjective and objective
    classes

121
Scores
  • Can produce single scores (like euclidean
    distance) but should we?
  • How do you weigh errors in various attributes?
  • Again, this must be user-specific like value
    question

122
Determining weights for generalized Euclidean
distance
  • Could this be taken from Gerritys response to
    Gandin and Murphy?
  • Inverse of the attribute observed frequency
    covariance matrix?

123
Problems
  • Cluster analysis finds groups with similar
    attributes
  • My target data set might happen to have clusters
    that are not representative of those that may or
    may not be found in real atmosphere
  • Classification is single-valued, either the case
    is in one class or another. Might need fuzzy
    classficiation
  • Cell/line class in the transition zone between
    more definitive line and cell cases
  • What does the difference between a 30 cell and
    a 50 cell mean in practical terms? (cost/loss
    for misclassification/misforecasting)

124
Process of developing classification procedure
  • Knowledge Discovery in Databases Process
  • Understand goals/application of end-user
  • Create target data set
  • Preprocess data set
  • Data reduction
  • Choose data mining task
  • Choose data mining algorithm
  • Execute data mining
  • Interpret mined patterns, possibly repeat steps
  • Consolidate discovered knowledge

125
Process of developing classification procedure
  • Knowledge Discovery in Databases Process
  • Understand goals/application of end-user
  • Create target data set
  • Preprocess data set
  • Data reduction
  • Choose data mining task
  • Choose data mining algorithm
  • Execute data mining
  • Interpret mined patterns, possibly repeat steps
  • Consolidate discovered knowledge

126
Process of developing classification procedure
  • Knowledge Discovery in Databases Process
  • Understand goals/application of end-user
  • Create target data set
  • Preprocess data set
  • Data reduction
  • Choose data mining task
  • Choose data mining algorithm
  • Execute data mining
  • Interpret mined patterns, possibly repeat steps
  • Consolidate discovered knowledge

127
Process of developing classification procedure
  • Knowledge Discovery in Databases Process
  • Understand goals/application of end-user
  • Create target data set
  • Preprocess data set
  • Data reduction
  • Choose data mining task
  • Choose data mining algorithm
  • Execute data mining
  • Interpret mined patterns, possibly repeat steps
  • Consolidate discovered knowledge

128
Process of developing classification procedure
  • Knowledge Discovery in Databases Process
  • Understand goals/application of end-user
  • Create target data set
  • Preprocess data set
  • Data reduction
  • Choose data mining task
  • Choose data mining algorithm
  • Execute data mining
  • Interpret mined patterns, possibly repeat steps
  • Consolidate discovered knowledge

129
Process of developing classification procedure
  • Knowledge Discovery in Databases Process
  • Understand goals/application of end-user
  • Create target data set
  • Preprocess data set
  • Data reduction
  • Choose data mining task
  • Choose data mining algorithm
  • Execute data mining
  • Interpret mined patterns, possibly repeat steps
  • Consolidate discovered knowledge
Write a Comment
User Comments (0)
About PowerShow.com