Title: On Predictive Modeling for Claim Severity
1On Predictive Modeling for Claim Severity
- Glenn Meyers
- ISO Innovative Analytics
- CARe Seminar
- June 6-7, 2005
2Problems with Experience Rating for Excess of
Loss Reinsurance
- Use submission claim severity data
- Relevant, but
- Not credible
- Not developed
- Use industry distributions
- Credible, but
- Not relevant (???)
3General Problems withFitting Claim Severity
Distributions
- Parameter uncertainty
- Fitted parameters of chosen model are estimates
subject to sampling error. - Model uncertainty
- We might choose the wrong model. There is no
particular reason that the models we choose are
appropriate. - Loss development
- Complete claim settlement data is not always
available.
4Outline of Remainder of Talk
- Quantifying Parameter Uncertainty
- Likelihood ratio test
- Incorporating Model Uncertainty
- Use Bayesian estimation with likelihood functions
- Uncertainty in excess layer loss estimates
- Bayesian estimation with prior models based on
data reported to a statistical agent - Reflect insurer heterogeneity
- Develops losses
5How Paper is Organized
- Start with classical hypothesis testing.
- Likelihood ratio test
- Calculate a confidence region for parameters.
- Calculate a confidence interval for a function of
the parameters. - For example, the expected loss in a layer
- Introduce a prior distribution of parameters.
- Calculate predictive mean for a function of
parameters.
6The Likelihood Ratio Test
7The Likelihood Ratio Test
8An Example The Pareto Distribution
- Simulate random sample of size 1000
- a 2.000, q 10,000
9Hypothesis Testing Example
- Significance level 5
- c2 critical value 5.991
- H0 (q,a) (10000, 2)
- H1 (q,a) ? (10000, 2)
- lnLR 2(-10034.660 10035.623) 1.207
- Accept H0
10Hypothesis Testing Example
- Significance level 5
- c2 critical value 5.991
- H0 (q,a) (10000, 1.7)
- H1 (q,a) ? (10000, 1.7)
- lnLR 2(-10034.660 10045.975) 22.631
- Reject H0
11Confidence Region
- X confidence region corresponds to the 1-X
level hypothesis test. - The set of all parameters (q,a) that fail to
reject corresponding H0. - For the 95 confidence region
- (10000, 2.0) is in.
- (10000, 1.7) out.
12Confidence Region
Outer Ring 95, Inner Ring 50
13Grouped Data
- Data grouped into four intervals
- 562 under 5000
- 181 between 5000 and 10000
- 134 between 10000 and 20000
- 123 over 20000
- Same data as before, only less information is
given.
14Confidence Region for Grouped Data
Outer Ring 95, Inner Ring 50
15Confidence Region for Ungrouped Data
Outer Ring 95, Inner Ring 50
16Estimation with Model UncertaintyCOTOR Challenge
November 2004
- COTOR published 250 claims
- Distributional form not revealed to participants
- Participants were challenged to estimate the cost
of a 5M x 5M layer. - Estimate confidence interval for pure premium
17You want to fit a distribution to 250 Claims
- Knee jerk first reaction, plot a histogram.
18This will not do! Take logs
- And fit some standard distributions.
19Still looks skewed. Take double logs.
- And fit some standard distributions.
20Still looks skewed. Take triple logs.
- Still some skewness.
- Lognormal and gamma fits look somewhat better.
21Candidate 1Quadruple lognormal
22Candidate 2Triple loggamma
23Candidate 3Triple lognormal
24All three cdfs are within confidence interval
for the quadruple lognormal.
25Elements of Solution
- Three candidate models
- Quadruple lognormal
- Triple loggamma
- Triple lognormal
- Parameter uncertainty within each model
- Construct a series of models consisting of
- One of the three models.
- Parameters within a broad confidence interval for
each model. - 7803 possible models
26Steps in Solution
- Calculate likelihood (given the data) for each
model. - Use Bayes Theorem to calculate posterior
probability for each model - Each model has equal prior probability.
27Steps in Solution
- Calculate layer pure premium for 5 x 5 layer for
each model. - Expected pure premium is the posterior
probability weighted average of the model layer
pure premiums. - Second moment of pure premium is the posterior
probability weighted average of the model layer
pure premiums squared.
28CDF of Layer Pure Premium
- Probability that layer pure premium x
- equals
- Sum of posterior probabilities for which the
- model layer pure premium is x
29Numerical Results
30Histogram of Predictive Pure Premium
31Example with Insurance Data
- Continue with Bayesian Estimation
- Liability insurance claim severity data
- Prior distributions derived from models based on
individual insurer data - Prior models reflect the maturity of claim data
used in the estimation
32Initial Insurer Models
- Selected 20 insurers
- Claim count in the thousands
- Fit mixed exponential distribution to the data of
each insurer - Initial fits had volatile tails
- Truncation issues
- Do small claims predict likelihood of large
claims?
33Initial Insurer Models
34Low Truncation Point
35High Truncation Point
36Selections Made
- Truncation point 100,000
- Family of cdfs that has correct behavior
- Admittedly the definition of correct is
debatable, but - The choices are transparent!
37Selected Insurer Models
38Selected Insurer Models
39Each model consists of
- The claim severity distribution for all claims
settled within 1 year - The claim severity distribution for all claims
settled within 2 years - The claim severity distribution for all claims
settled within 3 years - The ultimate claim severity distribution for all
claims - The ultimate limited average severity curve
40Three Sample Insurers Small, Medium and Large
- Each has three years of data
- Calculate likelihood functions
- Most recent year with 1 on prior slide
- 2nd most recent year with 2 on prior slide
- 3rd most recent year with 3 on prior slide
- Use Bayes theorem to calculate posterior
probability of each model
41Formulas for Posterior Probabilities
Model (m) Cell Probabilities
Number of claims
Likelihood (m)
Using Bayes Theorem
42ResultsTaken from paper.
43Formulas for Ultimate Layer Pure Premium
- Use 5 on model (3rd previous) slide to calculate
ultimate layer pure premium
44Results
- All insurers were simulated from same population.
- Posterior standard deviation decreases with
insurer size.
45Possible Extensions
- Obtain model for individual insurers
- Obtain data for insurer of interest
- Calculate likelihood, Prdatamodel, for each
insurers model. - Use Bayes Theorem to calculate posterior
probability of each model - Calculate the statistic of choice using models
and posterior probabilities - e.g. Loss reserves