Title: STAT131171 W4L2 Modelling Variation: Introduction to modelling and GOF
1STAT131/171W4L2 Modelling Variation
Introduction to modelling and GOF
- by
- Anne Porter
- alp_at_uow.edu.au
2Activity Lets play beat the butcher
- Morning radio 6am -7am, weekdays
- Contestant telephones in to play
- Contestant has to say stop before the gong rings
to win the meat - Radio personality reads the meat items 2 slices
of scotch fillet,,3kg mince, until the gong is
reached
3The list
Lets play, all stand, Ill read, you sit when
you have enough meat. Last ones standing before
the gong win.
- 1) Three kilos scotch fillet
4) 12 chicken kebabs
5) 12 lamb kebabs
2) 1 chicken
3) 3 kilos of sausages
6) 3 livers
9) 2kg salmon rissoles
7) 1 kg bacon
8) lamb chops
4How might you increase your chances of
winning?What information would be useful before
you play again?
- What is the maximum and minimum number of items
ever read out? - What is the voice pattern over the gonged items?
- What is the average number of items read out
before the Gong? - What is the frequency of gongs over time for each
item?
5Frequency distribution of the number of items
before the gong
What is a more informative way of presenting
the data so we optimise our chance of where to
stop?
6Relative frequency table
What is a better way of presenting this
information so it is easier to use?
7Cumulative frequency
What will be the median number of items before
the gong?
Is the (n1)/2th value the 50.5th value 8
8Frequency distribution of the number of items
before the gong
What is the average number of items read before
the gong?
9What do we do to calculate the mean number of
items before the gong?
10What do we do to calculate the mean number of
items before the gong?
Multiply the number of items by the Frequency
AND add to get the total number of items before
the gong AND divide by the number of games
played
11Calculate the mean
12Calculate the mean
784/100 7.84 Items before the gong
13Will your stopping strategy be the same for this
set of data?
Why not?
14Will your stopping strategy be the same for this
set of data?
Why not?
For these values of x we have a much smaller
spread
15In the long run what should be the probability of
stopping at each number if stopping at random?
16P(Xx) and number expected for each item for the
random stopping model
Does it appear that the data fit the
random stopping model? Why so?
17P(Xx) and number expected for each item for the
random stopping model
Does it appear that the data fit the
random stopping model? Why so?
Number expected differs from number observed.
18Bar Chart Compare observed expected
frequencies
19Measuring the difference between O and E
How do we Measure (compare, calculate) the
difference between observed and expected
20P(Xx) and number expected for each item for the
random stopping model
How might we calculate the difference between
observed and expected
If the data fits will this be big or small?
small
21Calculating
22Calculating
23Calculating
24Model Fit Using
- Calculate
- And see if it is too large for the data to be
considered to fit the model
25Model Fit Informal Is too big?
- If
- Where dg-p-1
- g is the number of cells
- p is the number of parameters estimated from the
data - Then there is evidence the data does not fit the
model
For our example g
10 cells therefore d10-0-19
17.49
Decision As 65.6 gt17.49 there
is evidence that the data do not fit the random
stopping model
26Model Fit Formal
- Decision If calculated gt
critical value of (tables) then there is
evidence of lack of fit - a0.05 (typical and we will use)
- dfNumber of cells number of estimated
parameters-1 - df 10-0-19
-
27Model Fit Formal
- Decision As calculated
65.6 gt critical value of 16.919 found in the
tables there is evidence of lack of fit between
the data and the random stopping model. -
28Lack of fit
Looking at the table we can see most lack of fit
occurs for items 2, 3, 8 and 9 lots of meat
before the gong
29Sampling Distributions
- We will explore how these types of sampling
distributions, are generated in
our lecture on sampling distributions. - We will also explore how we chose a value of a
- We will look at using the data to estimate
parameters later
30Model fit approaches
- Use a Bar chart to compare observed and expected
frequencies - Compare observed and expected frequencies
- Calculate and use
- Informally
- Formally
- assumes that the expected counts in each cell is
5 - If not combine cells. Other literature uses other
rules, there is a debate over this. - (Check the Utts Heckard (2004) definition)
31Mean (expected value, E(X)) for the random
stopping model
32Expected value for the random stopping model is?
E(X)6.5
33Spread of the Population Model
We will leave calculation of these till a little
later on a simpler example
34What have we been doing?
- We have been looking at the centre, spread,
outliers and shape of samples of data? - With a view to improving decision making.
- Why are we concerned with looking at models?
35Describing characteristics of Data
- We collect data on samples
- Time in seconds until two species of flies
released together mate - The number of lost articles found in a large
municipal office - The average carbohydrate content per 100 gm serve
in a sample of different species - The number of items of meat read before the gong
36Improving our decisions
- Looking at
- The shape of the distribution
- Centre
- Spread
- Whether or not the data fit some model
- May even look at outliers, points not fitting the
model
37Describing Batches of Data
- Comparing midterm marks from the different
versions of the test. - Are the papers completed in a similar manner?
38What we are really looking at is NOT
- The mating behaviour of these particular flies
- Past lost articles
- Or last years exam papers
- Or the last 100 games of beat the butcher
We are interested in them because they
may suggest a model for the characteristics of
the data in general. This involves Probability
Models. We shall continue to explore probability
models in future lectures.