Computers, One and TwoSample Analyses, and Chisquared Tests - PowerPoint PPT Presentation

1 / 55

About This Presentation

Title:

Computers, One and TwoSample Analyses, and Chisquared Tests

Description:

There are two 'additional parts' of Excel that we will be using extensively ... Excel's Data Analysis Tools - again, two of several ... – PowerPoint PPT presentation

Number of Views:25

Avg rating:3.0/5.0

Slides: 56

Provided by: Morgan75

Category:

more less

Transcript and Presenter's Notes

Title: Computers, One and TwoSample Analyses, and Chisquared Tests

1
Computers, One- and Two-Sample Analyses, and
Chi-squared Tests
2
Using personal computers in this course

Your use will be important and extensive
We will be using two main things
Excel - to do statistical analyses
World Wide Web - to obtain course data and
information

3
Using Excel

Excel is a spreadsheet program - the interface
is divided into a large number of rectangular
cells
Each cell can contain one of four things
A number
A label
A formula
A built-in Excel function

4
Using Excel, cont.

There are two additional parts of Excel that we
will be using extensively
The Data Analysis Tools (does the statistical
number crunching for us)
The Solver optimization routine (solves linear
programming problems for us)
If you own your own computer, you should make
sure you have these

5
Checking for impt. add-ins

From the Tools menu, select Add-Ins . . .
then make sure the following are on the list
Analysis ToolPak
Analysis ToolPak - VBA
Solver Add-In
and that they are checked
Get my handout if these three are not on your list

6
Using the course Web site

The URL is on the course syllabus
Five parts
Homework assignments (which problems will be due
on which Thursdays)
Homework answers
Exam answers
Recommended practice problems
Access to the course data sets

7
Accessing the course data sets

Right click on the data set link (or Shift Left
click)
Select the Save As . . . option
Expand the compressed file once you have
downloaded it

8
Using Excel, cont.

Saving a file - use Save As (not Save) under
the File menu
Printing and page setup
Opening text files (like the MBS files) -
learning to use the Text Import Wizard (Dont
forget to remove the End-of-file character!)

9
Using Excel, cont.

Excels built-in statistical functions - for the
moment, two of many
AVERAGE( Some range )
STDEV( Some range )
Excels Data Analysis Tools - again, two of
several
Descriptive Statistics (calculates lots of
different numerical summaries of the data)
Histogram

10
A Diagram of the Statistical Process
The Population The individuals have at least
one characteristic, e.g., age, which varies over
them. Hence,this characteristic has a
probability distribution
11
Lets Look at Your Results

What are the values of your three statistics?
What sampling method did you use to obtain your
sample?
Did you sample with or without replacement?

12
I Now Have Three Sets of Values

Why are they different?
Which ones are correct? Why?
In this process, what is the
Population?
Sample outcome?
Statistic?
Parameter?

13
Summary

Statistics are random variables -- as such they
have a probability distribution. Like all
probability distributions, this distribution has
a mean
a standard deviation
a characteristic shape

14
Summary, cont.

The probability distribution of a statistic is
called the sampling distribution
Be careful to distinguish the sampling
distribution from the population distribution
(See, for example, Fig. 6.10 on p. 255 of MBS)

15
Summary, cont.

Statisticians know the most about the sampling
distribution of the sample mean
The mean of this sampling distribution is the
same as the mean of the population distribution
The standard deviation of this sampling
distribution is the standard deviation of the
population distribution divided by the square
root of the sample size.

16
Samp. Dist. Of Mean, cont.

If the sample size is large enough, no matter
what the shape of the population distribution,
the sampling distribution is approximately
Normally distributed. (This is the Central Limit
Theorem.)

17
(No Transcript)
18
Samp. Dist. Of Mean, cont.

If the population distribution is Normal, no
matter what the sample size, the sampling
distribution is
Normally distributed if the value of s is known
Students t distributed if the value of s is
unknown

19
The Sampling Distribution of the Sample Mean,
cont.
Sampling Distribution Normal
Sampling Distribution Students t
Punt!
20
Samp. Dist. Of Mean, cont.

Note what is not known
The sampling distribution of the sample mean when
the population distribution is non-Normal and the
sample size is small
The sampling distribution of any statistic other
than the sample mean. (You will learn about some
of these as you further study statistics.)

21
A Diagram of the Statistical Process
The Population The individuals have at least
one characteristic, e.g., age, which varies over
them. Hence,this characteristic has a
probability distribution
22
Overview of Statistical Inference

Closing the gap between statistics, whose
numerical values we know, and parameters, whose
numerical values we want to know

23
Recall Two Important Concepts

Statistics vary in value from sample to sample
Statistics rarely exactly coincide with their
corresponding parameter
The lack of coincidence means more complex
techniques will be needed for inferring parameter
values

24
Two Kinds of Statistical Inference

Confidence intervals
Hypothesis testing
Each is based on the notion that a statistic is a
random variable
and consequently heavily depends on the sampling
distribution of the statistic

25
Bala -- What is the difference between hypothesis
testing and confidence intervals?
26
Confidence Intervals

You have no idea what the value of m is
Use as the best estimate
Construct a confidence interval around
which is likely to contain m

27
Hypothesis Testing

You have some belief about the value of m
Then ask the question - If that value of m is
correct, how likely would it be that the value of
which I have obtained would occur?

28
Hypothesis Testing, cont.

If it is likely that value of would occur,
given m, we conclude that our belief about m
must be correct
If it is not likely that value of would
occur, given m, we conclude that our belief
about m must not be correct

29
Three Ways to do Hypothesis Testing

Method 1 Standardize the value of the statistic
and compare this to the z-value or t-value which
is the boundary of the rejection region

30
Three Ways, cont.

Method 2 Convert the boundary (z or t value)
of the rejection region to the scale of the
statistic
and compare this value to the value of the
statistic

31
Three Ways, cont.

Method 3 Using the value of the statistic,
calculate the p-value and compare this p-value to
the value of a
Remember The p-value must always match the
rejection region!

32
Rules when using the p-value

If p-value gt a accept H0
If p-value lt a reject H0

33
Steps in Hypothesis Testing

Determine the null and alternative hypotheses
Specify the value of a
Calculate the standardized value of the statistic

34
Steps in Hypothesis Testing, cont.

Draw a picture of the sampling distribution which
includes
Identification of the shape ( z or t )
Location of the rejection region (Upper tail,
lower tail, or both)
Position of the standardized statistic (Above or
below the mean)
Use the standardized value of the statistic to
calculate the p-value

35
Steps in Hypotesis Testing, cont.

Compare the p-value to the value of a to draw a
conclusion (Reject or accept the null hypothesis)
Important - Translate your statistical conclusion
into the practical terms of the problem

36
Using Excel to Calculate p-values

To calculate Normal probabilities, use
NORMSDIST( z )
To calculate t probabilities, use
TDIST( t, d.f., tails )

37
Comparing the Means of Two Populations

Three ways to do the analysis
When the samples are independent of each other
Both samples large (30 or more)
One or both samples small (less than 30)
When the samples can be paired - this situation
reduces to a one-sample test

38
Comparing Two Means with Independent Samples

Statistic
Sampling distribution
Normal if both samples are 30 or more (p. 370 of
MBS)
t if one or both samples smaller than 30 (pp.
373-374 of MBS)
Important - maintain order between your
hypotheses and your statistic (pp. 13-14 of the
coursepack)

39
Comparing Two Means with Independent Samples,
cont.

In Excel, for the large sample test
Use VAR( data range ) to calculate the sample
variance of each data set
Write down these variances!
Then use the Data Analysis Tool
z-Test Two-Sample for Means
For the small sample test use
t-Test Two-Sample Assuming Equal Variances

40
Comparing Two Means when the data can be paired

Think of pairing data like you would think of a
pair of shoes
Most aspects the same (size, color, style, etc.)
One aspect different (left vs. right)

41
Comparing Two Means using paired data, cont.

Calculate the difference between the paired
values
Analyze these differences as you would analyze a
one-sample set of data
Again, be careful about maintaining order
between your calculations of the differences and
your hypotheses!

42
Comparing Two Means using paired data, cont.

In Excel - Data Analysis Tool
t-Test Paired Two Sample for Means

43
The Crest toothpaste test

Pairs of identical twins living in the same
household were recruited
One twin was given Crest without stannous
fluoride, the other Crest with stannous fluoride
A paired test was conducted
P G spent 1 M promoting the results, and
Crest grew from a 3 share brand to the market
leader at about a 35 share

44
(No Transcript)
45
Chi-squared Analyses of Categorical Data

We will discuss two kinds of chi-squared analyses
Analysis of contingency tables - testing two
variables to see whether they are related to one
another
Goodness-of-fit tests - checking some data
against a probability distribution

46
Analysis of Contingency Tables

Hypotheses
H0 There is no relationship between the two
variables
Ha There is a relationship between the two
variables

47
Analysis of Contingency Tables, cont.

Using the probability law
P(A and B) P(A) P(B)
if A and B are independent
create a table of Expected Counts
Compare the table of actual data (Observed) with
the Expected table using the chi-squared statistic

48
The Chi-squared Statistic

is Chi-squared distributed with
( rows - 1) ( columns - 1) d.f.

49
Analysis of Contingency Tables, cont.

Small values of the chi-squared statistic mean
the counts in the Observed table match those in
the Expected table. This means that H0 is
accepted.
Large values of the chi-squared statistic mean
the counts in the two tables dont match. This
means H0 is rejected.

50
Analysis of Contingency Tables using Excel

There is no Data Analysis Tool to do this
Hence, we create a spreadsheet which mimics the
calculations we do by hand

51
Analysis using Excel, cont.

Start with the table of observed data, then
Calculate a table of expected values,
Calculate a table of chi-squared values,
Calculate the overall chi-squared value,

52
Analysis using Excel, cont.

Calculate the p-value of the test, using either
CHIDIST(c2 value, d.f.)
or
CHITEST(obs. range, exp. range)

53
Goodness-of-fit Tests

Similar to contingency table analysis, except the
hypotheses postulate a probability distribution
H0 The data come from a specified probability
distribution
Ha The data do not come from a specified
probability distribution
For this analysis
d.f. categories - 1

54
Goodness-of-Fit Example
55
Quality Control at M Ms

Write a Comment

User Comments (0)