Title: Computers, One and TwoSample Analyses, and Chisquared Tests
1Computers, One- and Two-Sample Analyses, and
Chi-squared Tests
2Using personal computers in this course
- Your use will be important and extensive
- We will be using two main things
- Excel - to do statistical analyses
- World Wide Web - to obtain course data and
information
3Using Excel
- Excel is a spreadsheet program - the interface
is divided into a large number of rectangular
cells - Each cell can contain one of four things
- A number
- A label
- A formula
- A built-in Excel function
4Using Excel, cont.
- There are two additional parts of Excel that we
will be using extensively - The Data Analysis Tools (does the statistical
number crunching for us) - The Solver optimization routine (solves linear
programming problems for us) - If you own your own computer, you should make
sure you have these
5Checking for impt. add-ins
- From the Tools menu, select Add-Ins . . .
then make sure the following are on the list - Analysis ToolPak
- Analysis ToolPak - VBA
- Solver Add-In
- and that they are checked
- Get my handout if these three are not on your list
6Using the course Web site
- The URL is on the course syllabus
- Five parts
- Homework assignments (which problems will be due
on which Thursdays) - Homework answers
- Exam answers
- Recommended practice problems
- Access to the course data sets
7Accessing the course data sets
- Right click on the data set link (or Shift Left
click) - Select the Save As . . . option
- Expand the compressed file once you have
downloaded it
8Using Excel, cont.
- Saving a file - use Save As (not Save) under
the File menu - Printing and page setup
- Opening text files (like the MBS files) -
learning to use the Text Import Wizard (Dont
forget to remove the End-of-file character!)
9Using Excel, cont.
- Excels built-in statistical functions - for the
moment, two of many - AVERAGE( Some range )
- STDEV( Some range )
- Excels Data Analysis Tools - again, two of
several - Descriptive Statistics (calculates lots of
different numerical summaries of the data) - Histogram
10A Diagram of the Statistical Process
The Population The individuals have at least
one characteristic, e.g., age, which varies over
them. Hence,this characteristic has a
probability distribution
11Lets Look at Your Results
- What are the values of your three statistics?
- What sampling method did you use to obtain your
sample? - Did you sample with or without replacement?
12I Now Have Three Sets of Values
- Why are they different?
- Which ones are correct? Why?
- In this process, what is the
- Population?
- Sample outcome?
- Statistic?
- Parameter?
13Summary
- Statistics are random variables -- as such they
- have a probability distribution. Like all
probability distributions, this distribution has - a mean
- a standard deviation
- a characteristic shape
14Summary, cont.
- The probability distribution of a statistic is
called the sampling distribution - Be careful to distinguish the sampling
distribution from the population distribution
(See, for example, Fig. 6.10 on p. 255 of MBS)
15Summary, cont.
- Statisticians know the most about the sampling
distribution of the sample mean - The mean of this sampling distribution is the
same as the mean of the population distribution -
- The standard deviation of this sampling
distribution is the standard deviation of the
population distribution divided by the square
root of the sample size.
16Samp. Dist. Of Mean, cont.
- If the sample size is large enough, no matter
what the shape of the population distribution,
the sampling distribution is approximately
Normally distributed. (This is the Central Limit
Theorem.)
17(No Transcript)
18Samp. Dist. Of Mean, cont.
- If the population distribution is Normal, no
matter what the sample size, the sampling
distribution is - Normally distributed if the value of s is known
- Students t distributed if the value of s is
unknown
19The Sampling Distribution of the Sample Mean,
cont.
Sampling Distribution Normal
Sampling Distribution Students t
Punt!
20Samp. Dist. Of Mean, cont.
- Note what is not known
- The sampling distribution of the sample mean when
the population distribution is non-Normal and the
sample size is small - The sampling distribution of any statistic other
than the sample mean. (You will learn about some
of these as you further study statistics.)
21A Diagram of the Statistical Process
The Population The individuals have at least
one characteristic, e.g., age, which varies over
them. Hence,this characteristic has a
probability distribution
22Overview of Statistical Inference
- Closing the gap between statistics, whose
numerical values we know, and parameters, whose
numerical values we want to know
23Recall Two Important Concepts
- Statistics vary in value from sample to sample
- Statistics rarely exactly coincide with their
corresponding parameter - The lack of coincidence means more complex
techniques will be needed for inferring parameter
values
24Two Kinds of Statistical Inference
- Confidence intervals
- Hypothesis testing
- Each is based on the notion that a statistic is a
random variable - and consequently heavily depends on the sampling
distribution of the statistic
25Bala -- What is the difference between hypothesis
testing and confidence intervals?
26Confidence Intervals
- You have no idea what the value of m is
- Use as the best estimate
- Construct a confidence interval around
which is likely to contain m
27Hypothesis Testing
- You have some belief about the value of m
- Then ask the question - If that value of m is
correct, how likely would it be that the value of
which I have obtained would occur?
28Hypothesis Testing, cont.
- If it is likely that value of would occur,
given m, we conclude that our belief about m
must be correct - If it is not likely that value of would
occur, given m, we conclude that our belief
about m must not be correct
29Three Ways to do Hypothesis Testing
- Method 1 Standardize the value of the statistic
- and compare this to the z-value or t-value which
is the boundary of the rejection region
30Three Ways, cont.
- Method 2 Convert the boundary (z or t value)
of the rejection region to the scale of the
statistic - and compare this value to the value of the
statistic
31Three Ways, cont.
- Method 3 Using the value of the statistic,
calculate the p-value and compare this p-value to
the value of a - Remember The p-value must always match the
rejection region!
32Rules when using the p-value
- If p-value gt a accept H0
- If p-value lt a reject H0
33Steps in Hypothesis Testing
- Determine the null and alternative hypotheses
- Specify the value of a
- Calculate the standardized value of the statistic
34Steps in Hypothesis Testing, cont.
- Draw a picture of the sampling distribution which
includes - Identification of the shape ( z or t )
- Location of the rejection region (Upper tail,
lower tail, or both) - Position of the standardized statistic (Above or
below the mean) - Use the standardized value of the statistic to
calculate the p-value
35Steps in Hypotesis Testing, cont.
- Compare the p-value to the value of a to draw a
conclusion (Reject or accept the null hypothesis) - Important - Translate your statistical conclusion
into the practical terms of the problem
36Using Excel to Calculate p-values
- To calculate Normal probabilities, use
- NORMSDIST( z )
- To calculate t probabilities, use
- TDIST( t, d.f., tails )
37Comparing the Means of Two Populations
- Three ways to do the analysis
- When the samples are independent of each other
- Both samples large (30 or more)
- One or both samples small (less than 30)
- When the samples can be paired - this situation
reduces to a one-sample test
38Comparing Two Means with Independent Samples
- Statistic
- Sampling distribution
- Normal if both samples are 30 or more (p. 370 of
MBS) - t if one or both samples smaller than 30 (pp.
373-374 of MBS) - Important - maintain order between your
hypotheses and your statistic (pp. 13-14 of the
coursepack)
39Comparing Two Means with Independent Samples,
cont.
- In Excel, for the large sample test
- Use VAR( data range ) to calculate the sample
variance of each data set - Write down these variances!
- Then use the Data Analysis Tool
- z-Test Two-Sample for Means
- For the small sample test use
- t-Test Two-Sample Assuming Equal Variances
40Comparing Two Means when the data can be paired
- Think of pairing data like you would think of a
pair of shoes - Most aspects the same (size, color, style, etc.)
- One aspect different (left vs. right)
41Comparing Two Means using paired data, cont.
- Calculate the difference between the paired
values - Analyze these differences as you would analyze a
one-sample set of data - Again, be careful about maintaining order
between your calculations of the differences and
your hypotheses!
42Comparing Two Means using paired data, cont.
- In Excel - Data Analysis Tool
- t-Test Paired Two Sample for Means
43The Crest toothpaste test
- Pairs of identical twins living in the same
household were recruited - One twin was given Crest without stannous
fluoride, the other Crest with stannous fluoride - A paired test was conducted
- P G spent 1 M promoting the results, and
Crest grew from a 3 share brand to the market
leader at about a 35 share
44(No Transcript)
45Chi-squared Analyses of Categorical Data
- We will discuss two kinds of chi-squared analyses
- Analysis of contingency tables - testing two
variables to see whether they are related to one
another - Goodness-of-fit tests - checking some data
against a probability distribution
46Analysis of Contingency Tables
- Hypotheses
- H0 There is no relationship between the two
variables - Ha There is a relationship between the two
variables
47Analysis of Contingency Tables, cont.
- Using the probability law
- P(A and B) P(A) P(B)
- if A and B are independent
- create a table of Expected Counts
- Compare the table of actual data (Observed) with
the Expected table using the chi-squared statistic
48The Chi-squared Statistic
- is Chi-squared distributed with
- ( rows - 1) ( columns - 1) d.f.
49Analysis of Contingency Tables, cont.
- Small values of the chi-squared statistic mean
the counts in the Observed table match those in
the Expected table. This means that H0 is
accepted. - Large values of the chi-squared statistic mean
the counts in the two tables dont match. This
means H0 is rejected.
50Analysis of Contingency Tables using Excel
- There is no Data Analysis Tool to do this
- Hence, we create a spreadsheet which mimics the
calculations we do by hand
51Analysis using Excel, cont.
- Start with the table of observed data, then
- Calculate a table of expected values,
- Calculate a table of chi-squared values,
- Calculate the overall chi-squared value,
52Analysis using Excel, cont.
- Calculate the p-value of the test, using either
- CHIDIST(c2 value, d.f.)
- or
- CHITEST(obs. range, exp. range)
53Goodness-of-fit Tests
- Similar to contingency table analysis, except the
hypotheses postulate a probability distribution - H0 The data come from a specified probability
distribution - Ha The data do not come from a specified
probability distribution - For this analysis
- d.f. categories - 1
54Goodness-of-Fit Example
55Quality Control at M Ms