Computers, One and TwoSample Analyses, and Chisquared Tests - PowerPoint PPT Presentation

1 / 55
About This Presentation
Title:

Computers, One and TwoSample Analyses, and Chisquared Tests

Description:

There are two 'additional parts' of Excel that we will be using extensively ... Excel's Data Analysis Tools - again, two of several ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 56
Provided by: Morgan75
Category:

less

Transcript and Presenter's Notes

Title: Computers, One and TwoSample Analyses, and Chisquared Tests


1
Computers, One- and Two-Sample Analyses, and
Chi-squared Tests
2
Using personal computers in this course
  • Your use will be important and extensive
  • We will be using two main things
  • Excel - to do statistical analyses
  • World Wide Web - to obtain course data and
    information

3
Using Excel
  • Excel is a spreadsheet program - the interface
    is divided into a large number of rectangular
    cells
  • Each cell can contain one of four things
  • A number
  • A label
  • A formula
  • A built-in Excel function

4
Using Excel, cont.
  • There are two additional parts of Excel that we
    will be using extensively
  • The Data Analysis Tools (does the statistical
    number crunching for us)
  • The Solver optimization routine (solves linear
    programming problems for us)
  • If you own your own computer, you should make
    sure you have these

5
Checking for impt. add-ins
  • From the Tools menu, select Add-Ins . . .
    then make sure the following are on the list
  • Analysis ToolPak
  • Analysis ToolPak - VBA
  • Solver Add-In
  • and that they are checked
  • Get my handout if these three are not on your list

6
Using the course Web site
  • The URL is on the course syllabus
  • Five parts
  • Homework assignments (which problems will be due
    on which Thursdays)
  • Homework answers
  • Exam answers
  • Recommended practice problems
  • Access to the course data sets

7
Accessing the course data sets
  • Right click on the data set link (or Shift Left
    click)
  • Select the Save As . . . option
  • Expand the compressed file once you have
    downloaded it

8
Using Excel, cont.
  • Saving a file - use Save As (not Save) under
    the File menu
  • Printing and page setup
  • Opening text files (like the MBS files) -
    learning to use the Text Import Wizard (Dont
    forget to remove the End-of-file character!)

9
Using Excel, cont.
  • Excels built-in statistical functions - for the
    moment, two of many
  • AVERAGE( Some range )
  • STDEV( Some range )
  • Excels Data Analysis Tools - again, two of
    several
  • Descriptive Statistics (calculates lots of
    different numerical summaries of the data)
  • Histogram

10
A Diagram of the Statistical Process
The Population The individuals have at least
one characteristic, e.g., age, which varies over
them. Hence,this characteristic has a
probability distribution
11
Lets Look at Your Results
  • What are the values of your three statistics?
  • What sampling method did you use to obtain your
    sample?
  • Did you sample with or without replacement?

12
I Now Have Three Sets of Values
  • Why are they different?
  • Which ones are correct? Why?
  • In this process, what is the
  • Population?
  • Sample outcome?
  • Statistic?
  • Parameter?

13
Summary
  • Statistics are random variables -- as such they
  • have a probability distribution. Like all
    probability distributions, this distribution has
  • a mean
  • a standard deviation
  • a characteristic shape

14
Summary, cont.
  • The probability distribution of a statistic is
    called the sampling distribution
  • Be careful to distinguish the sampling
    distribution from the population distribution
    (See, for example, Fig. 6.10 on p. 255 of MBS)

15
Summary, cont.
  • Statisticians know the most about the sampling
    distribution of the sample mean
  • The mean of this sampling distribution is the
    same as the mean of the population distribution
  • The standard deviation of this sampling
    distribution is the standard deviation of the
    population distribution divided by the square
    root of the sample size.

16
Samp. Dist. Of Mean, cont.
  • If the sample size is large enough, no matter
    what the shape of the population distribution,
    the sampling distribution is approximately
    Normally distributed. (This is the Central Limit
    Theorem.)

17
(No Transcript)
18
Samp. Dist. Of Mean, cont.
  • If the population distribution is Normal, no
    matter what the sample size, the sampling
    distribution is
  • Normally distributed if the value of s is known
  • Students t distributed if the value of s is
    unknown

19
The Sampling Distribution of the Sample Mean,
cont.
Sampling Distribution Normal
Sampling Distribution Students t
Punt!
20
Samp. Dist. Of Mean, cont.
  • Note what is not known
  • The sampling distribution of the sample mean when
    the population distribution is non-Normal and the
    sample size is small
  • The sampling distribution of any statistic other
    than the sample mean. (You will learn about some
    of these as you further study statistics.)

21
A Diagram of the Statistical Process
The Population The individuals have at least
one characteristic, e.g., age, which varies over
them. Hence,this characteristic has a
probability distribution
22
Overview of Statistical Inference
  • Closing the gap between statistics, whose
    numerical values we know, and parameters, whose
    numerical values we want to know

23
Recall Two Important Concepts
  • Statistics vary in value from sample to sample
  • Statistics rarely exactly coincide with their
    corresponding parameter
  • The lack of coincidence means more complex
    techniques will be needed for inferring parameter
    values

24
Two Kinds of Statistical Inference
  • Confidence intervals
  • Hypothesis testing
  • Each is based on the notion that a statistic is a
    random variable
  • and consequently heavily depends on the sampling
    distribution of the statistic

25
Bala -- What is the difference between hypothesis
testing and confidence intervals?
26
Confidence Intervals
  • You have no idea what the value of m is
  • Use as the best estimate
  • Construct a confidence interval around
    which is likely to contain m

27
Hypothesis Testing
  • You have some belief about the value of m
  • Then ask the question - If that value of m is
    correct, how likely would it be that the value of
    which I have obtained would occur?

28
Hypothesis Testing, cont.
  • If it is likely that value of would occur,
    given m, we conclude that our belief about m
    must be correct
  • If it is not likely that value of would
    occur, given m, we conclude that our belief
    about m must not be correct

29
Three Ways to do Hypothesis Testing
  • Method 1 Standardize the value of the statistic
  • and compare this to the z-value or t-value which
    is the boundary of the rejection region

30
Three Ways, cont.
  • Method 2 Convert the boundary (z or t value)
    of the rejection region to the scale of the
    statistic
  • and compare this value to the value of the
    statistic

31
Three Ways, cont.
  • Method 3 Using the value of the statistic,
    calculate the p-value and compare this p-value to
    the value of a
  • Remember The p-value must always match the
    rejection region!

32
Rules when using the p-value
  • If p-value gt a accept H0
  • If p-value lt a reject H0

33
Steps in Hypothesis Testing
  • Determine the null and alternative hypotheses
  • Specify the value of a
  • Calculate the standardized value of the statistic

34
Steps in Hypothesis Testing, cont.
  • Draw a picture of the sampling distribution which
    includes
  • Identification of the shape ( z or t )
  • Location of the rejection region (Upper tail,
    lower tail, or both)
  • Position of the standardized statistic (Above or
    below the mean)
  • Use the standardized value of the statistic to
    calculate the p-value

35
Steps in Hypotesis Testing, cont.
  • Compare the p-value to the value of a to draw a
    conclusion (Reject or accept the null hypothesis)
  • Important - Translate your statistical conclusion
    into the practical terms of the problem

36
Using Excel to Calculate p-values
  • To calculate Normal probabilities, use
  • NORMSDIST( z )
  • To calculate t probabilities, use
  • TDIST( t, d.f., tails )

37
Comparing the Means of Two Populations
  • Three ways to do the analysis
  • When the samples are independent of each other
  • Both samples large (30 or more)
  • One or both samples small (less than 30)
  • When the samples can be paired - this situation
    reduces to a one-sample test

38
Comparing Two Means with Independent Samples
  • Statistic
  • Sampling distribution
  • Normal if both samples are 30 or more (p. 370 of
    MBS)
  • t if one or both samples smaller than 30 (pp.
    373-374 of MBS)
  • Important - maintain order between your
    hypotheses and your statistic (pp. 13-14 of the
    coursepack)

39
Comparing Two Means with Independent Samples,
cont.
  • In Excel, for the large sample test
  • Use VAR( data range ) to calculate the sample
    variance of each data set
  • Write down these variances!
  • Then use the Data Analysis Tool
  • z-Test Two-Sample for Means
  • For the small sample test use
  • t-Test Two-Sample Assuming Equal Variances

40
Comparing Two Means when the data can be paired
  • Think of pairing data like you would think of a
    pair of shoes
  • Most aspects the same (size, color, style, etc.)
  • One aspect different (left vs. right)

41
Comparing Two Means using paired data, cont.
  • Calculate the difference between the paired
    values
  • Analyze these differences as you would analyze a
    one-sample set of data
  • Again, be careful about maintaining order
    between your calculations of the differences and
    your hypotheses!

42
Comparing Two Means using paired data, cont.
  • In Excel - Data Analysis Tool
  • t-Test Paired Two Sample for Means

43
The Crest toothpaste test
  • Pairs of identical twins living in the same
    household were recruited
  • One twin was given Crest without stannous
    fluoride, the other Crest with stannous fluoride
  • A paired test was conducted
  • P G spent 1 M promoting the results, and
    Crest grew from a 3 share brand to the market
    leader at about a 35 share

44
(No Transcript)
45
Chi-squared Analyses of Categorical Data
  • We will discuss two kinds of chi-squared analyses
  • Analysis of contingency tables - testing two
    variables to see whether they are related to one
    another
  • Goodness-of-fit tests - checking some data
    against a probability distribution

46
Analysis of Contingency Tables
  • Hypotheses
  • H0 There is no relationship between the two
    variables
  • Ha There is a relationship between the two
    variables

47
Analysis of Contingency Tables, cont.
  • Using the probability law
  • P(A and B) P(A) P(B)
  • if A and B are independent
  • create a table of Expected Counts
  • Compare the table of actual data (Observed) with
    the Expected table using the chi-squared statistic

48
The Chi-squared Statistic
  • is Chi-squared distributed with
  • ( rows - 1) ( columns - 1) d.f.

49
Analysis of Contingency Tables, cont.
  • Small values of the chi-squared statistic mean
    the counts in the Observed table match those in
    the Expected table. This means that H0 is
    accepted.
  • Large values of the chi-squared statistic mean
    the counts in the two tables dont match. This
    means H0 is rejected.

50
Analysis of Contingency Tables using Excel
  • There is no Data Analysis Tool to do this
  • Hence, we create a spreadsheet which mimics the
    calculations we do by hand

51
Analysis using Excel, cont.
  • Start with the table of observed data, then
  • Calculate a table of expected values,
  • Calculate a table of chi-squared values,
  • Calculate the overall chi-squared value,

52
Analysis using Excel, cont.
  • Calculate the p-value of the test, using either
  • CHIDIST(c2 value, d.f.)
  • or
  • CHITEST(obs. range, exp. range)

53
Goodness-of-fit Tests
  • Similar to contingency table analysis, except the
    hypotheses postulate a probability distribution
  • H0 The data come from a specified probability
    distribution
  • Ha The data do not come from a specified
    probability distribution
  • For this analysis
  • d.f. categories - 1

54
Goodness-of-Fit Example
55
Quality Control at M Ms
Write a Comment
User Comments (0)
About PowerShow.com