Introductory Statistical Concepts - PowerPoint PPT Presentation

1 / 118
About This Presentation
Title:

Introductory Statistical Concepts

Description:

What is an Index ? How SUNNY is SUNNY? THE UV Index. Click Here. 13 ... This demonstration references David Lane's applet at Rice University. Regression Applet ... – PowerPoint PPT presentation

Number of Views:162
Avg rating:3.0/5.0
Slides: 119
Provided by: SusanH46
Category:

less

Transcript and Presenter's Notes

Title: Introductory Statistical Concepts


1
Introductory Statistical Concepts
  • F. Michael Speed, Ph.D.
  • Department of Statistics
  • Texas AM University

2
Disclaimer
  • I am not an expert SAS programmer.
  • Nothing that I say is confirmed or denied by
    Texas AM University.

3
Why Are We Here?
  • Deming
  • To Learn
  • To Have Fun
  • Question Who was Deming?

4
Poll What type of organization do you work for?
  • PlaceWare Multiple Choice Poll. Use PlaceWare gt
    Edit Slide Properties... to edit.
  • Business
  • Government
  • Education
  • Nonprofit
  • Other

5
Purpose of These Lectures
  • A review of the statistical concepts used in most
    of the SAS Analytics Lecture Series.
  • We will look at questions such as the following
  • What is the nature of statistical analyses?
  • Why are population parameters so important?
  • What is really being tested when you see a
    p-value?
  • Why does regression handle missing data so well?
  • What are residual analyses?

6
Descriptive Statistics
7
The Population
8
Learning Outcomes
  • You will learn
  • basic statistical concepts
  • the definition of mean, median, mode and standard
    deviation
  • the difference between populations and samples
  • the difference between parameters and estimates
  • about confidence intervals
  • how to test a statistical hypothesis
  • how to run a regression analysis

9
Parameters
  • Characteristics of the variable of interest
  • It is how we describe the variable of interest
  • Parameters are unknown

10
Parameters(Characteristics)
  • Central Tendency
  • Mode
  • Median
  • Mean
  • Measures of Variability
  • Range
  • Variance
  • Standard Deviation

Click Here for more information on Mode Mean
Median Click Here for an applet
11
Variability
  • Change in the Data

12
What is an Index ?
How SUNNY is SUNNY? THE UV Index Click Here
13
Air Quality IndexWhat Does It Mean?
14
DOW JONES INDUSTRIAL AVERAGE INDEX 
What does 10,971.16 really mean? What is
better a DJIA of 10,000 Or a DJIA of 12,000?
15
Variability Index
  • A Simple One
  • Find the Largest Value
  • Find the Smallest Value
  • Let Range R Largest Smallest

16
A More Complex Variation Index
  • The Standard Deviation
  • Statisticians use this index to indicate
    variability
  • You will see it written as
  • Widely available from SAS, Excel, and other
    statistical packages

17
Details of the More Complex Index
  • Example Suppose that we observe the following
    three numbers
  • 1 4 7
  • The mean of these number is
  • ( 1 47)/3 4
  • We now subtract the mean from each number and
    square it
  • (1-4)(1-4) (4-4)(4-4) (7-4)(7-4) 18
  • The Standard Deviation sqrt(18/2) 3

18
What does this Mean?
  • By itself , it may be confusing to some.
  • Comparing populations, we can use it to say which
    population varies the most.
  • Let us look at an applet Click Here

19
Using Graphs to Determine Variability
  • Box Plot
  • Click Here

20
Describe What Is Happening
You are giving the parameters of the picture
21
Example Using SAS
22
Distributions
23
Known Distribution
  • With a known distribution, we know the following
  • the shape
  • the mean
  • the variability (standard deviation)
  • and/or some other information

24
Classical Distributions-Normal
25
Normal-Overlay
26
Classical Distributions-Uniform
27
Uniform-Overlay
28
Classical Distributions-Chi-Square
29
Survey
  • The following are called parameters of the
    population
  • mean, median, mode
  • variance, standard deviation, range,
    inter-quartile range (IQR)
  • In general, are these known or unknown?
  • Known yes (select using your seat indicator)
  • Unknown no (select using your seat indicator)

30
Generate a Sample from a Known Distribution
  • Why?
  • This is a simulation.
  • It helps us to understand a process or analyses.
  • It helps to see if we are getting expected
    results.
  • It is fun.

31
MPG Example
  • Suppose we want to simulate mpg for a car that
    weighs 3000 lbs.
  • Let us assume that the mean mpg24.
  • Let us assume that the standard deviation1 mpg.
  • We will generate a number from the normal
    distribution with mean 0 and standard
    deviation1.
  • We will then add (subtract) that number from 24.

32
MPGComposition
Observed
Essential Part 24
Leftovers N(0,1)
/-

Let us generate 1000 mpg.
33
SAS Code 3
data mpg1 do i1 to 1000 mean 24
lo normal(-1) mpg mean lo
output end run
34
Simulated MPG
35
MPG-Histogram
Compare with true values !
36
Simulated Sample
  • In this example, we simulated taking a sample of
    size 1000 from one population of cars weighing
    3000 pounds with a normal distribution with
    mean24 and standard deviation1.
  • You can practice this after class.

37
After Class Practice
  • Simulate 1000 data points for each of the
    following five populations. Run and explore your
    data.

38
SAS Code 4 Generate a Normal with Mean 0 and
Standard Deviation of s1.5
data mpg1 s1.5 mean 24 do i1 to
1000 lo snormal(-1) mpg mean lo
output end run
39
Simulating Data SAS_Code4.sas
  • This demonstration illustrates how to simulate
    data for a given population.

40
View/Application Share Demo Simulation
  • PlaceWare View/Application Share. Use PlaceWare
    gt Edit Slide Properties... to edit.

41
Summary
42
(No Transcript)
43
Section 1.2
  • Populations and Samples

44
Objectives
  • Understand the relationships between
  • populations and samples
  • parameters and estimates.
  • Look at an overview of hypotheses testing.

45
Population
Mean, Variance, Median, Mode, Distribution,
Parameters
46
Example
  • Mpg of American-made cars that weigh between 2000
    and 3500 pounds and were built in the 1970s.
  • Parameters mean, variance, and so on
  • In general, we do not know the parameters.

47
Purpose of Statistical Analyses
  • Estimate the parameters. (Make guesses.)
  • Example What is the population mean?
  • Test hypothesis about the parameters. (Ask
    questions.)
  • Example Is the population mean30mpg?

48
Role of Samples
  • Taking a sample of the population enables you to
  • make estimates of the population parameters
  • answer the questions about the population
    parameters.

49
Population and Sample
Mean, Variance, Median, Mode, Distribution,
Parameters
Sample
Sample mean Sample variance
S
Inference Estimates Test of hypotheses
50
Example cars_american
  • This is a sample of American-made cars that weigh
    between 2000 and 3500 pounds and that were built
    in the 1970s.
  • We are interested in the mpg.
  • Use summary statistics to analyze the data.

51
Results of Summary Statistics
52
Results of Histogram
continued...
53
Results of Histogram
54
Sampling Distribution Applet sampling_dist
  • This demonstration illustrates how to estimate
    and plot the sampling distribution of various
    statistics.

55
View/Application Share Demo Sampling
Distributions Applet
  • PlaceWare View/Application Share. Use PlaceWare
    gt Edit Slide Properties... to edit.

56
http//www.ruf.rice.edu/lane/stat_sim/sampling_di
st/index.h...
  • PlaceWare Web Page. Use PlaceWare gt Edit Slide
    Properties... to edit.

57
Confidence Intervals on the Population Mean
  • Level of Comfort
  • 50 21.57 to 22.21
  • 95 20.96 to 22.82
  • 99.9 20.30 to 23.48

What does this mean?
58
Test That the Population Mean 30 mpg
  • Use t-test ? One Sample t-test
  • Requirements for running this test
  • Large n gt 35
  • Or leftovers are normal
  • What is the p-value or sig value?

59
Testing Mean 30
60
Conclusions of the Test
  • Choose an alpha level, usually alpha.05.
  • If sigltalpha, then reject.
  • Otherwise, fail to reject.

61
Sig and p-values
  • When you see a sig value or p-value
  • You know that some hypothesis is being tested.
  • You know whether or not the hypothesis is being
    rejected.
  • You probably do not know what the hypothesis
    really is.
  • Ask yourself these questions
  • What are the population parameters being tested?
  • How is what is being tested related to those
    parameters?

62
Requirements for Doing This Test
  • Large n ? n gt 35
  • Or leftovers are normally distributed.
  • Use Histogram to test for normality.

63
Testing Hypotheses
  • This demonstration illustrates the testing of
    hypotheses using the data set cars_american.

64
View/Application Share Demo Testing Hypotheses
  • PlaceWare View/Application Share. Use PlaceWare
    gt Edit Slide Properties... to edit.

65
(No Transcript)
66
Populations-Which Ones are Similar?
67
Populations-Which Ones are Similar?
  • Take samples.

68
Take Samples
  • Use the samples to answer this question
  • Which populations are similar?
  • Statistical translations
  • Which populations are similar? is the same as
    asking
  • Are the following the same
  • distribution?
  • mean?
  • variance?

69
Background/Requirements
  • Before we jump into the analysis, we must ask the
    following questions
  • How many populations are there?
  • How many population parameters are we interested
    in and what are they?
  • What tests do we want to do, and what are the
    requirements for doing those?
  • Are we using everything we know?

70
Example
  • Suppose that we are interested in the mpg of
    American and European cars. How many populations
    are there?

American Cars Mpg Distribution Mean Variance
European Cars Mpg Distribution Mean Variance
71
Poll How many populations are there?
  • PlaceWare Multiple Choice Poll. Use PlaceWare gt
    Edit Slide Properties... to edit.
  • One - MPG
  • Two - American and European
  • Depends on the sample size

72
Parameters
73
Analyses
  • We want to look at the distributions.
  • We want to estimate the parameters.
  • We want to answer these questions
  • Are the populations means the same?
  • Are the population variances the same?

74
Example Our Data Set car_am_eu
  • Suppose that we are interested in the mpg of
    American and European cars.

American Cars Mpg Distribution Mean Variance
European Cars Mpg Distribution Mean Variance
Sample
Sample
75
Results from the Sample
continued...
76
Results
77
Box Plots
American
European
78
Histograms
American
European
79
Poll Are the populations the same?
  • PlaceWare Yes/No Poll. Use PlaceWare gt Edit
    Slide Properties... to edit.
  • Yes
  • No

80
Conclusion Based on Sample Numbers and Graphs
  • Easy -- Based on the samples, the populations are
    differentno statistical jargon
  • But I must have a p-value for my boss, for my
    paper, and so on.

81
Formal Tests
  • The classical approach in determining whether two
    populations are the same is to test to see
    whether the two population means are equal.
  • But first we check to see whether the two
    population variances are equal

continued...
82
Formal Tests
  • We use t-test ? Two Sample.

Test 2
Test 1
83
Comparing Two Populations
  • This demonstration shows how to compare two
    populations using the data set car_am_eu.

84
View/Application Share Demo Comparing Two
populations
  • PlaceWare View/Application Share. Use PlaceWare
    gt Edit Slide Properties... to edit.

85
Example
  • Run summary statistics.
  • Ask for histogram and box plot.
  • What do you get?

data temp1 x 1 output run
86
Summary
  • The building blocks of analyzing data using
    statistical techniques are
  • population parameters
  • population parameters
  • population parameters.
  • Population parameters are very important.

87
(No Transcript)
88
Section 1.3
  • Simple Linear Regression

89
Objectives
  • Identify the following
  • the population parameters
  • the appropriate model
  • number of populations sampled
  • the correct hypotheses
  • what should be tested for normality
  • what equal variances means.

90
MPG Example
Weight 3000
Weight 2600
Take a sample of size 1 from each population!
Weight 2300
Weight 2900
91
Data
  • We should be in deep trouble with one sample from
    each population.
  • We have eight unknown population parameters.
  • Can you name them?
  • But what do we know?

92
Survey
  • Name the population parameters.

93
Essential Part and Leftovers
  • We want to model the data as follows
  • MPG Essential Part Leftover
  • or
  • MPG Mean Leftover

94
Know or Assumptions
  • First, we know that
  • Second, each population mean is related to weight
    by the following
  • The population means fall on a straight line!!
  • How many unknowns are there now?

95
Poll How many unknowns are there?
  • PlaceWare Multiple Choice Poll. Use PlaceWare gt
    Edit Slide Properties... to edit.
  • 1
  • 2
  • 3
  • 4
  • 5
  • n

96
Graph
97
Observed, Essential Part, Leftover
98
The Official Regression Model
  • or
  • or
  • or

99
Main Assumptions
  • The means of the populations fall on a straight
    line.
  • All of the variances are equal ( ).
  • The errors are known to be normal with mean 0
    and variance .

100
Assumptions for Simple Linear Regression
Appendix A
  • This demonstration illustrates the fundamental
    concepts of simple linear regression.

101
View/Application Share Demo Linear.doc
  • PlaceWare View/Application Share. Use PlaceWare
    gt Edit Slide Properties... to edit.

102
How Can We Estimate the Unknown Parameters?
  • The Principle of Least Squares
  • or
  • or
  • Now, choose a and b so that
    is as small as possible.
  • or
  • Minimize .

103
David Lanes Applet
  • http//www.ruf.rice.edu/lane/stat_sim/reg_by_eye/

104
Regression Applet Reg_by_eye
  • This demonstration references David Lanes applet
    at Rice University.

105
View/Application Share Demo David Lane's Applet
  • PlaceWare View/Application Share. Use PlaceWare
    gt Edit Slide Properties... to edit.

106
http//www.ruf.rice.edu/lane/stat_sim/reg_by_eye
  • PlaceWare Web Page. Use PlaceWare gt Edit Slide
    Properties... to edit.

107
Output of SAS Regression cars_american
  • This demonstration illustrates the use of
    regression in SAS Enterprise Guide.

108
View/Application Share Demo Output of SAS
Regression
  • PlaceWare View/Application Share. Use PlaceWare
    gt Edit Slide Properties... to edit.

109
OUTPUT_0
110
OUTPUT
111
OUTPUT_1
112
OUTPUT_2
113
OUTPUT_3
114
OUTPUT_4
115
Missing Values
  • Suppose that we want to estimate the mean mpg
    when weight2500.
  • Predicted (Estimated) Mean MPG 44.05 -
    .0078weight
  • Why does this work?



116
Survey
  • Can anyone explain why this works?

117
Conclusion
  • Simple linear regression is very powerful.
  • But it is based on assumptions (what we know).
  • We need to check assumptions (residual analyses).

118
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com