Title: Introductory Statistical Concepts
1Introductory Statistical Concepts
- F. Michael Speed, Ph.D.
- Department of Statistics
- Texas AM University
2Disclaimer
- I am not an expert SAS programmer.
- Nothing that I say is confirmed or denied by
Texas AM University.
3Why Are We Here?
- Deming
- To Learn
- To Have Fun
- Question Who was Deming?
4Poll What type of organization do you work for?
- PlaceWare Multiple Choice Poll. Use PlaceWare gt
Edit Slide Properties... to edit. - Business
- Government
- Education
- Nonprofit
- Other
5Purpose of These Lectures
- A review of the statistical concepts used in most
of the SAS Analytics Lecture Series. - We will look at questions such as the following
- What is the nature of statistical analyses?
- Why are population parameters so important?
- What is really being tested when you see a
p-value? - Why does regression handle missing data so well?
- What are residual analyses?
6Descriptive Statistics
7 The Population
8Learning Outcomes
- You will learn
- basic statistical concepts
- the definition of mean, median, mode and standard
deviation - the difference between populations and samples
- the difference between parameters and estimates
- about confidence intervals
- how to test a statistical hypothesis
- how to run a regression analysis
9Parameters
- Characteristics of the variable of interest
- It is how we describe the variable of interest
- Parameters are unknown
10Parameters(Characteristics)
- Central Tendency
- Mode
- Median
- Mean
- Measures of Variability
- Range
- Variance
- Standard Deviation
Click Here for more information on Mode Mean
Median Click Here for an applet
11Variability
12What is an Index ?
How SUNNY is SUNNY? THE UV Index Click Here
13Air Quality IndexWhat Does It Mean?
14DOW JONES INDUSTRIAL AVERAGE INDEX
What does 10,971.16 really mean? What is
better a DJIA of 10,000 Or a DJIA of 12,000?
15Variability Index
- A Simple One
- Find the Largest Value
- Find the Smallest Value
- Let Range R Largest Smallest
16A More Complex Variation Index
- The Standard Deviation
- Statisticians use this index to indicate
variability - You will see it written as
- Widely available from SAS, Excel, and other
statistical packages
17Details of the More Complex Index
- Example Suppose that we observe the following
three numbers - 1 4 7
- The mean of these number is
- ( 1 47)/3 4
- We now subtract the mean from each number and
square it - (1-4)(1-4) (4-4)(4-4) (7-4)(7-4) 18
- The Standard Deviation sqrt(18/2) 3
18What does this Mean?
- By itself , it may be confusing to some.
- Comparing populations, we can use it to say which
population varies the most. - Let us look at an applet Click Here
19Using Graphs to Determine Variability
20Describe What Is Happening
You are giving the parameters of the picture
21Example Using SAS
22Distributions
23Known Distribution
- With a known distribution, we know the following
- the shape
- the mean
- the variability (standard deviation)
- and/or some other information
24Classical Distributions-Normal
25Normal-Overlay
26Classical Distributions-Uniform
27Uniform-Overlay
28Classical Distributions-Chi-Square
29Survey
- The following are called parameters of the
population - mean, median, mode
- variance, standard deviation, range,
inter-quartile range (IQR) - In general, are these known or unknown?
- Known yes (select using your seat indicator)
- Unknown no (select using your seat indicator)
30Generate a Sample from a Known Distribution
- Why?
- This is a simulation.
- It helps us to understand a process or analyses.
- It helps to see if we are getting expected
results. - It is fun.
31MPG Example
- Suppose we want to simulate mpg for a car that
weighs 3000 lbs. - Let us assume that the mean mpg24.
- Let us assume that the standard deviation1 mpg.
- We will generate a number from the normal
distribution with mean 0 and standard
deviation1. - We will then add (subtract) that number from 24.
32MPGComposition
Observed
Essential Part 24
Leftovers N(0,1)
/-
Let us generate 1000 mpg.
33SAS Code 3
data mpg1 do i1 to 1000 mean 24
lo normal(-1) mpg mean lo
output end run
34Simulated MPG
35MPG-Histogram
Compare with true values !
36Simulated Sample
- In this example, we simulated taking a sample of
size 1000 from one population of cars weighing
3000 pounds with a normal distribution with
mean24 and standard deviation1. - You can practice this after class.
37After Class Practice
- Simulate 1000 data points for each of the
following five populations. Run and explore your
data.
38SAS Code 4 Generate a Normal with Mean 0 and
Standard Deviation of s1.5
data mpg1 s1.5 mean 24 do i1 to
1000 lo snormal(-1) mpg mean lo
output end run
39Simulating Data SAS_Code4.sas
- This demonstration illustrates how to simulate
data for a given population.
40View/Application Share Demo Simulation
- PlaceWare View/Application Share. Use PlaceWare
gt Edit Slide Properties... to edit.
41Summary
42(No Transcript)
43Section 1.2
44Objectives
- Understand the relationships between
- populations and samples
- parameters and estimates.
- Look at an overview of hypotheses testing.
45Population
Mean, Variance, Median, Mode, Distribution,
Parameters
46Example
- Mpg of American-made cars that weigh between 2000
and 3500 pounds and were built in the 1970s. - Parameters mean, variance, and so on
- In general, we do not know the parameters.
47Purpose of Statistical Analyses
- Estimate the parameters. (Make guesses.)
- Example What is the population mean?
- Test hypothesis about the parameters. (Ask
questions.) - Example Is the population mean30mpg?
48Role of Samples
- Taking a sample of the population enables you to
- make estimates of the population parameters
- answer the questions about the population
parameters.
49Population and Sample
Mean, Variance, Median, Mode, Distribution,
Parameters
Sample
Sample mean Sample variance
S
Inference Estimates Test of hypotheses
50Example cars_american
- This is a sample of American-made cars that weigh
between 2000 and 3500 pounds and that were built
in the 1970s. - We are interested in the mpg.
- Use summary statistics to analyze the data.
51Results of Summary Statistics
52Results of Histogram
continued...
53 Results of Histogram
54Sampling Distribution Applet sampling_dist
- This demonstration illustrates how to estimate
and plot the sampling distribution of various
statistics.
55View/Application Share Demo Sampling
Distributions Applet
- PlaceWare View/Application Share. Use PlaceWare
gt Edit Slide Properties... to edit.
56http//www.ruf.rice.edu/lane/stat_sim/sampling_di
st/index.h...
- PlaceWare Web Page. Use PlaceWare gt Edit Slide
Properties... to edit.
57Confidence Intervals on the Population Mean
- Level of Comfort
- 50 21.57 to 22.21
- 95 20.96 to 22.82
- 99.9 20.30 to 23.48
What does this mean?
58Test That the Population Mean 30 mpg
- Use t-test ? One Sample t-test
- Requirements for running this test
- Large n gt 35
- Or leftovers are normal
- What is the p-value or sig value?
59Testing Mean 30
60Conclusions of the Test
- Choose an alpha level, usually alpha.05.
- If sigltalpha, then reject.
- Otherwise, fail to reject.
61Sig and p-values
- When you see a sig value or p-value
- You know that some hypothesis is being tested.
- You know whether or not the hypothesis is being
rejected. - You probably do not know what the hypothesis
really is. - Ask yourself these questions
- What are the population parameters being tested?
- How is what is being tested related to those
parameters?
62Requirements for Doing This Test
- Large n ? n gt 35
- Or leftovers are normally distributed.
- Use Histogram to test for normality.
63Testing Hypotheses
- This demonstration illustrates the testing of
hypotheses using the data set cars_american.
64View/Application Share Demo Testing Hypotheses
- PlaceWare View/Application Share. Use PlaceWare
gt Edit Slide Properties... to edit.
65(No Transcript)
66Populations-Which Ones are Similar?
67Populations-Which Ones are Similar?
68Take Samples
- Use the samples to answer this question
- Which populations are similar?
- Statistical translations
- Which populations are similar? is the same as
asking - Are the following the same
- distribution?
- mean?
- variance?
69Background/Requirements
- Before we jump into the analysis, we must ask the
following questions - How many populations are there?
- How many population parameters are we interested
in and what are they? - What tests do we want to do, and what are the
requirements for doing those? - Are we using everything we know?
70Example
- Suppose that we are interested in the mpg of
American and European cars. How many populations
are there?
American Cars Mpg Distribution Mean Variance
European Cars Mpg Distribution Mean Variance
71Poll How many populations are there?
- PlaceWare Multiple Choice Poll. Use PlaceWare gt
Edit Slide Properties... to edit. - One - MPG
- Two - American and European
- Depends on the sample size
72Parameters
73Analyses
- We want to look at the distributions.
- We want to estimate the parameters.
- We want to answer these questions
- Are the populations means the same?
- Are the population variances the same?
74Example Our Data Set car_am_eu
- Suppose that we are interested in the mpg of
American and European cars.
American Cars Mpg Distribution Mean Variance
European Cars Mpg Distribution Mean Variance
Sample
Sample
75Results from the Sample
continued...
76Results
77Box Plots
American
European
78Histograms
American
European
79Poll Are the populations the same?
- PlaceWare Yes/No Poll. Use PlaceWare gt Edit
Slide Properties... to edit. - Yes
- No
80Conclusion Based on Sample Numbers and Graphs
- Easy -- Based on the samples, the populations are
differentno statistical jargon - But I must have a p-value for my boss, for my
paper, and so on.
81Formal Tests
- The classical approach in determining whether two
populations are the same is to test to see
whether the two population means are equal. - But first we check to see whether the two
population variances are equal -
continued...
82Formal Tests
- We use t-test ? Two Sample.
Test 2
Test 1
83Comparing Two Populations
- This demonstration shows how to compare two
populations using the data set car_am_eu.
84View/Application Share Demo Comparing Two
populations
- PlaceWare View/Application Share. Use PlaceWare
gt Edit Slide Properties... to edit.
85Example
- Run summary statistics.
- Ask for histogram and box plot.
- What do you get?
data temp1 x 1 output run
86Summary
- The building blocks of analyzing data using
statistical techniques are - population parameters
- population parameters
- population parameters.
- Population parameters are very important.
87(No Transcript)
88Section 1.3
89Objectives
- Identify the following
- the population parameters
- the appropriate model
- number of populations sampled
- the correct hypotheses
- what should be tested for normality
- what equal variances means.
90MPG Example
Weight 3000
Weight 2600
Take a sample of size 1 from each population!
Weight 2300
Weight 2900
91Data
- We should be in deep trouble with one sample from
each population. - We have eight unknown population parameters.
- Can you name them?
- But what do we know?
92Survey
- Name the population parameters.
93Essential Part and Leftovers
- We want to model the data as follows
- MPG Essential Part Leftover
- or
- MPG Mean Leftover
94Know or Assumptions
- First, we know that
- Second, each population mean is related to weight
by the following - The population means fall on a straight line!!
- How many unknowns are there now?
95Poll How many unknowns are there?
- PlaceWare Multiple Choice Poll. Use PlaceWare gt
Edit Slide Properties... to edit. - 1
- 2
- 3
- 4
- 5
- n
96Graph
97Observed, Essential Part, Leftover
98The Official Regression Model
99Main Assumptions
- The means of the populations fall on a straight
line. - All of the variances are equal ( ).
- The errors are known to be normal with mean 0
and variance .
100Assumptions for Simple Linear Regression
Appendix A
- This demonstration illustrates the fundamental
concepts of simple linear regression.
101View/Application Share Demo Linear.doc
- PlaceWare View/Application Share. Use PlaceWare
gt Edit Slide Properties... to edit.
102How Can We Estimate the Unknown Parameters?
- The Principle of Least Squares
- or
- or
- Now, choose a and b so that
is as small as possible. - or
- Minimize .
103David Lanes Applet
- http//www.ruf.rice.edu/lane/stat_sim/reg_by_eye/
104Regression Applet Reg_by_eye
- This demonstration references David Lanes applet
at Rice University.
105View/Application Share Demo David Lane's Applet
- PlaceWare View/Application Share. Use PlaceWare
gt Edit Slide Properties... to edit.
106http//www.ruf.rice.edu/lane/stat_sim/reg_by_eye
- PlaceWare Web Page. Use PlaceWare gt Edit Slide
Properties... to edit.
107Output of SAS Regression cars_american
- This demonstration illustrates the use of
regression in SAS Enterprise Guide.
108View/Application Share Demo Output of SAS
Regression
- PlaceWare View/Application Share. Use PlaceWare
gt Edit Slide Properties... to edit.
109OUTPUT_0
110OUTPUT
111OUTPUT_1
112OUTPUT_2
113OUTPUT_3
114OUTPUT_4
115Missing Values
- Suppose that we want to estimate the mean mpg
when weight2500. - Predicted (Estimated) Mean MPG 44.05 -
.0078weight - Why does this work?
116Survey
- Can anyone explain why this works?
117Conclusion
- Simple linear regression is very powerful.
- But it is based on assumptions (what we know).
- We need to check assumptions (residual analyses).
118(No Transcript)