Introductory Statistical Concepts - PowerPoint PPT Presentation

About This Presentation
Title:

Introductory Statistical Concepts

Description:

Introductory Statistical Concepts * * * * * * * Spell Out Hypotheses * * * * Name the population parameters? * * * Draw a line * Put in Leftover * * * Bring up Linear ... – PowerPoint PPT presentation

Number of Views:683
Avg rating:3.0/5.0
Slides: 89
Provided by: SusanH73
Category:

less

Transcript and Presenter's Notes

Title: Introductory Statistical Concepts


1
Introductory Statistical Concepts
2
Disclaimer
  • I am not an expert SAS programmer.
  • Nothing that I say is confirmed or denied by
    Texas AM University.

3
Why Are We Here?
  • Deming
  • To Learn
  • To Have Fun
  • Question Who was Deming?

4
Poll What type of organization do you work for?
  • PlaceWare Multiple Choice Poll. Use PlaceWare gt
    Edit Slide Properties... to edit.
  • Business
  • Government
  • Education
  • Nonprofit
  • Other

5
Purpose of These Lectures
  • A review of the statistical concepts used in most
    of the SAS Analytics Lecture Series.
  • We will look at questions such as the following
  • What is the nature of statistical analyses?
  • Why are population parameters so important?
  • What is really being tested when you see a
    p-value?
  • Why does regression handle missing data so well?
  • What are residual analyses?

6
Descriptive Statistics
7
The Population
8
Learning Outcomes
  • You will learn
  • basic statistical concepts
  • the definition of mean, median, mode and standard
    deviation
  • the difference between populations and samples
  • the difference between parameters and estimates
  • about confidence intervals
  • how to test a statistical hypothesis
  • how to run a regression analysis

9
Parameters
  • Characteristics of the variable of interest
  • It is how we describe the variable of interest
  • Parameters are unknown

10
Parameters(Characteristics)
  • Central Tendency
  • Mode
  • Median
  • Mean
  • Measures of Variability
  • Range
  • Variance
  • Standard Deviation

Click Here for more information on Mode Mean
Median Click Here for an applet
11
Variability
  • Change in the Data

12
What is an Index ?
How SUNNY is SUNNY? THE UV Index Click Here
13
Air Quality IndexWhat Does It Mean?
14
DOW JONES INDUSTRIAL AVERAGE INDEX 
What does 10,971.16 really mean? What is
better a DJIA of 10,000 Or a DJIA of 12,000?
15
Variability Index
  • A Simple One
  • Find the Largest Value
  • Find the Smallest Value
  • Let Range R Largest Smallest

16
A More Complex Variation Index
  • The Standard Deviation
  • Statisticians use this index to indicate
    variability
  • You will see it written as
  • Widely available from SAS, Excel, and other
    statistical packages

17
Details of the More Complex Index
  • Example Suppose that we observe the following
    three numbers
  • 1 4 7
  • The mean of these number is
  • ( 1 47)/3 4
  • We now subtract the mean from each number and
    square it
  • (1-4)(1-4) (4-4)(4-4) (7-4)(7-4) 18
  • The Standard Deviation sqrt(18/2) 3

18
What does this Mean?
  • By itself , it may be confusing to some.
  • Comparing populations, we can use it to say which
    population varies the most.
  • Let us look at an applet Click Here

19
Using Graphs to Determine Variability
  • Box Plot
  • Click Here

20
Distributions
21
Known Distribution
  • With a known distribution, we know the following
  • the shape
  • the mean
  • the variability (standard deviation)
  • and/or some other information

22
Classical Distributions-Normal
23
Normal-Overlay
24
Classical Distributions-Uniform
25
Survey
  • The following are called parameters of the
    population
  • mean, median, mode
  • variance, standard deviation, range,
    inter-quartile range (IQR)
  • In general, are these known or unknown?
  • Known yes (select using your seat indicator)
  • Unknown no (select using your seat indicator)

26
MPG-Histogram
Compare with true values !
27
Simulated Sample
  • In this example, we simulated taking a sample of
    size 1000 from one population of cars weighing
    3000 pounds with a normal distribution with
    mean24 and standard deviation1.
  • You can practice this after class.

28
Section 1.2
  • Populations and Samples

29
Objectives
  • Understand the relationships between
  • populations and samples
  • parameters and estimates.
  • Look at an overview of hypotheses testing.

30
Population
Mean, Variance, Median, Mode, Distribution,
Parameters
31
Example
  • Mpg of American-made cars that weigh between 2000
    and 3500 pounds and were built in the 1970s.
  • Parameters mean, variance, and so on
  • In general, we do not know the parameters.

32
Purpose of Statistical Analyses
  • Estimate the parameters. (Make guesses.)
  • Example What is the population mean?
  • Test hypothesis about the parameters. (Ask
    questions.)
  • Example Is the population mean30mpg?

33
Role of Samples
  • Taking a sample of the population enables you to
  • make estimates of the population parameters
  • answer the questions about the population
    parameters.

34
Population and Sample
Mean, Variance, Median, Mode, Distribution,
Parameters
Sample
Sample mean Sample variance
S
Inference Estimates Test of hypotheses
35
Example cars_american
  • This is a sample of American-made cars that weigh
    between 2000 and 3500 pounds and that were built
    in the 1970s.
  • We are interested in the mpg.
  • Use summary statistics to analyze the data.

36
Results of Summary Statistics
37
Results of Histogram
continued...
38
Results of Histogram
39
Sampling Distribution Applet sampling_dist
  • This demonstration illustrates how to estimate
    and plot the sampling distribution of various
    statistics.

40
View/Application Share Demo Sampling
Distributions Applet
  • PlaceWare View/Application Share. Use PlaceWare
    gt Edit Slide Properties... to edit.

41
http//www.ruf.rice.edu/lane/stat_sim/sampling_di
st/index.h...
  • PlaceWare Web Page. Use PlaceWare gt Edit Slide
    Properties... to edit.

42
Confidence Intervals on the Population Mean
  • Level of Comfort
  • 50 21.57 to 22.21
  • 95 20.96 to 22.82
  • 99.9 20.30 to 23.48

What does this mean?
43
Test That the Population Mean 30 mpg
  • Use t-test ? One Sample t-test
  • Requirements for running this test
  • Large n gt 35
  • Or leftovers are normal
  • What is the p-value or sig value?

44
Testing Mean 30
45
Conclusions of the Test
  • Choose an alpha level, usually alpha.05.
  • If sigltalpha, then reject.
  • Otherwise, fail to reject.

46
Sig and p-values
  • When you see a sig value or p-value
  • You know that some hypothesis is being tested.
  • You know whether or not the hypothesis is being
    rejected.
  • You probably do not know what the hypothesis
    really is.
  • Ask yourself these questions
  • What are the population parameters being tested?
  • How is what is being tested related to those
    parameters?

47
Requirements for Doing This Test
  • Large n ? n gt 35
  • Or leftovers are normally distributed.
  • Use Histogram to test for normality.

48
Populations-Which Ones are Similar?
49
Populations-Which Ones are Similar?
  • Take samples.

50
Take Samples
  • Use the samples to answer this question
  • Which populations are similar?
  • Statistical translations
  • Which populations are similar? is the same as
    asking
  • Are the following the same
  • distribution?
  • mean?
  • variance?

51
Background/Requirements
  • Before we jump into the analysis, we must ask the
    following questions
  • How many populations are there?
  • How many population parameters are we interested
    in and what are they?
  • What tests do we want to do, and what are the
    requirements for doing those?
  • Are we using everything we know?

52
Example
  • Suppose that we are interested in the mpg of
    American and European cars. How many populations
    are there?

American Cars Mpg Distribution Mean Variance
European Cars Mpg Distribution Mean Variance
53
Poll How many populations are there?
  • PlaceWare Multiple Choice Poll. Use PlaceWare gt
    Edit Slide Properties... to edit.
  • One - MPG
  • Two - American and European
  • Depends on the sample size

54
Parameters
Population 1 Population 2
American Cars European Cars
Variable of interest mpg Variable of interest mpg
Distribution Normal? Distribution Normal?
Mean Mean
Variance Variance
55
Analyses
  • We want to look at the distributions.
  • We want to estimate the parameters.
  • We want to answer these questions
  • Are the populations means the same?
  • Are the population variances the same?

56
Example Our Data Set car_am_eu
  • Suppose that we are interested in the mpg of
    American and European cars.

American Cars Mpg Distribution Mean Variance
European Cars Mpg Distribution Mean Variance
Sample
Sample
57
Results from the Sample
continued...
58
Results
59
Box Plots
American
European
60
Histograms
American
European
61
Poll Are the populations the same?
  • PlaceWare Yes/No Poll. Use PlaceWare gt Edit
    Slide Properties... to edit.
  • Yes
  • No

62
Conclusion Based on Sample Numbers and Graphs
  • Easy -- Based on the samples, the populations are
    differentno statistical jargon
  • But I must have a p-value for my boss, for my
    paper, and so on.

63
Formal Tests
  • The classical approach in determining whether two
    populations are the same is to test to see
    whether the two population means are equal.
  • But first we check to see whether the two
    population variances are equal

continued...
64
Formal Tests
  • We use t-test ? Two Sample.

Test 2
Test 1
65
Section 1.3
  • Simple Linear Regression

66
Objectives
  • Identify the following
  • the population parameters
  • the appropriate model
  • number of populations sampled
  • the correct hypotheses
  • what should be tested for normality
  • what equal variances means.

67
MPG Example
Weight 3000
Weight 2600
Take a sample of size 1 from each population!
Weight 2300
Weight 2900
68
Data
  • We should be in deep trouble with one sample from
    each population.
  • We have eight unknown population parameters.
  • Can you name them?
  • But what do we know?

69
Survey
  • Name the population parameters.

70
Essential Part and Leftovers
  • We want to model the data as follows
  • MPG Essential Part Leftover
  • or
  • MPG Mean Leftover

71
Know or Assumptions
  • First, we know that
  • Second, each population mean is related to weight
    by the following
  • The population means fall on a straight line!!
  • How many unknowns are there now?

72
Poll How many unknowns are there?
  • PlaceWare Multiple Choice Poll. Use PlaceWare gt
    Edit Slide Properties... to edit.
  • 1
  • 2
  • 3
  • 4
  • 5
  • n

73
Graph
74
Observed, Essential Part, Leftover
75
The Official Regression Model
  • or
  • or
  • or

76
Main Assumptions
  • The means of the populations fall on a straight
    line.
  • All of the variances are equal ( ).
  • The errors are known to be normal with mean 0
    and variance .

77
Assumptions for Simple Linear Regression
Appendix A
  • This demonstration illustrates the fundamental
    concepts of simple linear regression.

78
View/Application Share Demo Linear.doc
  • PlaceWare View/Application Share. Use PlaceWare
    gt Edit Slide Properties... to edit.

79
How Can We Estimate the Unknown Parameters?
  • The Principle of Least Squares
  • or
  • or
  • Now, choose a and b so that
    is as small as possible.
  • or
  • Minimize .

80
OUTPUT_0
81
OUTPUT
82
OUTPUT_1
83
OUTPUT_2
84
OUTPUT_3
85
OUTPUT_4
86
Missing Values
  • Suppose that we want to estimate the mean mpg
    when weight2500.
  • Predicted (Estimated) Mean MPG 44.05 -
    .0078weight
  • Why does this work?



87
Survey
  • Can anyone explain why this works?

88
Conclusion
  • Simple linear regression is very powerful.
  • But it is based on assumptions (what we know).
  • We need to check assumptions (residual analyses).
Write a Comment
User Comments (0)
About PowerShow.com