Statistics Overview - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

Statistics Overview

Description:

Are two samples (likely) drawn from different populations? ... Since 6 AM yesterday (Wednesday), how many times have you brushed your teeth? ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 36
Provided by: radar5
Category:

less

Transcript and Presenter's Notes

Title: Statistics Overview


1
Statistics Overview
  • Lecture Notes
  • Prob-Stat 3350
  • August 24, 2006

2
Questions statistics can answer
  • What distribution or population was this data
    sample (likely) drawn from?
  • Data Analysis
  • Numeric and Graphical Summaries
  • Are two samples (likely) drawn from different
    populations?
  • Group comparison questions Do 2 groups differ
    on some variable? If so, how different are they?
  • Are two numeric variables from a single sample
    related (correlated)?
  • If so, how strongly are they related?
  • Are two categorical variables from a single
    sample dependent (Chi-Square)?

3
Consulting Project
  • We will perform the following tasks
  • Write a good survey
  • Collect sample representative of 10 of NGCSU
    population
  • Analyze data
  • Understand demographics of Rec Sports
    constituencies and their needs/wants
  • Test hypotheses about how different groups will
    use the new Rec Center
  • Advise the NGCSU administration on how to best
    implement and market the new Rec Center

4
Data Analysis
  • Numerics
  • Mean, Median, Mode
  • Standard Deviation (Variance), Z-Scores
  • Percentiles, Quartiles (5-Number-Summary)
  • Graphics
  • (Relative) Frequency Tables Histograms
  • Pie Charts
  • Stem-and-Leaf Plots
  • Box-and-Whisker Plots

5
Data Analysis Example
  • Researchers are investigating bone loss by
    nursing moms. Since breast-feeding moms secrete
    calcium into their milk, researchers worry that
    the calcium may be coming from their bones. They
    measure the percent change in mineral content in
    the spines of 47 nursing mothers.

6
Data Table
7
So What Now?
  • How can we describe whats happening?
  • Central Tendencies
  • What is happening for the typical or average
    mom in this study?
  • Spread (Variance)
  • Position
  • How do individuals compare to the group?
  • Is the data Normal? What statistical tests are
    appropriate?

8
Numerics
  • Central Tendencies
  • Mean -3.588
  • Median -3.8
  • Mode most useless statistic ever invented!!
  • Spread or Variance
  • Std Dev 2.5056126
  • Range 10.5 (Min -8.3, Max 2.2)
  • Quartiles
  • Q1 -5.3
  • Q3 -2.1

9
Things to Notice
  • Mean and Median are Different
  • Data is Skewed Skewed Slightly Right
  • Mean -3.588
  • Median -3.8
  • Skewed Data Can Be Problematic
  • Middle of Data Set
  • Middle Half Between -2.1 -5.3 Loss
  • Middle Two-Thirds From 1 6 Loss

10
No Clear Picture of the Data
  • Numerics are Vital
  • Used in Computations
  • Limited Diagnostics Capability
  • Check Graphics to See Shape of Data
  • Histogram
  • Frequency Tables
  • Relative Frequency Tables
  • Stem-and-Leaf Plot
  • Box-Plot

11
Frequency Tables
  • Histograms are a Vital Analytical Tool
  • Built on Frequency Tables (or Relative Frequency
    Tables)
  • Show the Shape of a Distribution
  • Standard Properties for Histograms
  • Avoids Lying with Statistics
  • Bars must touch be equally spaced with no gaps
    in the x- or y-axis (no colors!)

12
Frequency Table
13
Histogram
14
Stem-and-Leaf Plot
15
Box Plot

16
Box-and-Whisker Plots
  • Graphical Representation of the
  • 5-Number Summary
  • Min
  • Q1
  • Median
  • Q3
  • Max

17
SoWhat Do We Know?
  • Generally Bell-Shaped Data
  • Some anomalies
  • Some skew
  • Typical nursing moms lose from 2 6 mineral
    content from their spine
  • These are scores within 1 std. dev. of mean
  • Captures two thirds of all scores
  • Within 1 s.d. is average

18
Discovering Distributions
  • Main use for
  • Histograms
  • Stem Plots
  • Discover underlying distribution
  • Bell-Shaped?
  • Most statistical tests assume normality
  • Test result not valid if data not sampled from a
    normal distribution

19
Histograms
  • Bar Graphs with special properties.
  • Must be flat, 2-D bar graph
  • All bars must be same width
  • Bars must be touching
  • Axes must include origin (no gaps!)
  • Provides standard look for all research tables
    and graphs not deceptive
  • Standard Excel Bar Graphs are often misleading!

20
Ex Stem-and-Leaf Data
USA
21
Ex Stem-and-Leaf Plot
1. Is the data sample from a normal
distribution?2. Find the median.
22
Marriage Misinformation
  • By the way, this statistic is almost always used
    to mislead people
  • Half of all marriages (in the U.S.) end in
    divorce.
  • While essentially true, this statistic needs a
    closer look.
  • If you graduate from college, marry your college
    sweetheart, is it really just a coin flip if
    youll still be married in 20 years?

23
Numerics
  • Basic
  • Mean, Median, Mode
  • Standard Deviation
  • Quartiles
  • Range, 5-Number-Summary
  • Advanced
  • Z-Scores (Standardized Scores)
  • Percentiles (p-values!!)

24
Central Tendencies
  • Mean or Average
  • VERY sensitive to OUTLIERS
  • Good if data is
  • Nearly symmetric (no skew)
  • Bad if data is
  • Skewed (has outliers on only one side)
  • Median
  • Compare to mean
  • If mean median, data is likely symmetric
  • If not, check for outliers/skew
  • Misleading to quote it as the average
  • Mode
  • Limited usefulness

Excel Example
25
Skewed Distributions
  • Left Test Scores
  • Cant get higher than 100
  • Outliers only lie to leftdrag mean left
  • Right Income Levels
  • Cant earn less than 0 per year
  • Outliers only lie to rightdrag mean right

26
Example Skewed Right
Median
Mean
Mean moves right.
27
Example Skewed Left
Mean moves left.
Mean
Median
28
What to do?
  • Often its best to report both
  • Median and Average
  • With skewed data
  • be sure to report Median
  • With near-perfect bell-curve
  • report Average only

29
Dispersion
  • Standard Deviation Major Indicator of Spread
  • Variance (Standard Deviation)2
  • Only use it to calculate Std. Dev.
  • 5-Number Summary
  • Range Max - Min

30
Example Interpreting Z-Scores
  • IQ scores have the N(100,16) distribution.
  • Compute z-scores for the following individuals
    and describe their meaning.
  • Joes IQ is 92
  • Jackis IQ is 125
  • Jakes IQ is 139

31
Empirical Rule
  • IQ scores have the N(100,16) distribution.
  • Which of the following categories includes the
    most people?
  • IQs less than 84
  • IQs between 84 and 116
  • IQs higher than 132
  • IQs less than 100

32
Example Z-Score Comparisons
  • Joey and Janie want to know who is taller. For
    males, the average height is 69.5 inches with a
    standard deviation of 2.9 inches. For females,
    the average height is 64.1 inches with a standard
    deviation of 2.8 inches. If Joey is 73 inches
    tall and Janie is 69 inches tall, who is
    (normally speaking) taller?

33
Clicker Question 1
  • Compute your standardized height
  • Males N(69.5,2.9)
  • Females N(64.1,2.8)
  • Use your clicker to record your height category
  • Well Below Average (Z lt -1.5)
  • Below Average (-1.5 lt Z lt -.5)
  • Average (-.5 lt Z lt .5)
  • Above Average (.5 lt Z lt 1.5)
  • Well Above Average (1.5 lt Z)

34
Clicker Question 2
  • If your life worked out perfectly for the next 20
    or so years, how many children would you have by
    age 40?
  • Clicker categories
  • 0
  • 1
  • 2
  • 3
  • 4 or more

35
Clicker Question 3
  • Since 6 AM yesterday (Wednesday), how many times
    have you brushed your teeth?
  • Clicker categories
  • 0
  • 1
  • 2
  • 3
  • 4 or more
Write a Comment
User Comments (0)
About PowerShow.com