Review of Top 10 Concepts in Statistics - PowerPoint PPT Presentation

1 / 118
About This Presentation
Title:

Review of Top 10 Concepts in Statistics

Description:

Test statistic: A value, determined from sample information, ... Test Statistic ... Because we know the population standard deviation, the test statistic is z. ... – PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 119
Provided by: Lapt161
Category:

less

Transcript and Presenter's Notes

Title: Review of Top 10 Concepts in Statistics


1
Review of Top 10 Conceptsin Statistics
  • NOTE This Power Point file is not an
    introduction, but rather a checklist of topics to
    review

2
Top Ten 1
  • Descriptive Statistics

3
Measures of Central Location
  • Mean
  • Median
  • Mode

4
Mean
  • Population mean µ Sx/N (516)/3 12/3 4
  • Algebra Sx Nµ 34 12
  • Sample mean x-bar Sx/n
  • Example the number of hours spent on the
    Internet 4, 8, and 9
  • x-bar (489)/3 7 hours
  • Do NOT use if the number of observations is small
    or with extreme values
  • Ex Do NOT use if 3 houses were sold this week,
    and one was a mansion

5
Median
  • Median middle value
  • Example 5,1,6
  • Step 1 Sort data 1,5,6
  • Step 2 Middle value 5
  • When there is an even number of observation,
    median is computed by averaging the two
    observations in the middle.
  • OK even if there are extreme values
  • Home sales 100K,200K,900K, so
  • mean 400K, but median 200K

6
Mode
  • Mode most frequent value
  • Ex female, male, female
  • Mode female
  • Ex 1,1,2,3,5,8
  • Mode 1
  • It may not be a very good measure, see the
    following example

7
Measures of Central Location - Example
  • Sample 0, 0, 5, 7, 8, 9, 12, 14, 22, 23
  • Sample Mean x-bar Sx/n 100/10 10
  • Median (89)/2 8.5
  • Mode 0

8
Relationship
  • Case 1 if probability distribution symmetric
    (ex. bell-shaped, normal distribution),
  • Mean Median Mode
  • Case 2 if distribution positively skewed to
    right (ex. incomes of employers in large firm a
    large number of relatively low-paid workers and a
    small number of high-paid executives),
  • Mode lt Median lt Mean

9
Relationship contd
  • Case 3 if distribution negatively skewed to left
    (ex. The time taken by students to write exams
    few students hand their exams early and majority
    of students turn in their exam at the end of
    exam),
  • Mean lt Median lt Mode

10
Dispersion Measures of Variability
  • How much spread of data
  • How much uncertainty
  • Measures
  • Range
  • Variance
  • Standard deviation

11
Range
  • Range Max-Min gt 0
  • But range affected by unusual values
  • Ex Santa Monica has a high of 105 degrees and a
    low of 30 once a century, but range would be
    105-30 75

12
Standard Deviation (SD)
  • Better than range because all data used
  • Population SD Square root of variance sigma s
  • SD gt 0

13
Empirical Rule
  • Applies to mound or bell-shaped curves
  • Ex normal distribution
  • 68 of data within one SD of mean
  • 95 of data within two SD of mean
  • 99.7 of data within three SD of mean

14
Standard Deviation Square Root of Variance
15
Sample Standard Deviation
16
Standard Deviation
  • Total variation 34
  • Sample variance 34/4 8.5
  • Sample standard deviation
  • square root of 8.5 2.9

17
Measures of Variability - Example
  • The hourly wages earned by a sample of five
    students are
  • 7, 5, 11, 8, and 6
  • Range 11 5 6
  • Variance
  • Standard deviation

18
Coefficient of Variation (CV)
  • CV Standard Deviation/Mean
  • Relative measure of spread
  • Example
  • Country 1 CV 100/1000 0.10
  • Country 2 CV 200/4000 0.05
  • Although Country 2 has higher standard
    deviation, Country 1 has higher relative spread

19
Graphical Tools
  • Line chart trend over time
  • Scatter diagram relationship between two
    variables
  • Bar chart frequency for each category
  • Histogram frequency for each class of measured
    data (graph of frequency distr.)
  • Box plot graphical display based on quartiles,
    which divide data into 4 parts

20
Top Ten 2
  • Hypothesis Testing

21
H0 Null Hypothesis
  • Population mean?
  • Population proportionp
  • A statement about the value of a population
    parameter
  • Never include sample statistic (such as, x-bar)
    in hypothesis

22
HA or H1 Alternative Hypothesis
  • ONE TAIL ALTERNATIVE
  • Right tail ?gtnumber(smog ck)
  • pgtfraction(defectives)
  • Left tail ?ltnumber(weight in box of crackers)
  • pltfraction(unpopular Presidents
    approval low)

23
One-Tailed Tests
  • A test is one-tailed when the alternate
    hypothesis, H1 or HA, states a direction, such as
  • H1 The mean yearly salaries earned by full-time
    employees is more than 45,000. (?gt45,000)
  • H1 The average speed of cars traveling on
    freeway is less than 75 miles per hour. (?lt75)
  • H1 Less than 20 percent of the customers pay
    cash for their gasoline purchase. (p lt0.2)

24
Two-Tail Alternative
  • Population mean not equal to number (too hot or
    too cold)
  • Population proportion not equal to fraction (
    alcohol too weak or too strong)

25
Two-Tailed Tests
  • A test is two-tailed when no direction is
    specified in the alternate hypothesis
  • H1 The mean amount of time spent for the
    Internet is not equal to 5 hours. (? ? 5).
  • H1 The mean price for a gallon of gasoline is
    not equal to 2.54. (? ? 2.54).

26
Reject Null Hypothesis (H0) If
  • Absolute value of test statistic gt critical
    value
  • Reject H0 if Z Value gt critical Z
  • Reject H0 if t Value gt critical t
  • Reject H0 if p-value lt significance level (alpha)
  • Note that direction of inequality is reversed!
  • Reject H0 if very large difference between sample
    statistic and population parameter in H0

Test statistic A value, determined from sample
information, used to determine whether or not to
reject the null hypothesis. Critical value The
dividing point between the region where the null
hypothesis is rejected and the region where it is
not rejected.
27
Example Smog Check
  • H0 ? 80
  • HA ? gt 80
  • If test statistic 2.2 and critical value 1.96,
    reject H0, and conclude that the population mean
    is likely gt 80
  • If test statistic 1.6 and critical value
    1.96, do not reject H0, and reserve judgment
    about H0

28
Type I vs. Type II Error
  • Alphaa P(type I error) Significance level
    probability that you reject true null hypothesis
  • Beta ß P(type II error) probability you do
    not reject a null hypothesis, given H0 false
  • Ex H0 Defendant innocent
  • a P(jury convicts innocent person)
  • ß P(jury acquits guilty person)

29
Type I vs. Type II Error
30
Example Smog Check
  • H0 ? 80
  • HA ? gt 80
  • If p-value 0.01 and alpha 0.05, reject H0,
    and conclude that the population mean is likely gt
    80
  • If p-value 0.07 and alpha 0.05, do not reject
    H0, and reserve judgment about H0

31
Test Statistic
  • When testing for the population mean from a large
    sample and the population standard deviation is
    known, the test statistic is given by

32
Example
  • The processors of Best Mayo indicate on the
    label that the bottle contains 16 ounces of mayo.
    The standard deviation of the process is 0.5
    ounces. A sample of 36 bottles from last hours
    production showed a mean weight of 16.12 ounces
    per bottle. At the .05 significance level, can
    we conclude that the mean amount per bottle is
    greater than 16 ounces?

33
Example contd
  • 1. State the null and the alternative hypotheses
  • H0 ? 16, H1 ? gt 16

2. Select the level of significance. In this
case, we selected the .05 significance level.
  • 3. Identify the test statistic. Because we know
    the population standard deviation, the test
    statistic is z.
  • 4. State the decision rule.
  • Reject H0 if zgt 1.645 ( z0.05)

34
Example contd
  • 5. Compute the value of the test statistic
  • 6. Conclusion Do not reject the null hypothesis.
    We cannot conclude the mean is greater than 16
    ounces.

35
Example Using Empirical Rule in Hypothesis
Testing
  • H0 ? 80
  • HA µ ? 80
  • The following information is given
  • Computed test statistic (z) -1.14
  • Significance level (alpha) 0.05
  • Should you reject H0?

36
Example contd
  • Two tail HA with alpha 0.05 has same critical Z
    value as 95 confidence interval
  • Empirical rule, if bell shaped
  • 95 of data within two standard deviation of
    mean
  • That is, 95 of normal curve between Z -2.0 and
    Z 2.0
  • Computed test statistic (Z -1.14) is between
    2.0 and 2.0, so do NOT reject H0

37
Top Ten 3
  • Confidence Intervals Mean and Proportion

38
Confidence Interval
  • A confidence interval is a range of values within
    which the population parameter is expected to
    occur.

39
Factors for Confidence Interval
  • The factors that determine the width of a
    confidence interval are
  • The sample size, n
  • The variability in the population, usually
    estimated by standard deviation.
  • The desired level of confidence.

40
Confidence Interval Mean
  • Use normal distribution (Z table if)
  • population standard deviation (sigma) known and
    either (1) or (2)
  • Normal population
  • Sample size gt 30

41
Confidence Interval Mean
  • If normal table, then

42
Normal Table
  • Tail .5(1 confidence level)
  • NOTE! Different statistics texts have different
    normal tables
  • This review uses the tail of the bell curve
  • Ex 95 confidence tail .5(1-.95) .025
  • Z.025 1.96

43
Example
  • n49, Sx490, s2, 95 confidence
  • 9.44 lt ? lt 10.56

44
Another Example
  • One of SOM professors wants to estimate the mean
    number of hours worked per week by students. A
    sample of 49 students showed a mean of 24 hours.
    It is assumed that the population standard
    deviation is 4 hours. What is the population
    mean?

45
Another Example contd
  • 95 percent confidence interval for the
    population mean.

The confidence limits range from 22.88 to 25.12.
We estimate with 95 percent confidence that the
average number of hours worked per week by
students lies between these two values.
46
Confidence Interval Mean t distribution
  • Use if normal population but population standard
    deviation (s) not known
  • If you are given the sample standard deviation
    (s), use t table, assuming normal population
  • If one population, n-1 degrees of freedom

47
Confidence Interval Mean t distribution
48
Confidence Interval Proportion
  • Use if success or failure
  • (ex defective or not-defective,
  • satisfactory or unsatisfactory)
  • Normal approximation to binomial ok if
  • (n)(p) gt 5 and (n)(1-p) gt 5, where
  • n sample size
  • p population proportion
  • NOTE NEVER use the t table if proportion!!

49
Confidence Interval Proportion
  • Ex 8 defectives out of 100, so p .08 and
  • n 100, 95 confidence

50
Confidence Interval Proportion
  • A sample of 500 people who own their house
    revealed that 175 planned to sell their homes
    within five years. Develop a 98 confidence
    interval for the proportion of people who plan to
    sell their house within five years.

51
Interpretation
  • If 95 confidence, then 95 of all confidence
    intervals will include the true population
    parameter
  • NOTE! Never use the term probability when
    estimating a parameter!! (ex Do NOT say
    Probability that population mean is between 23
    and 32 is .95 because parameter is not a random
    variable. In fact, the population mean is a fixed
    but unknown quantity.)

52
Point vs. Interval Estimate
  • Point estimate statistic (single number)
  • Ex sample mean, sample proportion
  • Each sample gives different point estimate
  • Interval estimate range of values
  • Ex Population mean sample mean error
  • Parameter statistic error

53
Width of Interval
  • Ex sample mean 23, error 3
  • Point estimate 23
  • Interval estimate 23 3, or (20,26)
  • Width of interval 26-20 6
  • Wide interval Point estimate unreliable

54
Wide Confidence Interval If
  • (1) small sample size(n)
  • (2) large standard deviation
  • (3) high confidence interval (ex 99 confidence
    interval wider than 95 confidence interval)
  • If you want narrow interval, you need a large
    sample size or small standard deviation or low
    confidence level.

55
Top Ten 4
  • Linear Regression

56
Linear Regression
  • Regression equation
  • dependent variablepredicted value
  • x independent variable
  • b0y-intercept predicted value of y if x0
  • b1sloperegression coefficient
  • change in y per unit change in x

57
Slope vs. Correlation
  • Positive slope (b1gt0) positive correlation
    between x and y (y increase if x increase)
  • Negative slope (b1lt0) negative correlation (y
    decrease if x increase)
  • Zero slope (b10) no correlation(predicted value
    for y is mean of y), no linear relationship
    between x and y

58
Simple Linear Regression
  • Simple one independent variable, one dependent
    variable
  • Linear graph of regression equation is straight
    line

59
Example
  • y salary (female manager, in thousands of
    dollars)
  • x number of children
  • n number of observations

60
Given Data
61
Totals
62
Slope (b1) -6.5
  • Method of Least Squares formulas not on BUS 302
    exam
  • b1 -6.5 given

Interpretation If one female manager has 1 more
child than another, salary is 6,500 lower that
is, salary of female managers is expected to
decrease by -6.5 (in thousand of dollars) per
child
63
Intercept (b0)
  • b0 44.33 (-6.5)(2.33) 59.5
  • If number of children is zero, expected salary is
    59,500

64
Regression Equation
65
Forecast Salary If 3 Children
  • 59.5 6.5(3) 40
  • 40,000 expected salary

66
Standard Error of Estimate
67
Standard Error of Estimate
68
Standard Error of Estimate
Actual salary typically 1,900 away from expected
salary
69
Coefficient of Determination
  • R2 of total variation in y that can be
    explained by variation in x
  • Measure of how close the linear regression line
    fits the points in a scatter diagram
  • R2 1 max. possible value perfect linear
    relationship between y and x (straight line)
  • R2 0 min. value no linear relationship

70
Sources of Variation (V)
  • Total V Explained V Unexplained V
  • SS Sum of Squares V
  • Total SS Regression SS Error SS
  • SST SSR SSE
  • SSR Explained V, SSE Unexplained

71
Coefficient of Determination
  • R2 SSR
    SST
  • R2 197 .98
    200.5
  • Interpretation 98 of total variation in salary
    can be explained by variation in number of
    children

72
0 lt R2 lt 1
  • 0 No linear relationship since SSR0
    (explained variation 0)
  • 1 Perfect relationship since SSR SST
    (unexplained variation SSE 0), but does not
    prove cause and effect

73
RCorrelation Coefficient
  • Case 1 slope (b1) lt 0
  • R lt 0
  • R is negative square root of coefficient of
    determination

74
Our Example
  • Slope b1 -6.5
  • R2 .98
  • R -.99

75
Case 2 Slope gt 0
  • R is positive square root of coefficient of
    determination
  • Ex R2 .49
  • R .70
  • R has no interpretation
  • R overstates relationship

76
Caution
  • Nonlinear relationship (parabola, hyperbola, etc)
    can NOT be measured by R2
  • In fact, you could get R20 with a nonlinear
    graph on a scatter diagram

77
Summary Correlation Coefficient
  • Case 1 If b1 gt 0, R is the positive square root
    of the coefficient of determination
  • Ex1 y 43x, R2.36 R .60
  • Case 2 If b1 lt 0, R is the negative square root
    of the coefficient of determination
  • Ex2 y 80-10x, R2.49 R -.70
  • NOTE! Ex2 has stronger relationship, as measured
    by coefficient of determination

78
Extreme Values
  • R1 perfect positive correlation
  • R -1 perfect negative correlation
  • R0 zero correlation

79
MS Excel Output
Correlation Coefficient (-0.9912) Note that you
need to change the sign because the sign of slope
(b1) is negative (-6.5)
Coefficient of Determination
Standard Error of Estimate
Regression Coefficient
80
Top Ten 5
  • Expected Value

81
Expected Value
  • Expected Value E(x) SxP(x)
  • x1P(x1) x2P(x2)
  • Expected value is a weighted average, also a
    long-run average

82
Example
  • Find the expected age at high school graduation
    if 11 were 17 years old, 80 were 18 years old,
    and 5 were 19 years old
  • Step 1 1180596

83
Step 2
84
Another Example of E(x)
  • A news rack has 2 papers left. In past, 20 of
    days you sold both papers, while 50 of days you
    sold one paper. Find expected number of papers
    sold.
  • Answer
  • First, find P(0) 1 - 0.20 - 0.50 0.30
  • E(X) 0(0.30) 1(0.50) 2(0.20) 0.9

85
Top Ten 6
  • What Distribution to Use?

86
Use Binomial Distribution If
  • Random variable (x) is number of successes in n
    trials
  • Each trial is success or failure
  • Independent trials
  • Constant probability of success (p) on each trial
  • Sampling with replacement (in practice, people
    may use binomial w/o replacement, but theory is
    with replacement)

87
Success vs. Failure
  • The binomial experiment can result in only one of
    two possible outcomes
  • Male vs. Female
  • Defective vs. Non-defective
  • Yes or No
  • Pass (8 or more right answers) vs. Fail (fewer
    than 8)
  • Buy drink (21 or over) vs. Cannot buy drink

88
Binomial Is Discrete
  • Integer values
  • 0,1,2,n
  • Binomial is often skewed, but may be symmetric

89
Example of Binomial
  • If 60 of all voters in a precinct are Democrats,
    find the probability that a sample of 3 voters
    has (a) all Democrats (b) no Democrats
  • Answer for (a)
  • P(3) (0.6)(0.6)(0.6)
  • Answer for (b)
  • P(0) (0.4)(0.4)(0.4)

90
Normal Distribution
  • Continuous, bell-shaped, symmetric
  • Meanmedianmode
  • Measurement (dollars, inches, years)
  • Cumulative probability under normal curve use Z
    table if you know population mean and population
    standard deviation
  • Sample mean use Z table if you know population
    standard deviation and either normal population
    or n gt 30

91
t Distribution
  • Continuous, mound-shaped, symmetric
  • Applications similar to normal
  • More spread out than normal
  • Use t if normal population but population
    standard deviation not known
  • Degrees of freedom df n-1 if estimating the
    mean of one population
  • t approaches z as df increases

92
Normal or t Distribution?
  • Use t table if normal population but population
    standard deviation (s) is not known
  • If you are given the sample standard deviation
    (s), use t table, assuming normal population

93
Top Ten 7
  • P-value

94
P-value
  • P-value probability of getting a sample
    statistic as extreme (or more extreme) than the
    sample statistic you got from your sample, given
    that the null hypothesis is true

95
P-value Example one tail test
  • H0 ? 40
  • HA ? gt 40
  • Sample mean 43
  • P-value P(sample mean gt 43, given H0 true)
  • Meaning probability of observing a sample mean
    as large as 43 when the population mean is 40
  • How to use it Reject H0 if p-value lt a
    (significance level)

96
Two Cases
  • Suppose a .05
  • Case 1 suppose p-value .02, then reject H0
    (unlikely H0 is true you believe population mean
    gt 40)
  • Case 2 suppose p-value .08, then do not reject
    H0 (H0 may be true you have reason to believe
    that the population mean may be 40)

97
P-value Example two tail test
  • H0 ? 70
  • HA ? ? 70
  • Sample mean 72
  • If two-tails, then P-value
  • 2 ? P(sample mean gt 72)2(.04).08
  • If a .05, p-value gt a, so do not reject H0

98
Top Ten 8
  • Variation Creates Uncertainty

99
No Variation
  • Certainty, exact prediction
  • Standard deviation 0
  • Variance 0
  • All data exactly same
  • Example all workers in minimum wage job

100
High Variation
  • Uncertainty, unpredictable
  • High standard deviation
  • Ex 1 Workers in downtown L.A. have variation
    between CEOs and garment workers
  • Ex 2 New York temperatures in spring range from
    below freezing to very hot

101
Comparing Standard Deviations
  • Temperature Example
  • Beach city small standard deviation (single
    temperature reading close to mean)
  • High Desert city High standard deviation (hot
    days, cool nights in spring)

102
Standard Error of the Mean
  • Standard deviation of sample mean
  • standard deviation/square root of n
  • Ex standard deviation 10, n 4, so standard
    error of the mean 10/2 5
  • Note that 5lt10, so standard error lt standard
    deviation.
  • As n increases, standard error decreases.

103
Sampling Distribution
  • Expected value of sample mean population mean,
    but an individual sample mean could be smaller or
    larger than the population mean
  • Population mean is a constant parameter, but
    sample mean is a random variable
  • Sampling distribution is distribution of sample
    means

104
Example
  • Mean age of all students in the building is
    population mean
  • Each classroom has a sample mean
  • Distribution of sample means from all classrooms
    is sampling distribution

105
Sampling
  • Sampling Distribution concepts assume
    probability sample
  • Probability sample requires calculation of
    probability of being in the sample
  • Probability sample more accurate than judgment or
    convenience sample

106
Central Limit Theorem (CLT)
  • If population standard deviation is known,
    sampling distribution of sample means is normal
    if n gt 30
  • CLT applies even if original population is skewed

107
Top Ten 9
  • Population vs. Sample

108
Population
  • Collection of all items (all light bulbs made at
    factory)
  • Parameter measure of population
  • (1) population mean (average number of hours in
    life of all bulbs)
  • (2) population proportion ( of all bulbs that
    are defective)

109
Sample
  • Part of population (bulbs tested by inspector)
  • Statistic measure of sample estimate of
    parameter
  • (1) sample mean (average number of hours in life
    of bulbs tested by inspector)
  • (2) sample proportion ( of bulbs in sample that
    are defective)

110
Top Ten 10
  • Qualitative vs. Quantitative

111
Levels of Measurement
  • I. QUALITATIVE
  • Nominal No order (Ex color)
  • Ordinal Order important (Ex good, fair, poor)
  • II. QUANTITATIVE
  • Interval Order important (Ex temp, shoe size)
  • Ratio Order important AND ratio meaningful (Ex
    20/hr twice as good as 10/hr)

112
Qualitative
  • Categorical data
  • success vs. failure
  • ethnicity
  • marital status
  • color
  • zip code
  • 4 star hotel in tour guide

113
Qualitative
  • If you need an average, do not calculate the
    mean
  • However, you can compute the mode (average
    person is married, buys a blue car made in
    America)

114
Quantitative
  • Two cases
  • Case 1 discrete
  • Case 2 continuous

115
Discrete
  • (1) integer values (0,1,2,)
  • (2) example binomial
  • (3) finite number of possible values
  • (4) counting
  • (5) number of brothers
  • (6) number of cars arriving at gas station

116
Continuous
  • Real numbers, such as decimal values (22.22)
  • Examples Z, t
  • Infinite number of possible values
  • Measurement
  • Miles per gallon, distance, duration of time

117
Graphical Tools
  • Pie chart or bar chart qualitative
  • Joint frequency table qualitative (relate
    marital status vs. zip code)
  • Scatter diagram quantitative (distance from CSUN
    vs. duration of time to reach CSUN)

118
Hypothesis TestingConfidence Intervals
  • Quantitative Mean
  • Qualitative Proportion
Write a Comment
User Comments (0)
About PowerShow.com