Those who don - PowerPoint PPT Presentation

About This Presentation
Title:

Those who don

Description:

courses.ischool.berkeley.edu – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 40
Provided by: Rash82
Category:
Tags: coca | don | those

less

Transcript and Presenter's Notes

Title: Those who don


1
Those who dont know statistics are condemned to
reinvent it David Freedman
2
All you ever wanted to know about the histogram
and more ...
3
Distribution of No of Graphics on web pages
(N1873)
1
Mean 17.93
Median 16.00
Std. Dev 17.92
N 1873
Graphic Count
4
Horizontal Scale
2
5
Distribution of Redundant Link on web pages (N
1861)
3
Mean 22.1
Median 14
Std. Dev 37.33
N 1861.00
6
Plotting a histogramendpoint convention, plot
frequencies, make equal intervals etc.
7
Frequency Table
4
convention include the left endpoint in the
class interval
8
Frequency/Probability
9
No of fonts used on a web-page
5
Frequency /probability
10
Cleaning up a histogram getting rid of outliers
11
Distribution of word count (N1903)
Mean 393.2
Median 223
Std. Dev 725.24
Minimum 0
Maximum 20,357
12
Distribution of word count (N1897) top six
removed
7
Mean 368.0
Median 223
Std. Dev 474.04
Minimum 0
Maximum 4132
13
Distribution of word count (N1873)
Mean 333.4
Median 220
Std. Dev 360.30
Minimum 0
Maximum 4132
WORDCNT2
14
What can histograms tell you
15
Distribution of link count on good bad web-pages
8
Good Sites
Bad Sites
16
Making inferences from histograms Incidence of
riots and temperature
9
3
0
4
0
9
0
1
0
0
1
1
0
5
0
6
0
7
0
8
0
temperature
17
Mean and Median
Mean is arithmetic average, median is 50
point Mean is point where graph balances
Mean shifts around, Median does not shift much,
is more stable Computing Median for odd
numbered N find middle number For even numbered
N interpolate between middle 2, e.g. if it is 7
and 9, then 8 is the median
18
The instability of means and standard deviations
19
Add two numbers watch the mean, median, SD
20
Add one outlier...
21
Standard Deviation a measure of spread
22
Same mean, different spread
10
S
D
S
D
23
The Standard Deviation
24
  • The SD says how far away numbers
  • on a list are from their average.
  • Most entries on the list will be
  • somewhere around one SD away
  • from the average. Very few will be
  • more than two or three SDs away.

25
Understanding the standard deviation
  • Lets start with a list 1, 2, 2, 3

50
25
0
Histogram is symmetric about 2, 2 is mean, and
50 to left of 2, 50 to right
26
50
  • List 1, 2, 2, 3
  • Average 2
  • SD .8

25
0
50
List 1, 2, 2, 5 Average 2.5 SD 1.73
25
0
50
List 1, 2, 2, 7 Average 3 SD 2.71
25
0
27
Computing the standard deviation
  • List 20, 10, 15, 15
  • Average 15
  • Find deviations from average
  • 5, -5, 0, 0
  • Square the deviations
  • (5)2 (-5)2 (0)2 (0)2 50
  • divide it by N-1 50/3 16.67
  • Square root it ?16.67 4.08

28
Properties of the standard deviation
  • The standard deviation is in the same units as
    the mean
  • The standard deviation is inversely related to
    sample size (therefore as a measure of spread it
    is biased)
  • In normally distributed data 68 of the sample
    lies within 1 SD

29
Properties of the Normal Probability Curve
  • The graph is symmetric about the mean (the part
    to the right is a mirror image of the part to the
    left)
  • The total area under the curve equals 100
  • Curve is always above horizontal axis
  • Appears to stop after a certain point (the curve
    gets really low)

30
11
1 SD 68
2 SD 95
3 SD 99.7
  • The graph is symmetric about the mean
  • The total area under the curve equals 100
  • Mean to 1 SD - 68
  • Mean to 2 SD - 95
  • Mean to 3 SD - 99.7
  • You can disregard rest of curve

31
Distribution of judges ratings for the Webby
Awards
12
Mean 6.3
Median 6.3
Std. Dev 1.98
N 1867.00
Skewness -.43
Kurtosis -.201
32
It is a remarkable fact that many histograms in
real life tend to follow the Normal Curve. For
such histograms, the mean and SD are good summary
statistics. The average pins down the center,
while the SD gives the spread. For histogram
which do not follow the normal Curve, the mean
and SD are not good summary statistics. What
when the histogram is not normal ...
33
13
Distribution of word count on web pages
Std. Dev 384.83
Mean 348.3
- 3 SD (384 3) 1152 Mean - 1152 about
30 sample had negative number of links
34
When SD is influenced by outliers Use inter
quartile range 75th percentile - 25th percentile
Note. A percentile is a score below which a
certain of sample is
35
Measures of Normality
14
  • Visual examination
  • Skewness measure of symmetry

Symmetric
Positively Skewed
Negatively Skewed
36
Kurtosis Does it cluster in the middle?
15
Kurtosis is based on a distributions tail.
Distributions with a large tail
leptokurtic Distributions with a small tail
platykurtic Distributions with a normal tail
mesokurtic
Large tail
Small tail
Normal Tail
37
Positively Skewed and Leptokurtic Word Count
Mean 393.2
Median 223
Std. Dev 725.24
Skewness 13.62
Kurtosis 321.84
N 1903.00
38
Distribution of word count (N1897) top six
removed
Kurtosis 16.40
Skewness 3.49
Mean 368.0
Median 223
Std. Dev 474.04
N 1897.00
39
Degree of Freedom
  • The number of independent pieces of information
    remaining after estimating one or more parameters
  • Example List 1, 2, 3, 4 Average 2.5
  • For average to remain the same three of the
    numbers can be anything you want, fourth is fixed
  • New List 1, 5, 2.5, __ Average 2.5
Write a Comment
User Comments (0)
About PowerShow.com