Title: 45-733: lecture 8 (chapter 7)
145-733 lecture 8 (chapter 7)
2Samples from populations
- There is some population we are interested in
- Families in the US
- Products coming off our assembly line
- Consumers in our products market segment
- Employees
3Samples from populations
- We are interested in some quantitative
information (called variables) about these
populations - Income of families in the US
- Defects in products coming off our assembly line
- Perception of consumers of our product
- Productivity of our employees
4Samples from populations
- All the information (accessible to statistics)
about a quantity in a population is contained in
its distribution function - Real-world distribution functions are complicated
things - In real life, we usually know little or nothing
about the distribution functions of the variables
we are interested in
5Samples from populations
- Because distribution functions are complex, we
only try to find out about certain aspects of
them (parameters) - Average income of families in the US
- Rate of defects coming off our production line
- of customers who view our product favorably
- Average pieces/hour finished by a worker
6Samples from populations
- Of course, we do not begin by knowing even these
quantities - One possibility is to measure the whole
population - Allows us to answer any question about the
distribution or parameters, using the techniques
of chapter 2 - However, this is almost always expensive and
often infeasible
7Samples from populations
- Instead, we take a sample
- Taking a sample
- We select only a few of the members of the
population - We measure the variables of interest for those
members we select - Examples
- Phone survey
- Take 1 out of each 10,000 units off our prod line
8Samples from populations
- The whole of statistics is figuring out what we
can learn about the population from a sample - What can we say about the distribution of a
variable from the information in a sample? - What can we say about the parameters we are
interested in from our sample? - How good is the information in our sample about
the population?
9Samples and statistics
- As a practical matter, we are usually interested
in using our sample to say something about a
parameter of the distribution we care about - To get at this parameter, we construct a variable
called an estimator or statistic
10Sample and estimator
- An estimate is an informed guess at the value of
a parameter - An estimator is an algorithm or rule for turning
samples into informed guesses about the value of
a parameter - An estimator is an algorithm for tuning samples
into estimates
11Sample and estimator
- Example
- We are benchmarking our compensation policies for
our salesforce - Therefore, we are interested in how much
salespeople who work in similar jobs for similar
companies are paid - Naturally, they are not all paid the same
- There is a distribution of salaries among these
salespeople
12Sample and estimator
- Example
- We dont need or want to know exactly how much
each and every one of these comparable people is
paid - We dont need or want to know the exact
distribution of pay for this job
13Sample and estimator
- Example
- We do need and want to know some basic facts
about pay in this job. For example - What is the mean salary?
- What is the median salary?
- What is the standard deviation of salary?
- What is the 25th percentile of salary?
- What is the 75th percentile of salary?
- How is salary related to
- Experience?
- Typical hours? Travel requirements?
- Job responsibilities? Etc.
14Sample and estimator
- Example
- Each of these things can be regarded as a
parameter, either of the distribution of salaries
or of the joint distribution of salary and other
variables - Lets focus on mean salary
- We take a sample of salaries s1, s2, ,sn
- How can we get an estimate of E(s)?s?
15Sample and estimator
- Example
- Lets focus on mean salary, E(s)?s
- There is a TRUE value of ?s
- This value is fixed (non-random)
- It is just a number, like 47,432.81
- We wish to know it
- Knowing it exactly would be nice
- If we cant know it exactly, a good guess would
be useful.
16Sample and estimator
- Example
- Lets focus on mean salary
- We take a sample of salaries s1, s2, ,sn
- S-bar is an estimator
- S-bar tells us what to do with a sample to turn
it into a guess at the (population) mean salary
17Sample and estimator
- Example
- Lets focus on mean salary
- We take a sample of salaries s1, s2, ,sn
- S-bar is an estimator
- S-bar is a random variable with a distribution
function of its own - The distribution of s-bar depends on the
distribution of the underlying s
18Sample and estimator
- Example
- Lets focus on mean salary
- Suppose our sample is (in thousands)55,62,43,77
,89,61 - The our estimate would be
19Sample and estimator
- Example
- Lets focus on mean salary
- Suppose our sample is (in thousands)45,52,33,67
,79,51 - The our estimate would be
20Sample and estimator
- Example
- Lets focus on mean salary
- In both cases, the estimator is
- But in one case, the estimate is 64.5 and in the
other example, the estimate is 54.5
21Sample and estimator
- A key distinction estimator vs. estimate
- An estimate is a guess, based on a sample, at the
value of a parameter - It is a number, not random
- It is different for each sample, and depends on
the sample - An estimator is an algorithm, a rule, a formula
for turning a sample into an estimate. - It is a random variable
- Its distribution depends only on the
distribution of the underlying variable - It is exactly the same from sample to sample
22Sample and estimator
- Review
- We wish to know about (some quantity) in a
population - The distribution of the quantity complete
knowledge - A parameter of the distribution a summary of
the info in the distribution - A estimate is a guess at a parameter based on the
information in a sample - An estimator is a way of turning samples into
guesses
23All estimators are created equal?
- NOT!
- What makes for a good estimator?
- What makes for a good guess?
- Being exactly right all the time (cant be done)
- Being close to right, making few/small mistakes
- Being right on average
- Improving as the sample size grows
24All estimators are created equal?
- There is a parameter we want to know, lets call
it ?. It has a true value that we dont know. - We have an estimator, call it ?1-hat, which has
some distribution. - We have another estimator, call it ?2-hat, which
has some (other) distribution - How can we know which of these two is better than
the other
25All estimators are created equal?
- Some examples of estimators for E(s)?s
- The sample mean
26All estimators are created equal?
- Some examples of estimators for E(s)?s
- The sample mean plus one
27All estimators are created equal?
- Some examples of estimators for E(s)?s
- The first observation
28All estimators are created equal?
- Some examples of estimators for E(s)?s
- Roll a die and use the number of spots
29All estimators are created equal?
- Some examples of estimators for E(s)?s
- Seven
30All estimators are created equal?
- Some examples of estimators for E(s)?s
- It should be clear that the sample mean is the
best of these estimators - We want to develop objective criteria for
evaluating estimators which allow us to conclude
that, for example, that the sample mean is the
best of these estimators
31All estimators are created equal?
- Consider the distribution of the sample mean
32All estimators are created equal?
- Compared to the distribution of ?s,2-hat
33All estimators are created equal?
- Why do we like the distribution of the sample
mean better? - It is centered on the true value, ?s
- The estimator (the random variable) is more often
close to the truth, ?s
34All estimators are created equal?
- Consider the distribution of the sample mean
35All estimators are created equal?
- Compare to the distribution of the first obs
36All estimators are created equal?
- Why do we like the distribution of the sample
mean better? - Now, both are centered on the true value, ?s
- The sample mean is more often close to the truth,
?s - Now, because it has smaller variance
37All estimators are created equal?
- Consider the distribution of the sample mean
38All estimators are created equal?
- Compare to the distribution of seven
39All estimators are created equal?
- Why do we like the distribution of the sample
mean better? - Sample mean is centered on the true value, ?s, no
matter what the true value is - The estimator seven is only centered on the
true value if the true value happens to be ?s7 - Similarly, the sample mean is close to the true
value more often unless the true value is very
close to seven
40All estimators are created equal?
- Recall
- In general, we are trying to estimate a parameter
whose value we do not know, ? - We have a proposed estimator, ?1-hat
- We have another proposed estimator, ?2-hat
- We want to know which is better
- So, we need some criteria to use to compare
estimators
41All estimators are created equal?
- The simplest criteria
- Is an estimator is good if it is always right
- But a parameter is just a fixed number, like 62.
- An estimator is a random variable, so it can take
on many values - So, practically no estimator will be good by this
criterion. - We must lower our standards!
42All estimators are created equal?
- Bias and unbiasedness
- Since estimators are random variables, we can
think about their expectations - We are going to say that an estimator is unbiased
if
43All estimators are created equal?
- Bias and unbiasedness
- An estimator is unbiased if it is (always) right
on average - An unbiased estimator is not systematically
wrong
44All estimators are created equal?
- Bias and unbiasedness
- The bias of an estimator is defined as
- Obviously, an unbiased estimator has a bias equal
to zero
45All estimators are created equal?
- Bias and unbiasedness
- The sample mean is unbiased
- The sample mean plus one is biased
- The sample mean plus one has a bias of 1
- This is why we like the sample mean better than
the sample mean plus one - Sample mean is better than sample mean plus one
on the biasedness criterion
46All estimators are created equal?
- Some unbiased estimators
- The sample mean for the population mean
- The sample variance for the population variance
- The sample proportion for the population
proportion
47All estimators are created equal?
- Some biased estimators
- The sample standard deviation for the population
standard deviation - The sample median for the population median
- Sample percentiles for population percentiles
48All estimators are created equal?
- Variance (efficiency)
- Suppose we are comparing two unbiased estimators,
- We say that ?1-hat is more efficient than ?2-hat
if
49All estimators are created equal?
50All estimators are created equal?
- Variance (efficiency)
- We like the sample mean better than the first
observation because its variance is lower
51All estimators are created equal?
- Variance (efficiency)
- When we are talking about a group of unbiased
estimators, the best estimator is the one with
the least variance
52All estimators are created equal?
- Mean squared error
- Consider these two estimators
53All estimators are created equal?
- Mean squared error
- We might like ?1-hat more than ?2-hat even though
?1-hat is biased and ?2-hat is not - We might like ?1-hat better because it is near
the true value of the parameter more often, even
though it is biased.
54All estimators are created equal?
- Mean squared error
- To formalize this, we develop the mean-squared
error
55All estimators are created equal?
- Mean squared error
- The mean squared error is just the average
squared mistake that the estimator makes - So, even though ?1-hat is biased and ?2-hat is
not, we might like ?1-hat better since
56All estimators are created equal?
- Mean squared error and bias
- There is a relationship between mean squared
error and bias