STATISTICS FOR MANAGERS - PowerPoint PPT Presentation

About This Presentation
Title:

STATISTICS FOR MANAGERS

Description:

STATISTICS FOR MANAGERS LECTURE 3: LOOKING AT DATA AND MAKING INFERENCES 1. LOOKING AT DATA Central part of statistics: describing/ summarizing data Take into account ... – PowerPoint PPT presentation

Number of Views:89
Avg rating:3.0/5.0
Slides: 41
Provided by: CJQ9
Learn more at: https://www.econ.upf.edu
Category:

less

Transcript and Presenter's Notes

Title: STATISTICS FOR MANAGERS


1
STATISTICS FOR MANAGERS
  • LECTURE 3
  • LOOKING AT DATA AND MAKING INFERENCES

2
1. LOOKING AT DATA
  • Central part of statistics describing/
    summarizing data
  • Take into account that data come in different
    types
  • Sales
  • Security rating
  • Sector

3
1.1 TYPES OF DATA
  • Qualitative/categorical
  • Attribute (nominal) data
  • Ranked (ordinal) data
  • Quantitative/numerical
  • Different types of data require different
    treatment
  • One can use
  • Graphical summaries
  • Numerical summaries

4
1.2 QUALITATIVE DATA
  • Graphical summaries
  • Pie chart
  • Bar chart
  • Ordered bar chart
  • Numerical summaries
  • Frequency tables
  • Percentage tables

5
1.3 QUANTITATIVE DATA
  • Graphical summaries
  • Run chart Example stock prices
  • Histogram. Example tick data
  • Box plot
  • Numerical summaries
  • Arithmetic mean
  • Median
  • Standard deviation
  • Quartiles

6
1.3.1 RUN CHART
  • For data collected over time (time series)
  • X-axis date or number of data point
  • Y-axis numerical value of data point
  • Things to look for
  • Trends
  • Seasonality
  • Cycles
  • Outliers

7
1.3.1 RUN CHART (cont.)
8
1.3.1 RUN CHART (cont.)
9
1.3.2 HISTOGRAM
  • Determine the range of data
  • Decompose into bins of equal width
  • Count how many data points fall within each bin
  • Construct a bar chart based on these counts
  • Only problem have to choose the width of the bin
  • Allows to judge
  • Center/location
  • Spread/variation
  • Symmetry
  • Outliers

10
1.3.2 HISTOGRAM (cont.)
11
1.3.3 BOX PLOT
  • Pack a lot of information in a single plot
  • Box that extend from Q1 to Q3
  • A line inside the box indicates the median
  • Whiskers extend to bottom and top
  • Outliers are denoted by asterisks
  • Can compare data sets by lining up their box
    plots.

12
1.3.3 BOX PLOT (cont.)
13
1.3.4 LOCATION
  • Mean sum up all the data and divide by the
    number of points
  • Median
  • Sort all the data from smallest to largest
  • Take the middle one (for odd number of data)
  • Take the average of the middle two (for even
    number of data)

14
1.3.4 LOCATION (cont.)
  • Mean versus median
  • The median is more robust than the mean. This
    means that it is less affected by extreme
    observations
  • As a function of symmetry of the data
  • Skewed to the left meanltmedian
  • Symmetric mean approximately equal to median
  • Skewed to the right meangtmedian
  • For skewed data the median is a more typical
    observation

15
1.3.5 SPREAD
  • Standard deviation
  • Measures a typical deviation from the mean
  • Do not bother to do it yourself. Let EXCEL or any
    other program do it for you.
  • Inter-quartile range
  • Q1 is median of the bottom half of data
  • Q3 is median of the top half of data
  • IQRQ3-Q1

16
1.3.6 OUTLIER DETECTION
  • Graphically
  • Use histogram
  • Look for points away from the rest
  • Numerically
  • Points more than 3 standard deviations away from
    the mean
  • Points more than 1.5IQR away from Q1 and Q3.

17
2. SAMPLING
  • All statistical information is based on data
  • The process of collecting data is called sampling
  • It is important to do it right
  • Not everybody seems to understand this importance

18
2. GENERAL SITUATION
  • We study a population
  • Can be a population in the strict sense but it
    could also be an experiment
  • We are interested in certain characteristics of
    the population (parameter)
  • Want to learn as much as possible about the
    parameter

19
2. EXAMPLES
  • Population of Beijing
  • What is the average income?
  • What percentage speak Cantonese?
  • What percentage has Internet?
  • What is the average price of the square meter?
    (300.000 euros buy only 174 squares meters).
  • What is the percentage of people that have a DVD?

20
2. BASIC PROBLEM
  • Most populations are very large, or even infinite
  • Hence it is typically impossible to exactly
    determine a parameter (sometime unfeasible from a
    cost perspective)
  • But it is possible to learn something about a
    parameter
  • By collecting a sample from the population we can
    obtain information
  • But the quality of information cna only be as
    good as the quality of the sample

21
2. GOOD SAMPLE, IS THIS HARD?
  • The sample has to be representative of the
    population
  • In collecting data, we must not favor (or
    disfavor) any particular segment of the
    population
  • If we do we get biased samples
  • Biased samples yield biased estimates.
  • Example of biased samples Internet.

22
2. NO VOLUNTEERS PLEASE
  • A sample into which people have entered at their
    own choice is called voluntary response sample or
    self-selected.
  • This typically happens when polls are posted on
    the internet, the TV,..
  • The scheme favors people with strong opinions.
  • The resulting sample is rarely representative of
    the population
  • As so often you get what you pay for! (although
    something they pay for!)

23
2. HOW TO DO IT RIGHT
  • Analogy
  • Have one ball per member of the population
  • Put all the balls in a big urn
  • Mix well
  • Take out n balls
  • The result is called simple random sample

24
2. DO IT RIGHT...
  • There are other ways to get representative
    samples
  • Stratified sampling
  • Systematic sampling
  • Cluster sampling (multistage)

25
2. ... BUT AN ESTIMATE IS JUST THAT
  • We can estimate a parameter from the sample (a
    mean or a proportion)
  • ... but an estimate is not equal to the
    parameter!
  • ... Because a sample is not equal to a population
  • We must be aware of sampling error
  • Many people are not!
  • They sell us estimates as if they were
    parameters. Shame on them. Will do it right.

26
3. BASIC ESTIMATION
  • General estimation
  • We are interested in a population parameter
  • We collect a random sample
  • In a first step we estimate the parameter. This
    is usually straighforward.
  • In a second step, we deal with the sampling
    error.
  • This requires more work but it is worhwhile.

27
3.1. ESTIMATING A PROPORTION
  • We are interested in a population proportion p
  • We collect a random sample size n
  • We compute the sample proportion
  • This is a natural estimator for p,
  • But due to the sampling error is not equal to the
    true parameter p
  • Goal quantify the sample uncertainty contained
    in the estimator of p
  • Intuition the larger n the smaller the
    uncertainty.

28
3.1. ESTIMATING A PROPORTION
  • From probability theory we know that the central
    limit theorem applies, under some assumptions
  • For n large, then with a probability 95 the
    population proportion p will be in between
  • For the interval to be trusted we require

29
3.1. ESTIMATING A PROPORTION
  • A confidence interval has the following general
    form
  • CIestimator constant x std error (SE)
  • estimator margin of error (ME)
  • For a proportion
  • SE
  • The SE does not depend on the confidence level
    but the ME does because of the constant, which is
    often abbreviated as z

30
3.1. ESTIMATING A PROPORTION
  • How is the ME affected by its various inputs?
  • ME
  • As the confidence level increases the ME goes up.
  • As the estimator moves towards 0.5 the ME goes up
  • As n increases the ME goes down
  • We control de confidence level and n, but not the
    estimator of p

31
3.1. ESTIMATING A PROPORTION
  • Want a CI with a specified level and a specif ME?
    How large a sample size n is needed?
  • Use ME and solve for n
  • ME
  • Solution
  • Catch 22 we have not collected the sample yet,
    and therefore the estimate for p is not available
    yet. Solutions
  • 1. Worst case scenario estimator0.5
  • 2. Use a guess based on previous information

32
WHAT CONFIDENCE LEVEL?
  • You may want a confidence level other than 95.
  • Most common 90, 95 and 99.
  • The formula for the CI is equal
  • You only change the constant 1.96
  • Higher confidence level give a wider interval

Conf. Level 90 95 99
Constant z 1.64 1.96 2.57
33
3.2. ESTIMATING A MEAN
  • We are interested in a population mean and use as
    estimator the sample mean.
  • CIestimator constant x std error (SE)
  • estimator margin of error (ME)
  • For a mean
  • SE
  • CI
  • Rule of thumb need more than 50 obs. To trust
    this interval

34
3.2. ESTIMATING A MEAN
  • How is the margin of error (ME) affected by its
    various inputs?
  • ME
  • As confidence level increases, ME goes up
  • As s increases the ME goes up
  • As n increases the ME goes down
  • We control n and conf. level but not s.

35
3.2. ESTIMATING A MEAN
  • Want a CI with a specified level and a specif ME?
    How large a sample size n is needed?
  • Use ME and solve for n
  • ME
  • Solution
  • Catch 22 we have not collected the sample yet,
    and therefore the estimate for p is not available
    yet. In this case there is not worst case
    scenario. Use a guess based on previous
    information

36
3.3. HYPOTHESIS TESTING
  • If you care about wether a parameter is equal to
    a certain prespecified value, there is an
    alternative to hypothesis testing
  • Just check whether the prespecified value is
    contained in the confidence interval

37
3.3. HYPOTHESIS TESTING
  • If the prespecified value is contained in the CI
  • It is one of the (many) plausible values
  • So we can only make a weak positive statement
  • If the prespecified values is not contained in
    the CI
  • It is not one of the plausible values
  • We can make a strong negative statement

38
3.3. CI PERFECT SUBSTITUTE
  • We wonder if a parameter is equal to a
    prespecified value?
  • The technique of hypothesis testing give a
    yes-or-no answer (at a certain level of
    significance)
  • We can get the same from the level of confidence
  • ... But in addition we get the range of all
    plausible values! This is valuable info.
  • Moral a confidence interval tends to be safer
    and more informative than hypothesis testing

39
3.4. CAVEAT
  • Our confidence intervals are simple, yet
    powerful.
  • But you cant use them blindly!
  • Two conditions to trust them
  • We need a large sample
  • We need a random sample
  • Data that is collected over time is usually NOT a
    random sample the data point of today is usually
    related to the data point yesterday
  • Stock returns are an exception to this rule.
  • Small sample and time series are for the pros!

40
3.5. WHAT ABOUT OTHER PARAMETERS
  • We have covered confidence intervals for
  • Population proportions
  • Population means
  • Both are based on the CLT
  • There are other interesting parameters
  • Population median
  • Population standard deviation
  • etc
  • Unfortunately they cannot be handle by the CLT
  • CI can be constructed but the corresponding
    techniques are more difficult.
Write a Comment
User Comments (0)
About PowerShow.com