Economics 173 Business Statistics - PowerPoint PPT Presentation

About This Presentation
Title:

Economics 173 Business Statistics

Description:

arithmetic mean, median, mode, (geometric mean) Measures of variability ... This is the most popular and useful measure of central location. Sum of the measurements ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 33
Provided by: sba461
Learn more at: http://www.econ.uiuc.edu
Category:

less

Transcript and Presenter's Notes

Title: Economics 173 Business Statistics


1
Economics 173Business Statistics
  • Lecture 2
  • Fall, 2001
  • Professor J. Petry
  • http//www.cba.uiuc.edu/jpetry/Econ_173_fa01/

2
Numerical Descriptive Measures
  • Measures of central location
  • arithmetic mean, median, mode, (geometric mean)
  • Measures of variability
  • range, variance, standard deviation, (coefficient
    of variation)
  • Measures of association
  • covariance, coefficient of correlation

3
Arithmetic mean
Measures of Central Location
  • This is the most popular and useful measure of
    central location

Sample mean
Population mean
Sample size
Population size
4
  • Example

The mean of the sample of six measurements 7, 3,
9, -2, 4, 6 is given by
6
7
3
9
4
4.5
  • Example

Calculate the mean of 212, -46, 52, -14, 66
5
The median
  • The median of a set of measurements is the value
    that falls in the middle when the measurements
    are arranged in order of magnitude.

Even number of observations
First, sort the salaries. Then, locate the values
in the middle
First, sort the salaries. Then, locate the value
in the middle
There are two middle values!
29.5,
26,26,28,29,30,32,60,31
26,26,28,29, 30,32,60,31
26,26,28,29, 30,32,60,31
26,26,28,29, 30,32,60,31
6
The mode
  • The mode of a set of measurements is the value
    that occurs most frequently.
  • Set of data may have one mode (or modal class),
    or two or more modes.

The modal class
7
  • Example
  • The manager of a mens store observes the waist
    size (in inches) of trousers sold yesterday 31,
    34, 36, 33, 28, 34, 30, 34, 32, 40.
  • What is the modal value?

8
Relationship among Mean, Median, and Mode
  • If a distribution is symmetrical, the mean,
    median and mode coincide
  • If a distribution is non symmetrical, and skewed
    to the left or to the right, the three measures
    differ.

A positively skewed distribution (skewed to the
right)
Mode
Mean
Median
9
  • If a distribution is symmetrical, the mean,
    median and mode coincide
  • If a distribution is non symmetrical, and skewed
    to the left or to the right, the three measures
    differ.

A negatively skewed distribution (skewed to the
left)
A positively skewed distribution (skewed to the
right)
Mean
Mode
Mean
Mode
Median
Median
10
Measures of variability(Looking beyond the
average)
  • Measures of central location fail to tell the
    whole story about the distribution.
  • A question of interest still remains unanswered

How typical is the average value of all the
measurements in the data set?
or
How spread out are the measurements about the
average value?
11
Observe two hypothetical data sets
Low variability data set
The average value provides a good representation
of the values in the data set.
High variability data set
This is the previous data set. It is now
changing to...
The same average value does not provide as good
presentation of the values in the data set as
before.
12
The range
  • The range of a set of measurements is the
    difference between the largest and smallest
    measurements.
  • Its major advantage is the ease with which it can
    be computed.
  • Its major shortcoming is its failure to provide
    information on the dispersion of the values
    between the two end points.

But, how do all the measurements spread out?
The range cannot assist in answering this question
Range
Largest measurement
Smallest measurement
13
The variance
  • This measure of dispersion reflects the values of
    all the measurements.
  • The variance of a population of N measurements
    x1, x2,,xN having a mean m is defined as
  • The variance of a sample of n measurementsx1,
    x2, ,xn having a mean is defined as

14
Consider two small populations Population A 8,
9, 10, 11, 12 Population B 4, 7, 10, 13, 16
9-10 -1
11-10 1
8-10 -2
12-10 2
Thus, a measure of dispersion is needed that
agrees with this observation.
Let us start by calculating the sum of deviations
The sum of deviations is zero in both
cases, therefore, another measure is needed.
A
10
9
8
11
12
but measurements in B are much more
dispersed then those in A.
The mean of both populations is 10...
4-10 - 6
16-10 6
B
7-10 -3
13-10 3
7
4
10
13
16
15
9-10 -1
The sum of squared deviations is used in
calculating the variance.
11-10 1
8-10 -2
12-10 2
The sum of deviations is zero in both
cases, therefore, another measure is needed.
A
10
9
8
11
12
4-10 - 6
16-10 6
B
7-10 -3
13-10 3
7
4
10
13
16
16
Let us calculate the variance of the two
populations
Why is the variance defined as the average
squared deviation? Why not use the sum of squared
deviations as a measure of dispersion instead?
After all, the sum of squared deviations
increases in magnitude when the dispersion of a
data set increases!!
17
Which data set has a larger dispersion?
Let us calculate the sum of squared deviations
for both data sets
However, when calculated on per observation
basis (variance), the data set dispersions are
properly ranked
Data set B is more dispersed around the mean
A
B
1
3
1
3
2
5
SumA (1-2)2 (1-2)2 (3-2)2 (3-2)2 10
5 times
5 times
!
SumB (1-3)2 (5-3)2 8
18
  • Example
  • Find the mean and the variance of the following
    sample of measurements (in years).
  • 3.4, 2.5, 4.1, 1.2, 2.8, 3.7
  • Solution

A shortcut formula
1/53.422.523.72-(17.7)2/6 1.075 (years)
19
  • The standard deviation of a set of measurements
    is the square root of the variance of the
    measurements.
  • Example
  • Rates of return over the past 10 years for two
    mutual funds are shown below. Which one have a
    higher level of risk?
  • Fund A 8.3, -6.2, 20.9, -2.7, 33.6, 42.9, 24.4,
    5.2, 3.1, 30.05
  • Fund B 12.1, -2.8, 6.4, 12.2, 27.8, 25.3, 18.2,
    10.7, -1.3, 11.4

20
  • Solution
  • Lets use the Excel printout that is run from the
    Descriptive statistics sub-menu

Fund A should be considered riskier because its
standard deviation is larger
21
Interpreting Standard Deviation
  • The standard deviation can be used to
  • compare the variability of several distributions
  • make a statement about the general shape of a
    distribution.
  • The empirical rule If a sample of measurements
    has a mound-shaped distribution, the interval

22
  • Example
  • The duration of 30 long-distance telephone calls
    are shown next. Check the empirical rule for the
    this set of measurements.
  • Solution
  • First check if the histogram has an
    approximate
  • mound-shape

23
  • Calculate the mean and the standard deviation
  • Mean 10.26 Standard deviation 4.29.
  • Calculate the intervals

Interval Empirical Rule Actual
percentage 5.97, 14.55 68 70 1.68,
18.84 95 96.7 -2.61, 23.13 100 100
24
Measures of Association
  • Two numerical measures are presented, for the
    description of linear relationship between two
    variables depicted in the scatter diagram.
  • Covariance - is there any pattern to the way two
    variables move together?
  • Correlation coefficient - how strong is the
    linear relationship between two variables

25
The covariance
mx (my) is the population mean of the variable X
(Y) N is the population size. n is the sample
size.
26
  • If the two variables move the same direction,
    (both increase or both decrease), the covariance
    is a large positive number.
  • If the two variables move in two opposite
    directions, (one increases when the other one
    decreases), the covariance is a large negative
    number.
  • If the two variables are unrelated, the
    covariance will be close to zero.

27
The coefficient of correlation
  • This coefficient answers the question How strong
    is the association between X and Y.

28
Strong positive linear relationship
1 0 -1
COV(X,Y)gt0
or
r or r
No linear relationship
COV(X,Y)0
Strong negative linear relationship
COV(X,Y)lt0
29
  • If the two variables are very strongly positively
    related, the coefficient value is close to 1
    (strong positive linear relationship).
  • If the two variables are very strongly negatively
    related, the coefficient value is close to -1
    (strong negative linear relationship).
  • No straight line relationship is indicated by a
    coefficient close to zero.

30
  • Example
  • Compute the covariance and the coefficient of
    correlation to measure how advertising
    expenditure and sales level are related to one
    another.

31
  • Use the procedure below to obtain the required
    summations

Similarly, sy 8.839
32
  • Excel printout
  • Interpretation
  • The covariance (10.2679) indicates that
    advertisement expenditure and sales level are
    positively related
  • The coefficient of correlation (.797) indicates
    that there is a strong positive linear
    relationship between advertisement expenditure
    and sales level.

Covariance matrix
Correlation matrix
Write a Comment
User Comments (0)
About PowerShow.com