Title: Handout 1
1Handout 1
2Summarizing and Describing Data (Graphical)
3Pareto Chart
- Consider a company that sells business bags.
Among these bags, some items generate more
revenues than other items. By ranking the items
according to the revenue, the company will know
which items they have to emphasize (in terms of
cost management, etc). For such a purpose, a
Pareto chart is useful.
4Pareto Chart Example
5Pareto Chart Example
The following data is on our IM Folder. Open the
excel file Bag sales
6A Procedure to make a Pareto
- Compute the revenue for each item
- Compute the total revenue
- Sort the data according to the revenue
- Compute the percentage of revenue for each item
- Compute the cumulative percentage of revenue
- Make the Pareto Chart
7Pareto Chart Example
8From the Pareto Chart example, we can learn
- Business bags black, Business bag brown, and OA
bags count above 70 of total revenue. - Require a lot of inventory
- Too much reliance on a small number of items.
Need more marketing effort for suit cases and
name card folders.
9Pareto Chart Example 2Visualizing Revenue by
Clients
- Use Pivot Table
- Pareto Chart
10Example (Sales by Clients spread sheet)
This data is a part of Sales by clients data
stored on our Applied Stat Folder. From this
data, we would like to make (1) a table that
ranks the revenue by clients, and (2) Pareto Chart
11Revenue Ranking Table (Example of the Use of
Pivot Table)
12Pareto Chart Example 2
100
13Histogram and frequency table
- Example
- Visualizing your clients age range using
histogram.
14Histogram Example
15From the histogram, we can learn that
- Clients of age between 35 and 45 are the primary
clients. -
- It is important to maintain the satisfaction
of these clients. - Provide new services for other age ranges to
increase client base.
16Making Histogram and Frequency Table
- Open the data Clients list which is stored in
our Applied Stat Folder. This is the data for the
histogram shown in the previous slides.
17Numerical Measure of data summary (I)
- Difference between Population and Sample
- Mean (Average)
- Median
18Difference between Population and Sample
- Population
- A population is the complete set of all items in
which an investigator is interested.
19Examples of Populations
- Names of all registered voters in the United
States. - Incomes of all families living in Daytona Beach.
- Grade point averages of all the students in your
university.
20- A major objective of statistics is to make an
inference about the population. For example What
is the average income of all families living in
Daytona Beach. - Often, collecting the data for the population is
costly or impossible. Therefore, we often collect
data for only a part of the population. Such data
is called a Sample.
21A Sample
- Sample
- A sample is an observed subset of population
values.
22Numerical Measure of Summarizing Data 1-1 Mean
(Average)
- How to compute the mean (average)
- Understanding the mathematical notation of the
mean (average) - Cautionary notes for the use of the mean
231-2 How to compute the mean
- Sum all the data, then divide it by the number of
observations. - We use the term sample size to mean the number
of observation.
241-3 Computing the mean an example
- This is a sample data of the ages of your
business clients. Compute the mean age of your
clients in this sample. - Note that this is a typical data format that we
will encounter in this course. It has the
observation id (Client ID), and the value of the
variable of interest (age) for each observation.
252-1 Understanding the mathematical notation of
the mean
This is one of the most common format of data
that we deal with. In the first column, we have
the observation id, and the second column has the
value for each observation. (Often observation id
is omitted) In the previous example, variable X
is the age of the clients. Then observation id 1
means that this is the first customer in your
customer list, and x1 is the age of the customer.
262-2 Understanding the mathematical notation of
the mean
When a data set is given in this format, the
sample mean of the variable X, denoted by ,is
given by
The notation, is the summation
notation. This is simply the sum from x1 to xn
272-3 Sample Mean and Population Mean
- Most often we use a sample data. For example, if
we want to know the popularity rating of the
current government, we may use data from 10,000
interviews. This is just a part of the whole
voting population. - Though not often, we may have the data from the
whole population.
282-4 Sample Mean and Population Mean
- Later, it will become convenient to distinguish
Sample mean and population mean. Thus we will use
different notation for the sample mean and the
population mean.
292-5 Notations for the sample mean and the
population mean
For a sample mean, we use the following notation
For the population mean, we use µ to denote the
population mean. We also use upper case N to
denote the sample size.
303-1 Cautionary note
- Mean (average) is not necessarily the center
of the data
313-2 Example
- The average Japanese household saving in year
2005 is ?17,280,000 - This data may make you feel well, if I do not
have this much saving, I am not normal - Now, take a look at the histogram of the
household saving in the next slide.
32The mean may not be the center of the data. An
example
33- One may think that the average is the normal
household. However, you can see that a lot of
households have savings much less than the
average. The average saving is very high because
a few households have huge savings. - In such case, median can give you a better
sense of a normal household. The definition of
the median is given in the next slide.
344-1 Median
- Sort the data in an ascending order. Then the
median is the value in the middle (middle
observation)
When the number of observation is an even number,
then there is no middle observation. In such
case, take the average of the two middle numbers
354-2 Median Exercise
- Open the file Computation of median A. This
data contains the age of a companys clients.
Find the median age of this sample - Open the file Computation of median B. This
data contains the revenue of bag sales. Find the
median of this sample.
36Japanese Household saving revisited
37Corresponding chapters
- This lecture note covers the following topics of
the textbook. - 1.1 Sampling
- Example 2.6 Pareto Diagram
- 2.4 Arithmetic Mean, Median