Title: DA 812 Quantitative Research Methods II
1DA 812 Quantitative Research Methods II
- Lecturer Arthur Dryver, Ph.D.
2Contact Information
- My Name is Arthur Dryver, Ph.D.
- Lecturer At NIDA
- Office 518 Building 2
- Office Phone 02-727-3084
- E-mail arthur_at_adryver-consulting.com
- Website http//as.nida.ac.th/dryver/courses
- Office Hours
- Monday and Wednesday 1-4 PM, Appointment
Required. - Hours other than pre-specified office hours are
fine, given prior notice.
3Some of My Background
- Work Experience
- Lecturer at NIDA from October 2003 to present
- Previously I worked for 4 years within the
consulting - Scorex an Experian Company, AnaBus and PwC
- Analysis of data from various industries.
Experience handling the analysis data with
millions of records, several files involved. - Education
- The Pennsylvania State University
- Ph.D. in Statistics 1999
- Dissertation Topic Adaptive Sampling Strategies
- Rice University
- BA in Mathematical Sciences/Statistics 1993
4Requirements and Expectations
- Requirements
- I will teach in English
- Feel free to ask questions
- Should I speak too fast please let me know
- You may remind me to speak slower more than once,
I know I sometimes speak fast. - All work must be done in English
- I cannot read Thai.
- Expectations
- By the end of the course you should have a good
understanding of how to do quantitative research. - What statistics to do when.
5Getting To Know You
- This is a class of approximately 10 people. As
such I would like to get know you better. In
this manor I can make the course more suited to
your individual needs. - Please let me know the following
- Name
- Area of interest
- Your present work, if working
- Where you are in your research
- Experience with SPSS
- Expectations from this course
- Please write the above on a piece of paper and
hand it in. - But in addition to the above write down how
comfortable you are with English and then
Statistics.
6Why is Statistics Important to You?
- Research often requires the collection of data.
Example - The new underground, determining the opinion of
the general public. - Happy with the new train, or not.
- Even to answer a question as simple as the above
requires statistical knowledge from design to
analysis. - How to collect data.
- Analysis
- The percent that are happy with the new train.
Percent is a statistic. - Perhaps a confidence interval is desired as well,
more statistical analysis. - Take 20 minutes to look over the articles passed
out and think about what they have in common.
7I Know Most Non-Statisticians Dont Like
Statistics
- Old saying Kill two birds with one stone.
- This means accomplish two things with one action.
- Think about this class in relation to your work.
- Think about how it can answer the questions that
arise at work, and for you dissertation. - You will have to read articles for class, try to
do it in your area. Get your research started
now, if it hasnt been. - Basically, I am recommending that you kill two
birds with one stone. - If you do so, I believe you will find this class
considerably more beneficial, interesting and
enjoyable.
8Another Old Saying
- G.I.G.O.
- Garbage In Garbage Out
- What does this mean and what does it mean to us?
- Bad information often leads to bad
results/answers/plans of action. - The decisions we make are the result of the
information we have. Poor information leads to
poor decisions. - Data is a major part of the information that goes
into statistics. For this reason we will first
discuss data collection or sampling. - Bad data can lead to misleading statistics.
Which can lead to misleading beliefs, bad plans
of action.
9What is a Population?
- First before we discuss sampling what is a
population when used in reference to
statistics - All people or units of direct interest to the
study. Examples - A study is designed to determine the percent of
females in Bangkok. The population is all people
in Bangkok. - A study is designed to determine the percent of
males over 20 years old living in Bangkok that
have jobs. The population is all males over 20
years old living in Bangkok. - A study is designed to determine the average
income of working people in Thailand. All people
working in Thailand.
10What is a Sample?
- A sample is a smaller group selected for the
study. - Examples of a sample
- A study is designed to determine the percent of
females in Bangkok. The population is all people
in Bangkok. The sample might be 1000 people in
Bangkok. - A study is designed to determine the percent of
males over 20 years old living in Bangkok that
have jobs. The population is all males over 20
years old living in Bangkok. The sample might be
200 males over 20 years old living in Bangkok. - A study is designed to determine the average
income of working people in Thailand. All people
working in Thailand. The sample might be 5000
people living in Bangkok. - This would be a poor sample, since your results
would represent Bangkok not Thailand and all of
Thailand is of interest in the study. - Remember G.I.G.O.
- It is always desirable to obtain a sample that
represents the population of interest.
11Why Do We Sample?
- A sample of the entire population is called a
census. A census would be the most reliable. - Sampling is necessary for many reasons.
- Often it is not feasible to survey everyone.
Most populations are too large to have a census.
Not feasible due - Financial not enough money to employ enough
people for the survey. - Time there is a time constraint and it would
take too long. - Often people refuse to be surveyed making it
impossible to include them in the study (a type
of non response to be discussed later). - The main reason tends to be financial.
- The larger the sample the better tends to be true
though.
12Simple Random Sampling
- A Simple Random Sample Every individual in the
population being studied has an equal probability
of selection. Example - A professor wishes to learn about his class, the
class is made up of six students. Thus the six
students make up the population. He decides to
take a simple random sample of 1 student. He
numbers the students from 1 to 6. He rolls a six
sided die to select a student. This is an
example of a simple random sample of size one. - A population will be numbered from 1 to N
(population size). There are random number
generators that would be used to select a sample
of size n. - Lower case n often denotes sample size while
upper case N denotes population size.
13Obtaining Data
- Often it is difficult/costly to obtain data.
- Often surveys are taken on non-representative
samples. - Many researchers use convenience, a sample taken
from what is convenient. Example - A survey handed out to someones friends and used
as data to generalize to all of Bangkok. - Think about G.I.G.O. before sampling.
- This course is not on sampling so I will not go
into more details. - Should you have further questions on sampling you
may schedule an appointment with me.
14Descriptive Statistics
- Although for complicated multivariate statistics
it is often necessary to use SPSS, SAS or S-Plus,
for descriptive statistics and many basic
statistics excel can be very useful. - I want to cover this with Excel as many people
are familiar with Excel and will often receive
data in Excel, etc. - Excel can be the first step for looking at the
data when you want some answer fast. - Imagine a survey on 30 students where response
are 1 to 5, representing strongly disagree to
strongly agree with the statement made.
15Descriptive Statistics With Excel The Data
16Descriptive Statistics With Excel Add-Ins
If you do not have Data Analysis under Tools
you will have to click Add-Ins.. to add it
17Descriptive Statistics With Excel Analysis
ToolPak
Check Analysis ToolPak, I tend to check
everything.
18Descriptive Statistics With Excel - Data Analysis
Click on Descriptive Statistics
Histograms are often useful as well.
19Descriptive Statistics With Excel
Last Click OK
Enter the input, highlight desired data with
titles.
Labels are in first row, Question 1
We want summary statistics.
20Descriptive Statistics With Excel Output
Unformatted
21Descriptive Statistics With Excel - Output Minor
Formatting
22Descriptive Statistics With Excel - Histogram
Do one question at a time.
The Bin Range
Click Chart Output for a Chart.
23Descriptive Statistics With Excel Histogram
ChartMinimal Formatting
No one responded 1 for question 1. Why???
Another old saying A picture is worth a
thousand words. Although, descriptive
statistics are very useful. For many
non-statisticians a picture such as this can be
more informative. Something to consider when
deciding how to present your results. Important
Know your audience!
24Back to Descriptive Statistics With Excel
- The key descriptive statistics in my opinion are
the following - Mean - The average of the observations.
- Median - The 50th percentile of the
observations. - Mode - The most common observation.
- Standard deviation Measures how spread out
observations are. - Minimum - The minimum.
- Maximum - The maximum.
- Count - The number of observations.
25Descriptive Statistics
- The initial use of certain descriptive statistics
such as - Count
- Minimum
- Maximum
- Mode
- Mean
- Median
- Standard deviation
- Variance
- Many descriptive statistics can be very useful
for getting a preliminary understanding and error
checking of data. - This will discussed further on the next several
slides.
26Descriptive Statistics - Count
- The Count
- The number of data points.
- Missing Values are not included in the count.
- Gives an overall view of how well the data is
populated and might indicate possible data
issues. Example A survey of 300 people - Most questions have approximately 295
respondents, only 5 people not responding on
average. - Question number 6 has a count of 170, meaning 130
people did not respond. Question 6 is about
personal income. People are apparently
uncomfortable mentioning income. Perhaps there
is a response bias, leading the results to this
question unreliable. One scenario is that the
less wealthy people did not respond, leading to
response bias. There are many possibilities
explaining the lack of responses to this
question, regardless results may be unreliable.
27Descriptive Statistics Minimum and Maximum
- The maximum and minimum
- Very important to check.
- Outliers
- Data may contain observations that are either
extremely small or extremely large relative to
most of the data. These data points/observations
are called outliers. - Outliers can be the result of an error
(measurement, entry or other error). - Many times there are data entry errors, these two
statistics are very helpful to noticing errors. - Example Survey, with possible responses ranging
1-5, but you see a maximum of 11? How? A typo,
someone typed 11, instead of 1, press enter and
another 1. Entering data too fast. - Default values. For example a value 99. Often
99 is a default value representing - Not applicable, or
- Missing.
- Sometimes MinimumMaximum, this data point lends
almost no useful information. What is its
purpose. Example - Gender Male1, Female0
- Gender1 for entire dataset, the entire dataset
is comprised of information on males.
28Descriptive Statistics - Mode
- Mode is the most common data point.
- This will often indicate default values.
- Again, default values. For example a value 99.
Often 99 is a default value representing - Not applicable, or
- Missing.
- In my opinion the mode is best for understanding
about possible default values and data checking.
- I personally prefer a histogram to determine what
values have high frequencies. It is more
informative than the mode which only reveals the
most common answer.
29Descriptive Statistics Mean and Median
- The mean simply the average of the data.
- The mean is only accurate and useful after
removing default values. - Unexpectedly high or low means can be the result
of default values. - Also, outliers can have a strong affect on the
mean. - Often in statistics we want to compare the mean
of two or more groups. - Sometimes it represents the percent of
Example - Gender Male1, Female0. The average for
gender is really the percent of males in the
sample. - Median the 50th percentile. The midpoint of
the data. It is not affected by outliers, unlike
the mean. For this reason it is often good to
look at the median as well as the mean. - When the data is symmetrical, the mean and the
median should be equal.
An example of symmetrical data.
30Descriptive Statistics Standard Deviation,
Skewness and Kurtosis
- Standard deviation - how spread out the data is.
- Skewness - Departures from 0 indicate lack
symmetry. If the data is symmetrical the
skewness should equal approximately zero. - Kurtosis - Departures from 0 indicate lack of
normality, the data does not follow a normal
distribution should the kurtosis be very
different from 0. - Graphical displays will be given to help
illustrate the above statistics.
31The next few slides were taken fromStatistics
for Managersusing Microsoft Excel 3rd Edition
- Chapter 3
- Numerical Descriptive Measures
32Mean (Arithmetic Mean)
- Mean (arithmetic mean) of data values
- Sample mean
- Population mean
Sample Size
Population Size
33Mean (Arithmetic Mean)
- The most common measure of central tendency
- Affected by extreme values (outliers)
(continued)
0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9 10 12
14
Mean 5
Mean 6
34Median
- Robust measure of central tendency
- Not affected by extreme values
-
-
- In an ordered array, the median is the middle
number - If n or N is odd, the median is the middle number
- If n or N is even, the median is the average of
the two middle numbers
0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9 10 12
14
Median 5
Median 5
35Mode
- A measure of central tendency
- Value that occurs most often
- Not affected by extreme values
- Used for either numerical or categorical data
- There may may be no mode
- There may be several modes
0 1 2 3 4 5 6
0 1 2 3 4 5 6 7 8 9 10 11
12 13 14
No Mode
Mode 9
36Variance
- Important measure of variation
- Shows variation about the mean
- Sample variance
- Population variance
37Standard Deviation
- Most important measure of variation
- Shows variation about the mean
- Has the same units as the original data
- Sample standard deviation
- Population standard deviation
38Comparing Standard Deviations
Data A
Mean 15.5 s 3.338
11 12 13 14 15 16 17 18
19 20 21
Data B
Mean 15.5 s .9258
11 12 13 14 15 16 17 18
19 20 21
Data C
Mean 15.5 s 4.57
11 12 13 14 15 16 17 18
19 20 21
39Coefficient of Variation
- Measures relative variation
- Always in percentage ()
- Shows variation relative to mean
- Is used to compare two or more sets of data
measured in different units -
40Comparing Coefficient of Variation
- Stock A
- Average price last year 50
- Standard deviation 5
- Stock B
- Average price last year 100
- Standard deviation 5
- Coefficient of variation
- Stock A
- Stock B
41Shape of a Distribution
- Describes how data is distributed
- Measures of shape
- Symmetric or skewed
Right-Skewed
Left-Skewed
Symmetric
Mean lt Median lt Mode
Mean Median Mode
Mode lt Median lt Mean
42Quartiles
- Split Ordered Data into 4 Quarters
- Position of i-th Quartile
- and Are Measures of Noncentral Location
- Median, A Measure of Central Tendency
25
25
25
25
Data in Ordered Array 11 12 13 16 16
17 18 21 22
43Exploratory Data Analysis
- Box-and-whisker plot
- Graphical display of data using 5-number summary
Median( )
X
X
largest
smallest
12
4
6
8
10
44Distribution Shape and Box-and-Whisker Plot
Right-Skewed
Left-Skewed
Symmetric
45Pitfalls in Numerical Descriptive Measures
- Data analysis is objective
- Should report the summary measures that best meet
the assumptions about the data set - Data interpretation is subjective
- Should be done in fair, neutral and clear manner
46Ethical Considerations
- Numerical descriptive measures
- Should document both good and bad results
- Should be presented in a fair, objective and
neutral manner - Should not use inappropriate summary measures to
distort facts