DA 812 Quantitative Research Methods II - PowerPoint PPT Presentation

1 / 46
About This Presentation
Title:

DA 812 Quantitative Research Methods II

Description:

Descriptive Statistics Standard Deviation, Skewness and Kurtosis ... Kurtosis - Departures from 0 indicate lack of normality, the data does not ... – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 47
Provided by: asNi7
Category:

less

Transcript and Presenter's Notes

Title: DA 812 Quantitative Research Methods II


1
DA 812 Quantitative Research Methods II
  • Lecturer Arthur Dryver, Ph.D.

2
Contact Information
  • My Name is Arthur Dryver, Ph.D.
  • Lecturer At NIDA
  • Office 518 Building 2
  • Office Phone 02-727-3084
  • E-mail arthur_at_adryver-consulting.com
  • Website http//as.nida.ac.th/dryver/courses
  • Office Hours
  • Monday and Wednesday 1-4 PM, Appointment
    Required.
  • Hours other than pre-specified office hours are
    fine, given prior notice.

3
Some of My Background
  • Work Experience
  • Lecturer at NIDA from October 2003 to present
  • Previously I worked for 4 years within the
    consulting
  • Scorex an Experian Company, AnaBus and PwC
  • Analysis of data from various industries.
    Experience handling the analysis data with
    millions of records, several files involved.
  • Education
  • The Pennsylvania State University
  • Ph.D. in Statistics 1999
  • Dissertation Topic Adaptive Sampling Strategies
  • Rice University
  • BA in Mathematical Sciences/Statistics 1993

4
Requirements and Expectations
  • Requirements
  • I will teach in English
  • Feel free to ask questions
  • Should I speak too fast please let me know
  • You may remind me to speak slower more than once,
    I know I sometimes speak fast.
  • All work must be done in English
  • I cannot read Thai.
  • Expectations
  • By the end of the course you should have a good
    understanding of how to do quantitative research.
  • What statistics to do when.

5
Getting To Know You
  • This is a class of approximately 10 people. As
    such I would like to get know you better. In
    this manor I can make the course more suited to
    your individual needs.
  • Please let me know the following
  • Name
  • Area of interest
  • Your present work, if working
  • Where you are in your research
  • Experience with SPSS
  • Expectations from this course
  • Please write the above on a piece of paper and
    hand it in.
  • But in addition to the above write down how
    comfortable you are with English and then
    Statistics.

6
Why is Statistics Important to You?
  • Research often requires the collection of data.
    Example
  • The new underground, determining the opinion of
    the general public.
  • Happy with the new train, or not.
  • Even to answer a question as simple as the above
    requires statistical knowledge from design to
    analysis.
  • How to collect data.
  • Analysis
  • The percent that are happy with the new train.
    Percent is a statistic.
  • Perhaps a confidence interval is desired as well,
    more statistical analysis.
  • Take 20 minutes to look over the articles passed
    out and think about what they have in common.

7
I Know Most Non-Statisticians Dont Like
Statistics
  • Old saying Kill two birds with one stone.
  • This means accomplish two things with one action.
  • Think about this class in relation to your work.
  • Think about how it can answer the questions that
    arise at work, and for you dissertation.
  • You will have to read articles for class, try to
    do it in your area. Get your research started
    now, if it hasnt been.
  • Basically, I am recommending that you kill two
    birds with one stone.
  • If you do so, I believe you will find this class
    considerably more beneficial, interesting and
    enjoyable.

8
Another Old Saying
  • G.I.G.O.
  • Garbage In Garbage Out
  • What does this mean and what does it mean to us?
  • Bad information often leads to bad
    results/answers/plans of action.
  • The decisions we make are the result of the
    information we have. Poor information leads to
    poor decisions.
  • Data is a major part of the information that goes
    into statistics. For this reason we will first
    discuss data collection or sampling.
  • Bad data can lead to misleading statistics.
    Which can lead to misleading beliefs, bad plans
    of action.

9
What is a Population?
  • First before we discuss sampling what is a
    population when used in reference to
    statistics
  • All people or units of direct interest to the
    study. Examples
  • A study is designed to determine the percent of
    females in Bangkok. The population is all people
    in Bangkok.
  • A study is designed to determine the percent of
    males over 20 years old living in Bangkok that
    have jobs. The population is all males over 20
    years old living in Bangkok.
  • A study is designed to determine the average
    income of working people in Thailand. All people
    working in Thailand.

10
What is a Sample?
  • A sample is a smaller group selected for the
    study.
  • Examples of a sample
  • A study is designed to determine the percent of
    females in Bangkok. The population is all people
    in Bangkok. The sample might be 1000 people in
    Bangkok.
  • A study is designed to determine the percent of
    males over 20 years old living in Bangkok that
    have jobs. The population is all males over 20
    years old living in Bangkok. The sample might be
    200 males over 20 years old living in Bangkok.
  • A study is designed to determine the average
    income of working people in Thailand. All people
    working in Thailand. The sample might be 5000
    people living in Bangkok.
  • This would be a poor sample, since your results
    would represent Bangkok not Thailand and all of
    Thailand is of interest in the study.
  • Remember G.I.G.O.
  • It is always desirable to obtain a sample that
    represents the population of interest.

11
Why Do We Sample?
  • A sample of the entire population is called a
    census. A census would be the most reliable.
  • Sampling is necessary for many reasons.
  • Often it is not feasible to survey everyone.
    Most populations are too large to have a census.
    Not feasible due
  • Financial not enough money to employ enough
    people for the survey.
  • Time there is a time constraint and it would
    take too long.
  • Often people refuse to be surveyed making it
    impossible to include them in the study (a type
    of non response to be discussed later).
  • The main reason tends to be financial.
  • The larger the sample the better tends to be true
    though.

12
Simple Random Sampling
  • A Simple Random Sample Every individual in the
    population being studied has an equal probability
    of selection. Example
  • A professor wishes to learn about his class, the
    class is made up of six students. Thus the six
    students make up the population. He decides to
    take a simple random sample of 1 student. He
    numbers the students from 1 to 6. He rolls a six
    sided die to select a student. This is an
    example of a simple random sample of size one.
  • A population will be numbered from 1 to N
    (population size). There are random number
    generators that would be used to select a sample
    of size n.
  • Lower case n often denotes sample size while
    upper case N denotes population size.

13
Obtaining Data
  • Often it is difficult/costly to obtain data.
  • Often surveys are taken on non-representative
    samples.
  • Many researchers use convenience, a sample taken
    from what is convenient. Example
  • A survey handed out to someones friends and used
    as data to generalize to all of Bangkok.
  • Think about G.I.G.O. before sampling.
  • This course is not on sampling so I will not go
    into more details.
  • Should you have further questions on sampling you
    may schedule an appointment with me.

14
Descriptive Statistics
  • Although for complicated multivariate statistics
    it is often necessary to use SPSS, SAS or S-Plus,
    for descriptive statistics and many basic
    statistics excel can be very useful.
  • I want to cover this with Excel as many people
    are familiar with Excel and will often receive
    data in Excel, etc.
  • Excel can be the first step for looking at the
    data when you want some answer fast.
  • Imagine a survey on 30 students where response
    are 1 to 5, representing strongly disagree to
    strongly agree with the statement made.

15
Descriptive Statistics With Excel The Data
16
Descriptive Statistics With Excel Add-Ins
If you do not have Data Analysis under Tools
you will have to click Add-Ins.. to add it
17
Descriptive Statistics With Excel Analysis
ToolPak
Check Analysis ToolPak, I tend to check
everything.
18
Descriptive Statistics With Excel - Data Analysis
Click on Descriptive Statistics
Histograms are often useful as well.
19
Descriptive Statistics With Excel
Last Click OK
Enter the input, highlight desired data with
titles.
Labels are in first row, Question 1
We want summary statistics.
20
Descriptive Statistics With Excel Output
Unformatted
21
Descriptive Statistics With Excel - Output Minor
Formatting
22
Descriptive Statistics With Excel - Histogram
Do one question at a time.
The Bin Range
Click Chart Output for a Chart.
23
Descriptive Statistics With Excel Histogram
ChartMinimal Formatting
No one responded 1 for question 1. Why???
Another old saying A picture is worth a
thousand words. Although, descriptive
statistics are very useful. For many
non-statisticians a picture such as this can be
more informative. Something to consider when
deciding how to present your results. Important
Know your audience!
24
Back to Descriptive Statistics With Excel
  • The key descriptive statistics in my opinion are
    the following
  • Mean - The average of the observations.
  • Median - The 50th percentile of the
    observations.
  • Mode - The most common observation.
  • Standard deviation Measures how spread out
    observations are.
  • Minimum - The minimum.
  • Maximum - The maximum.
  • Count - The number of observations.

25
Descriptive Statistics
  • The initial use of certain descriptive statistics
    such as
  • Count
  • Minimum
  • Maximum
  • Mode
  • Mean
  • Median
  • Standard deviation
  • Variance
  • Many descriptive statistics can be very useful
    for getting a preliminary understanding and error
    checking of data.
  • This will discussed further on the next several
    slides.

26
Descriptive Statistics - Count
  • The Count
  • The number of data points.
  • Missing Values are not included in the count.
  • Gives an overall view of how well the data is
    populated and might indicate possible data
    issues. Example A survey of 300 people
  • Most questions have approximately 295
    respondents, only 5 people not responding on
    average.
  • Question number 6 has a count of 170, meaning 130
    people did not respond. Question 6 is about
    personal income. People are apparently
    uncomfortable mentioning income. Perhaps there
    is a response bias, leading the results to this
    question unreliable. One scenario is that the
    less wealthy people did not respond, leading to
    response bias. There are many possibilities
    explaining the lack of responses to this
    question, regardless results may be unreliable.

27
Descriptive Statistics Minimum and Maximum
  • The maximum and minimum
  • Very important to check.
  • Outliers
  • Data may contain observations that are either
    extremely small or extremely large relative to
    most of the data. These data points/observations
    are called outliers.
  • Outliers can be the result of an error
    (measurement, entry or other error).
  • Many times there are data entry errors, these two
    statistics are very helpful to noticing errors.
  • Example Survey, with possible responses ranging
    1-5, but you see a maximum of 11? How? A typo,
    someone typed 11, instead of 1, press enter and
    another 1. Entering data too fast.
  • Default values. For example a value 99. Often
    99 is a default value representing
  • Not applicable, or
  • Missing.
  • Sometimes MinimumMaximum, this data point lends
    almost no useful information. What is its
    purpose. Example
  • Gender Male1, Female0
  • Gender1 for entire dataset, the entire dataset
    is comprised of information on males.

28
Descriptive Statistics - Mode
  • Mode is the most common data point.
  • This will often indicate default values.
  • Again, default values. For example a value 99.
    Often 99 is a default value representing
  • Not applicable, or
  • Missing.
  • In my opinion the mode is best for understanding
    about possible default values and data checking.
  • I personally prefer a histogram to determine what
    values have high frequencies. It is more
    informative than the mode which only reveals the
    most common answer.

29
Descriptive Statistics Mean and Median
  • The mean simply the average of the data.
  • The mean is only accurate and useful after
    removing default values.
  • Unexpectedly high or low means can be the result
    of default values.
  • Also, outliers can have a strong affect on the
    mean.
  • Often in statistics we want to compare the mean
    of two or more groups.
  • Sometimes it represents the percent of
    Example
  • Gender Male1, Female0. The average for
    gender is really the percent of males in the
    sample.
  • Median the 50th percentile. The midpoint of
    the data. It is not affected by outliers, unlike
    the mean. For this reason it is often good to
    look at the median as well as the mean.
  • When the data is symmetrical, the mean and the
    median should be equal.

An example of symmetrical data.
30
Descriptive Statistics Standard Deviation,
Skewness and Kurtosis
  • Standard deviation - how spread out the data is.
  • Skewness - Departures from 0 indicate lack
    symmetry. If the data is symmetrical the
    skewness should equal approximately zero.
  • Kurtosis - Departures from 0 indicate lack of
    normality, the data does not follow a normal
    distribution should the kurtosis be very
    different from 0.
  • Graphical displays will be given to help
    illustrate the above statistics.

31
The next few slides were taken fromStatistics
for Managersusing Microsoft Excel 3rd Edition
  • Chapter 3
  • Numerical Descriptive Measures

32
Mean (Arithmetic Mean)
  • Mean (arithmetic mean) of data values
  • Sample mean
  • Population mean

Sample Size
Population Size
33
Mean (Arithmetic Mean)
  • The most common measure of central tendency
  • Affected by extreme values (outliers)

(continued)
0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9 10 12
14
Mean 5
Mean 6
34
Median
  • Robust measure of central tendency
  • Not affected by extreme values
  • In an ordered array, the median is the middle
    number
  • If n or N is odd, the median is the middle number
  • If n or N is even, the median is the average of
    the two middle numbers

0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9 10 12
14
Median 5
Median 5
35
Mode
  • A measure of central tendency
  • Value that occurs most often
  • Not affected by extreme values
  • Used for either numerical or categorical data
  • There may may be no mode
  • There may be several modes

0 1 2 3 4 5 6
0 1 2 3 4 5 6 7 8 9 10 11
12 13 14
No Mode
Mode 9
36
Variance
  • Important measure of variation
  • Shows variation about the mean
  • Sample variance
  • Population variance

37
Standard Deviation
  • Most important measure of variation
  • Shows variation about the mean
  • Has the same units as the original data
  • Sample standard deviation
  • Population standard deviation

38
Comparing Standard Deviations
Data A
Mean 15.5 s 3.338
11 12 13 14 15 16 17 18
19 20 21
Data B
Mean 15.5 s .9258
11 12 13 14 15 16 17 18
19 20 21
Data C
Mean 15.5 s 4.57
11 12 13 14 15 16 17 18
19 20 21
39
Coefficient of Variation
  • Measures relative variation
  • Always in percentage ()
  • Shows variation relative to mean
  • Is used to compare two or more sets of data
    measured in different units

40
Comparing Coefficient of Variation
  • Stock A
  • Average price last year 50
  • Standard deviation 5
  • Stock B
  • Average price last year 100
  • Standard deviation 5
  • Coefficient of variation
  • Stock A
  • Stock B

41
Shape of a Distribution
  • Describes how data is distributed
  • Measures of shape
  • Symmetric or skewed

Right-Skewed
Left-Skewed
Symmetric
Mean lt Median lt Mode
Mean Median Mode

Mode lt Median lt Mean
42
Quartiles
  • Split Ordered Data into 4 Quarters
  • Position of i-th Quartile
  • and Are Measures of Noncentral Location
  • Median, A Measure of Central Tendency

25
25
25
25
Data in Ordered Array 11 12 13 16 16
17 18 21 22
43
Exploratory Data Analysis
  • Box-and-whisker plot
  • Graphical display of data using 5-number summary

Median( )
X
X
largest
smallest
12
4
6
8
10
44
Distribution Shape and Box-and-Whisker Plot
Right-Skewed
Left-Skewed
Symmetric
45
Pitfalls in Numerical Descriptive Measures
  • Data analysis is objective
  • Should report the summary measures that best meet
    the assumptions about the data set
  • Data interpretation is subjective
  • Should be done in fair, neutral and clear manner

46
Ethical Considerations
  • Numerical descriptive measures
  • Should document both good and bad results
  • Should be presented in a fair, objective and
    neutral manner
  • Should not use inappropriate summary measures to
    distort facts
Write a Comment
User Comments (0)
About PowerShow.com