Title: Descriptive statistics
1Descriptive statistics
- V506 Class 2
- September 3, 2009
2Overview
- Describing variables
- Measures of central tendency
- Measures of variation
- Using SPSS for descriptive statistics
3 Describing variables
- Have a variable with observations on a (possibly
large) number of cases - Idea is to produce a number of summary measures
that characterize those data - Focus here is on
- Central tendency
- Variation
4Measures of central tendency
5Mean
- Sum of the values divided by the number of cases
6Summation notation
- The yi (y1, y2, , yn) are the n values of the
variable Y - The sum of the values is then denoted as
7Calculating the mean for high temperatures
- Add values
- Number of cases
- Calculate mean
8Median
- The median represents the middle of the ordered
sample data - When the sample size is odd, the median is the
middle value - When the sample size is even, the median is the
midpoint/mean of the two middle values
9Calculating the median for high temperatures
10Mode
- The mode is the value that occurs most frequently
- It is the least useful (and least used) of the
three measures of central tendency
11Calculating the mode for high temperatures
mode 32
12Measures of central tendency and levels of
measurement
- Mean assumes numerical values and requires
interval data - Median requires ordering of values and can be
used with both interval and ordinal data - Mode only involves determination of most common
value and can be used with interval, ordinal, and
nominal data
13Comparison of mean and median
- Mean
- Uses all of the data
- Has desirable statistical properties
- Affected by extreme high or low values (outliers)
- May not best characterize skewed distributions
- Median
- Not affected by outliers
- May better characterize skewed distributions
14The mean and median and the distribution of the
data
- For symmetric distributions, the mean and the
median are the same - For skewed distributions, the mean lies in the
direction of the skew (the longer tail) relative
to the median
15Distribution shapes
Positively skewed
Symmetric
Negatively skewed
16Comparison of mean and median
17Measures of variation
- Range
- Variance and standard deviation
- Interquartile range
18Range
- Range is the difference between the minimum and
maximum values
19Calculating the range for high temperatures
range 60 32 28
20Variance and standard deviation
- The variance s2 is the sum of the squared
deviations from the mean divided by the number of
cases minus 1 - The standard deviation s is the square root of
the variance
21Why squared? Why n-1?
- Why square differences between data values and
mean? - Gives positive values
- Gives more weight to larger differences
- Has desirable statistical properties
- Why n - 1 for sample variance?
- Dividing by n underestimates population variance
- Dividing by n-1 gives unbiased estimate of
population variance
22Variance versus standard deviation
- Standard deviation is in same units as variable,
more readily interpreted - Standard deviation is measure of absolute
deviation - Variance has properties making it useful for
certain statistical analyses
23Calculating the variance and standard deviation
for high temperatures
24Interpretation of standard deviation
- If distribution of data approximately bell
shaped, then - About 68 of the data fall within one standard
deviation of the mean - About 95 of the data fall within two standard
deviations of the mean - Nearly all of the data fall within three standard
deviations of the mean
25Coefficient of variation
- Coefficient of variation (also sometimes
coefficient of dispersion) - Measure of relative variation
- Use to compare variation of distributions with
different units relative to their means
26Interquartile range
- Difference between upper (third) and lower
(first) quartiles - Quartiles divide data into four equal groups
- Lower (first) quartile is 25th percentile
- Middle (second) quartile is 50th percentile and
is the median - Upper (third) quartile is 75th percentile
27Calculating the interquartile range for high
temperatures
interquartile range 52 35 17
28Interquartile range and outliers
- Value can be considered to be an outlier if it
falls more than 1.5 times the interquartile range
above the upper quartile or more than 1.5 times
the range below the lower quarter - Example for high temperatures
- Interquartile range is 17
- 1.5 times interquartile range is 25.5
- Outliers would be values
- Above 52 25.5 77.5 (none)
- Below 25 25.5 9.5 (none)
29Comparison of range, standard deviation, and
interquartile range
- Sensitivity to extreme values
- Range extremely sensitive
- Standard deviation very sensitive
- Interquartile range not sensitive
- Standard deviation
- Has desirable statistical properties
- Suggests numbers of cases in different intervals
for bell-shaped distributions
30Typical work session with SPSS
- Create working folder on C using Windows
Explorer - Download SPSS .sav file to working folder
- Open .sav file in SPSS
- Select procedure from Analyze menu
- Select variable(s), specify options, run
- Print output, save output, or copy output to
other documents
31Entering new data
- Done in Data Editor window
- Click on Variable View tab to specify variable
name, type, and other information - Click on Data View tab to enter values in
spreadsheet-like window - Variables are columns
- Cases are rows
32Saving data
- Use File, Save command while in Data Editor
window to save data as SPSS data file - Saves file with .sav extension
- Use File, Save As to save with new filename, as
with any Windows program
33Using existing SPSS datasets
- Use File, Open, Data command to open existing
SPSS data file - Opens new Data Editor window with data from new
data file
34Descriptive statistics using Descriptives
- Use Analyze, Descriptive Statistics, Descriptives
to run procedure - Select variables for which descriptive statistics
are to be computed - Use Options to select statistics to be computed
- Quick way to get basic descriptive statistics
- Compact display of results for multiple variables
- Limited number of statistics available
35Descriptive statistics using Frequencies
- Use Analyze, Descriptive Statistics, Frequencies
to run procedure - Select variables
- For variables taking on large numbers of
different values, uncheck Display frequency
tables box - Must select Statistics to specify default is
none - Wide range of statistics, including percentiles
36Descriptive statistics using Explore
- Use Analyze, Descriptive Statistics, Explore to
run procedure - Select variables
- Click Statistics button if only statistics
desired and not graphs - Gives good assortment of statistics quickly,
without need to specify choices
37Saving SPSS output
- Use File, Save while in SPSS Output Viewer window
to save output file - File is saved with .spo extension
- Use File, Open, Output to load existing saved
output file
38Modifying SPSS output
- Can delete output by selecting in outline in left
panel, pressing Delete keeps things organized
after running something incorrectly - Use Insert, New Text command to open box for
adding text to output after currently selected
location - Allows adding notes, comments, answers to output
39Printing SPSS output
- Use File, Print while in SPSS Output Viewer
window to print output - Selecting output in outline in left panel allows
only selected output to be printed - Can use normal Windows Shift-Click and
Control-Click in selecting output
40Copying SPSS output to other documents
- Select item to be copied, either in outline in
left panel or in output itself - Generally works best to copy one object at a time
- Choose the Edit, Copy command
- Can right-click on object and select command from
pop-up menu - Paste the output into the target document
- In some cases, using Paste Special to paste
output in another format works better, e.g., as
Windows Metafile into PowerPoint