Data analysis: Explore - PowerPoint PPT Presentation

About This Presentation
Title:

Data analysis: Explore

Description:

The 5 per cent trimmed mean is the mean calculated on the data set with the top ... range measures the spread or range of the mid 50 per cent of the data ... – PowerPoint PPT presentation

Number of Views:85
Avg rating:3.0/5.0
Slides: 39
Provided by: test323
Learn more at: https://www.unodc.org
Category:
Tags: analysis | cent | data | explore

less

Transcript and Presenter's Notes

Title: Data analysis: Explore


1
Data analysis Explore
GAP Toolkit 5
Training in basic drug abuse data
management and analysis
Training session 9
2
Objectives
  • To define a standard set of descriptive
    statistics used to analyse continuous variables
  • To examine the Explore facility in SPSS
  • To introduce the analysis of a continuous
    variable according to values of a categorical
    variable, an example of bivariate analysis
  • To introduce further SPSS Help options
  • To reinforce the use of SPSS syntax

3
SPSS Descriptive Statistics
  • Analyse/Descriptive Statistics/Frequencies
  • Analyse/Descriptive Statistics/Explore
  • Analyse/Descriptive Statistics/Descriptives

4
Exercise continuous variable
  • Generate a set of standard summary statistics for
    the continuous variable Age

5
Explore Age
6
Explore Descriptive Statistics
Descriptives
7
Exercise Help
  • Whats This?
  • Results Coach
  • Case Studies

8
Measures of central tendency
  • Most commonly
  • Mode
  • Median
  • Mean
  • 5 per cent trimmed mean

9
The mode
  • The mode is the most frequently occurring value
    in a dataset
  • Suitable for nominal data and above
  • Example
  • The mode of the first most frequently used drug
    is Alcohol, with 717 cases, approximately 46 per
    cent of valid responses

10
Bimodal
  • Describes a distribution
  • Two categories have a large number of cases
  • Example
  • The distribution of Employment is bimodal,
    employment and unemployment having a similar
    number of cases and more cases than the other
    categories

11
The median
  • The middle value when the data are ordered from
    low to high is the median
  • Half the data values lie below the median and
    half above
  • The data have to be ordered so the median is not
    suitable for nominal data, but is suitable for
    ordinal levels of measurement and above

12
Example median
  • Seizures of opium in Germany, 1994-1998
  • (Kilograms)
  • Source United Nations (2000). World Drug Report
    2000 (United Nations publication, Sales No.
    GV.E.00.0.10).

13
  • Sort the seizure data in ascending order
  • The middle value is the median the median annual
    seizures of opium for Germany between 1994 and
    1998 was 42 kilograms

Ranked 1 2
3 4
5
14
The mean
  • Add the values in the data set and divide by the
    number of values
  • The mean is only truly applicable to interval and
    ratio data, as it involves adding the variables
  • It is sometimes applied to ordinal data or
    ordinal scales constructed from a number of
    Likert scales, but this requires the assumption
    that the difference between the values in the
    scale is the same, e.g. between 1 and 2 is the
    same as between 5 and 6

15
Example mean
  • Seizures of opium in Germany, 1994-1998
  • Sample size 5
  • 36 15 45 42 286 424
  • 424/5 84.8

16
The 5 per cent trimmed mean
  • The 5 per cent trimmed mean is the mean
    calculated on the data set with the top 5 per
    cent and bottom 5 per cent of values removed
  • An estimator that is more resistant to outliers
    than the mean

17
95 per cent confidence interval for the mean
  • An indication of the expected error (precision)
    when estimating the population mean with the
    sample mean
  • In repeated sampling, the equation used to
    calculate the confidence interval around the
    sample mean will contain the population mean 95
    times out of 100

18
Measures of dispersion
  • The range
  • The inter-quartile range
  • The variance
  • The standard deviation

19
The range
  • A measure of the spread of the data
  • Range maximum minimum

20
Quartiles
  • 1st quartile 25 per cent of the values lie below
    the value of the 1st quartile and 75 per cent
    above
  • 2nd quartile the median 50 per cent of values
    below and 50 per cent of values above
  • 3rd quartile 75 per cent of values below and
    25 per cent of the values above

21
Inter-quartile range
  • IQR 3rd Quartile 1st Quartile
  • The inter-quartile range measures the spread or
    range of the mid 50 per cent of the data
  • Ordinal level of measurement or above

22
Variance
  • The average squared difference from the mean
  • Measured in units squared
  • Requires interval or ratio levels of measurement

23
Standard deviation
  • The square root of the variance
  • Returns the units to those of the original
    variable

24
Example standard deviation and variance
Seizures of opium in Germany, 1994-1998
25
Distribution or shape of the data
  • The normal distribution
  • Skewness
  • Positive or right-hand skewed
  • Negative or left-hand skewed
  • Kurtosis
  • Platykurtic
  • Mesokurtic
  • Leptokurtic

26
The normal distribution
  • Symmetrical data the mean, the median and the
    mode coincide

27
Right-hand skew ()
  • Right-hand skew the extreme large values drag
    the mean towards them

28
Left-hand skew (-)
  • Left-hand skew the extreme small values drag the
    mean towards them

29
Bivariate analysis
  • Continuous Dependent Variable
  • Categorical Independent Variable

30
Explore
31
Explore Options button
32
Explore Plots button
33
Explore Statistics button
34
Descriptives
35
Male Female
36
Boxplot of Age vs Gender
Outlier
Median
Inter-quartile range
37
Syntax Explore
  • EXAMINE
  • VARIABLESage BY gender /IDid
  • /PLOT BOXPLOT HISTOGRAM
  • /COMPARE GROUP
  • /STATISTICS DESCRIPTIVES
  • /CINTERVAL 95
  • /MISSING LISTWISE
  • /NOTOTAL.

38
Summary
  • Measures of central tendency
  • Measures of variation
  • Quantiles
  • Measures of shape
  • Bivariate analysis for a categorical independent
    variable and continuous dependent variable
  • Histograms
  • Boxplots
Write a Comment
User Comments (0)
About PowerShow.com