Quantitative Data Preparation Alasdair Crockett, ESDS Data Services Manager - PowerPoint PPT Presentation

1 / 8
About This Presentation
Title:

Quantitative Data Preparation Alasdair Crockett, ESDS Data Services Manager

Description:

... aided surveys allow one to build in as many logical ... Microsoft Word, Adobe PDF, Rich text format (RTF) SGML, HTML, XML, WordPerfect. Hard copy (paper) ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 9
Provided by: croc97
Category:

less

Transcript and Presenter's Notes

Title: Quantitative Data Preparation Alasdair Crockett, ESDS Data Services Manager


1
Quantitative DataPreparationAlasdair
Crockett, ESDSData Services Manager
2
What characterises a good quantitative dataset?
  • i) Accurate data
  • ii) Well labelled data
  • iii) Well documented data
  • iv) Data that can be stored in user-friendly
    dissemination formats, but can also be archived
    in a future-proof preservation format

3
Accuracy of data validation checks
  • Computer aided surveys (CAPI, CATI or CAWI)
  • ? These are the most accurate way of gathering
    survey data, but the software (e.g. Blaise) and
    hardware (e.g. a laptop for every interviewer)
    may be beyond project resources 
  • ? Computer aided surveys allow one to build in as
    many logical checks - on question routing and
    responses - as is possible at the point of data
    creation.
  • Non computer aided surveys
  • ? Less control over initial responses, but checks
    can performed
  • At the point of data entry/transcription if
    data entry software is used. However, there are
    few cheap data entry packages around.
  • The only feasible option may be to enter data
    without checks directly into a spreadsheet style
    interface (e.g. Excel worksheet, SPSS data view),
    and perform validation checks afterwards - via
    command files in statistical packages or Visual
    Basic code in Excel or Access

4
An example of data seemingly untouched by the
human eye
  • Originating error in text variables
  • Occupation Description of Occupation
  • sole trader purveyor of seafood
  • Propagated error in derived numeric variables
  • Respondent was coded under the standard
    occupational (SIC) code relating to food
    retailers
  • 52.2 Retail sale of food, beverages and tobacco
    in specialised stores

5
Labelling of data I
  • All variables should be named. Variable names
    should not exceed 8 characters where possible, as
    the most common format for disseminating data is
    SPSS.
  • All variables should be labelled. Labels should
    be brief (preferably lt 80 characters), but
    precise and always make explicit the unit of
    measurement for continuous (interval) variables.
    Where possible, all variable labels should
    reference the question number (and if necessary
    questionnaire). For example, the variable
    q11bhexc might have the label q11b hours spent
    taking physical exercise in a typical week. This
    gives the unit of measurement and a reference to
    the question number (q11b), so the user can
    quickly and easily cross-reference to it.

6
Labelling of data I
  • For categorical variables, all codes (values)
    should be given a brief label (preferably lt 60
    characters). For example, p1sex (gender of person
    1) might have these value labels 1 male, 2
    female, -8 dont know, -9 not answered. 
  • Where possible, all such labelling should be
    created and supplied to the UKDA as part of the
    data file itself. This is the expectation with
    data supplied in one of the three major
    statistical packages - SPSS, STATA or SAS.

7
Documentation
  • Core documentation
  • ? Questionnaire.
  • ? Methodology details of sample design, response
    rate, etc.
  • ? Codebook, i.e. a comprehensive list of
    variable names, variable descriptions, code names
    and variable formatting information. This is
    essential If the package being used for data
    management does not allow the sort of variable
    and code labelling to be stored within the data
    file
  • ? Technical report describing the research
    project.
  • Other useful documentation that is seldom
    supplied
  • ? Code used to create derived variables or check
    data (e.g. SPSS, STATA or SAS command files).

8
Good and bad data documentation formats
  • For full details for all types of data see
  • http//www.data-archive.ac.uk/depositingData/howto
    Deposit.aspformat 
Write a Comment
User Comments (0)
About PowerShow.com