Title: Data Management (1)
1Data Management (1)Application of Information
and Communication Technology to Production and
Dissemination of Official statistics10 May 11
July 2006
- M Q Hasan
- Lecturer/ Statistician
- UN Statistical Institute for Asia and the Pacific
- Chiba, Japan
- Email hasan_at_unsiap.or.jp
2Overview
- Data management
- Data management planning
- Data management procedures
- Data management software
- Hands on experience
- References
3Data management and the NSO
- Data management during production
- Individual case
- Data management after production
- Individual case
- Data management
- All case long term
4Data management
- Management of data files
- Management files during analysis
- Management files afterwards
5Data management
- Management of data files
- Labeling data files
- Documentation
6Data management
- Management files during analysis
- Version managements
- Subset data
- Arrange files in different folder
- Index files
7Data management
- Management files afterwards
- Pass them to system administrator for future
reference
8DATA MANAGEMENT
9These will lead to
- Production of creditable data
- Design of robust/ efficient / flexible storage
and accessible system - Efficient procedure for sharing data with others
10Data managementbefore and duringdata processing
11During DP Planning
- Define the relevant aspects of a dataset.
- Formulate a data preservation strategy.
- Design an access procedure.
12Defining the relevant aspects of a dataset
- File format and file structure
- Naming files
- Creation and naming of variables
- Variable labels
13Defining the relevant aspects of a dataset
- Chose file structure according to available
computing resources and the experience of the
data processors.
14Defining the relevant aspects of a dataset
- Documentation
- Provide responsibility to log all processing
activities - Problems encounter
- How problems are to be solved
- Major decision taken
15DP Documentation
- Can be time consuming.
- Should contain all information about data, such
as, survey method, sample information, time of
collection, information about variables, missing
values etc. - Should start well before actual data processing.
- Follow standards.
- Preferably one file with reference to other
files.
16DP Documentation
- Title Child labour in Portugal Social
characterization of school-age children and their
families, 1998. - Subtitle Child labour in Portugal, 1998.
- Alternative title SIMPOC Portugal survey,
1998. - Parallel title Trabalho Infantil em Portugal
Caracterização social dos menores emidade escolar
e suas famílias, 1998 files.
17DP Documentation
- Keywords. National survey, child, economic
activity, child labour, household, household
chores etc. - Abstract. Purpose, nature, and scope of the child
labour data collection. Special characteristics
of the contents etc. - Time period covered. If the data was collected in
1999, and one question was did you work last
year?, The time period should be 1998-99.
18DP Documentation
- Date of collection. Date(s) when the data were
collected. - Country. Name of the country where the survey was
conducted. - Geographic coverage. Total geographic scope of
the data. - Geographic unit. Lowest level of geographic
aggregation covered by the datafor example
province, state, or district. - Unit of analysis. For most child labour surveys,
the basic unit of analysis or observation is the
individual person.
19DP Documentation
- Time method. Panel, cross-sectional, trend, and
time-series etc. - Data collector. Responsible for administering the
questionnaire or interview or for compiling the
data. E.G NSO. - Frequency of data collection. For example, in
first-time. - Sampling procedure. Reference to sampling
documents.
20DP Documentation
- Mode of data collection. CAPI, CATI etc.
- Type of research instrument. Structured,
semi-structured, open-ended questions etc. - Actions to minimize losses. E.G follow-up
visits, supervisory checks, historical matching
etc. - Control operations. Methods used to facilitate
data control.
21DP Documentation
- Weighting. Reference to appropriate document.
- Cleaning operation. E.g consistency checking,
wild code checking, etc. - Response rate. Percentage of sample members who
provided information. - Estimates of sampling error. Indication of how
precisely one can estimate a population value
from a given sample.
22DP Documentation
- Location. Say where the data is currently stored
(e.g. A national statistics office). - Availability status. Provide a statement of data
availability. - Extent of data. Number of physical files that
exist in a dataset. - Completeness of dataset. Describe if items of
collected information were not included in the
data file.
23DP Documentation
- Access authority. Contact person or organization
that controls access to the data collection. - Date use statement. Reference to the terms of use
for the data collection, if any. - Citation requirement. Specify any text that
should be cited in publications based on analysis
of the data.
24DP Documentation
- File contents. Short description of the file(s).
- File structure. E.G. Hierarchical, rectangular,
or relational etc. - Record or record group. Describe the record
groupings for hierarchical or relational. - Label (of record). Detailed information for each
record group. - Dimensions (of record). Physical characteristics
of the record, such items as number of variables
per record, number of cases, etc.
25DP Documentation
- Overall case count. Number of cases or
observations. - Overall variable count. Number of variables.
- Data format. Delimited format, free format,
software dependent, etc. - Missing data. Provide information such
standardized across the collection, that missing
data are the result of merging, etc. - Software. Identify the software used to create
the file, including the software version number. - Version statement. Version statement for the data
file.
26DP Documentation
- list of variables with followings
- if variable is a weight and if not reference
weight variable for this variable - question ID for the variable
- which format has been used (e.g. SAS, SPSS)
- the number of decimal points in the variable
- whether the options are discrete or continuous
which record type this variable belongs to
27DP
Conversion of data files to other formats as
required
- Usually generated in a package-specific format
- Convert data into other formats, if possible,
- Convert data into ASCII and generate codebook
- Reload ASCII data using same codebook
- Recheck data
28DATA MANAGEMENT
Storage of all files.
- Possible list/type of files
- Data in a package-specific format
- Data in ASCII with necessary data dictionary
- Public use data
- Public use data in ASCII with necessary data
dictionary - Final documentation
- Questionnaire
29DATA MANAGEMENT
Storage of all files.
- Possible list/type of files contd.
- Logical rules for consistency check.
- Computer program files.
- Interviewer and/or supervisors instruction
manual. - Coding file/s.
- Sampling and weight files.
30DATA MANAGEMENT
Storage of all files
- Group them considering version, type etc.
- Create index file associated with each
sub-directory. - Add short description to each file according to
the file contents in the index file.
31DATA MANAGEMENT
Formulating a data preservation strategy
- Hardware
- Automation software
- Directory structure
32DATA MANAGEMENT
33DATA MANAGEMENT
34DATA MANAGEMENT
35DATA MANAGEMENT
Designing an access procedure
- Access policy
- Safe keeping person system administrator
- Contact person supervisor
- Content modifying authority supervisor
- Finalize access condition to each file
36DATA DISSEMINATION
Data type
- Micro data
- Aggregate tables
- Executive summary
- Reports
37DATA DISSEMINATION
Methods
- Online direct access through internet in real
time - Off line available on request
38DATA MANAGEMENT
Designing an access procedure
- Backup policy
- During during data processing
- Data processors responsibility
- After finalization of data and documentation
- System administrators responsibility
39