Title: Data documentation guidelines
1Data documentation guidelines
- Ole Olsen, Helene Feveile
- National Institute of Occupational Health,
Denmark - European Conference on Quality in Survey
Statistics, - Cardiff, 24.-26. April 2006
2Outline of talk
- 1. Motivate research colleagues
- (and management)
- 2. Offer guidelines
- 3. How do other institutions cope with the
challenge?
3Newspaper headlines I
- Figures from Interpol reveal that Sweden is among
the most violent countries. - 7. September 2003
- By JAKOB RUBIN
- As neighbours we often think of Sweden as a
peaceful country - Forget it.
- Our neighbouring country is among the most
violent European Countries and even outrival USA
statistics show
4Newspaper text
- Swedish criminologist comments that the Interpol
figures must be wrong - nevertheles the article rambles on
- Interpol for some reason has come to include
attempted murder
5Newspaper headlines II
- Danish school children shirk (skive) twice as
often a new study shows - New laws and regulations suggested
- Much discussion
- It turned out that the questionnaire was wrongly
translated (last two weeks vs. last four weeks) - Could something similar happen at XXX?
6Could this happen at XXX?
- "Exactly what data are we analysing right now?"
- The raw, exciting and JUST arrived data?
- The cleaned data?
- The partly cleaned data?
- The "final" dataset?
- The real final dataset?
7Could this happen at XXX?
- ... And was it ...
- All returned questionnaires?
- Only those without too many missing?
- Those my colleague said was the "final"
- From the T-drive??
8Could this happen at XXX?
- And when do you ask yourself?
- When explaining why the press release does not
fit with the data? - When writing the article and trying to verify the
(according to memory) very exciting findings? - When the (otherwise very positive) peer reviewer
asks for just one additional information? - When preparing for follow-up analysis?
9Could this happen at XXX?
- "Unpredictable" events might occur
- The student, who cleaned the data, stops
- The statistician is employed in the medical
industry - The project manager experiences a severe
concussion - Etc.
10How to prevent all this?Data documentation!!
11(No Transcript)
12Gul forside
13.IH
14(No Transcript)
15Post fra bib (når den er klar igen!?)
16(No Transcript)
17Highlights from our manual
- 3.8.1. Preparation of a database
- 3.8.2. Documentation
- Read it twice
- First, when the project starts up
- Second, as a checklist while the project is
making progress
18Highlights from our manual
- 3.8.2. Documentation
- Purpose
- Chain of events
- Content
19Highlights from our manual
- 3.8.2. Documentation
- Purpose
- Any skilled researcher without prior knowledge
about the project should be able to use the
dataset based on only the data documentationen - Special warning
- No statistical analyses before data processing
and documentation has been finalized! - No matter how tempting this is!
20Highlights from our manual
- 3.8.2. Documentation
- Chain of events
- Begin data documentation when the project starts
- Create ONE file with documents and print-outs for
documentation of data processing and creation of
the final database
21Highlights from our manual
- 3.8.2. Documentation
- Content
- 1. Purpose
- 2. Design and study population
- 3. Research group
- 4. Overview of variables
- 5. Description of structure of data set
- 6. Analysis of non-response
- 7. Copy of questionnaire with variable names
inserted - 8. Description of data-processing
- 9. Description of all lasting recodingw and
new variables - 10. Variable list (proc contents) and
formats. - 11. Marginal distribitions
22Responsibility
- The project manager is responsible for data
documentation - (the data manager is not responsible!)
- (even though part of the work is done by the data
manager) - The research director is co-responsible for
allocation of sufficient - time
- resources
-
23Are we re-inventing the wheel?
- Are all these efforts needed?
24(No Transcript)
25(No Transcript)
26(No Transcript)
27Survey methods in community medicine
- ...
- 24. Collecting the data 231
- 25. Processing the data 235
- 26. Interpreting the findings 245
- ...
-
- 25. Processing the data 235
- Coding and data entry 236
- Data processing 237
- Statistical analysis 238
28 Data processing
- 5th ed, 1999
- A good data set does not present surprises when
it is used. Measures that may be taken ...
29For discussion
- Are the efforts needed?
- Level of ambition
- Allocation of time and ressources
- Documentation
- on paper
- or electronically
- Responsibility
- - Other comments
30(No Transcript)
31Value of Flow Diagrams
- Design and Setting
- - Analysis of 270 reports of RCTs published in
1998 in top-four medical journals - Conclusions
- - Flow diagrams are associated with improved
quality of reporting of randomized controlled
trials. - - The structure of current flow diagrams is less
than ideal. - - We propose a revised flow diagram
- Egger, Jüni, and Bartlett, for the CONSORT Group
JAMA. 20012851996-1999
32(No Transcript)
33(No Transcript)
34(No Transcript)
35Introduction to Survey Quality
- Preface
- ... very few survey workers are academically
trained in survey research ...... academic
training in survey methods is lagging behind
...... the goal of the book is to address the
need for a nontechnical comprehensive
introduction ...
36Introduction to Survey Quality
- "Since a critical role of the survey industry is
to provide input to worlds leaders for decision
making, it is imperative that the data generated
be of such quality that they can serve as a basis
for informed decisions. - The methods to assure good quality should be
known and accesible to all serious survey
organizations. - Today, this is unfortunately not always the case,
which is our primary motive and purpose for
writing this book."