Title: Janez
1Janez Å tebe DDI Experience in ADP (2002)
- Arhiv družboslovnih podatkov (ADP)
- University of Ljubljana
- E-mail
- arhiv.podatkov_at_uni-lj.si
- URL
- http//www.adp.fdv.uni-lj.si
- MOST (UNESCO) and GESIS workshop, Berlin, 22-24
February 2002
2(No Transcript)
3Topics of a presentation
- A brief history of technical standards and its
influence on Data Archives organisation - The adoption of DDI in 1999
- Advantages and disadvantages of using existent
but still emerging standard - What are XML and DDI?
- Quick look inside DDI DTD document structure
- DDI XML Codebooks production line in ADP
- Discussion
4A brief history of data archives technical
standards (Tannenbaum, Taylor 1990)
- Late 1950s IBM cards
- Easily reproduced, recycled the advent of DA
- 1960s electronic computers end of storage
standards - A task of data conversion and interchange DA
matured
5Beginning of the www era in early 90s (DDI
Committee, 2001)
- CSSDA electronic codebook specification
- OSIRIS Codebook Dictionary (SRC,ICPSR)
- Standard study description
- But lack of coordination resulted in
noncompatible catalogues
6Midwife function (Scheuch, 1990)
- A role of ZA in late 1960 when 5 new archives
were established in Europe - offers to share experiences, especially of past
errors - technical information on data storage and
retrieval
7Situation in 1997 when ADP establishes
- Multiplicity of classificatory languages, search
techniques and standards for documenting data
(DDI Committee, 2001) - Every organisation adopt its own dialect of
existing standards - A CESSDA IDC functioned as a lone example of
still living integrating efforts
8But... DDI was under discussion
- March 1999 DDI Beta version became operable
- ADP applied for a grant which secured a six-month
long intensive learning and practise of its own
XML codebooks production - Results
- Successful implementation of first ten XML
codebooks - Enhancing a production line for a routine
codebook production.
92000 - 2001
- Preparation of our own XSL for XML Codebook
presentation on the internet - March 2000 DDI DTD Version 1.0 was published
- Machine conversion of DDI DTD Beta XML Codebooks
into 1.0 version - Continuing production of XML Codebooks
10NESSTAR
- Meanwhile a parallel refinement of NESSTAR tool
was developing, which promises to add
functionality to a growing collection of XML
codebooks - End of 2001 a configuration of ADP NESSTAR
server catalogue
11(No Transcript)
12Advantages and disadvantages of using existent
but still emerging standard
- There is no need for (re)inventing a local
catalogue rules - Cooperation in document production (sharing
documents between sites)
- A danger of staying alone if others will not
adopt the same standard - Less capability to add specific emphasis
according to local needs
13/ -
- Use of existing and emerging software tools
suitable for the standard environment - Virtual catalogue
- Conversion tools from SPSS and CAI software files
- Dependency on others timetable in dynamic of
tools production - E.g. NESSTAR was late in full adoption of UTF-8
convention which was crucial fur us
14What is xml?
- XML is to a documents intellectual content what
HTML is to the physical structure of that
document (Thomas, Bloc 2001)
15Why XML?
- XML can be accomplished without professional or
expert knowledge (user-friendly) - It is ready for preparing a multiple format
presentations, e.g. printed book, internet etc. - It can be filled by different authors - each with
specialist knowledge of its subject area. All
obey the same content structure.
16DDI DTD ltgt XML?
- DTD xml Document type definition
- DDI DTD a special Data documentation initiative
XML Codebook definition - A Codebook xml document must be well-formed and
valid
17Well-formed
- Any XML document, e.g. HTML, can be well-formed
in accordance with the XML syntax - Main features lttagsgt must be closedlt/tagsgt
- Sensitive UPPERlower case naming
- Only one lttag-name IDid-entrygt per document
18Valid Well formed
- Conforms to a specific DTD
- Example an underlined path calls ...
- lt!DOCTYPE codeBook SYSTEM "CONFIG10/CODEBOOK-EN.DT
Dgt - ltcodeBookgt
- ltdocDscrgt ...
19... a file "CONFIG10/CODEBOOK-EN.DTDgt (Content
of a file) ... lt!ELEMENT codeBook
(docDscr , stdyDscr ,
fileDscr , dataDscr ,
otherMat) gt lt!ATTLIST codeBook
a.global gt ...
20What does it all mean?
- You do not have to look in the machine-readable
codebook.DTD file to fill-in a .XML Codebook - A XML editor helps to check well-formedness and
document validity - It helps choosing appropriate elements in
accordance with the DTD while editing - A human-readable Tag Library consists of
element definition with practical examples. It
gives you guidance on type and form of information
21Lets look
- Inside DDI DTD document structure...
22Integrates different levels of information in a
same document
- docDscr (XML document and sources description)
- stdyDscr (Overall study stdy level references)
- fileDscr (Physical data files)
- dataDscr (variables)
- othMat (additional material for variables
documentation)
23It specifies both...
- The content of catalogue - suitable as input to
virtual catalogue of different sites, produced on
various platforms. - The content of codebook (variables description)
suitable as input to virtual library of all
individual measurements in the studies in a
collection
24A dilemma of Library vs. Data service concept
(Scheuch, 1990
- The unit of storage is study
- The unit of storage is the variable
25In a DDI DTD XML codebook you can integrate
meta-information about...
- Intellectual content of a study
- Its scope
- Methodological details
- Retrieval and dissemination policies
- File location and format
26() References to accompanying documents, e.g.
- Reports on methodology,
- Publications,
- Classifications lists,
- Questionnaires and similar,
- Computer syntax files,
- Tables of results,
- etc.
27() Hyperlink cross-references inside and outside
document
- The use of ID and IDRefs attributes
- The use of URI attributes
28To sum up
- XML is similar to HTML in that it is
- Easy to use,
- Broadly accessible,
- Hyper-textual
- In addition it has
- Computerhuman readable and understandable
structure of document content
29DDI XML Codebooks production line in ADP
- First step
- Basic information about new data set file,
depositor, and accompanying material is first
entered in ADP Inventory book (ACCESS Data base) - After choosing best suited predefined XML DDI
Codebook template we extract the information from
ACCESS data base to the draft XML Codebook - A resulting codebook is moved to an Internet
catalogue for quick info about new study, viewing
is supported by referenced XSL through IE5 or
better.
30Second step Full Study description
- A depositor is requested to fill a MS Word form,
containing elements corresponding to DDI DTD
study description section - A draft XML Codebook from previous step is edited
with XMetaL XML editor. Missing peaces of
information are added manually
31Third step Codebook Data description generated
from SPSS data file
- Final SPSS data file, if fully labelled, is
converted with the NSD XML Generator to an XML
data description section of DDI Codebook and
integrated with previous study description
32Step 4 Codebook Data description with full
questions text
- For most important data sets full questions text
is entered into dD section from original
questionnaire text file - or
- by using a conversion tool from CAI computer
readable files to a DDI XML files.
33Finally NESSTAR
- Final two documents, Slovene and English
language DDI XML Codebooks, are converted into a
NESSTAR complaint format and together with the
data file published into a NESSTAR catalogue.
34Human computer readable
Human readable
Computer readable
Codebook.xsl
IE explorer view
Printed codebook
Tag Library
NESSTAR Catalogue Data Explorer
Code- book.dtd
stdyDscr form filled-in by depositor
Original paper documents
Codebook.xml (XML Editor)
SPSS data labels, CAI quest.
docDscr stdyDscr fileDscr dataDscr othMat...
Free-text documents
Coversion Tools
35Common issues in DDI XML codebooks production
- XML editors does not necessarily support UNICODE
- The use of entities in XML document helps to
standardise document production, makes it faster
and easier to translate into English
36Conclusions
- DDI DTD receive growing attention in a community
which guaranty production of new tools for
enhancing its use - Despite continuing developments and overlapping
archival standards, DDI 1.0 as todays technology
promises the longevity of XML Codebook 1.0
documents - Slovene ADP have taken the experience with DDI
for guidance of its organisation.
37Main references
- DDI Committee (2001) The Data Documentation
Initiative (DDI) Version 1.1 The New
Specification for Social Science Metadata.
Project Description. - Data Documentation Initiative. A Project of a
Social Science Community. (2002)
http//www.icpsr.umich.edu/DDI - Scheuch, Erwin K. (1990) From a data archive to
an infrastructure for the social sciences.
International Social Science Journal, No. 123,
pp. 93-111. - Tanenbaum, Eric and Marcia Taylor (1990)
Developing social science archives. International
Social Science Journal, No. 124?. - Thomas, Wendy L. And William C. Block (2001) An
Introduction to the Data Documentation Initiative
(DDI). ICPSR OR Meeting 2001. http//www.icpsr.umi
ch.edu/DDI/PAPERS/