Janez - PowerPoint PPT Presentation

About This Presentation
Title:

Janez

Description:

Janez tebe. DDI Experience in ADP (2002) Arhiv dru boslovnih ... Topics of a presentation ... OSIRIS Codebook Dictionary (SRC,ICPSR) Standard study description ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 38
Provided by: censu4
Category:
Tags: janez | osiris

less

Transcript and Presenter's Notes

Title: Janez


1
Janez Å tebe DDI Experience in ADP (2002)
  • Arhiv družboslovnih podatkov (ADP)
  • University of Ljubljana
  • E-mail
  • arhiv.podatkov_at_uni-lj.si
  • URL
  • http//www.adp.fdv.uni-lj.si
  • MOST (UNESCO) and GESIS workshop, Berlin, 22-24
    February 2002

2
(No Transcript)
3
Topics of a presentation
  • A brief history of technical standards and its
    influence on Data Archives organisation
  • The adoption of DDI in 1999
  • Advantages and disadvantages of using existent
    but still emerging standard
  • What are XML and DDI?
  • Quick look inside DDI DTD document structure
  • DDI XML Codebooks production line in ADP
  • Discussion

4
A brief history of data archives technical
standards (Tannenbaum, Taylor 1990)
  • Late 1950s IBM cards
  • Easily reproduced, recycled the advent of DA
  • 1960s electronic computers end of storage
    standards
  • A task of data conversion and interchange DA
    matured

5
Beginning of the www era in early 90s (DDI
Committee, 2001)
  • CSSDA electronic codebook specification
  • OSIRIS Codebook Dictionary (SRC,ICPSR)
  • Standard study description
  • But lack of coordination resulted in
    noncompatible catalogues

6
Midwife function (Scheuch, 1990)
  • A role of ZA in late 1960 when 5 new archives
    were established in Europe
  • offers to share experiences, especially of past
    errors
  • technical information on data storage and
    retrieval

7
Situation in 1997 when ADP establishes
  • Multiplicity of classificatory languages, search
    techniques and standards for documenting data
    (DDI Committee, 2001)
  • Every organisation adopt its own dialect of
    existing standards
  • A CESSDA IDC functioned as a lone example of
    still living integrating efforts

8
But... DDI was under discussion
  • March 1999 DDI Beta version became operable
  • ADP applied for a grant which secured a six-month
    long intensive learning and practise of its own
    XML codebooks production
  • Results
  • Successful implementation of first ten XML
    codebooks
  • Enhancing a production line for a routine
    codebook production.

9
2000 - 2001
  • Preparation of our own XSL for XML Codebook
    presentation on the internet
  • March 2000 DDI DTD Version 1.0 was published
  • Machine conversion of DDI DTD Beta XML Codebooks
    into 1.0 version
  • Continuing production of XML Codebooks

10
NESSTAR
  • Meanwhile a parallel refinement of NESSTAR tool
    was developing, which promises to add
    functionality to a growing collection of XML
    codebooks
  • End of 2001 a configuration of ADP NESSTAR
    server catalogue

11
(No Transcript)
12
Advantages and disadvantages of using existent
but still emerging standard
  • There is no need for (re)inventing a local
    catalogue rules
  • Cooperation in document production (sharing
    documents between sites)
  • A danger of staying alone if others will not
    adopt the same standard
  • Less capability to add specific emphasis
    according to local needs

13
/ -
  • Use of existing and emerging software tools
    suitable for the standard environment
  • Virtual catalogue
  • Conversion tools from SPSS and CAI software files
  • Dependency on others timetable in dynamic of
    tools production
  • E.g. NESSTAR was late in full adoption of UTF-8
    convention which was crucial fur us

14
What is xml?
  • XML is to a documents intellectual content what
    HTML is to the physical structure of that
    document (Thomas, Bloc 2001)

15
Why XML?
  • XML can be accomplished without professional or
    expert knowledge (user-friendly)
  • It is ready for preparing a multiple format
    presentations, e.g. printed book, internet etc.
  • It can be filled by different authors - each with
    specialist knowledge of its subject area. All
    obey the same content structure.

16
DDI DTD ltgt XML?
  • DTD xml Document type definition
  • DDI DTD a special Data documentation initiative
    XML Codebook definition
  • A Codebook xml document must be well-formed and
    valid

17
Well-formed
  • Any XML document, e.g. HTML, can be well-formed
    in accordance with the XML syntax
  • Main features lttagsgt must be closedlt/tagsgt
  • Sensitive UPPERlower case naming
  • Only one lttag-name IDid-entrygt per document

18
Valid Well formed
  • Conforms to a specific DTD
  • Example an underlined path calls ...
  • lt!DOCTYPE codeBook SYSTEM "CONFIG10/CODEBOOK-EN.DT
    Dgt
  • ltcodeBookgt
  • ltdocDscrgt ...

19
... a file "CONFIG10/CODEBOOK-EN.DTDgt (Content
of a file) ... lt!ELEMENT codeBook
(docDscr , stdyDscr ,
fileDscr , dataDscr ,
otherMat) gt lt!ATTLIST codeBook
a.global gt ...
20
What does it all mean?
  • You do not have to look in the machine-readable
    codebook.DTD file to fill-in a .XML Codebook
  • A XML editor helps to check well-formedness and
    document validity
  • It helps choosing appropriate elements in
    accordance with the DTD while editing
  • A human-readable Tag Library consists of
    element definition with practical examples. It
    gives you guidance on type and form of information

21
Lets look
  • Inside DDI DTD document structure...

22
Integrates different levels of information in a
same document
  • docDscr (XML document and sources description)
  • stdyDscr (Overall study stdy level references)
  • fileDscr (Physical data files)
  • dataDscr (variables)
  • othMat (additional material for variables
    documentation)

23
It specifies both...
  • The content of catalogue - suitable as input to
    virtual catalogue of different sites, produced on
    various platforms.
  • The content of codebook (variables description)
    suitable as input to virtual library of all
    individual measurements in the studies in a
    collection

24
A dilemma of Library vs. Data service concept
(Scheuch, 1990
  • The unit of storage is study
  • The unit of storage is the variable

25
In a DDI DTD XML codebook you can integrate
meta-information about...
  • Intellectual content of a study
  • Its scope
  • Methodological details
  • Retrieval and dissemination policies
  • File location and format

26
() References to accompanying documents, e.g.
  • Reports on methodology,
  • Publications,
  • Classifications lists,
  • Questionnaires and similar,
  • Computer syntax files,
  • Tables of results,
  • etc.

27
() Hyperlink cross-references inside and outside
document
  • The use of ID and IDRefs attributes
  • The use of URI attributes

28
To sum up
  • XML is similar to HTML in that it is
  • Easy to use,
  • Broadly accessible,
  • Hyper-textual
  • In addition it has
  • Computerhuman readable and understandable
    structure of document content

29
DDI XML Codebooks production line in ADP
  • First step
  • Basic information about new data set file,
    depositor, and accompanying material is first
    entered in ADP Inventory book (ACCESS Data base)
  • After choosing best suited predefined XML DDI
    Codebook template we extract the information from
    ACCESS data base to the draft XML Codebook
  • A resulting codebook is moved to an Internet
    catalogue for quick info about new study, viewing
    is supported by referenced XSL through IE5 or
    better.

30
Second step Full Study description
  1. A depositor is requested to fill a MS Word form,
    containing elements corresponding to DDI DTD
    study description section
  2. A draft XML Codebook from previous step is edited
    with XMetaL XML editor. Missing peaces of
    information are added manually

31
Third step Codebook Data description generated
from SPSS data file
  1. Final SPSS data file, if fully labelled, is
    converted with the NSD XML Generator to an XML
    data description section of DDI Codebook and
    integrated with previous study description

32
Step 4 Codebook Data description with full
questions text
  • For most important data sets full questions text
    is entered into dD section from original
    questionnaire text file
  • or
  • by using a conversion tool from CAI computer
    readable files to a DDI XML files.

33
Finally NESSTAR
  • Final two documents, Slovene and English
    language DDI XML Codebooks, are converted into a
    NESSTAR complaint format and together with the
    data file published into a NESSTAR catalogue.

34
Human computer readable
Human readable
Computer readable
Codebook.xsl
IE explorer view
Printed codebook
Tag Library
NESSTAR Catalogue Data Explorer
Code- book.dtd
stdyDscr form filled-in by depositor
Original paper documents
Codebook.xml (XML Editor)
SPSS data labels, CAI quest.
docDscr stdyDscr fileDscr dataDscr othMat...
Free-text documents
Coversion Tools
35
Common issues in DDI XML codebooks production
  1. XML editors does not necessarily support UNICODE
  2. The use of entities in XML document helps to
    standardise document production, makes it faster
    and easier to translate into English

36
Conclusions
  • DDI DTD receive growing attention in a community
    which guaranty production of new tools for
    enhancing its use
  • Despite continuing developments and overlapping
    archival standards, DDI 1.0 as todays technology
    promises the longevity of XML Codebook 1.0
    documents
  • Slovene ADP have taken the experience with DDI
    for guidance of its organisation.

37
Main references
  • DDI Committee (2001) The Data Documentation
    Initiative (DDI) Version 1.1 The New
    Specification for Social Science Metadata.
    Project Description.
  • Data Documentation Initiative. A Project of a
    Social Science Community. (2002)
    http//www.icpsr.umich.edu/DDI
  • Scheuch, Erwin K. (1990) From a data archive to
    an infrastructure for the social sciences.
    International Social Science Journal, No. 123,
    pp. 93-111.
  • Tanenbaum, Eric and Marcia Taylor (1990)
    Developing social science archives. International
    Social Science Journal, No. 124?.
  • Thomas, Wendy L. And William C. Block (2001) An
    Introduction to the Data Documentation Initiative
    (DDI). ICPSR OR Meeting 2001. http//www.icpsr.umi
    ch.edu/DDI/PAPERS/
Write a Comment
User Comments (0)
About PowerShow.com