Title: Susanne Dobratz
1Session 3aUsing XML for Archiving ETDs
- Susanne Dobratz
- dobratz_at_rz.hu-berlin.de
2Agenda
- Introduction
- Overview of XML-based ETD projects worldwide
- Report from a DTD for ETDs workshop help in May
2000 in Berlin - Susanne Dobratz
- Experiences with an XML project at UIowa
- Patty Strabala (University of Iowa)
- The Cyberthéses Project
- Guylaine Beaudry (U of Montreal),
- Viviane Bouletreau (U Lyon 2),
- Gabriela Ortuzar (U of Chile)
3Why XML for ETDs?
- Preservation / Archiving
- long term preservation for 10 ... years
- using standardised documents formats UNICODE
- easy reconversion into new presentation or print
formats - including of multimedia objects
- Reuseability
- Of data / documents / content!!!!
- Information extraction (citation index, automated
cataloging) - Generating new products from one original
resource e.g. catalog of ETDs with abstracts in
geology
4Archivierbarkeit
A document in PDF
5A document in XML
- ltfrontgt
- ltschoolgt
- ltpgtAus dem Institut für Pathologie
(Rudolf-Virchow-Haus) lt/pgt - ltpgtdes Universitätsklinikum der Charitélt/pgt
- ltpgtder Humboldt-Universität zu Berlinlt/pgt
- ltpgtDirektor Prof. Dr. Mustermannlt/pgt
- lt/schoolgt
- ltsubmissiongtDissertationlt/submissiongt
- lttitlegtDer Blutkrebslt/titlegt ...
- lt/frontgt
6Encoding content MathML example
- ltapplygt
- ltplus/gt
- ltapplygt
- ltpower/gt
- ltcigtxlt/cigt
- ltcngt2lt/cngt
- lt/applygt
- ltapplygt
- lttimes/gt
- ltcngt4lt/cngt
- ltcigtxlt/cigt
- lt/applygt
- ltcngt4lt/cngt
- lt/applygt
x2 4x 4 0
7Why XML for ETDs? (cont.)
- Retrieval
- using document structure and semantic tags
- detailed search
- automated cataloging
- Highly structured information has a high value
8Example Structured Search Facilitiy
9Understanding XML
- Extended Markup Language
- Basic philosophy strictly divide between
content, structure, layout - Child of SGML
- Standard Generalised Markup Language (1986 ISO
standard) - well-formed versus valid documents, that need a
DTD - Developed by SGML Experts (Charles Goldfarb and
others) - 1996 first proposal
- XML 1.0 10. February 1998 first
W3C-Recommendation - Aims
- Enable a better useability of SGML for the Web
- vendor independent visualisation and processing
of structured information - get more interoperability betwen WWW applications
10What is a document?
- Content
- provided information (text, tune, picture)
- Structure
- sequencing and ordering of information
- Layout
- Visualisation of content and structur of document
- Metadata
- description of documents
11First Step Content analysis dissertation
- Doctoral candidate
- personal information
- thesis information
- index terms
- persons
- places, etc
- technical terms
- bibliography
- titel, URL, journal
- ltauthorgt
- ltgivengt Vorname ltsurnamegt Name
- lttitlegt ltapprovalsgt ltdegreegtltschoolgt
- ltindxflag
- typeperson
- typeplace
- lttermgt
- ltbibliographygt
- ltarticletitlegt ltworktitlegt lturlgt
12First Step Structure analysis dissertation
- ltfrontgt
- ltbodygt
- ltchaptergt, ltmmgt, lttablegt
- ltbackgt
- ltbibliographygt
- ltappendixgt
- ltacknowledgementgt, ltvitagt, ltdeclarationgt
- Titlepage
- Main part
- Chapters, Multimedia Objects, Schemes, Tables
- Appendix
- Bibliography
- Definitions, Programm source / ....
- Acknowledgements, Vita, Declaration
13How can XML be used Terms
- XML-Documents (Instances of a document type)
- e.g. a dissertation
- Document Type Definition (DTD)
- e.g. TEI (Text Encoding Initative)
- eXtensible Style Language (XSL)
- eXtensible Linking Language (XLL)
- Special Definitions
- RDF (Resource Description Framework)
- MathML (Mathematical Markup Language)
- SVG (Scalable Vector Graphics)
14Understanding XML
TEI P3 (Text Encoding Initiative)
SGML
DSSSL
HyTime
XML
XSL XSLT XSLFo XPath
Xlink Xpointer XPath XBase XML Linking
Is DTD of
HTML Link Syntax
(X)HTML
CSS
15Problem Most authors dont write in XML
- Conversion from native document formats is needed
- Microsoft Winword
- Corel WordPerfect
- Adobe Framemaker
- LaTeX for natural sciences
- different approaches around the world
- different DTDs
- different conversion strategies
16Example Dissertation in Word
17Structured Dissertation in Word (style sheet)
18ltETDgt ltFRONTgt ltSCHOOLgtltPgtAus dem Institut
fuumlr Physiologie der Humboldt-Universitauml
t zu Berlinlt/Pgt ltPgtDirektor Prof. Dr. P.B.
Perssonlt/Pgtlt/SCHOOLgt ltSUBMISSIONgtDISSERTATIONlt/SU
BMISSIONgt ltTITLEgt"Untersuchungen zur Entwicklung
der kardiorespiratorischen Interaktion anhand
gemeinsamer Rhythmen von Atmung und Herzaktion.
Longitudinalstudie der ersten sechs Lebensmonate
gesunder Saumluglinge."lt/TITLEgt ltDEGREEgtzur
Erlangung des akademischen Grades doctor
medicinae (Dr. med.)lt/DEGREEgt ltMAJORgtvorgelegt
der Medizinischen Fakultaumlt der
Humboldt-Universitaumlt zu Berlinlt/MAJORgt ltAUTH
ORgtvon Herrn ltGIVENgtPeterlt/GIVENgt ltSURNAMEgtAikelelt
/SURNAMEgt ltSUFFIXgtgeb. am 27.04.1967 in
Dresdenlt/SUFFIXgtlt/AUTHORgt ltDEANgtDekan Prof. Dr.
M. Dietellt/DEANgt ltAPPROVALSgt ltNAMEgt1. Prof. Dr.
E. Schubertlt/NAMEgt
Dissertation in SGML
19Dissertation in HTML
20Dissertation in PDF
21Building a Document Type Definition
DiML1.DTD
HUTPubl.DTD
22XML projects around the world
- DTD workshop May 2000 in Berlin
- Virginia Tech
- Technical University of Helsinki, (Uolu)
- University of Oslo,
- Swedish Univ. of Agricultural Sciences Uppsala
(Linköping) - University of Iowa,
- University of Michigan at Ann Arbor
- Université Lyon 2
- Université Montréal
- Cambridge University UK (LaTeX)
- Humboldt-University Berlin
23There is a community!
Different strategies for conversion
24Need for cooperation
- Interoperability
- Identify standards
- International retrieval
- Reducing costs for development of tools
25More examples Graphics on the web
- Web used graphic formats for 2D and 3D (gif,
jpeg, png) are pixel based - Zooming leds to ugly raster images
- solution usage of vector based formats
described by mathematical functions - reuseable content for sciences -gt content markup
-
26Scalable Vector Graphics
- W3C Working Draft 3. March 2000
- http//www.w3.org/TR/SVG/
- description of 2 dimensional graphic objects
- contains 3 different graphic object types
- vector graphics (rectangles, circles, lines,
curves) - text
- pictures
27SVG example
Produced in Corel Draw 9, exported to SVG
28SVG example
lt?xml version"1.0" encoding"iso-8859-1"?gt lt!DOCT
YPE svg SYSTEM "svg-19991203.dtd"gt lt!-- Creator
CorelDRAW --gt ltsvg xmlspace"preserve"
width"6.34153in" height"4.09944in"
style"shape-renderinggeometricPrecision
text-renderinggeometricPrecision
image-renderingoptimizeQuality"
viewBox"-3637 0 6341 4099"gt ltdefsgt ltstyle
type"text/css"gt lt!CDATA .str0
stroke1F1A17stroke-width3 .str1
stroke169F62stroke-width111 .fil4
fillnone .fil1 fill75C5F0 .fil2
fillDD137B .fil3 fillFFF500 .fil0
fillFFFB9D gt lt/stylegt lt/defsgt ltg gt
lttitlegtEbene 1lt/titlegt ltellipse class"fil0
str0" cx"-1275" cy"2697" rx"2361" ry"1401"/gt
ltrect class"fil1 str0" x"-3262" y"2"
width"5540" height"2428"/gt ltpath class"fil2
str0" d"M1656 1367l523 328 523 327 -199 530 -200
531 -647 0 -646 0 -200 -531 - 200 -530 523 -327
523 -328z"/gt lttext x"-2756" y"836"
class"fil3" style"fontnormal 500
Garamond"gtTest picture for SVGlt/textgt lttext
x"-2756" y"1399" class"fil3"
style"fontnormal 500 Garamond"gtin Corel Draw
9lt/textgt ... lt/ggtlt/svggt
Produced in Corel Draw 9, exported to SVG
29Extensible 3D (X3D)
- Successor of VRML97
- http//www.web3d.org/x3d.html
- plan in 2002 ISO standard
- Component based approach (profiles)
- scalable, thin clients, DOM support
- core model (X3D-Core profile)
- minimum set of viewer functions
- 3D geometries, animations, interactions,
rendering - X3D-VRML profile
- compatibility to VRML97
- user specific profiles (GeoVRML)
30X3D example
- VRML V2.0 utf8
- Transform
- children
- NavigationInfo headlight FALSE We'll
add our own light - DirectionalLight First child
- direction 0 0 -1 Light
illuminating the scene -
- Transform Second child - a
red sphere - translation 3 0 1
- children
- Shape
- geometry Sphere radius 2.3
- appearance Appearance
- material Material diffuseColor 1 0
0 Red -
-
-
As VRML
31X3D example
- lt?xml version"1.0" encoding"utf-8" ?gt
- lt!DOCTYPE X3D PUBLIC
- "http//www.web3D.org/TaskGroups/x3d/translation/
x3d-compromise.dtd" -
- lt!ENTITY Vrml97Profile "INCLUDE"gt
- lt!ENTITY CoreProfile "IGNORE"gt
- lt!ENTITY X3dExtensions "IGNORE"gt
- lt!ENTITY GeoVrmlProfile "IGNORE"gt
- gt
- ltX3Dgt
- ltScenegt
- ltTransformgt
- ltchildrengt
- ltNavigationInfo headlight"false"
avatarSize" 0.25 1.6 0.75" type"34EXAMINE34
"/gt - ltDirectionalLight/gt
- ltTransform translation"3.0 0.0 1.0"gt
- ltchildrengt
- ltShapegt
- ltgeometrygt
32SMIL
- Synchronize Multimedia on the web using an XML
based language - To enable simple authoring of TV-like multimedia
presentations such as training courses on the Web - HTML-like, easy to learn
- a smile presentation can contain components of
streaming audio, streaming video, images, text
or any other media type - http//www.w3.org/AudioVideo/
- SMIL Boston Worling Draft (25 Feb. 2000)
33MusicML
- DTD for sheetmusic (March 5, 1998)
- No standard
- http//195.108.47.160/3.0/musicml/index.html
- Connection Factory, Dutch enterprise
- MusicML document basic structure
- ltsheetmusicgt
- ltmusicrow size"two"gt
- lt/musicrowgt
- ltmusicrow size"two"gt
- lt/musicrowgt
- lt/sheetmusicgt
34MusicML example
- lt?XML version"1.0"?gt
- lt!DOCTYPE sheetmusic SYSTEM "music.dtd"gt
- ltsheetmusicgt
- ltmusicrow size"two"gt
- ltentrysegmentgt
- ltentrypart cleff"bass" rythm"threequarter"
position"one"gt - ltmolkruis level"zero" name"g"
notetype"flat"/gt - ltmolkruis level"zero" name"b"
notetype"sharp"/gt - lt/entrypartgt
- ltentrypart cleff"treble" rythm"threequarter
" position"two"gt - lt/entrypartgt
- lt/entrysegmentgt
- ltsegmentgt
- ltsubsegment position"one"gt
- ltchordgt
- ltnote beat"quarter" name"f"
level"zero"/gt - ltnote beat"quarter" name"b"
level"zero"/gt - ltnote beat"quarter" name"d"
level"plus1"/gt - ltnote beat"quarter" name"f"
level"plus1"/gt
35MML (2)
- DTD for Music (July 1999)
- no standard
- University of Pretoria
- http//is.up.ac.za/mml/index.htm
- Document basic structure
- Baasic element modules
- Organization elements
- Time elements
- Frequency elements
- Lyric elements
- Notation elements
36MML (2) example
- ltlyric verse"1"gt
- ltupbeatgt Now
- ltbar 1gt that the sun doth lt/bargt
- ltbar 2gt shine no more, And lt/bargt
- ltbar 3gt day hath ltsquash 2gt
reached lt/squashgt its lt/bargt - ltbar 4gt close, They lt/bargt
- ltbar 5gt calmly sleep who lt/bargt
- ltbar 6gt wept before, The lt/bargt
- ltbar 7gt wearied find re lt/bargt
- ltbar 8gt pose. lt/bargt
- lt/lyricgt
- ltupbeatgtA E lt/upbeatgt
- ltbar 1gt3E A 3E B 3E8 CF8 3G C lt/bargt
- ltbar 2gtA D 3G D (3G E)2 lt/bargt
- ltbar 3gtA D 3E E (3E C)2 lt/bargt
- ltbar 4gt3E2 B2 R B E lt/bargt
- ltbar 5gtB8 CA8 3Gs B A8 CGn 3Fn A
lt/bargt - ltbar 6gt3(Gs F8E8) 3(E8 4AFs8) 3Gs B A
E lt/bargt - ltbar 7gtA8 D3Gs8 A C A B. 3G B8 lt/bargt
- ltbar 8gt(E A)2. lt/bargt
- lt/divgt
37Conclusion
- There are standard DTDs for specific subjects and
media types - For interoperability issues we should use those
standard DTDs - share tools for authoring, browsing and conversion
38What can we do!
- Repository of used DTDs, documentation,
conversion tools - http//dochost.rz.hu-berlin.de/epdiss/dtd-workshop
- Please Email me! dobratz_at_rz.hu-berlin.de
- Help provide Guidelines for Universities who are
interested starting with XML, but dont know how! - Dos and Donts
- Provide examples
- Provide Tools
39Thank You!
- Susanne Dobratz
- dobratz_at_rz.hu-berlin.de
- http//dochost.rz.hu-berlin.de/epdiss/dtd-workshop