Susanne Dobratz - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Susanne Dobratz

Description:

detailed search. automated cataloging. Highly structured information has a high value ... Lyric elements. Notation elements. 36. HUMBOLDT-UNIVERSIT T ZU BERLIN ... – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 40
Provided by: susanne56
Category:

less

Transcript and Presenter's Notes

Title: Susanne Dobratz


1
Session 3aUsing XML for Archiving ETDs
  • Susanne Dobratz
  • dobratz_at_rz.hu-berlin.de

2
Agenda
  • Introduction
  • Overview of XML-based ETD projects worldwide
  • Report from a DTD for ETDs workshop help in May
    2000 in Berlin
  • Susanne Dobratz
  • Experiences with an XML project at UIowa
  • Patty Strabala (University of Iowa)
  • The Cyberthéses Project
  • Guylaine Beaudry (U of Montreal),
  • Viviane Bouletreau (U Lyon 2),
  • Gabriela Ortuzar (U of Chile)

3
Why XML for ETDs?
  • Preservation / Archiving
  • long term preservation for 10 ... years
  • using standardised documents formats UNICODE
  • easy reconversion into new presentation or print
    formats
  • including of multimedia objects
  • Reuseability
  • Of data / documents / content!!!!
  • Information extraction (citation index, automated
    cataloging)
  • Generating new products from one original
    resource e.g. catalog of ETDs with abstracts in
    geology

4
Archivierbarkeit
A document in PDF
5
A document in XML
  • ltfrontgt
  • ltschoolgt
  • ltpgtAus dem Institut für Pathologie
    (Rudolf-Virchow-Haus) lt/pgt
  • ltpgtdes Universitätsklinikum der Charitélt/pgt
  • ltpgtder Humboldt-Universität zu Berlinlt/pgt
  • ltpgtDirektor Prof. Dr. Mustermannlt/pgt
  • lt/schoolgt
  • ltsubmissiongtDissertationlt/submissiongt
  • lttitlegtDer Blutkrebslt/titlegt ...
  • lt/frontgt

6
Encoding content MathML example
  • ltapplygt
  • ltplus/gt
  • ltapplygt
  • ltpower/gt
  • ltcigtxlt/cigt
  • ltcngt2lt/cngt
  • lt/applygt
  • ltapplygt
  • lttimes/gt
  • ltcngt4lt/cngt
  • ltcigtxlt/cigt
  • lt/applygt
  • ltcngt4lt/cngt
  • lt/applygt

x2 4x 4 0
7
Why XML for ETDs? (cont.)
  • Retrieval
  • using document structure and semantic tags
  • detailed search
  • automated cataloging
  • Highly structured information has a high value

8
Example Structured Search Facilitiy
9
Understanding XML
  • Extended Markup Language
  • Basic philosophy strictly divide between
    content, structure, layout
  • Child of SGML
  • Standard Generalised Markup Language (1986 ISO
    standard)
  • well-formed versus valid documents, that need a
    DTD
  • Developed by SGML Experts (Charles Goldfarb and
    others)
  • 1996 first proposal
  • XML 1.0 10. February 1998 first
    W3C-Recommendation
  • Aims
  • Enable a better useability of SGML for the Web
  • vendor independent visualisation and processing
    of structured information
  • get more interoperability betwen WWW applications

10
What is a document?
  • Content
  • provided information (text, tune, picture)
  • Structure
  • sequencing and ordering of information
  • Layout
  • Visualisation of content and structur of document
  • Metadata
  • description of documents

11
First Step Content analysis dissertation
  • Doctoral candidate
  • personal information
  • thesis information
  • index terms
  • persons
  • places, etc
  • technical terms
  • bibliography
  • titel, URL, journal
  • ltauthorgt
  • ltgivengt Vorname ltsurnamegt Name
  • lttitlegt ltapprovalsgt ltdegreegtltschoolgt
  • ltindxflag
  • typeperson
  • typeplace
  • lttermgt
  • ltbibliographygt
  • ltarticletitlegt ltworktitlegt lturlgt

12
First Step Structure analysis dissertation
  • ltfrontgt
  • ltbodygt
  • ltchaptergt, ltmmgt, lttablegt
  • ltbackgt
  • ltbibliographygt
  • ltappendixgt
  • ltacknowledgementgt, ltvitagt, ltdeclarationgt
  • Titlepage
  • Main part
  • Chapters, Multimedia Objects, Schemes, Tables
  • Appendix
  • Bibliography
  • Definitions, Programm source / ....
  • Acknowledgements, Vita, Declaration

13
How can XML be used Terms
  • XML-Documents (Instances of a document type)
  • e.g. a dissertation
  • Document Type Definition (DTD)
  • e.g. TEI (Text Encoding Initative)
  • eXtensible Style Language (XSL)
  • eXtensible Linking Language (XLL)
  • Special Definitions
  • RDF (Resource Description Framework)
  • MathML (Mathematical Markup Language)
  • SVG (Scalable Vector Graphics)

14
Understanding XML
TEI P3 (Text Encoding Initiative)
SGML
DSSSL
HyTime
XML
XSL XSLT XSLFo XPath
Xlink Xpointer XPath XBase XML Linking
Is DTD of
HTML Link Syntax
(X)HTML
CSS
15
Problem Most authors dont write in XML
  • Conversion from native document formats is needed
  • Microsoft Winword
  • Corel WordPerfect
  • Adobe Framemaker
  • LaTeX for natural sciences
  • different approaches around the world
  • different DTDs
  • different conversion strategies

16
Example Dissertation in Word
17
Structured Dissertation in Word (style sheet)
18
ltETDgt ltFRONTgt ltSCHOOLgtltPgtAus dem Institut
fuumlr Physiologie der Humboldt-Universitauml
t zu Berlinlt/Pgt ltPgtDirektor Prof. Dr. P.B.
Perssonlt/Pgtlt/SCHOOLgt ltSUBMISSIONgtDISSERTATIONlt/SU
BMISSIONgt ltTITLEgt"Untersuchungen zur Entwicklung
der kardiorespiratorischen Interaktion anhand
gemeinsamer Rhythmen von Atmung und Herzaktion.
Longitudinalstudie der ersten sechs Lebensmonate
gesunder Saumluglinge."lt/TITLEgt ltDEGREEgtzur
Erlangung des akademischen Grades doctor
medicinae (Dr. med.)lt/DEGREEgt ltMAJORgtvorgelegt
der Medizinischen Fakultaumlt der
Humboldt-Universitaumlt zu Berlinlt/MAJORgt ltAUTH
ORgtvon Herrn ltGIVENgtPeterlt/GIVENgt ltSURNAMEgtAikelelt
/SURNAMEgt ltSUFFIXgtgeb. am 27.04.1967 in
Dresdenlt/SUFFIXgtlt/AUTHORgt ltDEANgtDekan Prof. Dr.
M. Dietellt/DEANgt ltAPPROVALSgt ltNAMEgt1. Prof. Dr.
E. Schubertlt/NAMEgt
Dissertation in SGML
19
Dissertation in HTML
20
Dissertation in PDF
21
Building a Document Type Definition
DiML1.DTD
HUTPubl.DTD
22
XML projects around the world
  • DTD workshop May 2000 in Berlin
  • Virginia Tech
  • Technical University of Helsinki, (Uolu)
  • University of Oslo,
  • Swedish Univ. of Agricultural Sciences Uppsala
    (Linköping)
  • University of Iowa,
  • University of Michigan at Ann Arbor
  • Université Lyon 2
  • Université Montréal
  • Cambridge University UK (LaTeX)
  • Humboldt-University Berlin

23
There is a community!
Different strategies for conversion
24
Need for cooperation
  • Interoperability
  • Identify standards
  • International retrieval
  • Reducing costs for development of tools

25
More examples Graphics on the web
  • Web used graphic formats for 2D and 3D (gif,
    jpeg, png) are pixel based
  • Zooming leds to ugly raster images
  • solution usage of vector based formats
    described by mathematical functions
  • reuseable content for sciences -gt content markup

26
Scalable Vector Graphics
  • W3C Working Draft 3. March 2000
  • http//www.w3.org/TR/SVG/
  • description of 2 dimensional graphic objects
  • contains 3 different graphic object types
  • vector graphics (rectangles, circles, lines,
    curves)
  • text
  • pictures

27
SVG example
Produced in Corel Draw 9, exported to SVG
28
SVG example
lt?xml version"1.0" encoding"iso-8859-1"?gt lt!DOCT
YPE svg SYSTEM "svg-19991203.dtd"gt lt!-- Creator
CorelDRAW --gt ltsvg xmlspace"preserve"
width"6.34153in" height"4.09944in"
style"shape-renderinggeometricPrecision
text-renderinggeometricPrecision
image-renderingoptimizeQuality"
viewBox"-3637 0 6341 4099"gt ltdefsgt ltstyle
type"text/css"gt lt!CDATA .str0
stroke1F1A17stroke-width3 .str1
stroke169F62stroke-width111 .fil4
fillnone .fil1 fill75C5F0 .fil2
fillDD137B .fil3 fillFFF500 .fil0
fillFFFB9D gt lt/stylegt lt/defsgt ltg gt
lttitlegtEbene 1lt/titlegt ltellipse class"fil0
str0" cx"-1275" cy"2697" rx"2361" ry"1401"/gt
ltrect class"fil1 str0" x"-3262" y"2"
width"5540" height"2428"/gt ltpath class"fil2
str0" d"M1656 1367l523 328 523 327 -199 530 -200
531 -647 0 -646 0 -200 -531 - 200 -530 523 -327
523 -328z"/gt lttext x"-2756" y"836"
class"fil3" style"fontnormal 500
Garamond"gtTest picture for SVGlt/textgt lttext
x"-2756" y"1399" class"fil3"
style"fontnormal 500 Garamond"gtin Corel Draw
9lt/textgt ... lt/ggtlt/svggt
Produced in Corel Draw 9, exported to SVG
29
Extensible 3D (X3D)
  • Successor of VRML97
  • http//www.web3d.org/x3d.html
  • plan in 2002 ISO standard
  • Component based approach (profiles)
  • scalable, thin clients, DOM support
  • core model (X3D-Core profile)
  • minimum set of viewer functions
  • 3D geometries, animations, interactions,
    rendering
  • X3D-VRML profile
  • compatibility to VRML97
  • user specific profiles (GeoVRML)

30
X3D example
  • VRML V2.0 utf8
  • Transform
  • children
  • NavigationInfo headlight FALSE We'll
    add our own light
  • DirectionalLight First child
  • direction 0 0 -1 Light
    illuminating the scene
  • Transform Second child - a
    red sphere
  • translation 3 0 1
  • children
  • Shape
  • geometry Sphere radius 2.3
  • appearance Appearance
  • material Material diffuseColor 1 0
    0 Red

As VRML
31
X3D example
  • lt?xml version"1.0" encoding"utf-8" ?gt
  • lt!DOCTYPE X3D PUBLIC
  • "http//www.web3D.org/TaskGroups/x3d/translation/
    x3d-compromise.dtd"
  • lt!ENTITY Vrml97Profile "INCLUDE"gt
  • lt!ENTITY CoreProfile "IGNORE"gt
  • lt!ENTITY X3dExtensions "IGNORE"gt
  • lt!ENTITY GeoVrmlProfile "IGNORE"gt
  • gt
  • ltX3Dgt
  • ltScenegt
  • ltTransformgt
  • ltchildrengt
  • ltNavigationInfo headlight"false"
    avatarSize" 0.25 1.6 0.75" type"34EXAMINE34
    "/gt
  • ltDirectionalLight/gt
  • ltTransform translation"3.0 0.0 1.0"gt
  • ltchildrengt
  • ltShapegt
  • ltgeometrygt

32
SMIL
  • Synchronize Multimedia on the web using an XML
    based language
  • To enable simple authoring of TV-like multimedia
    presentations such as training courses on the Web
  • HTML-like, easy to learn
  • a smile presentation can contain components of
    streaming audio, streaming video, images, text
    or any other media type
  • http//www.w3.org/AudioVideo/
  • SMIL Boston Worling Draft (25 Feb. 2000)

33
MusicML
  • DTD for sheetmusic (March 5, 1998)
  • No standard
  • http//195.108.47.160/3.0/musicml/index.html
  • Connection Factory, Dutch enterprise
  • MusicML document basic structure
  • ltsheetmusicgt
  • ltmusicrow size"two"gt
  • lt/musicrowgt
  • ltmusicrow size"two"gt
  • lt/musicrowgt
  • lt/sheetmusicgt

34
MusicML example
  • lt?XML version"1.0"?gt
  • lt!DOCTYPE sheetmusic SYSTEM "music.dtd"gt
  • ltsheetmusicgt
  • ltmusicrow size"two"gt
  • ltentrysegmentgt
  • ltentrypart cleff"bass" rythm"threequarter"
    position"one"gt
  • ltmolkruis level"zero" name"g"
    notetype"flat"/gt
  • ltmolkruis level"zero" name"b"
    notetype"sharp"/gt
  • lt/entrypartgt
  • ltentrypart cleff"treble" rythm"threequarter
    " position"two"gt
  • lt/entrypartgt
  • lt/entrysegmentgt
  • ltsegmentgt
  • ltsubsegment position"one"gt
  • ltchordgt
  • ltnote beat"quarter" name"f"
    level"zero"/gt
  • ltnote beat"quarter" name"b"
    level"zero"/gt
  • ltnote beat"quarter" name"d"
    level"plus1"/gt
  • ltnote beat"quarter" name"f"
    level"plus1"/gt

35
MML (2)
  • DTD for Music (July 1999)
  • no standard
  • University of Pretoria
  • http//is.up.ac.za/mml/index.htm
  • Document basic structure
  • Baasic element modules
  • Organization elements
  • Time elements
  • Frequency elements
  • Lyric elements
  • Notation elements

36
MML (2) example
  • ltlyric verse"1"gt
  • ltupbeatgt Now
  • ltbar 1gt that the sun doth lt/bargt
  • ltbar 2gt shine no more, And lt/bargt
  • ltbar 3gt day hath ltsquash 2gt
    reached lt/squashgt its lt/bargt
  • ltbar 4gt close, They lt/bargt
  • ltbar 5gt calmly sleep who lt/bargt
  • ltbar 6gt wept before, The lt/bargt
  • ltbar 7gt wearied find re lt/bargt
  • ltbar 8gt pose. lt/bargt
  • lt/lyricgt
  • ltupbeatgtA E lt/upbeatgt
  • ltbar 1gt3E A 3E B 3E8 CF8 3G C lt/bargt
  • ltbar 2gtA D 3G D (3G E)2 lt/bargt
  • ltbar 3gtA D 3E E (3E C)2 lt/bargt
  • ltbar 4gt3E2 B2 R B E lt/bargt
  • ltbar 5gtB8 CA8 3Gs B A8 CGn 3Fn A
    lt/bargt
  • ltbar 6gt3(Gs F8E8) 3(E8 4AFs8) 3Gs B A
    E lt/bargt
  • ltbar 7gtA8 D3Gs8 A C A B. 3G B8 lt/bargt
  • ltbar 8gt(E A)2. lt/bargt
  • lt/divgt

37
Conclusion
  • There are standard DTDs for specific subjects and
    media types
  • For interoperability issues we should use those
    standard DTDs
  • share tools for authoring, browsing and conversion

38
What can we do!
  • Repository of used DTDs, documentation,
    conversion tools
  • http//dochost.rz.hu-berlin.de/epdiss/dtd-workshop
  • Please Email me! dobratz_at_rz.hu-berlin.de
  • Help provide Guidelines for Universities who are
    interested starting with XML, but dont know how!
  • Dos and Donts
  • Provide examples
  • Provide Tools

39
Thank You!
  • Susanne Dobratz
  • dobratz_at_rz.hu-berlin.de
  • http//dochost.rz.hu-berlin.de/epdiss/dtd-workshop
Write a Comment
User Comments (0)
About PowerShow.com