OSIS - PowerPoint PPT Presentation

1 / 51
About This Presentation
Title:

OSIS

Description:

desc Cashmere sweater /desc price unit='yen' 120000 /price /item item desc ... A search for 'Cashmere and 1000' hits. Needlessly annoying the searcher ... – PowerPoint PPT presentation

Number of Views:87
Avg rating:3.0/5.0
Slides: 52
Provided by: stevenj75
Category:
Tags: osis | cashmere

less

Transcript and Presenter's Notes

Title: OSIS


1
OSIS A Closer Look
  • Steven J. DeRose, Ph.D.
  • Chair, Bible Technologies Group
  • http//www.bibletechnologies.net
  • sderose_at_acm.org
  • November 22, 2002

2
Why have a standard?(first, for publishers)
  • Can reduce the costs of
  • Editing and publication process
  • Software purchase, training, maintenance
  • Rekeying, scanning, and conversion
  • Lets texts survive when your WP or typesetting
    program goes obsolete
  • Facilitates multi-format, multi-platform delivery
    and distribution
  • Enables use of generic tools

3
Why have a standard?(next, for users)
  • Lets you obtain the same texts regardless of what
    reading and other tools you use
  • Because the publisher does no more work to
    support 10, than to support 1
  • Helps texts survive when your book-reading
    software goes obsolete
  • Reduced costs
  • Better, more reliable resources
  • Enables communities of interest
  • Shared notes, collaborative study,

4
The medium picture
XHTML
Cost savingsusually start here
Typeset
OCR
Braille
XML/OSIStext
HTML
PDF
WPs
Open eBook
Other XML
Palmtops
47 convertors instead of 4 ? 7 (and reality is
bigger)
Cell delivery
5
The basic principleDescriptive markup
  • WPs only see huge, bold, space before
  • Now find/reformat all chapter headings
  • Expensive to apply a house style or look/feel
  • Hard to create diverse forms
  • Web, paper, and braille publication
  • A perfect user could use stylesheets
  • But interfaces make inconsistent work easier
  • Instead say what kind of portion each is
  • A formatter applies rules by kind

6
Why should I separate out the formatting?
  • It speeds your work
  • You can use a stylesheet from someone else, and
    not have to do any manual formatting
  • Typesetter can enhance formatting without risking
    corrupting your content
  • Therefore, less time wasted reviewing galleys
  • Multiple formats from the same source
  • Print, braille, Web, etc.
  • House styles for different journals
  • Last-minute changes are safer, cheaper
  • Especially crucial for Bible publishing

7
Why not just use HTML?
  • HTML is nice but lacks
  • Units like poem, chapter, verse, inscription
  • Ways to annotate for meaning, grammar, etc
  • Support for reference systems "Matt 11"
  • Multi-purpose tags like ltbgt, ltigt, etc.
  • Are hard to tease apart when you need to
  • HTML limitations encourage using tables to force
    layout, making re-use infeasible
  • And..

8
Compare
  • ltitemgt ltdescgtCashmere sweaterlt/descgt ltprice
    unit'yen'gt120000lt/pricegtlt/itemgtltitemgt
    ltdescgtSockslt/descgt ltprice unit'yen'gt1000lt/price
    gtlt/itemgt
  • versus
  • ltbrgtCashmere sweater, 120000ltbrgtSocks, 1000

9
Why is the markup better?
  • When relations are marked,an indexer can match
    price with item
  • If not, there is no reliable way
  • (there are lots of ways one might guess)
  • A search for Cashmere and 1000 hits
  • Needlessly annoying the searcher
  • How many false hits have you had like this?
  • ? Markup is not just about formatting

10
How do you spell XML?
  • The Extensible Markup Language
  • HTML on steroids (sort of)
  • Key features
  • Intrinsic support for Unicode
  • Ability to create your own units
  • Ability to validate how they are used
  • (no chapters inside footnotes, etc.)
  • Very easy for computes to process
  • Separates formatting (remember earlier)

11
OSIS and XML
  • OSIS is an application of XML
  • XML specifies the syntax
  • OSIS specifies a lexicon for our genre
  • ? Life would be easy if natural languages were
    that simple!
  • There are many other lexica for XML
  • Humanities Text Encoding Initiative
  • Closely related to OSIS

12
What is OSIS, really?
  • OSIS defines
  • A set of XML element types
  • p, verse, inscription, note,.
  • Certain attributes for those types
  • typedevotional
  • A standard form for Biblical references
  • A consistent way to to write them down
  • A way to specify within-verse locations
  • A way to refer to editions and translations, or
    to refer to a passage generically

13
Concept a hierarchy
osis
osisText
div typebook
header
div typechapter
workosisWorkKJV
p
p
identifier
title
language
verse
verseosisIDGen.1.3
verse
text content
note
text content
inscription
14
What's under the covers?
  • All of this is represented by inserting markers
    ("tags") into the text
  • Like HTML but more consistent
  • All starts and ends are explicit
  • Three kinds
  • Start tags ltpgt
  • End tags lt/pgt
  • Empty tags ltmilestone/gt
  • ltpgtJesus wept.lt/pgt, is an element.

15
What else is there?
  • Elements can contain other elements
  • ltdiv type"chapter"gt ltversegtIn the
    beginning...lt/versegt ltversegtAnd the
    Word...lt/versegt...lt/chaptergt
  • Many elements can also contain text
  • Some elements require or prohibit others
  • No ltdivgt inside ltabbrgt
  • An empty tag just marks a point
  • ltmilestone type"pb"/gt

16
Attributes
  • Usually modify a whole element
  • Appear only inside start tags
  • ltname type"nonhuman"gtBaallt/namegtltdiv
    type"chapter"gtlt/divgtltverse osisID"Rev.22.21"gt
    ltq who"God"gtlttransChange type"added"gt

17
The full set of (68) tags
  • abbr 
  • actor 
  • caption 
  • castGroup 
  • castItem 
  • castList 
  • catchWord 
  • cell 
  • closer 
  • contributor 
  • coverage 
  • creator 
  • date 
  • description 
  • div 
  • divineName 
  • figure 
  • foreign 
  • head 
  • header 
  • hi 
  • identifier 
  • index 
  • inscription 
  • item 
  • label 
  • language 
  • lg 
  • list 
  • mentioned 
  • milestone 
  • milestoneEnd 
  • milestoneStart
  • name 
  • Note
  • osis 
  • osisText 
  • publisher 
  • rdg 
  • reference 
  • refSystem 
  • relation 
  • revisionDesc
  • rights 
  • role 
  • roleDesc 
  • row 
  • salute 
  • seg 
  • signed 
  • Source
  • Speaker
  • speech 
  • table 
  • teiHeader 
  • title 
  • transChange
  • type 
  • verse 
  • work 

18
? Don't panic
  • A lot of these get used once each, in the header,
    almost as a ritual
  • You can paste a sample header and fill it in
  • About a dozen form the Dublin Core set for
    cataloging and identification info
  • Most of the rest fall into nice groups
  • The hard parts (later) include
  • Milestones
  • Quotes when they cross verses/paragraphs

19
Three major pieces to OSIS
  • The markup elements and their attributes
  • Defined by a schema
  • The standardized reference system
  • Partly defined in the schema
  • Partly defined in grammar and prose
  • The authority system
  • A way to declare formal/normalized names
  • Declaration portion still in process

20
Basic OSIS markup
  • (What's in a name?)

21
Sample markup
  • ltdiv type"testament"gt
  • ltdiv type"book" osisID"Gen"gt
  • ltdiv type"chapter" osisID"Gen.1"gt
  • ltverse osisID"Gen.1.1"gtIn the beginning God
    created the heaven and the earth.lt/versegt
  • ltverse osisID"Gen.1.2"gtAnd the earth was without
    form, and void and darkness was upon the face of
    the deep. And the Spirit of God moved upon the
    face of the waters.lt/versegt
  • ltverse osisID"Gen.1.3"gtAnd God said, Let there
    be light and there was light.lt/versegt
  • ltverse osisID"Gen.1.31"gtAnd God saw every thing
    that he had made, and, behold, it was very good.
    And the evening and the morning were the sixth
    day. ltnote type"x-StudyNote"gt And the
    evening... Heb. And the evening was, and the
    morning was etc.lt/notegtlt/versegt
  • lt/divgtlt/divgtlt/divgt
  • lt/osisTextgtlt/osisgt

22
Big generic elements
  • div  Testament, book, chap, section
  • type the type of division, as above
  • divTitle optional display title
  • title  Title of any div
  •  list  Genealogies and other lists
  • label 
  • item 
  • table  Mainly for appendixes, etc.
  • row 
  • cell 

23
Book/chapter/verse
  • Large units all use the ltdivgt element
  • It has a type attribute, with values
  • appendix
  • book
  • chapter
  • concordance
  • glossary
  • As with most attributes you can add new values if
    they start with "x-"
  • ltdiv type'x-toronto-thing'gt
  • We expect to add more div types in time
  • ltverse osisID"Rev.3.20"gt

Note There are no separate tags for testament,
book, or chapter
24
Small items
  • abbr  ltabbr expansion""gt
  • divineName  ltdivineNamegtThe Lord
  • foreign  ltforeign lang""gtTalitha
  • hi  Emphasis in notes/comm
  • inscription  Mene, mene, tekel, parsin
  • mentioned  The name ltmentionedgtPeter
  • name  Destroyed the ltname type
    "nonhuman"gtBaalslt/namegt
  • P The ubiquitous paragraph
  • q  Quotations (more later)

25
Genre-specific elements
  • Epistolary salute, closer 
  • ltclosergtI, Paul, sign this with my own
    hand.lt/closergt
  • Illustrations figure
  • May contain caption, note, index
  • Poetry lg, l
  • Also used for other line-oriented text
  • lg (line group) can be nested
  • Drama speech, speaker
  • speaker ok in speech cell closer div inscription
    l p q salute verse
  • who attribute can point to a castItem in the
    header

26
Inscription
  • ltverse osisID"Dan.5.25"gtThis is the
    inscription that was written ltinscriptiongtMene,
    Mene, Tekel, Parsinltnote type""gtAramaic UPARSIN
    (that is, AND PARSIN)lt/notegtlt/inscriptiongt
  • How many inscriptions can you think of?

27
About the source/target layout
  • ltmilestonegt
  • Use to mark point events
  • page and column breaks of a source manuscript
  • Intended screen breaks for display
  • Types column footer header line page screen
  • Note Do not confuse with milestoneStart and
    milestoneEnd, which stand in for several other
    elements when they must cross verse/p boundaries
    in certain ways.

28
About the text itself
  • transChange  Changed in translation
  • Types added amplified changed deleted moved
  • rdg  Variant readings
  • Used only within notes (for now)
  • ltnotegtSome ancient mss ltrdggtkiss the
    Sonlt/rdggtlt/notegt
  • seg  (extensions)
  • w word-level linguistics
  • Attributes POS, morph, lemma, gloss, src, xlit

29
Attributes of all elements(all are optional)
  • Name   Type   Meaning
  • osisRef   osisRefType       
  • annotateWork   anything    I am about W    
  • annotateType   osisAnnotation   My relation to W
  • ews   anything            
  • ID   xsID   For Web to link to
  • lang   languageType   language, wr sys
  • osisID   osisIDType   reference to here
  • resp   anything    responsible person 
  • splitID   anything    (later)
  • type   anything     
  • subType   anything   
  • n   anything    name/num of unit

30
The reference system
  • (I am named, therefore I am)

31
Header overview
  • Purpose
  • Identify the file as an XML file
  • Identify the file as using the OSIS schema
  • Say whether it's one text or a collection
  • Identify and declare names for
  • The work itself (title, author, etc)
  • Other works referenced
  • Verse reference systems used
  • Characters in the text ltcastListgt

32
Header sample
  • lt?xml version"1.0" encoding"UTF-8" ?gt
  • ltosis xmlnsxsi"http//www.w3.org/2001/XMLSchema-
    instance" xsinoNamespaceSchemaLocation"osisCore.
    1.1.xsd"gt
  • ltosisText osisIDWork"KJV" osisRefWork"defaultRef
    erenceScheme"gt
  • ltheadergt
  • ltwork osisWork"KJV"gt
  • lttitlegtKing James Version of 1769lt/titlegt
  • ltidentifier type"OSIS"gtKJVlt/identifiergt
  • ltlanguagegtenlt/languagegt
  • ltrefSystemgtBible.KJVlt/refSystemgtlt/workgt
  • ltwork osisWork"defaultReferenceScheme"gt
  • ltrefSystemgtBible.KJVlt/refSystemgtlt/workgt
  • lt/headergt

33
Other header elements
  • osisCorpus 
  • Use inside ltosisgt when there will be several
    texts in one document, as for a polyglot
  • osisCorpus can have its own header
  • osisCorpus then contains osisText elements
  •  teiHeader 
  • Allows including a fuller TEI-style header
  •  
  • Work uses the standard "Dublin Core" tags to give
    catalog/bibliography info

34
Dublin Core
  • title  The title of the work or collection
  • creator  The primary author
  • contributor  Other contributers (set 'role')
  • identifier  ISBN or similar unique ID of work
  • date  Publication date
  • language  Primary language of the work
  • rights  Statement of permissions/rights
  • publisher  Name of the publisher
  • description  An abstract or precis of the work
  • format  What representation (OSIS)
  • coverage  Intended audience and scope
  • relation 
  • source  If derived from another work
  • subject  LCSH or similar subject descr
  • type
  • refSystem  (OSIS only, not in D.C.)

35
Identifying parts of the work
  • osisID must be specified on any element that has
    a canonical reference
  • ltverse osisID"Luk.3.10"gt
  • ltp osisID"Rev.3.20"gt
  • ltdiv type"chapter" osisID"Luk.3"gt
  • 3-letter book names, periods to separate
  • HTML lta name""gt available as well
  • More useful in notes/commentary, not Bible
  • Back-of-book index entries
  • ltindex level1"Idols" level2 "burning of"
    level3"by Hezekiah"gt
  • ltindex level1"False gods" see"Idols"gt

36
When it won't come out even
  • If several verse are translated as (say) a p
  • Put all the appropriate osisIDs on the p
  • ltp osisID"Matt.1.1 Matt.1.2"gt
  • If a verse is split across paragraphs
  • Tag each part use splitID to number them
  • ltpgtltverse osisID"1Pe.1.3" splitID"1"gtlt/versegtlt
    /pgt ltpgtltverse osisID"1Pe.1.3"
    splitID"2"gtlt/versegtlt/pgt
  • milestone_Start milestone_End
  • Used to mark units that cross boundaries
  • abbr closer div foreign l lg q salute seg signed
    speech verse

37
References
  • Reference to other places/works
  • ltnotegtSee also ltreference osisRef
    "Mat.1.1"gtMatthewlt/referencegt for a similar
    theme.lt/notegt
  • div, figure, note, and reference can also
    directly refer
  • ltdiv type"commentary" osisRef"Luk.3.10"gt
  • This identifies the passage this commentary div
    is about.
  • HTML lta href""gt also available
  • (more useful in notes/commentary, not Bible)

38
Reference syntax
'code point', character
NIV.HebPsa.42.1-Psa.43.12_at_cp12
book
verse
edition
chapter
refsystem
grain type
grain value
39
Notes
  • Notes are placed right where they are referenced
    in the text.
  • Notes have several types
  • allusion alternative background citation
    devotional exegesis explanation study translation
    enumeration variant
  • Additional types must start with "x-"
  • catchWord -- marks referenced text cited within a
    note
  • ltnotegtltcatchWordgthellolt/catchWordgt may also be
    translated "goodbye" here.lt/notegt
  • rdg -- marks alternate readings

40
On to the authority system
  • The name is the thing, and the true name is the
    true thing. To know the name is to control the
    thing. -- Ursula LeGuin

41
Cast-lists
  • To declare cast of characters
  • Provides a formal ID for each
  • Can refer to ID from ltspeakergt, ltqgt, etc.
  • castList 
  • castGroup 
  • castItem 
  • actor 
  • role 
  • roleDesc 

42
The authority system
  • Only supported for castList at present
  • We intend to provide
  • A schema for declaring sets of formal names
  • A way to invoke such lists in documents
  • Standard name sets for
  • Bible versions
  • Versification schemes
  • People, places, etc. in the Bible
  • Journals, classical literature, and other works
    commonly cited in Biblical studies

43
OSIS in practice
  • Tourist to police officer Can you tell me how
    to get to Carnegie Hall?
  • Officer to tourist Practice, practice,
    practice.

44
How do I know if the markup is correct?
  • 5 levels of 'correct'
  • SLipshod
  • Only well-formed
  • Valid
  • Accurate
  • Complete
  • SL no check required
  • O Load in IE 5
  • V xp, xmetal, and other true validators
  • A requires human proofreading and interpretation
  • C there is always more that could be marked up

45
Tools vs. today
  • Today we will use the raw form
  • Experts will need to know this
  • Users should have protective software
  • Some XML editing programs
  • SoftQuad XMetal -- 300
  • Open Office -- free, very promising
  • Some generic-enough HTML editors
  • BBEdit, emacs, Netscape Communicator

46
Getting to OSIS
  • The cleaner your data, the easier it is
  • Data is seldom as clean as you think it is
  • Structured formats (USFM, XSEM, LGM, ThML) are
    the easiest sources
  • Tools
  • Perl/awk/sed/cc and the like
  • XSLT if coming from XML
  • BTG has sponsored development of several
    convertors.
  • BTG will maintain a repository of utilities

47
Getting your OSIS XML to display in IE
  • Make sure the document is at least WF
  • Name it filename.xml
  • Refer to a stylesheet if you want formatting
    instead of just an outline view
  • lt?xml version"1.0"?gtlt!DOCTYPE osis
    gtlt?xml-stylesheet href"mystyle.css"
    type"text/css"?gtltosis xmlns"http//www.biblete
    chnologies.org/namespaces/OSIS-1.1"gtltheadergt

48
Getting your OSIS printed
  • Most typesetting programs now import XML
  • OSIS converts easily to most relevant XML
    schemas, using XSLT
  • Word processors are also gaining ability to
    import arbitrary XML
  • Typesetting firms, esp. for journals, are
    starting to accept XML as well.

49
Near-term concerns of OSIS
  • Linguistic annotation
  • Formal name lists for people, places,
    translations, etc.
  • Connecting text to multimedia
  • Greater support for secondary genres
  • Tool development and conformance

50
How you can help
  • Find the best place to apply OSIS in your
    organization, and do it.
  • Join a Working Group
  • Send feedback, feature requests, etc.
  • Join a Working Group
  • Convert or create OSIS texts
  • Join a Working Group
  • Create a converter for your current format
  • Join a Working Group
  • Tell your friends and colleagues
  • Join a Working Group

51
For more information
  • Web
  • http//www.bibletechnologies.org
  • http//www.bibletechnologieswg.org
  • Some contacts
  • Steve DeRose sderose_at_acm.org
  • Kees de Blois deblois_at_zeelandnet.nl
  • Patrick Durusau pdurusau_at_emory.edu
  • Kirk_Lowery klowery_at_wts.edu
  • Mike_Perez MPerez_at_forministry.com
Write a Comment
User Comments (0)
About PowerShow.com