Title: OSIS
1OSIS A Closer Look
- Steven J. DeRose, Ph.D.
- Chair, Bible Technologies Group
- http//www.bibletechnologies.net
- sderose_at_acm.org
- November 22, 2002
2Why have a standard?(first, for publishers)
- Can reduce the costs of
- Editing and publication process
- Software purchase, training, maintenance
- Rekeying, scanning, and conversion
- Lets texts survive when your WP or typesetting
program goes obsolete - Facilitates multi-format, multi-platform delivery
and distribution - Enables use of generic tools
3Why have a standard?(next, for users)
- Lets you obtain the same texts regardless of what
reading and other tools you use - Because the publisher does no more work to
support 10, than to support 1 - Helps texts survive when your book-reading
software goes obsolete - Reduced costs
- Better, more reliable resources
- Enables communities of interest
- Shared notes, collaborative study,
4The medium picture
XHTML
Cost savingsusually start here
Typeset
OCR
Braille
XML/OSIStext
HTML
PDF
WPs
Open eBook
Other XML
Palmtops
47 convertors instead of 4 ? 7 (and reality is
bigger)
Cell delivery
5The basic principleDescriptive markup
- WPs only see huge, bold, space before
- Now find/reformat all chapter headings
- Expensive to apply a house style or look/feel
- Hard to create diverse forms
- Web, paper, and braille publication
- A perfect user could use stylesheets
- But interfaces make inconsistent work easier
- Instead say what kind of portion each is
- A formatter applies rules by kind
6Why should I separate out the formatting?
- It speeds your work
- You can use a stylesheet from someone else, and
not have to do any manual formatting - Typesetter can enhance formatting without risking
corrupting your content - Therefore, less time wasted reviewing galleys
- Multiple formats from the same source
- Print, braille, Web, etc.
- House styles for different journals
- Last-minute changes are safer, cheaper
- Especially crucial for Bible publishing
7Why not just use HTML?
- HTML is nice but lacks
- Units like poem, chapter, verse, inscription
- Ways to annotate for meaning, grammar, etc
- Support for reference systems "Matt 11"
- Multi-purpose tags like ltbgt, ltigt, etc.
- Are hard to tease apart when you need to
- HTML limitations encourage using tables to force
layout, making re-use infeasible - And..
8Compare
- ltitemgt ltdescgtCashmere sweaterlt/descgt ltprice
unit'yen'gt120000lt/pricegtlt/itemgtltitemgt
ltdescgtSockslt/descgt ltprice unit'yen'gt1000lt/price
gtlt/itemgt - versus
- ltbrgtCashmere sweater, 120000ltbrgtSocks, 1000
9Why is the markup better?
- When relations are marked,an indexer can match
price with item - If not, there is no reliable way
- (there are lots of ways one might guess)
- A search for Cashmere and 1000 hits
- Needlessly annoying the searcher
- How many false hits have you had like this?
- ? Markup is not just about formatting
10How do you spell XML?
- The Extensible Markup Language
- HTML on steroids (sort of)
- Key features
- Intrinsic support for Unicode
- Ability to create your own units
- Ability to validate how they are used
- (no chapters inside footnotes, etc.)
- Very easy for computes to process
- Separates formatting (remember earlier)
11OSIS and XML
- OSIS is an application of XML
- XML specifies the syntax
- OSIS specifies a lexicon for our genre
- ? Life would be easy if natural languages were
that simple! - There are many other lexica for XML
- Humanities Text Encoding Initiative
- Closely related to OSIS
12What is OSIS, really?
- OSIS defines
- A set of XML element types
- p, verse, inscription, note,.
- Certain attributes for those types
- typedevotional
- A standard form for Biblical references
- A consistent way to to write them down
- A way to specify within-verse locations
- A way to refer to editions and translations, or
to refer to a passage generically
13Concept a hierarchy
osis
osisText
div typebook
header
div typechapter
workosisWorkKJV
p
p
identifier
title
language
verse
verseosisIDGen.1.3
verse
text content
note
text content
inscription
14What's under the covers?
- All of this is represented by inserting markers
("tags") into the text - Like HTML but more consistent
- All starts and ends are explicit
- Three kinds
- Start tags ltpgt
- End tags lt/pgt
- Empty tags ltmilestone/gt
- ltpgtJesus wept.lt/pgt, is an element.
15What else is there?
- Elements can contain other elements
- ltdiv type"chapter"gt ltversegtIn the
beginning...lt/versegt ltversegtAnd the
Word...lt/versegt...lt/chaptergt - Many elements can also contain text
- Some elements require or prohibit others
- No ltdivgt inside ltabbrgt
- An empty tag just marks a point
- ltmilestone type"pb"/gt
16Attributes
- Usually modify a whole element
- Appear only inside start tags
- ltname type"nonhuman"gtBaallt/namegtltdiv
type"chapter"gtlt/divgtltverse osisID"Rev.22.21"gt
ltq who"God"gtlttransChange type"added"gt
17The full set of (68) tags
- a
- abbr
- actor
- caption
- castGroup
- castItem
- castList
- catchWord
- cell
- closer
- contributor
- coverage
- creator
- date
- description
- div
- divineName
- figure
- foreign
- head
- header
- hi
- identifier
- index
- inscription
- item
- l
- label
- language
- lg
- list
- mentioned
- milestone
- milestoneEnd
- milestoneStart
- name
- Note
- osis
- osisText
- p
- publisher
- q
- rdg
- reference
- refSystem
- relation
- revisionDesc
- rights
- role
- roleDesc
- row
- salute
- seg
- signed
- Source
- Speaker
- speech
- table
- teiHeader
- title
- transChange
- type
- verse
- w
- work
18? Don't panic
- A lot of these get used once each, in the header,
almost as a ritual - You can paste a sample header and fill it in
- About a dozen form the Dublin Core set for
cataloging and identification info - Most of the rest fall into nice groups
- The hard parts (later) include
- Milestones
- Quotes when they cross verses/paragraphs
19Three major pieces to OSIS
- The markup elements and their attributes
- Defined by a schema
- The standardized reference system
- Partly defined in the schema
- Partly defined in grammar and prose
- The authority system
- A way to declare formal/normalized names
- Declaration portion still in process
20Basic OSIS markup
21Sample markup
- ltdiv type"testament"gt
- ltdiv type"book" osisID"Gen"gt
- ltdiv type"chapter" osisID"Gen.1"gt
- ltverse osisID"Gen.1.1"gtIn the beginning God
created the heaven and the earth.lt/versegt - ltverse osisID"Gen.1.2"gtAnd the earth was without
form, and void and darkness was upon the face of
the deep. And the Spirit of God moved upon the
face of the waters.lt/versegt - ltverse osisID"Gen.1.3"gtAnd God said, Let there
be light and there was light.lt/versegt - ltverse osisID"Gen.1.31"gtAnd God saw every thing
that he had made, and, behold, it was very good.
And the evening and the morning were the sixth
day. ltnote type"x-StudyNote"gt And the
evening... Heb. And the evening was, and the
morning was etc.lt/notegtlt/versegt - lt/divgtlt/divgtlt/divgt
- lt/osisTextgtlt/osisgt
22Big generic elements
- div Testament, book, chap, section
- type the type of division, as above
- divTitle optional display title
- title Title of any div
- list Genealogies and other lists
- label
- item
- table Mainly for appendixes, etc.
- row
- cell
23Book/chapter/verse
- Large units all use the ltdivgt element
- It has a type attribute, with values
- appendix
- book
- chapter
- concordance
- glossary
- As with most attributes you can add new values if
they start with "x-" - ltdiv type'x-toronto-thing'gt
- We expect to add more div types in time
- ltverse osisID"Rev.3.20"gt
Note There are no separate tags for testament,
book, or chapter
24Small items
- abbr ltabbr expansion""gt
- divineName ltdivineNamegtThe Lord
- foreign ltforeign lang""gtTalitha
- hi Emphasis in notes/comm
- inscription Mene, mene, tekel, parsin
- mentioned The name ltmentionedgtPeter
- name Destroyed the ltname type
"nonhuman"gtBaalslt/namegt - P The ubiquitous paragraph
- q Quotations (more later)
25Genre-specific elements
- Epistolary salute, closer
- ltclosergtI, Paul, sign this with my own
hand.lt/closergt - Illustrations figure
- May contain caption, note, index
- Poetry lg, l
- Also used for other line-oriented text
- lg (line group) can be nested
- Drama speech, speaker
- speaker ok in speech cell closer div inscription
l p q salute verse - who attribute can point to a castItem in the
header
26Inscription
- ltverse osisID"Dan.5.25"gtThis is the
inscription that was written ltinscriptiongtMene,
Mene, Tekel, Parsinltnote type""gtAramaic UPARSIN
(that is, AND PARSIN)lt/notegtlt/inscriptiongt - How many inscriptions can you think of?
27About the source/target layout
- ltmilestonegt
- Use to mark point events
- page and column breaks of a source manuscript
- Intended screen breaks for display
- Types column footer header line page screen
- Note Do not confuse with milestoneStart and
milestoneEnd, which stand in for several other
elements when they must cross verse/p boundaries
in certain ways.
28About the text itself
- transChange Changed in translation
- Types added amplified changed deleted moved
- rdg Variant readings
- Used only within notes (for now)
- ltnotegtSome ancient mss ltrdggtkiss the
Sonlt/rdggtlt/notegt - seg (extensions)
- w word-level linguistics
- Attributes POS, morph, lemma, gloss, src, xlit
29Attributes of all elements(all are optional)
- Name Type Meaning
- osisRef osisRefType
- annotateWork anything I am about W
- annotateType osisAnnotation My relation to W
- ews anything
- ID xsID For Web to link to
- lang languageType language, wr sys
- osisID osisIDType reference to here
- resp anything responsible person
- splitID anything (later)
- type anything
- subType anything
- n anything name/num of unit
30The reference system
- (I am named, therefore I am)
31Header overview
- Purpose
- Identify the file as an XML file
- Identify the file as using the OSIS schema
- Say whether it's one text or a collection
- Identify and declare names for
- The work itself (title, author, etc)
- Other works referenced
- Verse reference systems used
- Characters in the text ltcastListgt
32Header sample
- lt?xml version"1.0" encoding"UTF-8" ?gt
- ltosis xmlnsxsi"http//www.w3.org/2001/XMLSchema-
instance" xsinoNamespaceSchemaLocation"osisCore.
1.1.xsd"gt - ltosisText osisIDWork"KJV" osisRefWork"defaultRef
erenceScheme"gt - ltheadergt
- ltwork osisWork"KJV"gt
- lttitlegtKing James Version of 1769lt/titlegt
- ltidentifier type"OSIS"gtKJVlt/identifiergt
- ltlanguagegtenlt/languagegt
- ltrefSystemgtBible.KJVlt/refSystemgtlt/workgt
- ltwork osisWork"defaultReferenceScheme"gt
- ltrefSystemgtBible.KJVlt/refSystemgtlt/workgt
- lt/headergt
33Other header elements
- osisCorpus
- Use inside ltosisgt when there will be several
texts in one document, as for a polyglot - osisCorpus can have its own header
- osisCorpus then contains osisText elements
- teiHeader
- Allows including a fuller TEI-style header
-
- Work uses the standard "Dublin Core" tags to give
catalog/bibliography info
34Dublin Core
- title The title of the work or collection
- creator The primary author
- contributor Other contributers (set 'role')
- identifier ISBN or similar unique ID of work
- date Publication date
- language Primary language of the work
- rights Statement of permissions/rights
- publisher Name of the publisher
- description An abstract or precis of the work
- format What representation (OSIS)
- coverage Intended audience and scope
- relation
- source If derived from another work
- subject LCSH or similar subject descr
- type
- refSystem (OSIS only, not in D.C.)
35Identifying parts of the work
- osisID must be specified on any element that has
a canonical reference - ltverse osisID"Luk.3.10"gt
- ltp osisID"Rev.3.20"gt
- ltdiv type"chapter" osisID"Luk.3"gt
- 3-letter book names, periods to separate
- HTML lta name""gt available as well
- More useful in notes/commentary, not Bible
- Back-of-book index entries
- ltindex level1"Idols" level2 "burning of"
level3"by Hezekiah"gt - ltindex level1"False gods" see"Idols"gt
36When it won't come out even
- If several verse are translated as (say) a p
- Put all the appropriate osisIDs on the p
- ltp osisID"Matt.1.1 Matt.1.2"gt
- If a verse is split across paragraphs
- Tag each part use splitID to number them
- ltpgtltverse osisID"1Pe.1.3" splitID"1"gtlt/versegtlt
/pgt ltpgtltverse osisID"1Pe.1.3"
splitID"2"gtlt/versegtlt/pgt - milestone_Start milestone_End
- Used to mark units that cross boundaries
- abbr closer div foreign l lg q salute seg signed
speech verse
37References
- Reference to other places/works
- ltnotegtSee also ltreference osisRef
"Mat.1.1"gtMatthewlt/referencegt for a similar
theme.lt/notegt - div, figure, note, and reference can also
directly refer - ltdiv type"commentary" osisRef"Luk.3.10"gt
- This identifies the passage this commentary div
is about. - HTML lta href""gt also available
- (more useful in notes/commentary, not Bible)
38Reference syntax
'code point', character
NIV.HebPsa.42.1-Psa.43.12_at_cp12
book
verse
edition
chapter
refsystem
grain type
grain value
39Notes
- Notes are placed right where they are referenced
in the text. - Notes have several types
- allusion alternative background citation
devotional exegesis explanation study translation
enumeration variant - Additional types must start with "x-"
- catchWord -- marks referenced text cited within a
note - ltnotegtltcatchWordgthellolt/catchWordgt may also be
translated "goodbye" here.lt/notegt - rdg -- marks alternate readings
40On to the authority system
- The name is the thing, and the true name is the
true thing. To know the name is to control the
thing. -- Ursula LeGuin
41Cast-lists
- To declare cast of characters
- Provides a formal ID for each
- Can refer to ID from ltspeakergt, ltqgt, etc.
- castList
- castGroup
- castItem
- actor
- role
- roleDesc
42The authority system
- Only supported for castList at present
- We intend to provide
- A schema for declaring sets of formal names
- A way to invoke such lists in documents
- Standard name sets for
- Bible versions
- Versification schemes
- People, places, etc. in the Bible
- Journals, classical literature, and other works
commonly cited in Biblical studies
43OSIS in practice
- Tourist to police officer Can you tell me how
to get to Carnegie Hall? - Officer to tourist Practice, practice,
practice.
44How do I know if the markup is correct?
- 5 levels of 'correct'
- SLipshod
- Only well-formed
- Valid
- Accurate
- Complete
- SL no check required
- O Load in IE 5
- V xp, xmetal, and other true validators
- A requires human proofreading and interpretation
- C there is always more that could be marked up
45Tools vs. today
- Today we will use the raw form
- Experts will need to know this
- Users should have protective software
- Some XML editing programs
- SoftQuad XMetal -- 300
- Open Office -- free, very promising
- Some generic-enough HTML editors
- BBEdit, emacs, Netscape Communicator
46Getting to OSIS
- The cleaner your data, the easier it is
- Data is seldom as clean as you think it is
- Structured formats (USFM, XSEM, LGM, ThML) are
the easiest sources - Tools
- Perl/awk/sed/cc and the like
- XSLT if coming from XML
- BTG has sponsored development of several
convertors. - BTG will maintain a repository of utilities
47Getting your OSIS XML to display in IE
- Make sure the document is at least WF
- Name it filename.xml
- Refer to a stylesheet if you want formatting
instead of just an outline view - lt?xml version"1.0"?gtlt!DOCTYPE osis
gtlt?xml-stylesheet href"mystyle.css"
type"text/css"?gtltosis xmlns"http//www.biblete
chnologies.org/namespaces/OSIS-1.1"gtltheadergt
48Getting your OSIS printed
- Most typesetting programs now import XML
- OSIS converts easily to most relevant XML
schemas, using XSLT - Word processors are also gaining ability to
import arbitrary XML - Typesetting firms, esp. for journals, are
starting to accept XML as well.
49Near-term concerns of OSIS
- Linguistic annotation
- Formal name lists for people, places,
translations, etc. - Connecting text to multimedia
- Greater support for secondary genres
- Tool development and conformance
50How you can help
- Find the best place to apply OSIS in your
organization, and do it. - Join a Working Group
- Send feedback, feature requests, etc.
- Join a Working Group
- Convert or create OSIS texts
- Join a Working Group
- Create a converter for your current format
- Join a Working Group
- Tell your friends and colleagues
- Join a Working Group
51For more information
- Web
- http//www.bibletechnologies.org
- http//www.bibletechnologieswg.org
- Some contacts
- Steve DeRose sderose_at_acm.org
- Kees de Blois deblois_at_zeelandnet.nl
- Patrick Durusau pdurusau_at_emory.edu
- Kirk_Lowery klowery_at_wts.edu
- Mike_Perez MPerez_at_forministry.com