Title: Genericity and flexibility in the NIR schema
1Genericity and flexibility in the NIR schema
- Fabio Vitali, University of Bologna
2Genericity
- Defined as the capacity of a document structure
to - correctly deal with an open set of document
types, - adapting when possible,
- extending when not possible
- Ideally, a generic document structure allows the
correct and precise description of document types
that did not exist when the structure was
invented - Genericity does not imply the lack of
specificity, but the existence of a number of
tricks to invent specificity when none existed in
advance
3Flexibility
- Defined as the capacity of a document structure
to - Correctly deal with an open set of uses of the
documents - Describing when possible
- Deducing when not possible
- Ideally, a flexible document structure allows the
correct and precise applicability of documents in
situations and for uses that did not exist when
the document was marked up. - Flexibility does not mean the lack of precise
support for specific features, but the existence
of tricks to invent precise support for specific
features when none existed in advance
4The size of the problem
- Italian citizens are affected by thousands of
norms expressed at three different levels - Local documents (organization, municipality,
region) - National (Laws, decrees, etc.)
- International (UE, treaties, etc.)
- These provide a number of more than 200 different
types of documents, from Legge to Bando del
Duce del Fascismo to Direttiva della Comunità
Europea, etc. - Each has different names and uses and success,
but most share the same overall structure. - We have identified three structures
- Strictly hierarchical (documento articolato)
- Partially hierarchical (documento semiarticolato)
- Without any visible structure (documento NIR)
5Shared and peculiar vocabulary
- One of the issues we have to face is that
although the overall structure is the same, names
and order of containment may vary concretely and
frequently. - E.g.
- Articolo, article, 1.
- Part, title, book and head appear in different
orders in different types of documents - Section only appears in some documents
- Solutions
- Lack of prescription (plenty for editors, none
for authors) - Abundance of description (72 elements for norms
132 metadata elements 30 xhtml elements for
exceptions) - Generic elements (16 elements)
6Applications
- This is an incomplete list of applications (i.e.,
uses) for a NIR document - Sophisticated on screen display (open wrt.
devices, o.s., browsers, skins, additional info,
user wishes) - Sophisticated print on paper (ditto)
- Support for references (through hypertext links)
- Reflection (provide management information about
itself even outside of document management
systems) - Support for consolidation (e.g., answering to
questions such as What was the enacted text on
2002/03/05? as well as Which modification norm
and when caused the existence of this
fragment?,etc. - Sophisticated search (all laws signed by
Berlusconi) - Support for provisions (i.e., formal description
of types and arguments of individual norm
snippets). - from authoritative sources, professionals and
amateurs. - Etc
7Design issues for NIR (1)
- Data structure rather than application
- Norme In Rete knows about applications, but is
not dependent on any use of the data and is not
specifically targeted towards any specific
application (except presentation) - Rigorous distinction of roles
- The author of a norm is the legislator, the
provider of the actual XML document is the
editor. The legislator is GOD (his decisions
cannot be discussed), but He only speaks through
the text of the norms. - The editor can add a large quantity of
information, which has no official status. The
very act of adding tag is an editorial operation,
subjective and open to discussions. - In fact, any addition coming from editors
(structure identification, notes, comments,
interpretation) happens outside of the document
content (in markup structures or in special
metadata sections) - Nonetheless, the editor can and must provide as
precise and specific markup as possible
8Design issues for NIR (2)
- Complexity of the access to texts
- Many editors, many publishing systems, many
copies in different stages of evolution - There is no authoritative source of XML documents
(only of printed documents). - One web site could forget about updating a law to
the latest version - Use of URN allows to refer to the text of a law
without identifying a single existing
authoritative source. - Support for description and prescription
- Tagging of existing texts can only be descriptive
(supporting any possible mess that the legislator
may have put in) - Support for legal drafting can be provided,
suggesting or enforcing legal drafting rules in
the writing.
9Design issues for NIR (3)
- Everything has a reliable name
- Every legal structure needs to be referenced and
accessible. - References need to be unambiguous, universal,
definitive. - URN for whole documents,
- Mandatory id attributes for substructures and
spans - XPointers for even smaller entities.
- Specific support for multiple interpretations
- Disposizioni (law provisions) can be identified
and specified on the text. - Multiple different interpretations of the same
text must be allowed - So they can be placed outside of the main
document.
10Design issues for NIR (6)
- Clean separation between objective properties and
interpretation - Objective properties can be marked by low-level
editors, while interpretation requires experts
and high-level editors. - Objective (manifest) properties include
identification of boundaries (articles, clauses,
etc.) and official facts about texts (publication
dates, etc.) - Interpretation includes identification of dates,
identification of normative content of the texts
provisions, application of modifications. - Objective properties need to be added when
marking the document rather than later on (more
expensive). Subjective properties can be added at
any time.
11Flexibility through variability
- High description, low prescription level
- Mostly no constraint on the content, much more on
the metacontent - Systematic extensions for local purposes
- Clear distinction between
- Mandatory structures (very few)
- Recommended structures
- Optional structures
- Extension model
12Generic elements
- Most NormeInRete elements are organized into four
categories (or, rather, design patterns) - Containers (hierarchies and separators)
- Blocks (containers of text with vertical
arrangement) - Inline (containers of text with horizontal
arrangement) - Properties (atomic values outside of the main
document flow. E.g., metadata, signatures, etc.) - Each category also provides a generic element,
that can be used whenever there is no specific
element of that category to be used. The name
attribute allows to provide detail to the
additional element - ltnirpartgt ltnircontainer namepartgt
- ltnirblock typefoobargt ltnirfoobargt with
block content model
13Genericity and description
- The risk is of using generic elements ignoring
description completely. - Wrong
- Description costs little
- Description allows identification (given two
containers without name attribute, are they the
same type of element, or are they two different
containers?) - Description does not have to be subjected to
policies, but is any exist, it can be enforced. - In NIR the name attribute is mandatory for all
generic elements, and the equivalence exist
between a named element and the corresponding
element of the same category with the appropriate
name attribute
14The Schemas for NIR documents
- 3 different DTDs
- Strict rules (prescriptive legal drafting)
- Loose rules (descriptive existing norms)
- Light rules (support for most common cases
simple, everyday norms) - They are intercompatible
- The vocabulary is exactly the same
- All light documents are also loose
- All strict document are also loose
15The overall structure of the NIR DTD
light.dtd
isodia.pen isogrk3.pen isolat1.pen isolat2.pen iso
num.pen isopub.pen isotech.pen
proprietary.dtd
meta.dtd
norms.dtd
text.dtd
globals.dtd
strict.dtd
loose.dtd
16An example of descriptive markup
- NIR Complete
- lt!ELEMENT book (num, heading?, (parttitlehead
article))gt - NIR Flexible
- lt!ELEMENT book (num?, heading?,
(parttitleheadsectionparagrapharticlepartit
ioncontainer))gt
17An example of generic markup
- lt!ELEMENT hierarchy (l1 l2 l3 l4 l5
l6)gt - lt!ELEMENT l1 (num?, tit?, (block
l2l3l4l5l6))gt - lt!ELEMENT l2 (num?, tit?, (block l3 l4 l5
l6))gt - lt!ELEMENT l3 (num?, tit?, (block l4 l5
l6))gt - lt!ELEMENT l4 (num?, tit?, (block l5 l6))gt
- lt!ELEMENT l5 (num?, tit?, (block l6))gt
- lt!ELEMENT l6 (num?, tit?, (block))gt
- All generic elements have a mandatory name
attribute.
18Examples of extensibility
- Editor footnotes
- Editor inline notes
- Global vs. proprietary metadata
- Additional arguments to provisions
- xhtml elements for typographical properties
without semantic justification (e.g., bold for
emphasis) - ltspan classfoobargt and ltinline namefoobargt
for specifying the inline element foobar - ltdiv classfoobargt and ltblock namefoobargt
for specifying the block element foobar
19Conclusions
- Genericity and flexibility do not need to happen
at the expenses of detail and appropriate
description. - Temptation to choose the easiest road is strong.
- It must be resisted. Markup time is the best time
to dump all available information on a document,
information that in future could be hard to find
or associate to the content - The document should come out of the markup
session as complete and rich of information as
possible. - The genericity and flexibility mechanisms of Norm
In rete should help in that.