Schemas for XML - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

Schemas for XML

Description:

Refinable archetypes, or 'inheritance'. DTD content models are 'closed' ... archetype name='address' model='refinable' sequence ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 33
Provided by: UFO
Category:
Tags: xml | archetype | schemas

less

Transcript and Presenter's Notes

Title: Schemas for XML


1
Schemas for XML
  • By Norman Walsh
  • From http//www.xml.com/pub/1999/07/schemas/index
    .html

2
Introduction
  • Schemas will have a broad impact on the future of
    XML for two reasons
  • first because they will define what it means for
    an XML document to be valid and
  • second because they are a radical departure from
    Document Type Definitions (DTDs), the existing
    schema mechanism inherited from SGML.

3
What is a Schema?
  • A schema is a model for describing the structure
    of information.
  • A term borrowed from the database world to
    describe the structure of data in relational
    tables.
  • In the context of XML, a schema describes a model
    for a whole class of documents.
  • The model describes the possible arrangement of
    tags and text in a valid document.
  • A schema might also be viewed as an agreement on
    a common vocabulary for a particular application
    that involves exchanging documents.

4
What is a Schema?
  • Is this a valid postal address?

ltaddressgt ltnamegtNamron H. Slawlt/namegt
ltstreetgt256 Eight Bit Lanelt/streetgt ltcitygtEast
Yahoolt/citygt ltstategtMAlt/stategt
ltzipgt12481-6326lt/zipgt lt/addressgt
  • Mentally, you compare the address with a schema
    that you have in your head for addresses.

5
What is a Schema?
  • In schemas, models are described in terms of
    constraints.
  • A constraint defines what can appear in any given
    context.
  • Two kinds of constraints that you can give
  • content model constraints describe the order and
    sequence of elements and
  • datatype constraints describe valid units of data.

6
What is a Schema?
  • For example, a schema might describe a valid
    ltaddressgt with the content model constraint that
  • it consists of a ltnamegt element, followed by
  • one or more ltstreetgt elements, followed by
  • exactly one ltcitygt, ltstategt, and ltzipgt element.
  • The content of a ltzipgt might have a further
    datatype constraint that it consist of either a
    sequence of exactly five digits or a sequence of
    five digits, followed by a hyphen, followed by a
    sequence of exactly four digits. No other text is
    a valid ZIP code.

7
What is a Schema?
  • The purpose of a schema is to allow machine
    validation of document structure.
  • The following is not valid according to the
    informal schema.

ltaddressgt ltnamegtNamron H. Slawlt/namegt
ltstreetgt256 Eight Bit Lanelt/streetgt ltcitygtEast
Yahoolt/citygt ltstategtMAlt/stategt
ltstategtCTlt/stategt ltzipgtbluelt/zipgt lt/addressgt
8
What is a Schema?
  • The ability to test the validity of documents is
    going to be an important aspect of large web
    applications that are receiving and sending
    information to and from lots of sources.
  • If you're receiving XML transactions over the
    web, you don't want to process the content into
    your database if it's not in the proper schema.
  • The earlier, and easier it is, to catch this sort
    of error, the better off you will be.

9
Limitations of DTD
  • XML inherited DTDs from SGML.
  • DTDs can be used to define content models (the
    valid order and nesting of elements) and, to a
    limited extent, the datatypes of attributes, but
    they have a number of obvious limitations
  • different (non-XML) syntax
  • no support for namespaces
  • extremely limited datatyping
  • a complex and fragile extension mechanism based
    on little more than string substitution

10
Limitations of DTD
  • The worst thing about the DTD extension mechanism
    (parameter entities) is that it does not really
    make relationships explicit.
  • Two elements defined to have the same content
    models are not the same thing in any explicit
    way.
  • Likewise, a group of attributes defined as a
    parameter entity and reused are not logically a
    group, they're just "coincidentally" a group.

11
Limitations of DTD
  • XML Schema overcome these limitations and are
    much more expressive than DTDs.
  • The additional expressiveness will allow web
    applications to exchange XML data much more
    robustly without relying on ad hoc validation
    tools.

12
Limitations of DTD
  • In the short term DTDs still have a number of
    advantages
  • Widespread tools support. All SGML tools and many
    XML tools can process DTDs.
  • Widespread deployment. A large number of document
    types are already defined using DTDs HTML,
    XHTML, DocBook, TEI, J2008, CALS, etc.
  • Widespread expertise and many years of practical
    application.

13
Features of Schema
  • Richer datatypes
  • booleans, numbers, dates and times, URIs,
    integers, decimal numbers, real numbers,
    intervals of time, etc.
  • In addition to these simple, predefined types,
    there will be facilities for creating other types
    and aggregate types.
  • User defined types
  • define your own named datatype.
  • For example, you might define a "PostalAddress"
    datatype and then define two elements,
    "ShippingAddress" and "BillingAddress" to be of
    that type.

14
Features of Schema
  • Attribute grouping.
  • It's not uncommon to have several attributes that
    "go together". Attribute grouping allows the
    schema author to make this relationship explicit.
  • In DTDs, the grouping can be achieved with a
    parameter entity, simplifying the process of
    authoring a DTD, but the information is not
    passed on to the processor.
  • Refinable archetypes, or "inheritance".
  • DTD content models are closed.
  • Open content models are in the other extreme.
  • Refinable content models are in the middle.

15
Features of Schema
  • Namespace support.
  • Since the introduction of Namespaces in XML,
    validation has become much more difficult.
  • In fact, until the XML Schema work is completed,
    it just is not practical to validate documents
    that use namespaces.

16
Validity
  • Reasons why need to validate documents
  • You're doing electronic commerce and you want to
    know that the purchase order you just received is
    exactly what you expect.
  • (B2B) If you receive a record from your partner's
    database via XML, you want to be sure that it's
    valid before you hand it off to the conversion
    tool that will insert it into your database.
  • The XML document you're constructing is going to
    control some overnight batch process and you want
    to make sure that the instructions you're sending
    are ones the processor is going to understand.
  • You have got a 1000 XML documents that you want
    to publish on a CD-ROM. You want to be confident
    that your stylesheet will present each of them
    correctly without proofing each and every one by
    hand.

17
Validity
  • Using a schema and a validating parser offers one
    standard way to test your documents.
  • Valid documents can still be semantically wrong
  • you can submit a purchase order that asks for a
    hundred boxes of staples when you meant to ask
    for ten, but checking validity catches a lot of
    "obvious" errors.

18
Validity
  • Every document can be defined in one of four
    ways
  • If it is not well-formed, it is not XML.
  • If an XML document does not identify a schema to
    which it claims to conform, then it is simply
    well-formed.
  • If a schema is associated with a document, and
    the document does not fit within the model
    described by that schema, it is well-formed but
    not valid.
  • If a schema is associated with a document, and
    the document does not violate any of the
    constraints of that schema, it is well-formed and
    valid.

19
Content Model Validity
  • Content model validity tests whether the order
    and nesting of tags is correct.
  • In XML Schema syntax, the content model of an
    address could be described like this

ltelementType name"address"gt ltsequencegt
ltelementTypeRef name"name" minOccur"1"
maxOccur"1"/gt ltelementTypeRef name"street"
minOccur"1" maxOccur"2"/gt ltelementTypeRef
name"city" minOccur"1" maxOccur"1"/gt
ltelementTypeRef name"state" minOccur"1"
maxOccur"1"/gt ltelementTypeRef name"zip"
minOccur"1" maxOccur"1"/gt ltelementTypeRef
name"country" minOccur"0" maxOccur"1"/gt
lt/sequencegt lt/elementTypegt
20
Datatype Validity
  • Datatype validity is the ability to test whether
    specific units of information are of the correct
    type and fall within the specified legal values.
  • For example, if I am writing a schema for catalog
    order forms, I should be able to express the
    constraint that the quantity ordered is greater
    than zero.
  • The ability to express datatype validity in a
    schema is one of the really new features of XML
    Schema.
  • Although database schema have always had this
    ability, XML DTDs do not.
  • DTDs have extremely limited datatyping.

21
Syntax
  • At bottom, a schema describes the content of
    elements and attributes.
  • Example The name Element Type

ltelementType name"name"gt ltmixed/gt lt/elementTypegt
22
Syntax
  • Example A ZIP Code Datatype and the ZIP Element
    Type

ltdatatype name"zipCode"gt ltbasetype
name"string"/gt ltlexicalRepresentationgt
ltlexicalgt99999lt/lexicalgt ltlexicalgt99999-9999lt/le
xicalgt lt/lexicalRepresentationgt lt/datatypegt
5 digits or 5 digits - 4 digits
ltelementType name"zip"gt ltdatatypeRef
name"zipCode"/gt lt/elementTypegt
23
Syntax
  • Example An Address in Schema Notation

ltelementType name"address"gt ltsequencegt
ltelementTypeRef name"company" minOccur"0"
maxOccur"1"/gt ltelementTypeRef name"name"
minOccur"1" maxOccur"1"/gt ltelementTypeRef
name"street" minOccur"1" maxOccur"2"/gt
ltelementTypeRef name"city" minOccur"1"
maxOccur"1"/gt ltelementTypeRef name"state"
minOccur"1" maxOccur"1"/gt ltelementTypeRef
name"zip" minOccur"1" maxOccur"1"/gt
lt/sequencegt lt/elementTypegt
lt!ELEMENT address (company?, name, street, city,
state, zip)gt
24
Syntax
  • Example An Address in DTD Notation

lt!ELEMENT address (company?, name, street,
city, state, zip)gt
  • Example An Address with Parameter Entities

lt!ENTITY address "company?, name,
street, city, state, zip"gt lt!ELEMENT
billing.address (address)gt lt!ELEMENT
shipping.address (address)gt
25
Syntax
  • Example An Address Archetype in Schema

ltarchetype name"address" model"refinable"gt
ltsequencegt ltelementTypeRef name"company"
minOccur"0" maxOccur"1"/gt ltelementTypeRef
name"name" minOccur"1" maxOccur"1"/gt
ltelementTypeRef name"street" minOccur"1"
maxOccur"2"/gt ltelementTypeRef name"city"
minOccur"1" maxOccur"1"/gt ltelementTypeRef
name"state" minOccur"1" maxOccur"1"/gt
ltelementTypeRef name"zip" minOccur"1"
maxOccur"1"/gt lt/sequencegt lt/archetypegt
ltelementType name"billing.address"gt
ltarchetypeRef name"address"/gt lt/elementTypegt
ltelementType name"shipping.address"gt
ltarchetypeRef name"address"/gt lt/elementTypegt
26
Syntax
  • significant advantages of an archetype (from the
    previous example)
  • The archetype is refinable. This means that I can
    derive new, related address types from it. I
    could create, for example, a return address that
    included everything in an address but added an
    element to hold the RMA (return merchandise
    authorization) number.
  • The relationship that a billing.address is an
    address and a shipping.address is an address is
    explicit.

Implicit
lt!ELEMENT billing.address (company?,
name, street, city, state, zip)gt lt!ELEMENT
shipping.address (company?, name,
street, city, state, zip)gt
27
Example A Purchase Order
lt!DOCTYPE purchase.order SYSTEM "po.dtd"gt
ltpurchase.ordergt ltdategt16 June 1967lt/dategt
ltbilling.addressgt ltnamegtNamron H. Slawlt/namegt
ltstreetgt256 Eight Bit Lanelt/streetgt
ltcitygtEast Yahoolt/citygt ltstategtMAlt/stategt
ltzipgt12481-6326lt/zipgt lt/billing.addressgt
ltitemsgt ltitemgt ltquantitygt3lt/quantitygt
ltproduct.numbergt248lt/product.numbergt
ltdescriptiongtDecorative Widget, Red,
Largelt/descriptiongt ltunitcostgt19.95lt/unitcostgt
lt/itemgt ltitemgt ltquantitygt1lt/quantity
gt ltproduct.numbergt1632lt/product.numbergt
ltdescriptiongtPacked electron storage container,
AA, 4-packlt/descriptiongt ltunitcostgt4.95lt/unitc
ostgt lt/itemgt lt/itemsgt lt/purchase.ordergt
28
lt!DOCTYPE schema SYSTEM "o/reference/w3c/schema/s
tructures.dtd"gt ltschemagt ltarchetype
name"address" model"refinable"gt ltsequencegt
ltelementTypeRef name"company" minOccur"0"
maxOccur"1"/gt ltelementTypeRef name"name"
minOccur"1" maxOccur"1"/gt ltelementTypeRef
name"street" minOccur"1" maxOccur"2"/gt
ltelementTypeRef name"city" minOccur"1"
maxOccur"1"/gt ltelementTypeRef name"state"
minOccur"1" maxOccur"1"/gt ltelementTypeRef
name"zip" minOccur"1" maxOccur"1"/gt
lt/sequencegt lt/archetypegt ltelementType
name"billing.address"gt ltarchetypeRef
name"address"/gt lt/elementTypegt ltelementType
name"shipping.address"gt ltarchetypeRef
name"address"/gt lt/elementTypegt
29
ltelementType name"items"gt ltelementTypeRef
name"item" minOccur"1"/gt lt/elementTypegt
ltelementType name"item"gt ltsequencegt
ltelementTypeRef name"quantity" minOccur"1"
maxOccur"1"/gt ltelementTypeRef
name"product.number" minOccur"1" maxOccur"1"/gt
ltelementTypeRef name"description"
minOccur"1" maxOccur"1"/gt ltelementTypeRef
name"unitcost" minOccur"1" maxOccur"1"/gt
lt/sequencegt lt/elementTypegt ltelementType
name"purchase.order"gt ltsequencegt
ltelementTypeRef name"date" minOccur"1"
maxOccur"1"/gt ltelementTypeRef
name"billing.address" minOccur"1"
maxOccur"1"/gt ltelementTypeRef
name"shipping.address" minOccur"0"
maxOccur"1"/gt ltelementTypeRef name"items"
minOccur"1" maxOccur"1"/gt lt/sequencegt
lt/elementTypegt
30
ltelementType name"company"gt ltmixed/gt
lt/elementTypegt ltelementType name"name"gt
ltmixed/gt lt/elementTypegt ltelementType
name"street"gt ltmixed/gt lt/elementTypegt
ltelementType name"city"gt ltmixed/gt
lt/elementTypegt ltelementType name"state"gt
ltmixed/gt lt/elementTypegt
ltdatatype name"zipCode"gt ltbasetype
name"string"/gt ltlexicalRepresentationgt
ltlexicalgt99999lt/lexicalgt ltlexicalgt99999-9999lt/l
exicalgt lt/lexicalRepresentationgt lt/datatypegt
ltelementType name"zip"gt ltdatatypeRef
name"zipCode"/gt lt/elementTypegt ltelementType
name"product.number"gt ltmixed/gt
lt/elementTypegt ltelementType name"description"gt
ltmixed/gt lt/elementTypegt
31
ltdatatype name"quantityType"gt ltbasetype
name"integer"/gt ltminExclusivegt0lt/minExclusivegt lt
/datatypegt ltelementType name"quantity"gt
ltdatatypeRef name"quantityType"/gt
lt/elementTypegt ltdatatype name"currency"gt
ltbasetype name"decimal"/gt ltprecisiongt8lt/precisio
ngt ltscalegt2lt/scalegt lt/datatypegt ltelementType
name"unitcost"gt ltdatatypeRef name"currency"/gt
lt/elementTypegt ltelementType name"date"gt
ltdatatypeRef name"dateTime"/gt lt/elementTypegt lt/
schemagt
32
Conclusion
  • Schemas greatly improves over DTDs.
  • Certain kinds of applications can be made more
    interoperable by XML Schema. For example
  • exchanging information between databases, and
    ecommerce
  • DTDs are well understood and they do offer a good
    way to describe the structure of an document for
    interchange.
  • It will take some time before XML Schema are as
    well understood.
Write a Comment
User Comments (0)
About PowerShow.com