Title: SDO and XML
1SDO and XML
Frank Budinsky IBM Jan 7, 2008
2Agenda
- Scenario
- XML Fidelity Issues
- Tolerance Issues
- SDO and XPath
- Type System
- Summary
3Scenario SDO over XML
- WS/SOA programmer who wishes to read/write/modify
an XML document using dynamic SDO - The XML document conforms to a predefined XML
Schema, for example a Business Object or Message - (Note schema-less XML does not appear to be a
common use case) - The XSD is usually defined by a third party
- The XML document has a natural physical
serialization as XML - The programmer understands XML and XML Schema,
and has knowledge of the particular schema in
question - SDO requirements
- XML Fidelity - API and model must support all
valid XML schemas - Naturalness - API, model and behavior must seem
natural to an XML-savvy programmer - Performance - API must not inject features that
prevent high-performance implementations - Tolerance - must be able to tolerate some degree
of erroneous data
4Scenario SDO over non XML
- WS/SOA programmer who wishes to read/write/modify
an non XML data using dynamic SDO - The non XML data conforms to a predefined XML
Schema, for example a Business Object or Message - The XSD is usually defined by a third party (see
next slide) - The XML document does not have a natural
physical serialization as XML. It comes from data
sources that have other native serializations,
e.g. COBOL data structures (DFDL) - The programmer understands XML and XML Schema,
and has knowledge of the particular schema in
question - SDO requirements
- XML Fidelity - API and model must support all
valid XML schemas - Naturalness - API, model and behavior must seem
natural to an XML-savvy programmer - Performance - API must not inject features that
prevent high-performance implementations - Tolerance - must be able to tolerate some degree
of erroneous data
5XML Schema types
- XSD has become the standard way to define data
structures shared by industry-specific
applications - Industrial schemas
- OTA (Travel) - http//www.opentravel.org/
- OpenTravels primary activity is to develop and
maintain a library of Extensible Markup Language
(XML) schemas for use by the travel industry - HL7 (Health Care) - http//www.hl7.org/
- Develop coherent, extendible standards that
permit structured, encoded health care
information of the type required to support
patient care, to be exchanged between computer
applications while preserving meaning - OAGIS (B2B) - http//www.openapplications.org/
- UBL (B2B) - http//docs.oasis-open.org/ubl/os-UBL-
2.0/UBL-2.0.html - ACORD (Insurance) - http//www.acord.org/
- Parlay (Telco)
- SWIFT (Financial)
- IFX (Financial)
- OFX (Financial)
- PIDXML (Petroleum Trading)
Thousands of types - performance issue during
importing and building - usability, only a
small subset of types are used, and hard to
locate type used or root type
6Agenda
- Scenario
- XML Fidelity Issues
- Tolerance Issues
- SDO and XPath
- Type System
- Summary
7Names containing . characters
- "." character is commonly used in element names
of standard industry schemas - but, "." is used in SDO path as the delimiter for
the index of a many-valued property - foo.0 is used to access the first foo
property, not property foo.0 - Possible solutions
- Clarify name.index semantic (still broken corner
case) - Deprecate name.index syntax
8Duplicates attributes and elements
- Example (2 foo properties)
- ltxsdcomplexType name"DT1"gt ltxsdsequencegt
ltxsdelement name"foo" type"xsdstring" /gt
lt/xsdsequencegt ltxsdattribute name"foo"
type"xsdint" /gt - lt/xsdcomplexTypegt
- Should we use _at_foo to differentiate?
- Breaking change?
- SDOPath vs. XPath issue
9Duplicates different namespaces
- Example (2 custNum properties)
- ltschema targetNamespace"tns1" elementFormDefault
"qualified" ... gt - ltcomplexType name"CustomerBase"gt
- ltsequencegt
- ltelement name"custNum" type"string"/gt
- lt/sequencegt
- lt/complexTypegt
- lt/schemagt
- ltschema targetNamespace"tns2" elementFormDefault
"qualified" ... gt - ltimport namespace"tns1" ... /gt
- ltcomplexType name"Customer"gt
- ltcomplexContentgt
- ltextension base"tns1CustomerBase"gt
- ltsequencegt
- ltelement name"custNum" type"string"/gt
- lt/sequencegt
- lt/extensiongt
- lt/complexContentgt
10Duplicates in model group
ltxscomplexType name"IVL_INT"gt
ltxscomplexContentgt ltxsextension
base"SXCM_INT"gt ltxschoice
minOccurs"0"gt ltxssequencegt
ltxselement name"low" minOccurs"1"
maxOccurs"1" type"IVXB_INT"/gt
ltxschoice minOccurs"0"gt
ltxselement name"width" minOccurs"0"
maxOccurs"1" type"INT"/gt
ltxselement name"high" minOccurs"0"
maxOccurs"1" type"IVXB_INT"/gt
lt/xschoicegt lt/xssequencegt
ltxselement name"high" minOccurs"1"
maxOccurs"1" type"IVXB_INT"/gt
ltxssequencegt ltxselement
name"width" minOccurs"1" maxOccurs"1"
type"INT"/gt ltxselement
name"high" minOccurs"0" maxOccurs"1"
type"IVXB_INT"/gt lt/xssequencegt
ltxssequencegt
ltxselement name"center" minOccurs"1"
maxOccurs"1" type"INT"/gt
ltxselement name"width" minOccurs"0"
maxOccurs"1" type"INT"/gt
lt/xssequencegt lt/xschoicegt
lt/xsextensiongt lt/xscomplexContentgt
lt/xscomplexTypegt
- HL7 types are defined in UML, including semantic
constraints - Published XSD is generated, represents both the
type and the constraints - Results in some XSDs that are difficult to
consume into SDO - Example
- IVL_INT
- low IVXB_INT
- width INT
- high IVXB_INT
- center INT
- Semantic constraints (restrictions)
- low and center cannot both be set
- high and center cannot both be set
- low, high, and width cannot all be set
11Duplicates in model group (cont)
- Unfortunately, this XSD also imposes some
ordering constraints - valid instances according to the XSD
- empty, ltlowgt, ltlowgtltwidthgt, ltlowgtlthighgt,
lthighgt, ltwidthgt, ltwidthgtlthighgt, ltcentergt,
ltcentergtltwidthgt - Note that ltwidthgtltlowgt, for example, is invalid,
even though it is valid to set both the width and
low properties at the type level - The SDO Type needs to be sequencedtrue, so that
the user can access and control the order of the
set properties - this is unfortunate since the intent of the
original type was not wanting to specify XML
ordering - its a side effect of using XSD as the standard
interchange language - Bottom line SDO must support this type in a
(clean) standard way - If SDO can do better than other XML binding
technologies, it would really strengthen the SDO
story
12Duplicates anonymous types
- Example (2 person types)
- ltxsdcomplexType name"complextype1"gt
- ltxsdsequencegt
- ltxsdelement name"person"gt
- ltxsdcomplexTypegt
- ltxsdsequencegt
- ltxsdelement name"first-name"
type"xsdstring"/gt - ltxsdelement name"last-name"
type"xsdstring"/gt - lt/xsdsequencegt
- lt/xsdcomplexTypegt
- lt/xsdelementgt
- lt/xsdsequencegt
- lt/xsdcomplexTypegt
- ltxsdcomplexType name"complextype2"gt
- ltxsdsequencegt
- ltxsdelement name"person"gt
- ltxsdcomplexTypegt
- ltxsdsequencegt
13Special value property
- ComlexType extending SimpleType use a special
value property to access the simple content - can conflict with value attributes
- Example (2 value properties)
- ltselect name"product"gt ltoption
value"1"gtToasterlt/optiongt - lt/selectgt
14Duplicate null TNS definitions
- OTA has many conflicting null-namespace
definitions, intended to be included in the
correct context - Example (2 Address types)
- Address1.xsd
- lt!-- This schema is created for reuse only and
is intended to be included in a non-null tns
schema --gt - ltschemagt lt!-- null tns --gt
- ltcomplexType name"Address"gt lt!-- definition 1
--gt lt/complexTypegt - lt/schemagt
- Address2.xsd
- lt!-- This schema is created for reuse only and
is intended to be included into a non-null tns
schema --gt - ltschemagt lt!-- null tns --gt
- ltcomplexType name"Address"gt lt!-- definition 2
--gt lt/complexTypegt - lt/schemagt
- Other XSD files include these in the correct
context - Customer1.xsd
- ltschema targetNamespace"tns1"gt
- ltinclude schemaLocation"Address1.xsd"/gt
- ltcomplexType name"Customer1"gt
- ltsequencegtltelement name"address"
type"Address"/gt lt/sequencegt - lt/complexTypegt
15xsinil with attributes
- Example (intRange is a nillable property that
can include attributes) - ltxsdelement name"query"gt
- ltxsdcomplexTypegt
- ltxsdsequencegt
- ltxsdelement name"intRange"
type"IntegerRange" nillable"true/gt - lt/xsdsequencegt
- lt/xsdcomplexTypegt
- lt/xsdelementgt
- ltxsdcomplexType name"IntegerRange"gt
- ltxsdattribute name"min" type"xsdinteger" /gt
- ltxsdattribute name"max" type"xsdinteger" /gt
- lt/xsdcomplexTypegt
- The following instance is valid, but cannot be
represented in SDO using the value null - ltquerygt
- ltintRange min"1" xsinil"true"/gt
- lt/querygt
- Maybe add XMLHelper.isNil(DataObject object)?
16Other fidelity issues
- Handling of xsdredefine
- Substitution groups (element control and
roundtrip) - Support for soapencArray
- Attribute-properties in a sequenced type
- More?
17Agenda
- Scenario
- XML Fidelity Issues
- Tolerance Issues
- SDO and XPath
- Type System
- Summary
18Error Tolerance
- SDO needs to define its behavior when loading
invalid instances. It can either - Completely fail to load the invalid instance
- Ignore the invalid portion of the instance
(possibly recording the error) and load the
remainder of the XML. If the instance is later
serialized, the invalid portion of the XML will
have been discarded - Same as 2, but retain and continue to serialize
the invalid portion of the XML
19Example missing NS prefix
- Assume element sayHello in a schema with
elementFormDefault"qualified" - What if we try to load an xml instance that looks
like this - lttnssayHello xmlnstns"http//QuickTest/HelloWor
ld"gt - ltinput1gthellolt/input1gt
- lt/tnssayHellogt
- This is a common error (input1 is not qualified)
- Possible solution
- lax mode option when a property with the same
name, but different namespace, exists, then it
will be associated with it
20Example Unresolved declaration type
- What should be the expected behavior if
XSDHelper.define() is called with the following
schema - ltxsdcomplexType name"T8"gt
- ltxsdsequencegt
- ltxsdelement name"neElement"
type"neNonExistentType"/gt - lt/xsdsequencegt
- ltxsdattribute name"neAttribute"
type"neNonExistentType"/gt - lt/xsdcomplexTypegt
- Possible solution unresolved type mapped to
sdoObject?
21Tolerance approach
- Should SDO loader tolerate XML with
- Missing namespace prefix
- Unresolved xsitype
- Incompatible content
- Problems validating xsdany processContentsstric
t - Unresolved declaration type
- Etc.
- Should SDO 3.0 or just implementations address
these issues? - Load options (e.g., lax, strict)?
22Agenda
- Scenario
- XML Fidelity Issues
- Tolerance Issues
- SDO and XPath
- Type System
- Summary
23SDOPath and XPath
- SDO Path is basically a syntactic (but not
semantic) subset of XPath - SDOPathname.0 is the exception
- SDOPath has several limitations
- attribute vs. element
- mixed and simple text
- namespace qualification
- other XPath features (e.g. some standard
functions)? - Do we need both?
- Seems to be lots of discussion about where to use
XPath vs SDOPath (e.g., ChangeSummary) - Should SDO Path be a proper subset of XPath?
24Example mixed and simple text
- Given the following XML
- ltletterBodygt
- ltsalutationgtDear Mr.ltnamegtRobert
Smithlt/namegt.lt/salutationgt - Your order of ltquantitygt1lt/quantitygt
ltproductNamegtBaby - Monitorlt/productNamegt shipped from our warehouse
on - ltshipDategt1999-05-21lt/shipDategt. ....
- lt/letterBodygt
- Using a proper XPath expression, the string
shipped from out warehouse on in the above xml
can be accessed as follows - /letterBody/text()2
- Should SDOPath also support this?
25Agenda
- Scenario
- XML Fidelity Issues
- Tolerance Issues
- SDO and XPath
- Type System
- Summary
26SDO (2.1) Type System
- SDO should ideally provide a standard way of
accessing the remaining XSD metadata - Possibly add some additional properties (or
instance properties) to Type and Property
interfaces - Only for non XSD specific features
- Some metadata is XSD specific
- some XSD facet constraints
- content model (choice, sequence, all, mixed, etc)
- how many xsdanys and where they appear in the
sequence - xsdkey/xsdkeyref metadata
- xsdrestriction specifics
- etc.
27Example Appinfo on Model Group
- Annotation on sequence element
- ltxsdcomplexType name"T7"gt
- ltxsdsequencegt
- ltannotationgt lt! this is on the sequence
instead of the type --gt - ltappinfo source"http//www.example.com/meta
data/annotation"gt - ltp1example xmlnsp1"http//example"gt
- ltp1picturegtforest.jpglt/p1picturegt
- lt/p1examplegt
- lt/appinfogt
- lt/annotationgt
- ltxsdelement name"A" type"xsdstring"/gt
- ltxsdelement name"B" type"xsdstring"/gt
- lt/xsdsequencegt
- lt/xsdcomplexTypegt
- How do SDO users access this annotation?
28XSD Type System for SDO
- Observations
- Need to have a 1-1 mapping between XSD types and
SDO types - Must not force XML users to learn about SDO types
- Must provide full XSD fidelity (e.g., appinfo on
model group vs. type) - Should still try and square the circle, so that
non-XML users arent presented with all the
complexity of XSD types - Possible solution
- SDO 3 provides an API to access an extended XSD
type system some options - SchemaXXX interfaces from XMLBeans (more
abstract)? - XSModel from Xerces (less abstract)?
- Some new standard XSD model?
- Implementation dependent?
29Agenda
- Scenario
- XML Fidelity Issues
- Tolerance Issues
- SDO and XPath
- Type System
- Summary
30Summary
- Accessing XML (XSD defined data) is a very
significant SDO use case - SDO currently supports XML, but not well enough
- XML Schema (XSD) is the canonical definition of
the standard data structures being used in many
(most) industries - SDO needs to provide full support for XSD-defined
types to play in this environment - If SDO can support all of XSD/XML in a natural
way and also provide a simplified object-based
PM, we believe the industry will embrace SDO! - If not, then some other XML-based API will push
SDO aside - More information
- www.oasis-open.org/apps/org/workgroup/sdo/download
.php/26722/SDO_XML_Issues.doc