Title: DFAS XML Best Practices Version 1.0
1DFAS XML Best PracticesVersion 1.0
Defense Finance and Accounting Service
2Introduction
- The best practices contained herein
- have been realized during two years of research
and development efforts by the DFAS Data
Architecture (DFAS-DTB) XML team - are the result of a learning by trial-and-error
approach - have collaborated with other XML best practices
developed by government and industry groups - are not policies
3DFAS XML Team
- Mike Lubash, DFAS Data Architecture
- Nauman Malik, XMLCG
- Bruce Peat, eProcessSolutions
- Kit Lueder, Mitre
- Charlie Clark, EMI
4Short-Term versus Long-Term
- Problem What is the best way to manage
deployment of XML at the enterprise level? - Solution Develop and deploy over time, with
short-term and long-term solutions (as explained
on the next slide) - Consequences
- Pros
- Effective approach for an evolutionary process
- Cons
- Limitations will exist in the short-term
solutions (addressed in long-term) - Registry will cost money and require a mature
infrastructure in place (which currently doesnt
exist)
5Short-Term versus Long-Term
6XML Validation Levels
- Problem At what levels should XML documents be
checked? - Solution Four levels of XML checking (as
explained on the next slide) - Business requirements of each organization will
dictate level of checking - Consequences
- Pros
- Business requirements dictate the level of
checking - Business requirements dictate the resources
allocated to checking - Cons
- Potential for errors exist at lower levels of
checking
7XML Validation Levels
8Why XML Schema
- Problem How do we validate data, create and
enforce structure, communicate and collaborate
with trading partners, capture basic metadata,
etc.? - Solution W3Cs XML Schema
- Business centric methodology dictates the use of
open, non-proprietary standards - Consequences Next Slide
- Alternatives
- Not recommended by W3C
- DTD
- RELAX
- Schematron
- SOX
- XDR
9 Pros Requirements met by XML Schema
- As DFASs preferred mechanism for managing our
information assets (information resource), XML
Schema is used - to use open standards, such as those from the W3C
- to validate data
- to establish and communicate our XML accounting
business vocabulary and model - to establish a mechanism of collaboration
- to create reusable components (via datatypes) for
heterogeneous environments spanning multiple
trading partners - to encapsulate document structure
- to capture structure, optionality, cardinality,
enumerated code lists, etc. - to aid precise communication among our technical,
functional and customer stakeholders to deliver
value to our customers - in the short term, to capture basic metadata
10Cons Limitations of and supplements to XML
Schema
- Limitations of and requirements unmet with XML
Schema - Forces early commitment to tag names
- Does not allow IF-THEN logic
- Does not allow extensions of enumerated lists
- Does not allow value pairing (multi-fields), e.g.
21 Dept. of Army - Lacks formal mechanisms for defining
- metadata
- business rules
- context
- constraints
- code lists
- Therefore, in the long-term, XML Schema will be
supplemented with possible additional mechanisms
such as - Business centric methodology
- AssemblyDocs / OASIS TC - Content Assembly
Mechanism (CAM) - A registry
11UID
- Problem How do we uniquely identify DFAS XML
artifacts? - Solution UIDs
- Our UID will be in the form discussion on next
slide - Steward.ArtifactName.Version.FileType
- For example
- DFAS.USSGLAccountType.2002-12-17.xsd
- UID file name
- Not specified for instance documents
- Consequences
- Pros
- Business friendly and technically identifiable
- Cons
- This approach to constructing UIDs is uncommon in
industry random generation is more common - There are competing methods for creating UIDs
- But they do not all aid business communication
(technical implementations)
12UID Components
- Steward
- Registration authority that controls the UID to
assure there are no conflicts - For artifacts produced at DFAS, ltStewardgt is
simply set to DFAS - Reference ltdcpublishergt in Dublin Core Element
Set v1.1 - ArtifactName
- Name of the quasi root, for example,
USSGLAccountType - Version
- Date of creation or last modification, for
example, 2002-12-17 - Reference ltdcdategt in Dublin Core Element Set
v1.1 - FileType
- Internet Media or Mime types, for example, xml,
xsl, xsd, dtd, etc. - Reference ltdcformatgt in Dublin Core Element Set
v1.1
13The Role of Files
- Problem What is the optimal vehicle and/or
storage medium for XML artifacts? - Solution Physical flat files, each of which
contains exactly one quasi root XML artifact
and possibly other dependent XML artifacts - The UID of the quasi root artifact equals the
filename - Consequences
- Pros
- ease of configuration management
- discreteness
- simple design
- compactness of size
- allows correspondence between filename and UID
efficient cross-referencing mechanism - Cons
- many files to manage
- inclusion list in assemblies and/or transaction
schemas can get long - tools dont easily generate documentation for
multiple discrete files easily
14Types of XML Schema Artifacts
- Problem What are the various types of XML
Schema artifacts? - Solution Following types (which are also valid
values for the EMS element ltdctypegt) - SelfContained
- Artifacts that are not dependent on the import or
inclusion of any external resources - CodeList
- Artifacts that contain a code list of domain
values in enumerated list form - Do not contain a version in their file name
otherwise, updating could cause a chain reaction
of failures in including artifacts. - Assembly
- Comprised of one or more artifacts, to include
SelfContaineds, CodeLists, and other Assemblies - Transaction
- Defines the root element of an XML instance
document - Includes and/or imports one or more artifacts, to
include SelfContaineds, CodeLists, Assemblies,
and other Transactions
15Types of XML Schema Artifacts (continued)
- Consequences
- Pros
- Easy identification and grouping of type of
artifact - Delineates roles for personnel responsible for
developing artifacts - Cons
- Incomplete list based on current requirements
(updating expected in future) - No precedence available for this type of work
- Choices of types can be considered arbitrary by
outside bodies
16elementFormDefault
- Problem What should the value of the XML Schema
attribute elementFormDefault be set to in
transaction schemas? - Solution
- elementFormDefaultqualified
- This will necessitate that instance documents
contain elements that are qualified via prefixes
or default namespaces (see discussion on
namespaces in later slides) - This attribute is not to be used in
non-transaction schema artifacts - Consequences
- Pros
- Identifies elements contained inside DFAS
instance documents as DFAS-owned - Cons
- Bulk added to instance documents by added
prefixes, unless DFAS namespaces is defaulted
(recommended approach whenever possible)
17attributeFormDefault
- Problem What should the value of the XML Schema
attribute attributeFormDefault be set to in
transaction schemas? - Solution
- This attribute should not be included in any
schemas - By inaction, the default value (attributeFormDefau
ltunqualified) will be chosen, which is the
desired result - Consequences
- Pros
- Alleviates developers from having to concern
themselves with this attribute, implications of
which are minimal to none - Cons
- None
18Exampleof file usage and inclusion
lt?xml version''1.0'' encoding''UTF-8''?gt
ltxsschema targetNamespace''http//www.dfas.mil/
DFAS'' xmlnsxs''http//www.w3.org/2001/XMLSchem
a''gt ltxssimpleType name''CustodialNoncustodialC
odeType"gt ltxsrestriction base"xsstring"gt
ltxsenumeration value"S"/gt
ltxsenumeration value"A"/gt
lt/xsrestrictiongt lt/xssimpleTypegt lt/xsschemagt
CodeList Artifact
DFAS.CustodialNoncustodialCodeType.2002-08-27.xsd
lt?xml version"1.0" encoding"UTF-8"?gt ltxsschema
targetNamespace"http//www.dfas.mil/DFAS"
xmlnsxs"http//www.w3.org/2001/XMLSchema"
xmlns"http//www.dfas.mil/DFAS"gt lt!-- Begin
import of reusable components --gt ltxsinclude
schemaLocation"DFAS.BudgetSubfunctionCodeType.200
2-08-27.xsd"/gt ltxsinclude schemaLocation"DFAS.C
ustodialNoncustodialCodeType.2002-08-27.xsd"/gt ltx
sinclude schemaLocation"DFAS.TradingPartnerCodeT
ype.2002-08-27.xsd"/gt lt!-- End import of
reusable components --gt ltxselement
name"FactsATB"gt ltxscomplexTypegt
ltxssequencegt ltxselement
name"ATBAccountDetails" maxOccurs"unbounded"gt
ltxscomplexTypegt ltxsallgt
ltxselement name"CustodialNoncustod
ialCode" type"CustodialNoncustodialCodeType"
minOccurs"0"/gt ltxselement
name"BudgetSubfunctionCode" type"BudgetSubfuncti
onCodeType" minOccurs"0"/gt
ltxselement name"TradingPartnerCode"
type"TradingPartnerCodeType" minOccurs"0"/gt
lt/xsallgt lt/xscomplexTypegt
lt/xselementgt lt/xssequencegt lt/xscomplex
Typegt lt/xselementgt lt/xsschemagt
DFAS.ATB.2002-08-27.xsd
Transaction Schema
19Attributes versus Elements
- Problem What are guidelines to help developers
determine whether to use elements or attributes
to store data? - Solution Keep the guidelines simple (rules of
thumb) and avoid making official policy in this
area - Key guideline The number of attributes SHOULD
be minimized - Some additional guidelines are
- Attributes can be used to provide additional
metadata required to better understand the
business value of an element - Attributes can only be used to describe
information units that cannot or will not be
further extended, or subdivided (elements are
better suited for this) - Consequences
- Pros
- Assistance provided in an area of potential
confusion for XML developers - Cons
- Can potentially limit developer creativity
20Reusable XML Schema ArtifactsElements, Named
Datatypes, Named Groups
- Problem What are the guidelines to help
developers determine whether to use elements,
named datatypes, or named groups when creating
XML Schema artifacts? - Solution Keep the guidelines simple (rules of
thumb) and to avoid making policy in this area - Guidelines follow on next 2 slides
- Consequences
- Pros
- Assistance provided in an area of potential
confusion for XML developers - Cons
- Can potentially limit developer creativity
21Reusable XML Schema Artifacts Guidelines
- If the hiding of the namespace of elements in
instance documents is important, use named
datatypes - Named datatypes may be instantiated in the form
of either elements or attributes - Element versus attribute decision can be delayed
- If it is important to not have the container
element show up in the instance document, use
named groups - When in doubt, make it a named datatype
- Growing industry trend (X12, HR-XML, OASIS, etc.)
- Limit the number of successive derivations of a
named datatype (by extension or restriction)
22Reusable XML Schema Artifacts Guidelines
(continued)
- An alternative to named datatypes and named
groups is elements that bind to anonymous types - The elements can then be referenced using the
xsref attribute - This approach is increasingly against industry
convention - Pros and cons discussion can be referenced at
this website http//www.xfront.org (Best
Practices) - Design Approaches (Roger Costello, Mitre
http//www.xfront.org) - Russian Doll, Salami Slice and Venetian Blind
- Simplicity via simpleType
- The Embedded Metadata Section has largely
eliminated the need for many attributes - For example UID, version, DoDClassWord
- However, the ability is lost to extract
information from attributes using standard APIs
such as SAX/DOM RDF APIs can make up for that
loss - For SelfContaineds, complexTypes are rarely
called for (unless a depth of hierarchy is
needed, which is usually the case with Assemblies)
DFAS preferred choice - Declare named
datatypes and bind elements to them as needed
23Substitution Groups
- Problem How do we implement generic structures
that allow for interchangeable SelfContaineds and
Assemblies? - Solution Substitution Groups
- Consequences
- Pros
- Can provide plug-n-play capability for reusable
components - Support in some industry implementations
- Cons
- Not to be used for aliases
- Wide use for this purpose is a key indicator that
the business semantic issues are not being
addressed and a technical workaround has been
pursued - Inside ltallgt model groups, substitution groups
can cause problems - Since maxOccurs1 automatically inside ltallgt
groups, if both the element and its substitute
needs to be used, eg, DepartmentCode and
TradingPartnerCode, a validation error will
result
24Namespaces
- Problem What are the issues surrounding XML
namespaces and what guidelines should be
followed? - Solution
- Conflict resolution and collision of names is
best handled by business adjudication and not
technical workarounds - Guidelines regarding namespaces are on the next 2
slides - Consequences
- Pros
- Namespaces allow for collections of XML Schema
components - Namespaces allow for disambiguation among XML
Schema components - Namespaces allow for easy identification of
collections of XML Schema components - Cons
- Internally to the organization, proliferation of
namespaces can encourage stove-pipe mentality
instead of collaborative development - Namespaces can be cryptic and difficult to
understand - Namespaces are presently not handled consistently
by XML parsers
25Guidelines Namespaces(targetNamespace)
- We encourage that all XML Schemas specific to an
organization have the same targetNamespace - http//www.dfas.mil/DFAS is the namespace that
is to be used for all DFAS XML Schema artifacts - Situations where targetNamespace of DFAS
artifacts is not http//www.dfas.mil/DFAS - If the artifact is promoted to the Enterprise
namespace in the DoD XML Registry - If the artifact is established as a general
artifact (not specific to any particular domain)
26Guidelines Namespaces(default namespaces)
- Use of the default namespace is discouraged
- if used, however, it is recommended that it be
set to the targetNamespace - In general, a default namespace in XML Schemas
can potentially cause problems when including
schemas have a different default namespace than
the included schemas therefore, strong caution
is advised - The default namespace should not be set to the
XML Schema specifications namespace (i.e.
http//www.w3.org/2001/XMLSchema) - If this guideline is ignored, problems with
collisions can arise if the including XSDs
targetNamespace overrides the included XSDs
default namespace (which is set to XML Schemas)
the composite document will then not parse
- the recommended prefix for the XML Schema
namespace is xs, as in xmlnsxshttp//www.w3.o
rg/2001/XMLSchema
27Guidelines Namespaces(general)
- Declare all namespace on the root element
- Do not use more than one prefix per namespace per
XML document - Inside instance documents
- All elements should be qualified via a default or
prefixed namespace - Default namespaces do not pose the threat inside
instance documents as they do in schemas, so
their use is left to the discretion of the
developer - That namespace will be http//www.dfas.mil/DFAS
to correlate with the targetNamespace of all DFAS
transaction schemas - For example
- ltFactsATB xmlns"http//www.dfas.mil/DFAS"
xmlnsxsi"http//www.w3.org/2001/XMLSchema-instan
ce" xsischemaLocation"http//www.dfas.mil/DFAS
DFAS.FactsATB.2002-11-22.xsd"gt
28Configuration Management Artifact Versioning
- Problem How should artifacts be versioned?
- Solution
- Version numbers of artifacts will be based on the
creation (or last modification) date and will be
in the form YYYY-MM-DD - For multiple releases in one day, a lower-cased
letter will be appended in alphabetical order
(handles up to 27 releases on any given day) - For example
- 1st release 2002-09-09, 2nd release
2002-09-09a, 3rd release 2002-09-09b, etc. - The version number will be placed both
- inside the physical filename of the artifact
(does not apply to CodeLists) - and inside the Embedded Metadata Section of the
containing XSD in an element called ltdcdategt - Consequences
- Pros
- The date readily and simply identifies the
version of the artifact - Scheme allows for each artifact to be versioned
sperately - Cons
- Doesnt seem to be a commonplace method of
versioning - It may be hard to keep track of the version of
each different low-level element
29Configuration Management Instance Document and
Schemas
- Problem How should instance documents be
associated with schemas? - Solution
- Instance documents are versioned by their schemas
- The structure of the instance documents is
versioned, not their content - One of the following options is chosen
- (1) Validation option a schema URL is placed in
the instance document parser will invoke
validation and linkage is documented - Examples
- ltFactsATB xmlnshttp//www.dfas.mil/DFAS
xmlnsxsi"http//www.w3.org/2001/XMLSchema-instan
ce" xsischemaLocationhttp//www.dfas.mil/DFAS
DFAS.USSGLAccountType.2002-08-13.xsdgt - ltFactsATB xmlnsxsi"http//www.w3.org/2001/XMLSch
ema-instance" xsinoNamespaceSchemaLocation"DFAS.
USSGLAccountType.2002-08-13.xsdgt - (2) No validation option (but XSD available) an
attribute called citation which contains the
filename and the location of the instance
document's corresponding schema validation will
not be invoked by the parser, however, linkage is
documented - Example
- ltFactsATB citation"http//www.disa.mil/DFAS.USSG
LAccountType.2002-08-13.xsdgt - (3) Non XML Schema option the version of the
alternate schema method is cited - Example
- ltFactsATB citation2002-08-13gt
30Configuration Management Instance Document and
Schemas (continued)
- Consequences
- Pros
- A creative approach to versioning instance
documents - ties them to their respective schemas - Cons
- Doesnt seem to be a commonplace method of
versioning
31Naming of Tags
- Problem How should XML tags be named and what
are the surrounding issues? - Solution
- Guidelines on the next 5 slides
- Consequences
- Pros
- Consistent approach to naming tags
- Cons
- Difficult to enforce guidelines
32Guidelines Naming of Tags
- All XML tag names should fully exploit the
inherent hierarchical structure of XML, thus
reducing redundancy of terms in the tag name and
allowing for tag reuse. For example - ltUSSGLDetailsgt
- ltAccountNumberCodegt1110lt/AccountNumberCodegt
- ltDebitCreditCodegtDlt/DebitCreditCodegt
- ltCustodialNoncustodialCodegtAlt/CustodialNoncustod
ialCodegt - ltFederalNonfederalCodegtNlt/FederalNonfederalCode
gt - ltAmountgt21159598.29lt/Amountgt
- lt/USSGLDetailsgt
- is preferred over
- ltUSSGLgt
- ltUSSGLAccountNumberCodegt1110lt/USSGLAccountNumber
Codegt - ltUSSGLDebitCreditCodegtDlt/USSGLDebitCreditCodegt
- ltUSSGLCustodialNoncustodialCodegtAlt/USSGLCustodia
lNoncustodialCodegt - ltUSSGLFederalNonfederalCodegtNlt/USSGLFederalNonfe
deralCode gt - ltUSSGLAmountgt21159598.29lt/USSGLAmountgt
- lt/USSGLgt
33Guidelines Naming of Tags (continued)
- All XML tag names should align with commonly used
business terms, including - Registration of business acronyms prior to use in
accordance with DFAS Extensible Markup Language
Registration Policy, e.g. DFAS Defense Finance
and Accounting Service, DoD Department of
Defense. - Use of abbreviations as registered, e.g. if
Dept was registered as a short business term
for department, then ltDept/gt is preferred over
ltDepartment/gt - The tag name shall be in singular form unless the
word exists in plural form only. E.g. for
singular ltAccount/gt, not ltAccountsgt, for plural
ltScissors/gt
34Guidelines Naming of Tags (continued)
- For collections of the same item, the tag name
must end with List. ltUSSGLAccountNumberListgt
for a generic listing of accounts. For example - ltUSSGLAccountNumberListgt
- ltUSSGLAccountNumbergt1010lt/USSGLAccountNumbergt
- ltUSSGLAccountNumbergt1110lt/USSGLAccountNumbergt
- ltUSSGLAccountNumbergt1310lt/USSGLAccountNumbergt
- ltUSSGLAccountNumbergt1520lt/USSGLAccountNumbergt
- lt/USSGLAccountNumberList gt
- Exception If the collection is properly named
and has a specific, registered business meaning,
e.g. United States Standard General Ledger Chart
of Accounts, then use ltUSSGLChartOfAccounts/gt
instead of ltAccountList/gt.
35Guidelines Naming of Tags (continued)
- In order to enforce a consistent capitalization
and naming convention across all newly created
DFAS XML, "Upper Camel Case" (UCC) and "Lower
Camel Case" (LCC) capitalization styles is
preferred. UCC style capitalizes the first
character of each word and compounds the name.
LCC style capitalizes the first character of each
word except the first word. To date, there
exists no public standard for this convention.
These rules do not apply to XML created at DFAS
prior to the creation of this guideline nor does
it demand modification of externally created XML,
such as industry consortia XML, for example,
HR-XML, XBRL, etc. - It is preferred that XML element names use the
UCC convention, for example ltAnnualReportgt). - It is preferred that XML attribute names use the
LCC convention, for example ltAnnualReport
fiscalYear2001gt - It is preferred that XML named datatypes use the
UCC convention, for example ltxscomplexType
nameFiscalYearTypegt - It is preferred that XML named groups use the UCC
convention, for example ltxsgroup
nameFACTSAccountsGroupgt
36Guidelines Naming of Tags (continued)
- Where acronyms are used, the capitalization shall
remain for elements and attributes, for example
ltDFASGuidelines/gt. - Note that this is an exception to the previously
discussed rule concerning word boundaries in UCC
and LCC - Underscore (_), periods (.) and dashes (-) should
not be used for word boundaries - Don't use ltHeader.Manifest/gt, ltStock_Quote_5/gt,
ltCommercial-Transaction/gt - Use ltHeaderManifest/gt, ltStockQuote5/gt,
ltCommercialTransaction/gt instead - Tag names should be concise but not at the
expense of expressiveness.
37Naming of Datatypes and Groups
- Problem How should datatypes and groups be
named? - Solution
- The name of the datatype will end in Type (even
if the business term ends in Type) - For example USSGLAccountType
- The name of the group will end in Group (even
if the business term ends in Group) - For example USSGLAccountGroup
- Consequences
- Pros
- Consistent approach to naming
- Allows for disambiguation from other major XML
Schema components such as elements, attributes,
datatypes, groups, etc. - Leads to ease of recognition and identification
- Cons
- Difficult to enforce guidelines
38Multi-field Approaches(XML Schemas)
- Problem How should multi-fields be handled in
XML Schemas? - Solution
- Short-term approach
- Make use of XML Schema enumeration mechanism to
capture code lists - Use Dublin Core / RDF metadata for capturing
relationships or mappings to other code list
values - Long-term approach
- Involves the use of a registry
- Consequences (of short-term approach)
- Pros
- Makes use of currently available technology
- Cons
- Can easily get out of date configuration
management issues
39Multi-field Approaches(Instance Documents)
- Problem How should multi-fields be handled in
instance documents? - Solution
- Preferred approach
- ltOrganizationgt
- ltIDNumber code34 IDSourceDNBgt10-495
-9618lt/IDNumbergt - lt/Organizationgt
- Consequences
- Pros
- Makes use of currently available technology
- Cons
- Relationship is artificially tied between
element, attributes, and content
40Multi-field Approaches(Instance Documents)
(continued)
- Alternatives approaches
- ltOrganizationgt
- ltDNBgt10-495-9618lt/DNBgt
- lt/Organizationgt
- or
- ltOrganizationgt
- ltIDCodegt34lt/IDCodegt
- ltIDSourcegtDNBlt/IDSourcegt
- ltIDNumbergt10-495-9618lt/IDNumbergt
- lt/Organizationgt
41XML Schema Content Models
- Problem What are the recommendations concerning
XML Schema content models? - Solution
- Recommendations for content models
- Mixed
- No for data
- Yes for documents
- Any - Yes (future expansion usage)
- Trading Partner data specific
- Recursive - Use with caution
- Consequences
- Pros
- Mixed, Any, and Recursive content models allow
for specification of numerous data structures - Cons
- All 3 can potentially lead to data management
nightmares caution is advised, especially for
recursive models
42Default / Fixed Values
- Problem What are the recommendations concerning
default and fixed values for element and
attributes? - Solution
- Recommendations
- Use as needed
- Consequences
- Pros
- Can be used for elements or attributes
- Can potentially simplify the instance document by
shifting burden to its schema - Cons
- Fixed values can be likened to hard-coding data
values, a practice unpopular in software
engineering
43DoD XML RegistrySuggested Refinements
- Creation of Named Datatypes and Named Groups
categories - Searchable aliases
- For ease of reuse, XSD format should be used for
- elements
- attributes
- code lists / domain values
- Configuration Management
44Thank you!
- mike.lubash_at_dfas.mil
- amalik_at_xmlcg.com
- kit_at_mitre.org
- peat_at_erols.com
- charlie.clark_at_dfas.mil