Title: Comparing RDMS to XML
1Comparing RDMS to XML
- Understanding the Differences
- David Plotkin
- Data Quality Manager
- Wells Fargo Consumer Credit Group
- David.Plotkin_at_WellsFargo.com
2What we will cover
- XML from a Data Practitioners standpoint
- Relational databases and schemas
- A comparison of relational and XML architecture
- What each technology is good at and when to use
it. - Relational metadata and XML metadata
- How to navigate a relational schema vs. an XML
schema - Gathering data (queries) in a database vs. an XML
document - Differences in flexibility between the two
- Cost considerations
- Data integrity considerations
- Sowhat do we do?
3XML A Data Practitioners View (1)
- XML is a specification for designing tag-based
languages. - The specification allows for
- Metadata (XML Schemas) that define
- Valid data structures
- Defining of user data types
- Valid lists of values, ranges, and patterns
- Optionality/cardinality
- Elements and Attributes
- Reusability of data, data types, and schemas
- Creation of instance documents based on an XML
Schema
4XML A Data Practitioners View (2)
- The potential exists for
- Industries to agree on an XML-based language for
data exchange. - Exchange of XML instance documents between
trading partners - An entire industry has grown up around
- Providing XML tools (and repositories)
- Ongoing development of standards
- XML has gained very wide acceptance!
5XML Content
- Here is an easy-to-understand sample of XML
"UTF-8"? "http//www.w3.org/2001/XMLSchema-instance"
xsinoNamespaceSchemaLocation
"PatientSearchResponse.xsd" Source
"CentralPatient" Target "Store042" MsgTypeCode
"PSRS" MsgTypeDesc "PatientSearchResponse"
FoundFlag "true" lds Maria
Montes 1951-11-05ate F 1969
Ygnacio Valley Road sWalnut Creek CAteCode 94597
109993345 entList
6The XML Document(1)
- An XML document is
- A self-contained set of structured (tagged)
data - Decoupled from the source systems
- Tagged data in the document does not change when
the source system data changes - The document represents a snapshot of the source
system data at a particular moment in time. - Packaged with the metadata (tags) for a specific
business transaction
7The XML Document (2)
- An XML Document has
- Semantic content
- Values (tagged data)
- Metadata (the tags)
- Order
- Hierarchy
- Structure
8The XML Schema
- An XML document is usually (but not always)
validated by an XML Schema. - The XML Schema provides the information on
whether the XML document followed the rules set
up in the XML Schema. - An XML Schema is an agreement between the sender
and the receiver of a document as to the
structure of that document.
9Elements vs. Attributes
- Elements
- Basic building blocks of XML
- Contain content which can be a structure
- Attributes
- Specify additional information about an element.
- Contain only simple type content
- Some data could be either an Element or an
Attribute (so you need standards on how to decide
which to use).
10Element Attribute in XML Schema
- Element and Attribute declaration
lexType "FirstName" minOccurs "0"/ "LastName" minOccurs "0"/ ref "Phone" minOccurs "0"/ ref "Birthdate" minOccurs "0"/ t ref "Gender" minOccurs "0"/ ref "StreetAddress" minOccurs
"0"/ minOccurs "0"/ "StateCode" minOccurs "0"/ "ZipCode" minOccurs "0"/ "SSN" minOccurs "0"/ "SafetyCapDate" type "xsddate"/ ce "required" type "xsdstring"/ name "Target" use "required" type
"xsdstring"/ "MsgTypeCode" use "required" type
"MsgTypeCodeType"/ "MsgTypeDesc" use "required" type
"xsdstring"/
11Element and Attribute XML Document
- Element and Attribute content
001/XMLSchema-instance" xsinoNamespaceSchemaLocat
ion "PatientSearchRequest.xsd" Source
"Store599" Target "CentralPatient" MsgTypeCode
"PSRQ" MsgTypeDesc "PatientSearchRequest" irstNameMaria MontesName 1951-11-05
F 1969 Ygnacio Valley
Road Walnut
Creek CA ipCode94597 561-88-9208 afetyCapDate2001-05-22 smission
12Simple data types in an XML Schema
- Comes with atomic simple data types
- Integer, boolean, date, decimal, string, etc.
- You can build user-defined simple data types
- Built on the included atomic data types
- Allows declaration of
- valid values, ranges, Patterns, Length, total
digits - And more
- Attributes or Elements can be of a simple data
type (either atomic or user-defined).
13Simple data type examples
restriction base "xsdinteger"
builds on atomic simple data type ts value "7"/ ype striction base "xsdstring" value "M"/ "F"/ on"RelationshipCodeType" "xsdstring" "self"/ "spouse"/ "dependent"/ "other"/ sdsimpleType name "SevenPlacePositiveInteger"
builds on custom simple data type
sdminInclusive value "0"/
14Complex data types in XML Schema
- Builds a structure of Elements.
- Each subelement is either a simple data type or
another structure of Elements. - Only Elements can be of a complex data type.
- Can be named and reusable or anonymous and used
only by a single Element. - Can be an extension or restriction of another
complex type.
15Complex data type examples
declaration of named complex data
type "StreetAddress"/ "CityAddress"/ "StateCode"/
"AddressType"/ association of Element with
named complex data type"AddressWithCountryType" new complex data type
extends existing complex data type ntent xsdsequence "CountryCode" type "xsdstring"/ nce sdcomplexType"PatientInsurance" element with anonymous
complex data type ce ent ref "TPMembership" minOccurs "0"
maxOccurs "unbounded"/ complexType
16Using the XML Schema
Source database
Target database
Data
Data
Extract program
Parse program
XML Document
Network
XML Schema
XML Schema
17Reusing XML Schemas
Statecode.xsd
- XML Schemas can build on each other to provide
reusability.
Base Definitions.xsd
Patient Search Request.xsd
Patient Search Response.xsd
Patient Update Request.xsd
"/
"/
"/
18An XML Schema example
"PatientID"/ "FirstName"/ "LastName"/ "Birthdate"/ "Gender"/ type "xsdboolean"/ "StreetAddress"/ "CityAddress"/ "StateCode"/ "ZipCode"/ minOccurs "0"/ minOccurs "0"/ "HIPAANotifInd" type "xsdboolean"/ ment name "SafetyCapInd" type
"xsdboolean"/ "SafetyCapDate" type "xsddate"/ t ref "StatusCode"/ "Doctor" minOccurs "0"/ "Phone" maxOccurs "unbounded"/ ref "PatDrugAllergy" minOccurs "0" maxOccurs
"unbounded"/ "OtherDrugTaken" minOccurs "0" maxOccurs
"unbounded"/ name "PrivacyInd" type "xsdboolean"/ complexType
19The Structure of an XML Schema
- Elements in an XML Schema are hierarchical.
- To expand the hierarchy with this tool (Tibcos
XML Authority), click here.
20Expanding an Element
This is the result you get you can now see the
elements that make up the structure of the the
OtherDrugTaken element.
21Managing XML Schemas
- Avoid chaos by managing XML metadata across the
Enterprise - Create reusable base definition schemas
- Create and document
- Widely used elements (with their attributes)
- Complex data types
- Simple data types
- Keep track of which schemas use other schemas
- Keep track of which documents are validated by
which schemas (XML repository).
22XML Architecture
This hierarchy is useful for starting with a
patient and finding all the information about
them such as a list of their prescriptions and
when the prescriptions were filled.
Patient
Insurance
Prescription
Doctor
Drug
Fills
Doctor
Drug
Claim
23XML Hierarchy Revisited
- The Hierarchy can change depending on what the
XML document is used for.
Prescription
Each version of the hierarchy serves a different
purpose
Drug
Fills
Doctor
Patient
This version of the hierarchy is useful for
starting with a Prescription and finding all the
information about it, including the Patient and
Drug.
Drug
Claim
24Database Architecture
- Database architecture is relational
- Normalized to eliminate data redundancy
- Join on any two columns that have the same data
type. - Foreign keys can enforce data integrity
25Relational Metadata the Schema
- Relational metadata is stored in the database
- Database control tables fully define the
structure of the database. - Without the DBMS metadata the contents of the
database are worthless. - Completely self-contained (not reusable)
- Tables are structured, each column is a bucket
for a specific kind of data - In most databases, the metadata does not include
descriptions, so a Data Dictionary is necessary.
26XML Metadata the Document
- Metadata built into the document
- Every element has a tag to tell you where the
data is stored in the document. - Descriptive tags give structure to the document
and tell you what the data means (sort of). - Sort of because it only tells the tag name, so
this only has meaning to someone who already
understands what the element or attribute means. - Document cannot be parsed for storage on its own.
What else is needed?
27XML Metadata the Schema
- An XML Schema (or DTD) is needed to
- Provide standardization (basis of agreement)
- Allow meaningful parsing and data storage
- Specify agreement on document structure
- A data dictionary is still necessary to provide
definition for Elements and Attributes - Without an XML Schema, a document is essentially
only good for transmitting blocks of data for
humans to read.
28Comparing XML to RDMS Metadata
- An XML Schema establishes the valid structure of
an XML document, like a database schema
establishes the valid structure of a database.
29What are relational databases good at?
- Data Storage
- General purpose data storage and retrieval
- Used for many purposes, such as queries and
analysis - Generalized view of data for shared use
- Ideally shared across business units or the
Enterprise - Works well to store the contents of an XML
document.
30What is XML good at?
- Data Exchange
- Exchange of data in a document
- Usually designed for a specific communication
- Works well to move data between databases
- Important when source or target database is
outside your firewall. - Usually dont have direct access to such
databases. - Works well (with style sheets or XSLT) to display
data on the web because browsers inherently are
designed to display documents.
31When do I use each technology?
- RDBMS
- Store data
- Query data
- Mine data
- Create generalized reports
- XML
- Transmit data
- Exchange data with outside agencies
- Replace flat files
- Create specific reports
- Which one?
- Use them together
- Each for its own purpose
- Build an infrastructure that
- Creates XML document (using XML schemas) from
database contents. - Parses XML documents to store their contents in a
database
32Relationships RDBMS
- RDBMS
- Explicit table to table if not declared by a
foreign key, it doesnt exist.
33Relationships XML
10988453 NameMaria Montes
1951-11-05 Fnder false ess1969 Ygnacio Valley Road yAddressWalnut Creek CA
94597 fIndtrue falsefetyCapInd 2000-01-01ate A ID45569009 925
5556964 deRS
43325564
2001-01-01
432678945
- XML
- Implied positioning
- Part-of relationship implied by positioning
- Parallel elements have same relationships as
sibling elements
34Querying XML vs. RDBMS
- Relational
- Easy to build queries by
- Navigating the database joins or
- Creating joins on any pair of tables that share
matching columns - XML
- Difficult to build queries because
- Must stick to the structure of the document
- Pointer-based navigation is restrictive
35Navigating in XML for Queries
- Must follow the structure of the document.
- Example DOM Node navigation
- getFirstChild
- getLastChild
- getPreviousSibling
- getNextSibling
- getParentNode
36Structure Flexibility Relational vs. XML
- Relational Rigid structure
- Lots of work to change the structure because of
foreign keys, Views, Stored procs, and triggers - Plus programs that assume the database looks a
certain way! - XML Flexible structure
- Easy to change the structure of documents to
account for new data, data type changes - Everything is text so it is easy to implement
- Or is it?
37XML Flexibility
- Changing the structure of an XML document is
easy. - Changing the XML Schema is easy too, but have you
considered that you must - Get agreement from business partners
- Change the programs that create documents from
the database and parse documents for the
database. - Version your XML Schemas if you want to be able
to open and parse historical documents (do you
keep them?).
38Data Integrity Relational vs. XML
- Both provide ways to control data types, ranges,
patterns, valid values, and optionality. - But XML does not provide data validation via
look-up as relational does with foreign keys. - However, this lack does make the XML Schema a
less rigid structure than a database schema.
39Cost Considerations
- People always ask which is cheaper to
implement? - XML is essentially all text, so it is easy to
create, change and manage. - XML doesnt need the infrastructure that an RDBMS
does. - XML tools are cheap (99!)
- But does this cost difference really matter?
- The two technologies do different things.
- You cant use one as replacement of the other.
- If you need it, you need it.
- So, XML is cheaper and it doesnt really matter.
40So what do we do?
- Do you remember the previous slide?
- The two technologies do different things
- They complement each other
- RDBMS stores data in a generally usable way
- XML exchanges data in a specific way is easy to
display on the web. - They both present challenges for metadata
management. - Sowe use them both for the appropriate purpose.
41Thank you!