Understanding XML and Its Impact on the Enterprise - PowerPoint PPT Presentation

1 / 64
About This Presentation
Title:

Understanding XML and Its Impact on the Enterprise

Description:

Why XML is the cornerstone of the Semantic Web ... product id='P02' title='RC Racer' price='89.99' category toys /category ... – PowerPoint PPT presentation

Number of Views:66
Avg rating:3.0/5.0
Slides: 65
Provided by: Cla50
Category:

less

Transcript and Presenter's Notes

Title: Understanding XML and Its Impact on the Enterprise


1
Understanding XML and Its Impact on the Enterprise
2
Outline
  • Why XML is the cornerstone of the Semantic Web
  • Why XML has achieved widespread adoption and
    continues to expand to new areas of information
    processing
  • How XML works and the mechanics of related
    standards like namespaces and XML schema
  • The impact of XML on the enterprise
  • Why XML itself is not enough

3
Introduction
  • Currently, primary use of XML is for data
    exchange between internal and external
    organizations (interoperability)
  • XML may become the primary syntax for all
    enterprise data
  • As XQuery and XML Schema achieve greater maturity
    and adoption

4
Why Is XML a Success?
  • XML creates application-independent documents and
    data
  • It has a standard syntax for meta data
  • It has a standard structure for both document and
    data
  • XML is not a new technology (not a 1.0 release)

5
XML creates application-independent documents and
data
  • Plaintext in human-readable form
  • Binary formats lock into applications for the
    life of your data
  • Encoding XML as text allows any program to open
    and read the file
  • By using an open, standard syntax and verbose
    descriptions of the meaning of data, XML is
    readable and understandable by everyone not
    just the application and person that produced
  • Critical underpinning of the Semantic Web,
    because you cannot predict the variety of
    software agents and systems that will need to
    consume data on Web
  • XML can be searched as easily as Web pages

6
XML has a standard syntax for meta data
  • Meta data data about data (meaning of data
    values)
  • Data is the raw context-specific value and the
    meta data denotes the meaning or purpose of those
    values
  • XML standardizes a simple, text-based method for
    encoding meta data
  • XML provides a simple yet robust mechanism for
    encoding semantic information, or the meaning of
    data

7
Comparing Data to Meta Data
ltNamegt Joe Smith lt/NamegtltAddressgt222 Happy
Lanelt/AddressgtltCitygtSierra Vistalt/CitygtltStategtAZ
lt/StategtltZipgt85645lt/Zipgt
8
XML has a standard structure for both documents
and data
  • XML standardize a structure suitable to express
    semantic information for both documents and data
    fields
  • The structure XML uses is a hierarchy or tree
    structure
  • Allow the user to decompose a concept into its
    component parts in a recursive manner

9
person.xml
  • ltpersongt
  • ltnamegt
  • ltfirst_namegtAlanlt/first_namegt
  • ltlast_namegtTuringlt/last_namegt
  • lt/namegt
  • ltprofessiongtComputer Scientistlt/professiongt
  • ltprofessiongtMathematicianlt/professiongt
  • ltprofessiongtCryptographerlt/professiongt
  • lt/persongt

10
A Tree Diagram for person.xml
11
XML is not a new technology
  • SGML (Standardized Generalized Markup Language)
  • Invented in 1969
  • XML is an abbreviated version of SGML
  • Omit more complex and less-used parts of SGML
  • Easier to define new document types
  • Easier to write program to handle XML documents
  • More suited to delivery and interoperability over
    WWW
  • XML is more SGML-- rather than HTML
  • XML is SGML for the Web

12
What is XML?
  • XML eXtensible Markup Language
  • XML is NOT a set of tags that you can apply to
    documents
  • XML is a a set of syntax rules for the creating
    semantically rich markup languages in a
    particular domain (eXtensible)
  • XML does not define the tag (element) names YOU
    DO
  • XML is NOT a programming language like C
  • XML is NOT a network transport protocol like
    HTTP, FTP
  • XML is NOT a database
  • A database can contain XML data, but the database
    itself is not an XML document
  • You can store XML data into a database or
    retrieve XML data from a database, but you need
    to run software written in a real programming
    language such as C and Java

13
Why Should Documents Be Well-Formed and Valid?
  • A well-formed XML document complies with all the
    W3C syntax rules of XML like naming, nesting, and
    attribute quoting
  • Guarantee that an XML processor can parse the
    document (break into identifiable components)
    without error
  • A valid XML document references and satisfies a
    schema
  • A schema is a separate document whose purpose is
    to define the legal elements, attributes, and
    structure of an XML instance document. Ex. DTD,
    XML Schema
  • Think of a schema as defining the legal
    vocabulary, number, and placement of elements and
    attributes in your markup language
  • A schema defines a particular type or class of
    documents

14
Why Should Documents Be Well-Formed and Valid?
  • W3C-compliant XML processors check for
    well-formedness but may not check for validity
  • Validation is often a feature that can be turned
    on or off in an XML parser
  • Validation is time-consuming and not always
    necessary
  • It is generally best to perform validation either
    as part of document creation or immediately after
    creation

15
XML NameSpace
  • http//www.w3.org/TR/REC-xml-names

16
Motivation
  • XML is extensible
  • But, extensibility does not come free
  • Extensibility must be managed to avoid conflicts
  • Namespaces is a solution to help manage XML
    extensibility
  • Example two people extend the same document in
    incompatible ways
  • bookmark.xml
  • star_rating.xml
  • pa_rating.xml
  • star_pa_rating.xml

17
Motivation (Cont.)
  • Problems in star_pa_rating.xml
  • Software designed to operate with PA rating would
    be lost
  • How to do with 4 stars rating?
  • Ignore?
  • No way to differentiate PA rating with Star
    rating
  • Brute-force solution Use different names for the
    two rating
  • qa_pa_rating.xml

18
Motivation (Cont.)
  • Documents that contain multiple markup (meta
    data) vocabularies pose problems of recognition
    and collision.
  • Software modules need to be able to recognize the
    tags and attributes which they are designed to
    process, even in the face of "collisions"
    occurring when markup intended for some other
    software package uses the same element type or
    attribute name.
  • These considerations require that document
    constructs should have universal names, whose
    scope extends beyond their containing document.
    This specification describes a mechanism, XML
    namespaces, which accomplishes this.

19
Declaration
  • ltelement xmlnsprefixnamespace_urigt
  • lttitle xmlnsdchttp//purl.org/dcgt
  • Default namespace
  • ltelement xmlnsnamespace_urigt
  • lttitle xmlnshttp//purl.org/dcgt
  • Example namespace_bookmark.xml
  • ltqaratinggt5 stars lt/qaratinggt
  • A prefix is added before each element name
  • A colon separates the name and the prefix
  • The prefixes of default namespace can be omitted.

URI is unique!!!
20
Definition
  • An XML namespace is a collection of names,
    identified by a URI reference, which are used in
    XML documents as element types and attribute
    names.
  • URI references which identify namespaces are
    considered identical when they are exactly the
    same character-for-character.
  • Note that URI references which are not identical
    in this sense may in fact be functionally
    equivalent.
  • Examples include URI references which differ only
    in case, or which are in external entities which
    have different effective base URIs.

21
Names from XML NameSpaces
  • Names from XML namespaces may appear as qualified
    names, which contain a single colon, separating
    the name into a namespace prefix and a local
    part.
  • The prefix, which is mapped to a URI reference,
    selects a namespace.
  • The combination of the universally managed URI
    namespace and the document's own namespace
    produces identifiers that are universally unique.
  • Mechanisms are provided for prefix scoping and
    defaulting.
  • An attribute-based syntax described below is used
    to declare the association of the namespace
    prefix with a URI reference
  • Software which supports this namespace proposal
    must recognize and act on these declarations and
    prefixes.

22
Namespaces and Schemas
  • Namespaces are not fully compatible with DTDs
  • The current markup definition languages, like XML
    Schema, fully support namespaces
  • ltxsdschema xmlnsxsd"http//www.w3.org/2001/XMLS
    chema"gt

23
XML Schema
  • http//www.w3c.org/XML/Schema

24
XML Schema
  • A definition language that enables you to
    constrain conforming XML documents to a specific
    vocabulary and a specific hierarchical structure
  • Element types, attribute types, complex types
  • Two types of documents a schema document and
    multiple instance documents that conform to the
    schema
  • A schema definition is a blueprint (template) of
    a type and each instance is an incarnation of
    that template
  • Two roles that a schema can play
  • Template for a form generator to generate
    instances of a document type
  • Validator to ensure the accuracy of documents

25
Schema and Instances
26
XML Schema (Cont.)
  • Both the schema document and the instance
    document use XML syntax (tags, elements, and
    attributes)
  • Each instance document must declare which schema
    it adhere to
  • Use a special attribute attached to the root
    element called"xsinoNamespaceSchemaLocation"
    or"xsischemaLocation"
  • Depend on whether your vocabulary is defined in
    the context of a namespace
  • XML Schemas allow validation of instances to
    ensure the accuracy of field values and document
    structure at the time of creation
  • Field types, legal element and attribute names,
    correct number of children, and required
    attributes

27
What do Schemas Look Like?
  • An XML Schema uses XML syntax to declare a set of
    simple and complex type declarations
  • A type is named template that can hold one or
    more values
  • Simple types hold one value
  • Complex types are composed of multiple simple
    types
  • A type has two key characteristics a name and a
    legal set of values
  • Simple type an element declaration that includes
    its name and value constraints
  • ltxsdelement name"author" type"xsdstring" /gt
  • ltauthorgt Mike Daconta lt/authorgt

28
Common XML Schema Primitive Data Types
You can define custom data types
29
Define Custom Data Types
ltxsdsimpleType name"skuType"gt
ltxsdrestriction base"xsdstring"gt
ltxsdpattern value"\d 3 -A-Z 2 "/gt
lt/xsdrestrictiongt lt/xsdsimpleTypegt
ltxsdsimpleType name"stateType"gt
ltxsdrestriction base"xsdstring"gt
ltxsdenumeration value"AK"/gt
ltxsdenumeration value"AL"/gt
ltxsdenumeration value"AR"/gt ...
lt/xsdrestrictiongt lt/xsdsimpleTypegt

ltxsdsimpleType name"poIdType"gt
ltxsdrestriction base"xsdinteger"gt
ltxsdminExclusive value"10000"/gt
ltxsdmaxExclusive value"100000"/gt
lt/xsdrestrictiongt lt/xsdsimpleTypegt
30
What do Schemas Look Like? (Cont.)
  • Complex type an element that either contains
    other elements or has attached attributes
  • ltxsdelement name"book" ltxsdcomplexTypegt
    ltxsdattribute name"title" type"xsdstring" /gt
    ltxsdattribute name"pages" type"xsdint" /gt
    lt/xsdcomplexTypegtlt/xsdelementgt
  • ltbook title"More Java Pitfalls" page"453" /gt

31
What do Schemas Look Like? (Cont.)
  • Another example of Complex type
  • ltxsdelement name"product" ltxsdcomplexTypegt
    ltxsdsequencegt ltxsdelement
    name"description" type"xsdstring"
    minOccurs"0" maxOccurs"1" /gt ltxsdelement
    name"category" type"xsdstring"
    minOccurs"1" maxOccurs"unbounded" /gt lt/xsd
    sequencegt ltxsdattribute name"id"
    type"xsdID" /gt ltxsdattribute name"title"
    type"xsdstring" /gt ltxsdattribute
    name"price" type"xsddecimal" /gt
    lt/xsdcomplexTypegtlt/xsdelementgt

32
What do Schemas Look Like? (Cont.)
  • ltproduct id"P01" title"Wonder Teddy"
    price"49.99"gt ltdescriptiongt The best selling
    teddy bear of the year. lt/descriptiongt
    ltcategorygt toys lt/categorygt ltcategorygt stuffed
    animals lt/categorygtlt/productgt
  • ltproduct id"P02" title"RC Racer"
    price"89.99"gt ltcategorygt toys lt/categorygt
    ltcategorygt electronic lt/categorygt ltcategorygt
    radio-controlled lt/categorygt
  • lt/productgt

33
What do Schemas Look Like? (Cont.)
  • Let's Look at a more complex Schema po.xsd

34
Purchase Order Schema
35
Reusability
  • Basic reusability mechanisms address the problems
    of using existing assets in multiple places.
  • Element references
  • Content model groups
  • Attribute groups
  • Schema includes
  • Schema imports
  • Advanced reusability mechanisms address the
    problems of modifying existing assets to serve
    needs that are perhaps different from what they
    were originally designed for
  • Exploit object-oriented idea
  • Extension and Restrictions

36
Is Validation Worth the Trouble?
  • Validation, and the tool support for it, is still
    evolving
  • Until the schema languages mature, validation
    will be a frustrating process that requires
    testing with multiple tools
  • Validation is a critical component of your data
    management process, because
  • XML is intended to be shared and processed by a
    large number and variety of applications
  • A source document may be broken up into XML
    fragments and parts reused ? the cost of errors
    in XML must be multiplied across all the programs
    and partners that rely on that data
  • The chief difficulties with validation data
    types, namespace support, and type inheritance

37
Document Object Model (DOM)
38
What is the DOM?
  • The DOM is a platform- and language-neutral data
    model and application programming interface (API)
    that will allow programs to dynamically
    manipulate the content, structure and style of
    XML and HTML documents
  • DOM is a object-oriented data model, using
    objects, to represent an XML or HTML document
  • Status of DOM
  • DOM Level1 W3C recommendation, 1 Oct. 1998.
  • DOM Level2 W3C recommendation, 13 Nov. 2000.
  • DOM Level3 W3C candidate recommendation, 7 Nov.
    2003

39
The DOM structure model
  • DOM is as a set of classes that allow you to
    create a tree of objects in memory that represent
    a manipulable version of an XML or HTML document
  • DOM presents documents as a hierarchy of Node
    objects that also implement other, more
    specialized interfaces
  • Everything in an XML document is a node object
  • Some types of nodes may have child nodes of
    various types, and others are leaf nodes that
    cannot have anything below them in the document
    structure.

40
Class and Objects
41
A DOM as A Tree of Nodes
42
A DOM as A Tree of Subclasses
43
The DOM Interface
  • The DOM has many interfaces to handle various
    node objects.
  • Every interface has its Attributes and
    Methods.
  • Compare with Object Oriented Programming (OOP).

44
The DOM Interface Hierarchy
Fundamental Interface
DOMImplementation
NamedNodeMap
DOMException
NodeList
Node
Document
CharacterData
Comment
Attr
Text
Element
Extended Interface
DocumentType
CDATASection
Notation
Entity
EntityReference
ProcessingInstruction
45
The Simple Hierarchy of An XML Document
Document
NodeList
Element
Node
NodeList
Node
Node
Comment
Node
Node
Text
Node
Node
Node
Attr


Node
Node
46
The Hierarchy of An XML Document
  • lt?xml version"1.0" encoding"big5"?gt
  • ltMemberDatagt
  • ltUserNamegtclavenlt/UserNamegt
  • ltRealNamegt???lt/RealNamegt
  • ltTELDatagt
  • ltTELgt03-5712121lt/TELgt
  • ltExtgt12345lt/Extgt
  • lt/TELDatagt
  • ltAddr typeOfficegt?????????lt/Addrgt
  • lt/MemberDatagt

47
The Simple Hierarchy of an XML Document
Document
NodeList
Element (root MemberData)
Node
NodeList
Node
Element (UserName)
Node
Element (RealName)
Node
Element (TELData)
Node
Element (TEL)

NodeList
Node
Element (Ext)
Node
Element (Addr)
NodeList
NodeList
Attr (type)
48
(No Transcript)
49
The Relation Graph
Web Client side program (e.g. JavaScript) Web
Server side program (e.g. ASP) Console program
(e.g. C, Java)
DOM
Output
50
An Example Most Frequently Used Interface, Node
  • Attributes
  • childNodes Return the child nodes in a NodeList
  • nodeName Return the name of the node
  • nodeValue Return the value of the node
  • firstChild, lastChild, previousSibling,
    nextSibling, etc.
  • Methods
  • insertBefore, replaceChild, removeChild,
    appendChild, etc.

51
DOM in Programming Languages
  • Actually, most programming languages support DOM.
  • Java, C, C, VB.Net, etc.
  • And almost these programming languages supply
    more convenient attributes and methods than
    standard W3C DOM.

52
Impact of XML on Enterprise IT
  • Data exchange and interoperability
  • By agreeing on a standard schema, organization
    can produce these text documents that can be
    validated, transmitted, and parsed by any
    application regardless of hardware or operating
    system
  • The next Electronic Data Interchange (EDI)
  • Easy data exchange is the enabling technology
    behind ebusiness and Enterprise Application
    Integration
  • Ebusiness
  • B2B revolves around the exchange of business
    messages to conduct business transactions
  • Web services and Web service registries will
    increase the B2B trend by making it even easier
    to deploy such solutions

53
Impact of XML on Enterprise IT (Cont.)
  • Enterprise Application Integration (EAI)
  • EAI is the assembling of legacy applications,
    databases, and systems to work together to
    support integrated Web views, e-commerce, and ERP
  • Open Applications Group (http//www.openapplicatio
    ns.org) defines standard for application
    integration
  • EAI has proven to be the killer app for Web
    services
  • Enterprise IT architectures
  • Bridge between J2EE and .NET
  • XSLT, XML config. files, XMLRDBMS, Native XML
    databases

54
Impact of XML on Enterprise IT (Cont.)
  • Content Management Systems (CMS)
  • CMS is a Web-based system to manage the
    production and distribution of content to
    intranet and Internet sites
  • XML separates raw content from its presentation ?
    REUSE
  • Content can be transformed on the fly via XSLT to
    browsers or wireless clients
  • The ability to tailor content to user groups on
    the fly will continue to drive the use of XML for
    CMS systems
  • Knowledge management and e-learning
  • XML is driving the future of knowledge management
    in terms of knowledge representation (RDF),
    taxonomies, and ontologies
  • XML is fostering e-learning with standard formats
    like the Instructional Management System (IMS)
    XML standards (http//www.imsproject.org)

55
Impact of XML on Enterprise IT (Cont.)
  • Portals and data integration
  • A portal is a customizable, multipaned view
    tailored to support a specific community of users
  • XML is supported via standard transformation
    portlets that use XSLT to generate specific
    presentations of content, syndication of content,
    and the integration of Web services
  • A portlet is a dynamically pluggable application
    that generates content for one pane in a portal
  • Syndication is the reuse of content from another
    site
  • The most popular format for syndication is an
    XML-based format called the Resource Description
    Framework Site Summary (RSS)

56
Impact of XML on Enterprise IT (Cont.)
  • Customer relationship management (CRM)
  • CRM systems enable an organization's sales and
    marketing staffs to understand, track, inform,
    and service their customers
  • CRM involves portals, CMS, data integration, and
    databases
  • XML is becoming the glue to tie all these systems
    together to enable the sales force or customers
    (directly) to access information when they want
    and wherever they are
  • Databases and data mining
  • All the major DB vendors support XML translation
    between relational tables and XML schemas
  • XML as a native data type
  • Native XML databases for the storage and
    retrieval of XML
  • XQuery

57
Impact of XML on Enterprise IT (Cont.)
  • Collaboration technologies and peer-to-peer (P2P)
  • Collaboration technologies allow individuals to
    interact and participate in joint activities from
    disparate locations over networks
  • P2P is a specific decentralized collaboration
    protocol
  • XML is being used for collaboration at the
    protocol level, for supporting interoperable
    tools, configuring the collaboration experience,
    and capturing shared content
  • Open source JXTA project (http//www.jxta.org)

58
Why Meta Data Is Not Enough
  • XML meta data is a form of description
  • XML describes the purpose or meaning of raw data
    values via a text format to more easily enable
    exchange, interoperability, and application
    independence
  • Meta data increases the fidelity and granularity
    of our data
  • The current state of meta data is that we attach
    words (or labels) to our data values to describe
    it
  • How about sentences, paragraphs
  • The motivation for providing richer data
    description is to move data processing from being
    tediously preplanned and mechanistic to dynamic,
    just-in-time, and adaptive

59
Why Meta Data Is Not Enough (Cont.)
  • Scenario
  • The more computers understand, the more
    effectively they can handle complex tasks
  • We have not yet invented all the ways a
    semantically aware computing system can drive new
    business and decrease your operation costs
  • But to get there, we must push beyond simple meta
    data modeling to knowledge modeling and standard
    know processing
  • Simple meta data ? semantic levels ? rule
    languages ? inference engines

60
Evolution in Data Fidelity
61
Semantic Levels
  • Evolution of data fidelity required for
    semantically aware applications
  • Level 1 (Things) XML Schema
  • Describe singular concepts or objects
  • Capture and process meta data about isolated data
    classes
  • Level 2 (Knowledge about Things) RDF and
    taxonomies
  • Enable to model statements both about
    relationships between Level 1 objects and about
    how those objects operate
  • Level 3 (Worlds) ontologies
  • High-fidelity, closed-world models allow you to
    know your customer better, respond faster,
    rapidly set up new business partners, improve
    efficiencies, and reduce operation costs

62
Rules and Logic
  • The semantic levels of information provide the
    input for software systems
  • The operations that a software system uses to
    manipulate the semantic information will be
    standardized into one or more rule languages
  • A rule specifies an action if certain conditions
    are met
  • If (x) then y

63
Inference Engines
  • Applying rules and logic to our semantic data
    requires standard, embeddable inference engines
  • These programs will execute a set of rules on a
    specific instance of data using an ontology

64
Why Meta Data Is Not Enough (Cont.)
  • So, meta data is a starting point for semantic
    representation and processing
  • The rise of meta data is related to the ability
    to reuse meta data between organizations and
    systems
  • XML provides the best universal syntax to do that
Write a Comment
User Comments (0)
About PowerShow.com