Title: XML 101 Presentation
1XML 101 Presentation
2Objectives
- XML
- XML Basics
- Community Standards
- Core Components
- Registry and Repository
- XML Advanced
- CRC
- File Structure
- Processing Flows
3What Is XML?
- Extensible Markup Language
- XML is a meta language
- Allows trading partners to develop markup
languages - Open source
- Free
- Technology is Non-proprietary
- Supported by World Wide Web Consortium (W3C)
4Flat File Example
- 0405298927382 03 1979Sally ASmith
0203INDL2222222 FP 417
Halper Road Fort Wayne - IN46807Allen USA2197446947
sally.smith_at_veryspeedy.net 1221784902
5XML Example
- ltStudentgt
- ltIndexgt
- ltSSNgt298927382lt/SSNgt
- ltBirthDategt1967-08-13lt/BirthDategt
- lt/Indexgt
- ltNamegt
- ltFirstNamegtSALLYlt/FirstNamegt
- ltMiddleInitialgtAlt/MiddleInitialgt
- ltLastNamegtSMITHlt/LastNamegt
- lt/Namegt
- lt/Studentgt
6XML Advantages
- XML is hierarchical in nature can represent
more complex relationships - XML blocks can be repeated making information
sharing more flexible - XML schemas define advanced relationships that
are not possible in standard flat files
7XML Advantages
- Easy to understand and use
- Already has large base of users and support tools
- Web browsers understand XML
- Wide industry interest and support
8XML Advantages
- The entire document or portions of the record can
be transmitted - Data elements can be omitted
- Additional information can be added easily
- Schemas can reference other schemas
- Data files are machine and human readable
- You dont need to read it but you could
9Why use XML for Educational records?
- Cost savings
- Off the shelf tools are coming to market
- Technology-neutral
- Joins different databases or systems
- Smaller institutions can adopt
10ltSTUDENTID type SSNgt 123456789
lt/STUDENTIDgt ltDEMOGRAPHICgt ltBIRTH DATE
typeDATEgt 19740823 lt/BIRTH DATEgt
ltGENDERgt M lt/GENDERgt lt/DEMOGRAPHICgt ltGRADE_R
EPORTgt ltSESSION Code!199901gt ltLABELgt
SPRING SESSION lt/LABELgt ltYEAR
typeCCYYgt 1999 lt/YEARgt ltCOURSE
index1gt ltCREDIT typehoursgt 4
lt/CREDITgt ltGRADEgt A lt/GRADEgt
ltCODEgt SPN 406 lt/CODEgt
ltCOURSE_TITLEgt SPANISH I
lt/COURSE_TITLEgt lt/COURSEgt ltCOURSE index2gt
ltCREDIT typehoursgt 3 lt/CREDITgt
ltGRADEgt B lt/GRADEgt ltCODEgt HIS
302 lt/CODEgt ltCOURSE_TITLEgt TX
HISTORY lt/COURSE_TITLEgt lt/COURSEgt
web.xsl
pda.xsl
edi.xsd
11XML Predefined Special Characters
- XML predefines
- lt the less than sign (lt)
- amp the ampersand ()
- gt the greater than sign (gt)
- quot the straight, double quotation marks ()
- apos the apostrophe, straight quotation mark
()
12XML Predefined Special Characters
- Example
- ltAddressLinegt203 OaposREILLY LANElt/AddressLinegt
- (OReilly Lane)
13XML Terms
- Field Types
- Complex Elements
- ltNamegt
- Named Complex Elements
- ltPermanentAddressgt
- Complex Elements with Attributes
- ltDisbursement Number1gt
- Simple Elements
- ltDisbursementAmountgt
14XML Terms
- DTD Master listing of all the elements including
where and how they need to be placed in the
documents - Schema An XML application that can describe the
allowed content of documents - Validation Process of checking structural
validity of document - Instance Document A listing of all possible tags
- XML Example Document A listing of tags with
example data - XSL, XSLT converts an XML document into a
another specified format - Parser Tool that reads the document and divides
it into individual elements, attributes, and
other pieces - Namespace Simple method for qualifying element
and attribute names
15Community Standard Common Record
- ORIGINAL VISION Use XML Technology to create
financial aid data packet standards.
PELL
FFEL
DL
ISIR
Financial Aid Office
16Community Standard Common Record
- REVISED VISION Use XML Technology to create
higher education or cross-industry data packet
standards.
FirstName
FirstName
FirstName
FirstName
Educational Institution
Admissions
Registrars
Financial Aid
17Common Record ExpansionConvergence of Standards
18Importance of Standards
- Standards facilitate faster, better, and cheaper
(Every year it becomes more expensive to upgrade
systems and every organization is expected to do
more with less) - Standards make training and cross-training easier
- Reduce change for the sake of change
19Common Record Initiatives
- Core Components Dictionary PESC
- Schemas
- Common Record
- Common Record COD August, 2001
- Common Record CommonLine July, 2003
- Common Record ISIR (Final Draft published)
- Admissions/Registrar
- Academic Transcript July 2004
- K-12 Academic Transcript
- XML Framework
- Registry and Repository Summer 2004
20TIP
- CR - Common Record - Standard XML names (tags)
and formats (schema) for exchanging data within
Higher Education. - COD - Common Origination and Disbursement - FSA
process for originating and disbursing Pell
Grants and Direct Loans using Common Record.
21Vision/Where are we going?
Members of the community have come together to
build an XML standard for higher education.
22Common Record FSAs XML Framework
- XML Strategic Assessment and Enterprise Vision
- XML Technical Reference and Usage Guidelines
- XML Core Component Dictionaries
- XML Registry and Repository
- XML Framework Communications Strategy
- XML ISIR Performance Test and SAIG Capacity Plan
23Core Components
- Contains all elements used in schemas
- Each element has a tag name, definition, minimum
and maximum length, data type, field type,
format, valid field values, and an indication of
which business processes use that element.
24Core Components
- Field Number - Lists the Common Record field
number - Tag Name - The assigned XML tag name -
ltDisbursementDategt - Definition - The mutually agreed upon definition
of the element - This element indicates the
current scheduled or actual disbursement date for
the disbursement.
25Core Components
- Minimum and maximum field lengths (if applicable)
- some field and data types do not require
lengths - For example, Complex Elements do not require a
minimum or a maximum length and Simple Elements
defined as Dates do not require a maximum length
26Core Components
- Data Types -
- Date
- Date/Time
- Integer
- Decimal
- String
- Boolean
27Core Components
- Field Types
- Complex Elements
- Named Complex Elements
- Complex Elements with Attributes
- Simple Elements
28Core Components
- Field Formats
- CCYY-MM-DD
- 999999999.99
- CCYY-MM-DDTHHmmss.ff
29Core Components
- Valid Field Values
- Alphanumeric
- 0-999999999.99
- Word Values Citizen, EligibleNonCitizen,
NotEligible - Code Values 1 U.S. citizen (or U.S.
national), 2 Eligible noncitizen, 3 Not
eligible
30Core Components
- Stakeholder Processes
- Common Origination Disbursement
- Meteor
- Award Notification
- Loan Counseling
- CommonLine
- CAM
- ISIR
- Admissions/Registrars
- K through 12
31XML Registry and Repository - Login
32Landing Page
33Search
34Search Results
35Core Component View - Base
36Core Component View - Details
37Core Component View - Details
38Core Component View - Associations
39Knowledge Management - Info
40XML Usage at FSA
- Expanded use of XML by FSA
- More schools and software vendors are moving to
being COD Full Participants. - EdExpress has incorporated XML support.
- The CPS ISIR will be implemented as an XML Schema.
41XML Usage in the Community
- Expanded use of XML by the Financial Aid
Community - The Common Record CommonLine XML Schemas have
been completed and are in the process of being
implemented. - The Postsecondary Academic Transcript XML Schema
has been completed. - Meteor, ELM, and Mapping Your Future are using
and providing support for XML.
42XML Schema Design Best Practices
- Overview
- Russian Doll Design
- Salami Slice Design
- Venetian Blind Design
43XML Schema Design Best Practices Overview
- While FSA and the Financial Aid Community have
developed a number of XML Schemas that are in use
today, individual schools and vendors may find a
use for developing their own schemas for internal
data exchange and processing. - XML Schema Design Best Practices provides
information on the three design patterns commonly
used to create XML Schemas. - Each design pattern has its own pros and cons and
may be used depending on the situation. - An understanding of these different design
patterns will be helpful for schools and vendors
to provide feedback on future Schema development
efforts by FSA and the Community.
44XML Schema Design Best Practices Russian Doll
Design
- The Russian Doll Design defines objects nested
within each other. - Elements created using this methodology will have
Schemas that are very similar to the instance
documents. - Limits the reusability of Schema designs.
- Facilitates hiding namespaces.
- Can prevent namespace issues like name collisions.
45Example Russian Doll Design Schema Snippet
- ltxsdelement name"Movie"gt
- ltxsdcomplexTypegt
- ltxsdsequencegt
- ltxsdelement name"Title" type"xsdstring"/gt
- ltxsdelement name"Director"
type"xsdstring"/gt - ltxsdelement name"Genre" type"xsdstring"/gt
- ltxsdelement name"ReleaseYear"
type"xsdgYear"/gt - lt/xsdsequencegt
- lt/xsdcomplexTypegt
- lt/xsdelementgt
46XML Schema Design Best Practices Salami Slice
Design
- The Salami Slice Design defines all objects as
children of the root element. - Elements created using this methodology make
object reuse very easy. - Mapping between the Schema and an instance
document will not be as straight forward. - Automated validation of instance documents
against Schemas is not affected. - Allows the reuse of elements so that Schema
designers must be cognizant of possible namespace
issues like name collisions.
47Example Salami Slice Design Schema Snippet
- ltxsdelement name"Movie"gt
- ltxsdcomplexTypegt
- ltxsdsequencegt
- ltxsdelement ref"Title"/gt
- ltxsdelement ref"Director"/gt
- ltxsdelement ref"Genre"/gt
- ltxsdelement ref"ReleaseYear"/gt
- lt/xsdsequencegt
- lt/xsdcomplexTypegt
- lt/xsdelementgt
- ltxsdelement name"Title" type"xsdstring"/gt
- ltxsdelement name"Director" type"xsdstring"/gt
- ltxsdelement name"Genre" type"xsdstring"/gt
- ltxsdelement name"ReleaseYear" type"xsdgYear"/gt
48XML Schema Design Best Practices Venetian Blind
Design
- The Venetian Blind Design leverages the design
advantages of both the Russian Doll and Salami
Slice Designs. - Facilitates reuse while also hiding namespace
complexities (by creating type definitions). - Instead of actually creating elements and
referencing them, a Schema designer would create
a type, and reference that when creating their
elements.
49Example Venetian Blind Design Schema Snippet
- ltxsdsimpleType name"TitleType"gt
- ltxsdrestriction base"xsdstring"/gt
- lt/xsdsimpleTypegt
- ltxsdsimpleType name"DirectorType"gt
- ltxsdrestriction base"xsdstring"/gt
- lt/xsdsimpleTypegt
- ltxsdsimpleType name"GenreType"gt
- ltxsdrestriction base"xsdstring"/gt
- lt/xsdsimpleTypegt
- ltxsdsimpleType name"ReleaseYearType"gt
- ltxsdrestriction base"xsdgYear"/gt
- lt/xsdsimpleTypegt
- ltxsdcomplexType name"MovieType"gt
- ltxsdsequencegt
- ltxsdelement name"Title" type"TitleType"/gt
- ltxsdelement name"Director" type"DirectorType"/
gt - ltxsdelement name"Genre" type"GenreType"/gt
- ltxsdelement name"ReleaseYear"
type"ReleaseYearType"/gt - lt/xsdsequencegt
50XML Supporting Technologies
- Development Tools
- XML Java Parsers
- XSLT
- XPath
51XML Supporting Technologies Development Tools
- There are many categories of tools and many
different tools available to support XML
development. The following are representative
Integrated Development Environment (IDE) tools - Sonic Stylus Studio provides support for
authoring XQuery, XSLT stylesheets, XML schema,
and related XML documents. - Altova XMLSpy is an XML Development Environment
that can be used for designing, editing, and
implementing XML. It provides a graphical view
of schemas and instance documents. In addition,
XMLSpy provides an integrated XML instance
document validator. - Tibco TurboXML is an IDE for developing and
managing XML assets. It provides support for
creating, validating, converting, and managing
XML schemas, XML files, and DTDs.
52XML Supporting TechnologiesXML Java Parsers
- Parsers allow you to read in XML documents.
Provide access to information stored in XML
documents. - There are three general categories of parsers
- All-in-Memory Parsers load the entire XML
document into memory and provide a tree-like view
of the document. As a result, the entire
document must be processed before you can access
any piece of the document at all. - DOM
- Push Parsers hide the interaction with the
actual document and push the tokens to user
through callback methods. Reads the XML stream,
and when it encounters an element (or entity,
etc.), it generates an event. It is up to the
application to handle those events, usually via a
callback or an event handler class. - SAX
- Pull Parsers the application developer is
responsible for the parsing loop, pulling
elements (or entities, etc.) out of the XML
stream explicitly. - Currently, the two most widespread parsers are
DOM and SAX.
53XML Java Parsers Document Object Model (DOM)
- The Document Object Model (DOM) API is associated
with all-in-memory parsers. - DOM can read in an XML stream, can optionally
validate it against a schema or DTD, and when
its done parsing, it provides a hierarchical
tree view of the document. - Using DOM, developers can manipulate the document
in any number of ways, including addition,
modification, and removal of nodes and text
content. - A DOM instance can also be serialized to an XML
stream. - There is a great deal of overhead to using DOM.
- DOM stores everything, including attribute
values, as a string. Thus, if you have Boolean
or numerical attributes, DOM wastes space storing
them as strings. - DOM will usually be outperformed by a streaming
parser, because most DOM implementations are now
built on top of a streaming parser. - The performance issues in creating and
maintaining the tree structure is what makes DOM
so useful in the first place.
54XML Java Parsers Simple API for XML (SAX)
- The SAX API is associated with push parsers.
- SAX allows users to define a set of handler
objects through which it notifies you when
interesting events occur during the sequential
parsing of a document. - SAX reads XML content from a stream and then
passes that content on to the application through
the event handler interfaces. SAX implementations
generally have very little overhead, which in
turn usually leads to good parsing performance. - If it is not implemented properly, the
event-handling code for dealing with complex or
deeply nested documents can become very
convoluted and difficult to read and maintain. - SAX doesnt define an object model of its own.
Therefore, in most cases the developer will have
to define their own data structures to store the
data. - Once the SAX implementation has commenced the
parsing process, the only way to terminate the
process is to throw an exception, which is less
than ideal.
55XML Supporting TechnologiesXSLT
- XSLT provides developers with a higher-level
language to access and transform XML streams. - The stylesheet provides rules for mapping input
streams to output streams.
56XML Supporting TechnologiesXPath
- XPath is a non-XML language used to identify
parts of XML documents. It does this by viewing
the hierarchical structure of an XML document as
a tree of nodes and returns results based on the
position of a node, its type or content. - XSLT uses XPath.
- Example
- //Disbursement_at_Number1/DisbursementDate
57XML References
- Additional information on XML can be found in the
following references - www.w3c.org
- www.ebxml.org
- www.oasis-open.org
- www.xml.com
- XML Journal lthttp//sys-con.com/xml/gt
- www.xfront.com
- www.pesc.org
58Questions?
- We appreciate your feedback and comments. We can
be reached at - Name Holly A. Hyland
- Phone 202.377.3710
- Email Holly.Hyland_at_ed.gov
- Name Tim Bornholtz
- Phone 202.377.3465
- Email Tim.Bornholtz_at_ed.gov