Title: The Case for Conceptual Modeling for XML
1The Case for Conceptual Modeling for XML
- Arijit Sengupta
- Information Systems and Operations Management
- Raj Soin College of Business
Joint work with E. Wilde, ETH Zurich
2Agenda
- Introduction
- Motivation for Conceptual Modeling
- Existing Models
- Tools for Schema Design
- XER an Extensible ER Model for XML
- Conclusions
3Introduction
- Three phases of database design
- Conceptual design
- Developing an abstract idea of the data objects
and their interactions - Logical design
- Mapping from the conceptual design to the data
structure appropriate for the underlying database
model (e.g., relational, OO, OR) - Physical design
- Laying out the data on the disk with access
structures, indexes.
4Benefits of Conceptual Design
- Projects without a strong conceptual design are
more likely to fail - Design is one of the most important aspects of
project and business process quality management
standards - ISO 9000
- CMM
- Literature in Relational Model shows Benefits of
Conceptual Design in user performance
5Why Conceptual Design for XML?
- Designs are typically network structured, XML
representation is hierarchical - Directly creating XML structures in Schema
languages is difficult and error-prone - Conceptual models are not tied to a logical
structure, so can be used to create potentially
better logical designs - Improves presentation capabilities
6What exactly are we modeling?
- Structure of XML documents
- lt?xml version"1.0"?gt
- ltPAPER nameXER-JAIS05gt
- ltTITLEgtXER - Extensible ERlt/TITLEgt
- ltSECTION title"Intro"gt
- ltPARAgtThis is a ltEMPHgtlogicallt/EMPHgt
- model of ltCITE label"XML"gtXMLlt/CITEgt.
- lt/PARAgt
- lt/SECTIONgt
- ltSECTION title"Conclusion"gt
- ltPARAgtDoes sound very logical.lt/PARAgt
- lt/SECTIONgt
- ltBIBLIOGRAPHYgt
- ltBIBITEM label"XML"gtXML, 2004.
- W3C Recommendationlt/BIBITEMgt
- lt/BIBLIOGRAPHYgt
- lt/PAPERgt
7XML Structures with a DTD
lt!ELEMENT PAPER (TITLE,SECTION, BIBLIOGRAPHY?)
gt lt!ATTLIST PAPER name ID REQUIREDgt lt!ELEMENT
TITLE (PCDATA) gt lt!ELEMENT SECTION (PCDATA
CITE PARA EMPH)gt lt!ATTLIST SECTION title
CDATA REQUIRED gt lt!ELEMENT BIBLIOGRAPHY
(BIBITEM) gt lt!ELEMENT BIBITEM (PCDATA)gt lt!ATTLIS
T BIBITEM label ID IMPLIEDgt lt!ELEMENT CITE EMPTY
gt lt!ATTLIST CITE label IDREF REQUIRED
gt lt!ELEMENT EMPH (PCDATA) gt lt!ELEMENT PARA
(PCDATA)gt
8XML Structures using XML Schema
lt?xml version"1.0"?gt ltxsschema
xmlnsxs"http//www.w3.org/2001/XMLSchema"gt
ltxselement name"PAPER"gt ltxscomplexTypegt
ltxssequencegt ltxselement ref"TITLE"
/gt ltxssequence minOccurs"0" maxOccurs"unbounde
d"gt ltxselement ref"SECTION"
/gt lt/xssequencegt ltxssequence minOccurs"0"gt
ltxselement ref"BIBLIOGRAPHY"
/gt lt/xssequencegt lt/xssequencegt
ltxsattribute name"name" type"xsID"
use"required"/gt lt/xscomplexTypegt
lt/xselementgt
9An XML Schema - continued
ltxselement name"TITLE" type"xsstring" /gt
ltxselement name"SECTION"gt ltxscomplexType
mixed"true"gt ltxschoice minOccurs"0"
maxOccurs"unbounded"gt ltxselement ref"CITE"
/gt ltxselement ref"PARA" /gt ltxselement
ref"EMPH" /gt lt/xschoicegt ltxsattribute
name"title" type"xsstring" use"required" /gt
lt/xscomplexTypegt lt/xselementgt ltxselement
name"BIBLIOGRAPHY"gt ltxscomplexTypegt
ltxssequencegt ltxselement maxOccurs"unbounded"
ref"BIBITEM" /gt lt/xssequencegt
lt/xscomplexTypegt lt/xselementgt
10And more
ltxselement name"BIBITEM"gt ltxscomplexTypegt
ltxssimpleContentgt ltxsextension
base"xsstring"gt ltxsattribute name"label"
type"xsID" /gt lt/xsextensiongt
lt/xssimpleContentgt lt/xscomplexTypegt
lt/xselementgt ltxselement name"CITE"gt
ltxscomplexTypegt ltxsattribute name"label"
type"xsIDREF" use"required" /gt
lt/xscomplexTypegt lt/xselementgt ltxselement
name"EMPH" type"xsstring" /gt ltxselement
name"PARA" type"xsstring" /gt lt/xsschemagt
11Motivation Conceptualizing XML Schema
- Highly textual content
- More syntactic verbiage than semantic
- Highly tied to Schema structure, including case
sensitivity (good for ensuring structural
stability, but bad for design purposes!) - Based more on XML syntax rather than concepts
about the structure
12Conceptual Vs Physical models
- For the relational setting, it is a well
established theory that conceptual models improve
user performance over textual models - Reduce number of errors (Batra, 1990)
- Improve query performance (Batra, 1993)
- Generalized theory for improved performance with
higher level models (Chan, 1997) - Can we extend this for XML?
13A conceptual model
SECTION
has
CITE
1,1
1,1
0,M
cites
PARA
EMPH
0,1
_at_title
has
1,M
1,1
1,1
contains
0,M
14XML Modeling Challenges
- XML is not relational! (Robie, XML03)
- XML objects are inherently ordered.
- XML structures can be complex, including groups
of repeating and alternating structures - Heterogeneity the same element can have
different structures in different contexts - Mixed Content an element can have both data and
structure - XML does not have a direct way of supporting
many-many relationships - XML related concepts such as namespaces,
pointers, links
15Conceptualization of Information
- Chens levels of information representation
- Concepts of objects and their associations in
human minds - Structures of such information represented by
data, not necessarily in a specific form - Structures with a distinct form of data without
any specific access path - Structures of data with access-path dependency
16XML modeling requirements
- Functional requirements (have to have)
- Structure and access-path independence
- Reflection of users mental model
- Modeling of constraints
- Formal basis
- Non-functional requirements (optional)
- Graphical representation
- Textual nuances (mixed/open/reusable components)
- XML-specific characteristics (namespaces,
parameter entities, references/links)
17State of the Art in XML Models
less
- ER-based models
- Start with ER, extend to handle XML complexity
- Psaila '00, '03
- UML-based models
- Start with UML, simplify to match XML
- Conrad '00, Routlege '02
- Hierarchical models
- Others
- XGrammar (Mani '01)
- Semantic Network (Feng '02)
Complexity
more
18ERX (Psaila 2000, 2003)
- Entity Relationship for XML
- Based on the ER Model, with several modifications
- Annotations on attributes for required(R),
optional/implied (I) as well as ordering (O/U) - Allows relationships, arities, generalization/spec
ialization - Concept of interface for the purpose of
integrating a structure with a style - No support for involved XML characteristics like
mixed content
19An ERX Diagram
20Conrad (2000) and UML
- Representing XML structure using UML
- Logical representation since most XML structural
variations can be handled by UML - Elements represented as objects, attributes as
variables - Conrad UML restricted to DTDs, Later work by
Routelege (2002) incorporates Schema capabilities - Added syntactic sugar to UML to represent
concepts like primary keys, links, references - Links between objects annotated with the type of
XML content model (sequence/choice)
21A (partial) Conrad UML
22Modeling with Semantic Networks
- Feng, Chang Dillon (2002)
- Set of atomic and complex nodes representing real
world objects. - Set of directed edges representing the semantic
relationships between objects. - Set of labels denoting the different types of
semantic relationships such as aggregation,
generalization etc. - Set of constraints defined over nodes and edges
to constrain these relationships.
23A sample Semantic Net for Papers
24Some interesting observations
- All modeling techniques are networks and not
hierarchies - Most do not assume that an eventual hierarchy is
being represented - Other observations?
25Tools for Creating Schema
- There isnt really a conceptual modeling tool for
XML - There are, however, graphical interfaces for
creating schema - Typically based on either a relational structure,
tree structure, or a hybrid structure - Generation of schema from the graphical
representation (typically not stored internally)
26Schema Designer in Visual Studio
http//www.microsoft.com/vstudio
27An XML Schema Tree in XML Authority
http//www.tibco.com
28An XML HyperModel
http//www.xmlmodeling.com/
29XER eXtensible Entity Relationship
- I pronounce it as Cher, some like sure
- Based on the well-accepted ER model
- A model that is faithful to the standard
- Includes visual constructs for all basic XML
structural nuances - Includes advanced schema concepts such as data
types, participation constraints - Intended to be a starting point for an XML
design you should start with XER
30XER XML Conflicting terminology
- XER Entity different from XML entity (e.g.,
lt) - XER attribute different from XML attributes
(e.g., ltimg srca.jpg/gt - Rest of the presentation will use XER
terminology, we will prefix with XML if we need
to refer to the XML terms (e.g., XML attribute)
31Element Normal Form
- Canonical View of XML without using XML
attributes. - XML attributes are represented as elements
prefixed with a special symbol (e.g., _at_). - Special considerations need to be made for ID and
IDREF attributes - With no loss of generality, ENF allows us to work
with XML with only element content - We can prove that XML ? ENF XML
32XER Constructs
- Entity
- Basic XER object ordered by default, unordered
and mixed representations allowed - Attribute
- Components of an entity may display data types,
repetition constraints - Relationship
- Provides connections between Entities, show
participation constraints - Generalization
- ISA relationships models variations of
structures - Other Constructs.
33XER Entities
- Represented as rectangles, with the name of the
entity on top - Contains a list of XER attributes
- Always ordered by default (i.e., a sequence)
- IDs are underlined
- XML attributes are indicated with _at_
- May indicate repetition
- May indicate data types
34XER Entity ? XML Schema
lt!ELEMENT person (FirstName, LastName,
PhoneNumber)gt lt!ATTLIST person SSN ID REQUIRED
status CDATA
IMPLIEDgt
35Entity Variations
- Entities with Mixed Content
- Only one possibility with DTDs (PCDATAAB)
- Many possibilities with schema
- Entities with no order between attributes
- xsall in schema
- Explicitly listing all possibilities in DTD
Para
Bold
Italics
Footnote
36Heterogeneity
ITEM
- Same XML element may have different structure
- Modeled as generalization/specialization
structures - Same modeling constructs as ER generalization
- Sub-entities could be inline or linked outside
- Super entity may have its own attributes
_at_itemno
BOOK
pages
author ()
_at_ISBN
VIDEO
title
actor ()
37XER Generalization ? Schema
38XER Relationships
- Similar to ER relationships
- Primarily 1-1 and 1-M in XML
- Source is one attribute of one entity
- May show source line in a heavier style
- Participation constraints
BOOK
Chapter
title
_at_chapno
1,1
has
author()
title
abstract
_at_isbn
1,M
Chapter
section ()
39XER Relationships ? Schema
40Many-Many Relationships
- In DTDs, IDs and IDREFs are used
- IDREFs are similar to foreign keys, although
source IDs are not known - Better design in XML schema with xskey
- Global nature of IDREFs introduces complexity,
requires user interaction
CITATION
BIBITEM
PAPER
_at_label
1,1
1,1
0,1
1,M
citation
citation
0,M
1,M
41Advanced XER constructs
- Relationships with higher arity
- Source Entity is important
- Conceptually no different from standard higher
arity relationships - Weak Entities
- Most entities in XML are actually weak entities,
cannot exist without parent existence - Applications may explicitly indicate such
- Aggregations
- A relationship may be promoted to an entity for
better visibility, and incorporating IDs
42Translation between conceptual model (XER) and
logical model (XML Schema)
- Up-translation (reverse translation)
- From DTD/Schema, generate XER
- Mostly automatic, IDREFs need designer
interaction - Down-translation (forward translation)
- From XER, generate DTD/schema
- Fully automated
43The XER-XML Cycle
XER/Dia
Up-Translation XSLT
Down-Translation XSLT
Existing XML Schema
Optimized XML Schema
Existing XML Application
Existing XML Documents
44A comparison of everything
45Results and Contributions
- ER is XER!
- Proofs for equivalence and completeness
- Translations are information preserving
- DTD ? XER ? DTD will generate the original DTD
- Proof-of-concept implementation to show
effectiveness of methodology - Empirical Analysis follows theory, shows
improvement in user performance
46Future work
- Show relationship between XER and XNF (XML
Normal Form) - Show relationship between XER and design quality
- Empirical study on quality of models
- Include support for namespaces and other
advanced XML constructs.
47References
- Chen, P. (1976) "The Entity-Relationship model -
Toward a unified view of data," Transactions of
Database Systems (TODS) (1) 1, pp. 9-36. - Conrad, R., Scheffner, D., Freytag, J. (2000)
XML conceptual modeling using UML. International
conference on Conceptual Modeling, 2000, pp.
558-571 1920. - Feng, L., Chang, E., Dillon, T. (2002) "A
semantic network-based design methodology for XML
documents," ACM Transactions on Information
Systems (20) 4, pp. 390-421. - Psaila, G. (2000) ERX A conceptual model for XML
documents. ACM Symposium of Applied Computing
(SAC 2000), 2000. - Routledge, N., Bird, L., Goodchild, A. (2002)
UML and XML schema. Thirteenth Australasian
Database Conference, 2002. - Sengupta, A., Mohan, S., Doshi, R. (2003) XER
Extensible entity relationship modeling. XML
2003, Philadelphia, 2003.
48Questions?
- Arijit Sengupta
- Information Systems and Operations Management
- Raj Soin College of Business
- Wright State University, Dayton, Ohio
- arijit.sengupta_at_wright.edu
- http//www.wright.edu/arijit.sengupta