The Case for Conceptual Modeling for XML - PowerPoint PPT Presentation

1 / 48
About This Presentation
Title:

The Case for Conceptual Modeling for XML

Description:

Joint work with E. Wilde, ETH Zurich. 10/2/09. 2. ISOM 2003-2005 Arijit Sengupta. Conclusion ... Three phases of database design: Conceptual design ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 49
Provided by: Sri91
Category:

less

Transcript and Presenter's Notes

Title: The Case for Conceptual Modeling for XML


1
The Case for Conceptual Modeling for XML
  • Arijit Sengupta
  • Information Systems and Operations Management
  • Raj Soin College of Business

Joint work with E. Wilde, ETH Zurich
2
Agenda
  • Introduction
  • Motivation for Conceptual Modeling
  • Existing Models
  • Tools for Schema Design
  • XER an Extensible ER Model for XML
  • Conclusions

3
Introduction
  • Three phases of database design
  • Conceptual design
  • Developing an abstract idea of the data objects
    and their interactions
  • Logical design
  • Mapping from the conceptual design to the data
    structure appropriate for the underlying database
    model (e.g., relational, OO, OR)
  • Physical design
  • Laying out the data on the disk with access
    structures, indexes.

4
Benefits of Conceptual Design
  • Projects without a strong conceptual design are
    more likely to fail
  • Design is one of the most important aspects of
    project and business process quality management
    standards
  • ISO 9000
  • CMM
  • Literature in Relational Model shows Benefits of
    Conceptual Design in user performance

5
Why Conceptual Design for XML?
  • Designs are typically network structured, XML
    representation is hierarchical
  • Directly creating XML structures in Schema
    languages is difficult and error-prone
  • Conceptual models are not tied to a logical
    structure, so can be used to create potentially
    better logical designs
  • Improves presentation capabilities

6
What exactly are we modeling?
  • Structure of XML documents
  • lt?xml version"1.0"?gt
  • ltPAPER nameXER-JAIS05gt
  • ltTITLEgtXER - Extensible ERlt/TITLEgt
  • ltSECTION title"Intro"gt
  • ltPARAgtThis is a ltEMPHgtlogicallt/EMPHgt
  • model of ltCITE label"XML"gtXMLlt/CITEgt.
  • lt/PARAgt
  • lt/SECTIONgt
  • ltSECTION title"Conclusion"gt
  • ltPARAgtDoes sound very logical.lt/PARAgt
  • lt/SECTIONgt
  • ltBIBLIOGRAPHYgt
  • ltBIBITEM label"XML"gtXML, 2004.
  • W3C Recommendationlt/BIBITEMgt
  • lt/BIBLIOGRAPHYgt
  • lt/PAPERgt

7
XML Structures with a DTD
lt!ELEMENT PAPER (TITLE,SECTION, BIBLIOGRAPHY?)
gt lt!ATTLIST PAPER name ID REQUIREDgt lt!ELEMENT
TITLE (PCDATA) gt lt!ELEMENT SECTION (PCDATA
CITE PARA EMPH)gt lt!ATTLIST SECTION title
CDATA REQUIRED gt lt!ELEMENT BIBLIOGRAPHY
(BIBITEM) gt lt!ELEMENT BIBITEM (PCDATA)gt lt!ATTLIS
T BIBITEM label ID IMPLIEDgt lt!ELEMENT CITE EMPTY
gt lt!ATTLIST CITE label IDREF REQUIRED
gt lt!ELEMENT EMPH (PCDATA) gt lt!ELEMENT PARA
(PCDATA)gt
8
XML Structures using XML Schema
lt?xml version"1.0"?gt ltxsschema
xmlnsxs"http//www.w3.org/2001/XMLSchema"gt
ltxselement name"PAPER"gt ltxscomplexTypegt
ltxssequencegt ltxselement ref"TITLE"
/gt ltxssequence minOccurs"0" maxOccurs"unbounde
d"gt ltxselement ref"SECTION"
/gt lt/xssequencegt ltxssequence minOccurs"0"gt
ltxselement ref"BIBLIOGRAPHY"
/gt lt/xssequencegt lt/xssequencegt
ltxsattribute name"name" type"xsID"
use"required"/gt lt/xscomplexTypegt
lt/xselementgt
9
An XML Schema - continued
ltxselement name"TITLE" type"xsstring" /gt
ltxselement name"SECTION"gt ltxscomplexType
mixed"true"gt ltxschoice minOccurs"0"
maxOccurs"unbounded"gt ltxselement ref"CITE"
/gt ltxselement ref"PARA" /gt ltxselement
ref"EMPH" /gt lt/xschoicegt ltxsattribute
name"title" type"xsstring" use"required" /gt
lt/xscomplexTypegt lt/xselementgt ltxselement
name"BIBLIOGRAPHY"gt ltxscomplexTypegt
ltxssequencegt ltxselement maxOccurs"unbounded"
ref"BIBITEM" /gt lt/xssequencegt
lt/xscomplexTypegt lt/xselementgt
10
And more
ltxselement name"BIBITEM"gt ltxscomplexTypegt
ltxssimpleContentgt ltxsextension
base"xsstring"gt ltxsattribute name"label"
type"xsID" /gt lt/xsextensiongt
lt/xssimpleContentgt lt/xscomplexTypegt
lt/xselementgt ltxselement name"CITE"gt
ltxscomplexTypegt ltxsattribute name"label"
type"xsIDREF" use"required" /gt
lt/xscomplexTypegt lt/xselementgt ltxselement
name"EMPH" type"xsstring" /gt ltxselement
name"PARA" type"xsstring" /gt lt/xsschemagt
11
Motivation Conceptualizing XML Schema
  • Highly textual content
  • More syntactic verbiage than semantic
  • Highly tied to Schema structure, including case
    sensitivity (good for ensuring structural
    stability, but bad for design purposes!)
  • Based more on XML syntax rather than concepts
    about the structure

12
Conceptual Vs Physical models
  • For the relational setting, it is a well
    established theory that conceptual models improve
    user performance over textual models
  • Reduce number of errors (Batra, 1990)
  • Improve query performance (Batra, 1993)
  • Generalized theory for improved performance with
    higher level models (Chan, 1997)
  • Can we extend this for XML?

13
A conceptual model
SECTION
has
CITE
1,1
1,1
0,M
cites
PARA
EMPH
0,1
_at_title
has
1,M
1,1
1,1
contains
0,M
14
XML Modeling Challenges
  • XML is not relational! (Robie, XML03)
  • XML objects are inherently ordered.
  • XML structures can be complex, including groups
    of repeating and alternating structures
  • Heterogeneity the same element can have
    different structures in different contexts
  • Mixed Content an element can have both data and
    structure
  • XML does not have a direct way of supporting
    many-many relationships
  • XML related concepts such as namespaces,
    pointers, links

15
Conceptualization of Information
  • Chens levels of information representation
  • Concepts of objects and their associations in
    human minds
  • Structures of such information represented by
    data, not necessarily in a specific form
  • Structures with a distinct form of data without
    any specific access path
  • Structures of data with access-path dependency

16
XML modeling requirements
  • Functional requirements (have to have)
  • Structure and access-path independence
  • Reflection of users mental model
  • Modeling of constraints
  • Formal basis
  • Non-functional requirements (optional)
  • Graphical representation
  • Textual nuances (mixed/open/reusable components)
  • XML-specific characteristics (namespaces,
    parameter entities, references/links)

17
State of the Art in XML Models
less
  • ER-based models
  • Start with ER, extend to handle XML complexity
  • Psaila '00, '03
  • UML-based models
  • Start with UML, simplify to match XML
  • Conrad '00, Routlege '02
  • Hierarchical models
  • Others
  • XGrammar (Mani '01)
  • Semantic Network (Feng '02)

Complexity
more
18
ERX (Psaila 2000, 2003)
  • Entity Relationship for XML
  • Based on the ER Model, with several modifications
  • Annotations on attributes for required(R),
    optional/implied (I) as well as ordering (O/U)
  • Allows relationships, arities, generalization/spec
    ialization
  • Concept of interface for the purpose of
    integrating a structure with a style
  • No support for involved XML characteristics like
    mixed content

19
An ERX Diagram
20
Conrad (2000) and UML
  • Representing XML structure using UML
  • Logical representation since most XML structural
    variations can be handled by UML
  • Elements represented as objects, attributes as
    variables
  • Conrad UML restricted to DTDs, Later work by
    Routelege (2002) incorporates Schema capabilities
  • Added syntactic sugar to UML to represent
    concepts like primary keys, links, references
  • Links between objects annotated with the type of
    XML content model (sequence/choice)

21
A (partial) Conrad UML
22
Modeling with Semantic Networks
  • Feng, Chang Dillon (2002)
  • Set of atomic and complex nodes representing real
    world objects.
  • Set of directed edges representing the semantic
    relationships between objects.
  • Set of labels denoting the different types of
    semantic relationships such as aggregation,
    generalization etc.
  • Set of constraints defined over nodes and edges
    to constrain these relationships.

23
A sample Semantic Net for Papers
24
Some interesting observations
  • All modeling techniques are networks and not
    hierarchies
  • Most do not assume that an eventual hierarchy is
    being represented
  • Other observations?

25
Tools for Creating Schema
  • There isnt really a conceptual modeling tool for
    XML
  • There are, however, graphical interfaces for
    creating schema
  • Typically based on either a relational structure,
    tree structure, or a hybrid structure
  • Generation of schema from the graphical
    representation (typically not stored internally)

26
Schema Designer in Visual Studio
http//www.microsoft.com/vstudio
27
An XML Schema Tree in XML Authority
http//www.tibco.com
28
An XML HyperModel
http//www.xmlmodeling.com/
29
XER eXtensible Entity Relationship
  • I pronounce it as Cher, some like sure
  • Based on the well-accepted ER model
  • A model that is faithful to the standard
  • Includes visual constructs for all basic XML
    structural nuances
  • Includes advanced schema concepts such as data
    types, participation constraints
  • Intended to be a starting point for an XML
    design you should start with XER

30
XER XML Conflicting terminology
  • XER Entity different from XML entity (e.g.,
    lt)
  • XER attribute different from XML attributes
    (e.g., ltimg srca.jpg/gt
  • Rest of the presentation will use XER
    terminology, we will prefix with XML if we need
    to refer to the XML terms (e.g., XML attribute)

31
Element Normal Form
  • Canonical View of XML without using XML
    attributes.
  • XML attributes are represented as elements
    prefixed with a special symbol (e.g., _at_).
  • Special considerations need to be made for ID and
    IDREF attributes
  • With no loss of generality, ENF allows us to work
    with XML with only element content
  • We can prove that XML ? ENF XML

32
XER Constructs
  • Entity
  • Basic XER object ordered by default, unordered
    and mixed representations allowed
  • Attribute
  • Components of an entity may display data types,
    repetition constraints
  • Relationship
  • Provides connections between Entities, show
    participation constraints
  • Generalization
  • ISA relationships models variations of
    structures
  • Other Constructs.

33
XER Entities
  • Represented as rectangles, with the name of the
    entity on top
  • Contains a list of XER attributes
  • Always ordered by default (i.e., a sequence)
  • IDs are underlined
  • XML attributes are indicated with _at_
  • May indicate repetition
  • May indicate data types

34
XER Entity ? XML Schema
lt!ELEMENT person (FirstName, LastName,
PhoneNumber)gt lt!ATTLIST person SSN ID REQUIRED
status CDATA
IMPLIEDgt
35
Entity Variations
  • Entities with Mixed Content
  • Only one possibility with DTDs (PCDATAAB)
  • Many possibilities with schema
  • Entities with no order between attributes
  • xsall in schema
  • Explicitly listing all possibilities in DTD

Para
Bold
Italics
Footnote
36
Heterogeneity
ITEM
  • Same XML element may have different structure
  • Modeled as generalization/specialization
    structures
  • Same modeling constructs as ER generalization
  • Sub-entities could be inline or linked outside
  • Super entity may have its own attributes

_at_itemno
BOOK
pages
author ()
_at_ISBN
VIDEO
title
actor ()

37
XER Generalization ? Schema
38
XER Relationships
  • Similar to ER relationships
  • Primarily 1-1 and 1-M in XML
  • Source is one attribute of one entity
  • May show source line in a heavier style
  • Participation constraints

BOOK
Chapter
title
_at_chapno
1,1
has
author()
title
abstract
_at_isbn
1,M
Chapter
section ()
39
XER Relationships ? Schema

40
Many-Many Relationships
  • In DTDs, IDs and IDREFs are used
  • IDREFs are similar to foreign keys, although
    source IDs are not known
  • Better design in XML schema with xskey
  • Global nature of IDREFs introduces complexity,
    requires user interaction

CITATION
BIBITEM
PAPER
_at_label
1,1
1,1
0,1
1,M
citation
citation
0,M
1,M
41
Advanced XER constructs
  • Relationships with higher arity
  • Source Entity is important
  • Conceptually no different from standard higher
    arity relationships
  • Weak Entities
  • Most entities in XML are actually weak entities,
    cannot exist without parent existence
  • Applications may explicitly indicate such
  • Aggregations
  • A relationship may be promoted to an entity for
    better visibility, and incorporating IDs

42
Translation between conceptual model (XER) and
logical model (XML Schema)
  • Up-translation (reverse translation)
  • From DTD/Schema, generate XER
  • Mostly automatic, IDREFs need designer
    interaction
  • Down-translation (forward translation)
  • From XER, generate DTD/schema
  • Fully automated

43
The XER-XML Cycle
XER/Dia
Up-Translation XSLT
Down-Translation XSLT
Existing XML Schema
Optimized XML Schema
Existing XML Application
Existing XML Documents
44
A comparison of everything
45
Results and Contributions
  • ER is XER!
  • Proofs for equivalence and completeness
  • Translations are information preserving
  • DTD ? XER ? DTD will generate the original DTD
  • Proof-of-concept implementation to show
    effectiveness of methodology
  • Empirical Analysis follows theory, shows
    improvement in user performance

46
Future work
  • Show relationship between XER and XNF (XML
    Normal Form)
  • Show relationship between XER and design quality
  • Empirical study on quality of models
  • Include support for namespaces and other
    advanced XML constructs.

47
References
  • Chen, P. (1976) "The Entity-Relationship model -
    Toward a unified view of data," Transactions of
    Database Systems (TODS) (1) 1, pp. 9-36.
  • Conrad, R., Scheffner, D., Freytag, J. (2000)
    XML conceptual modeling using UML. International
    conference on Conceptual Modeling, 2000, pp.
    558-571 1920.
  • Feng, L., Chang, E., Dillon, T. (2002) "A
    semantic network-based design methodology for XML
    documents," ACM Transactions on Information
    Systems (20) 4, pp. 390-421.
  • Psaila, G. (2000) ERX A conceptual model for XML
    documents. ACM Symposium of Applied Computing
    (SAC 2000), 2000.
  • Routledge, N., Bird, L., Goodchild, A. (2002)
    UML and XML schema. Thirteenth Australasian
    Database Conference, 2002.
  • Sengupta, A., Mohan, S., Doshi, R. (2003) XER
    Extensible entity relationship modeling. XML
    2003, Philadelphia, 2003.

48
Questions?
  • Arijit Sengupta
  • Information Systems and Operations Management
  • Raj Soin College of Business
  • Wright State University, Dayton, Ohio
  • arijit.sengupta_at_wright.edu
  • http//www.wright.edu/arijit.sengupta
Write a Comment
User Comments (0)
About PowerShow.com