Normal Forms, XML, and Schemas - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Normal Forms, XML, and Schemas

Description:

For every relation scheme R and for every X A that holds over R, either A X (it is trivial) ,or ... B) if no schema Rj, 1 j i contains a candidate key for R ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 35
Provided by: zack4
Category:
Tags: xml | forms | normal | schemas

less

Transcript and Presenter's Notes

Title: Normal Forms, XML, and Schemas


1
Normal Forms, XML, and Schemas
  • Zachary G. Ives
  • University of Pennsylvania
  • CIS 550 Database Information Systems
  • October 13, 2005

Some slide content courtesy of Susan Davidson
Raghu Ramakrishnan
2
Announcements
  • Homework 3 due Wednesday _at_ noon
  • We will post a solution set shortly afterwards
  • (No late turn-ins!)
  • Midterm Thursday 10/20
  • Project groups (and optionally proposals) due
    10/20
  • We are aiming for a recitation section for the
    RSS project around 10/24-10/26

3
Two Important Normal Forms
  • Boyce-Codd Normal Form (BCNF). For every
    relation scheme R and for every X ? A that holds
    over R,
  • either A ? X (it is trivial) ,or
  • or X is a superkey for R
  • Third Normal Form (3NF). For every relation
    scheme R and for every X ? A that holds over R,
  • either A ? X (it is trivial), or
  • X is a superkey for R, or
  • A is a member of some key for R

4
Normal Forms Compared
  • BCNF is preferable, but sometimes in conflict
    with the goal of dependency preservation
  • Its strictly stronger than 3NF
  • Lets see algorithms to obtain
  • A BCNF lossless join decomposition
    (nondeterministic)
  • A 3NF lossless join, dependency preserving
    decomposition

5
BCNF Decomposition Algorithm(from Korth et al.
our book gives a recursive version)
result R compute F while there is a
relation schema Ri in result that isnt in
BCNF let A ? B be a nontrivial FD on Ri s.t.
A ? Ri is not in F and A and B are
disjoint result (result Ri) ? (Ri - B),
(A,B)
i.e., A doesnt form a key
6
An Example
  • Given the schema
  • Stuff(sid, name, serno, classroom, cid, fid,
    prof)
  • And FDs
  • sid ? name serno ? classroom, cid, fid
  • fid ? prof
  • Find the Boyce-Codd Normal Form for this schema
  • What if instead
  • sid ? name classroom, cid ? serno fid ? prof
    serno ? cid

7
3NF Decomposition Algorithm
Let F be a minimal cover i0 for each FD A ? B
in F if none of the schemas Rj, 1? j ? i,
contains AB increment i
Ri (A, B) if no schema Rj, 1 ? j ? i
contains a candidate key for R
increment i Ri any candidate key
for R return (R1, , Ri)
Build dep.-preserving decomp.
Ensurelosslessdecomp.
8
An Example
  • Given the schema
  • Stuff(sid, name, serno, classroom, cid, fid,
    prof)
  • And FDs
  • sid ? name serno ? classroom, cid, fid
  • fid ? prof
  • Find the Third Normal Form for this schema
  • What if instead
  • sid ? name classroom, cid ? serno fid ? prof
    serno ? cid

9
Summary of Normalization
  • We can always decompose into 3NF and get
  • Lossless join
  • Dependency preservation
  • But with BCNF
  • We are only guaranteed lossless joins
  • The algorithm is nondeterministic, so there is
    not a unique decomposition for a given schema R
  • BCNF is stronger than 3NF every BCNF schema is
    also in 3NF

10
Normalization Is Good Or Is It?
  • In some cases, we might not mind redundancy, if
    the data isnt directly updated
  • Reports (people like to see breakdowns by
    semester, department, course, etc.)
  • Warehouses (archived copies of data for doing
    complex analysis)
  • Data sharing (sometimes we may export data into
    object-oriented or hierarchical formats)

11
XML A Semi-Structured Data Model
12
Why XML?
  • XML is the confluence of several factors
  • The Web needed a more declarative format for data
  • Documents needed a mechanism for extended tags
  • Database people needed a more flexible
    interchange format
  • Lingua franca of data
  • Its parsable even if we dont know what it
    means!
  • Original expectation
  • The whole web would go to XML instead of HTML
  • Todays reality
  • Not so But XML is used all over under the
    covers

13
Why DB People Like XML
  • Can get data from all sorts of sources
  • Allows us to touch data we dont own!
  • This was actually a huge change in the DB
    community
  • Interesting relationships with DB techniques
  • Useful to do relational-style operations
  • Leverages ideas from object-oriented,
    semistructured data
  • Blends schema and data into one format
  • Unlike relational model, where we need schema
    first
  • But too little schema can be a drawback, too!

14
XML Anatomy
Processing Instr.
  • lt?xml version"1.0" encoding"ISO-8859-1" ?gt
  • ltdblpgt
  • ltmastersthesis mdate"2002-01-03"
    key"ms/Brown92"gt
  •   ltauthorgtKurt P. Brownlt/authorgt
  •   lttitlegtPRPL A Database Workload
    Specification Languagelt/titlegt
  •   ltyeargt1992lt/yeargt
  •   ltschoolgtUniv. of Wisconsin-Madisonlt/schoolgt
  •   lt/mastersthesisgt
  • ltarticle mdate"2002-01-03" key"tr/dec/SRC1997-
    018"gt
  •   lteditorgtPaul R. McJoneslt/editorgt
  •   lttitlegtThe 1995 SQL Reunionlt/titlegt
  •   ltjournalgtDigital System Research Center
    Reportlt/journalgt
  •   ltvolumegtSRC1997-018lt/volumegt
  •   ltyeargt1997lt/yeargt
  •   lteegtdb/labs/dec/SRC1997-018.htmllt/eegt
  •   lteegthttp//www.mcjones.org/System_R/SQL_Reunio
    n_95/lt/eegt
  •   lt/articlegt

Open-tag
Element
Attribute
Close-tag
15
Well-Formed XML
  • A legal XML document fully parsable by an XML
    parser
  • All open-tags have matching close-tags (unlike so
    many HTML documents!), or a special
  • lttag/gt shortcut for empty tags (equivalent to
    lttaggtlt/taggt
  • Attributes (which are unordered, in contrast to
    elements) only appear once in an element
  • Theres a single root element
  • XML is case-sensitive

16
XML as a Data Model
  • XML information set includes 7 types of nodes
  • Document (root)
  • Element
  • Attribute
  • Processing instruction
  • Text (content)
  • Namespace
  • Comment
  • XML data model includes this, plus typing info,
    plus order info and a few other things

17
XML Data Model Visualized(and simplified!)
attribute
root
p-i
element
Root
text
dblp
?xml
mastersthesis
article
mdate
mdate
key
key
author
title
year
school
2002
editor
title
year
journal
volume
ee
ee
2002
1992
1997
The
ms/Brown92
tr/dec/
PRPL
Digital
db/labs/dec
Univ.
Paul R.
Kurt P.
SRC
http//www.
18
What Does XML Do?
  • Serves as a document format (super-HTML)
  • Allows custom tags (e.g., used by MS Word,
    openoffice)
  • Supplement it with stylesheets (XSL) to define
    formatting
  • Data exchange format (must agree on terminology)
  • Marshalling and unmarshalling data in SOAP and
    Web Services

19
XML as a Super-HTML(MS Word)
  • lth1 class"Section1"gtlta name"_top /gtCIS 550
    Database and Information Systemslt/h1gt
  • lth2 class"Section1"gtFall 2004lt/h2gt
  • ltp class"MsoNormal"gt
  • ltplacegt311 Townelt/placegt, Tuesday/Thursday
  • lttime Hour"13" Minute"30"gt130PM
    300PMlt/timegt
  • lt/pgt

20
XML Easily Encodes Relations
Student-course-grade
sid serno exp-grade
1 570103 B
23 550103 A
  • ltstudent-course-gradegt
  • lttuplegtltsidgt1lt/sidgtltsernogt570103lt/sernogtltexp-grad
    egtBlt/exp-gradegtlt/tuplegt
  • lttuplegtltsidgt23lt/sidgtltsernogt550103lt/sernogtltexp-gra
    degtAlt/exp-gradegtlt/tuplegt
  • lt/student-course-gradegt

21
But XML is More FlexibleNon-First-Normal-Form
(NF2)
  • ltparentsgt
  • ltparent nameJean gt
  • ltsongtJohnlt/songt
  • ltdaughtergtJoanlt/daughtergt
  • ltdaughtergtJilllt/daughtergt
  • lt/parentgt
  • ltparent nameFenggt
  • ltdaughtergtFelicitylt/daughtergt
  • lt/parentgt

Coincides with semi-structured data, invented
by DB people at Penn and Stanford
22
XML and Code
  • Web Services (.NET, recent Java web service
    toolkits) are using XML to pass parameters and
    make function calls
  • Why?
  • Easy to be forwards-compatible
  • Easy to read over and validate (?)
  • Generally firewall-compatible
  • Drawbacks? XML is a verbose and inefficient
    encoding!
  • XML is used to represent
  • SOAP the envelope that data is marshalled
    into
  • XML Schema gives some typing info about
    structures being passed
  • WSDL the IDL (interface def language)
  • UDDI provides an interface for querying about
    web services

23
Integrating XML What If We Have Multiple
Sources with the Same Tags?
  • Namespaces allow us to specify a context for
    different tags
  • Two parts
  • Binding of namespace to URI
  • Qualified names
  • ltroot xmlnshttp//www.first.com/aspace
    xmlnsothernsgt
  • lttag xmlnsmynshttp//www.fictitious.com/mypath
    gt
  • ltthistaggtis in the default namespace
    (aspace)lt/thistaggt
  • ltmynsthistaggtis in mynslt/mynsthistaggtltotherns
    thistaggtis a different tag in othernslt/othernsthi
    staggt
  • lt/taggt
  • lt/rootgt

24
XML Isnt Enough on Its Own
  • Its too unconstrained for many cases!
  • How will we know when were getting garbage?
  • How will we query?
  • How will we understand what we got?
  • We also need
  • Some idea of the structure
  • Our focus next
  • Presentation, in some cases XSL(T)
  • Well talk about this soon
  • Some way of interpreting the tags?
  • Well talk about this later in the semester

25
Structural ConstraintsDocument Type Definitions
(DTDs)
  • The DTD is an EBNF grammar defining XML structure
  • XML document specifies an associated DTD, plus
    the root element
  • DTD specifies children of the root (and so on)
  • DTD defines special significance for attributes
  • IDs special attributes that are analogous to
    keys for elements
  • IDREFs references to IDs
  • IDREFS a nasty hack that represents a list of
    IDREFs

26
An Example DTD
  • Example DTD
  • lt!ELEMENT dblp((mastersthesis article))gt
  • lt!ELEMENT mastersthesis(author,title,year,school,c
    ommitteemember)gt
  • lt!ATTLIST mastersthesis(mdate CDATA REQUIRED ke
    y ID REQUIRED
  • advisor CDATA IMPLIEDgt
  • lt!ELEMENT author(PCDATA)gt
  • Example use of DTD in XML file
  • lt?xml version"1.0" encoding"ISO-8859-1" ?gt
  • lt!DOCTYPE dblp SYSTEM my.dtd"gt
  • ltdblpgt

27
Representing Graphs and Links in XML
  • lt?xml version"1.0" encoding"ISO-8859-1" ?gt
  • lt!DOCTYPE graph SYSTEM special.dtd"gt
  • ltgraphgt
  • ltauthor idauthor1gt
  • ltnamegtJohn Smithlt/namegt
  • lt/authorgt
  • ltarticlegt
  • ltauthor refauthor1 /gt lttitlegtPaper1lt/titlegt
  • lt/articlegt
  • ltarticlegt
  • ltauthor refauthor1 /gt lttitlegtPaper2lt/titlegt
  • lt/articlegt

28
Graph Data Model
Root
graph
?xml
!DOCTYPE
article
article
author
id
title
title
author
author
name
Paper1
author1
ref
Paper2
ref
John Smith
author1
author1
29
Graph Data Model
Root
graph
?xml
!DOCTYPE
article
article
author
id
title
title
author
author
name
Paper1
author1
ref
Paper2
ref
John Smith
30
DTDs Arent Expressive Enough
  • DTDs capture grammatical structure, but have some
    drawbacks
  • Not themselves in XML inconvenient to build
    tools for them
  • Dont capture database datatypes domains
  • IDs arent a good implementation of keys
  • Why not?
  • No way of defining OO-like inheritance

31
XML Schema
  • Aims to address the shortcomings of DTDs
  • XML syntax
  • Can define keys using XPaths
  • Type subclassing thats more complex than in a
    programming language
  • Programming languages dont consider order of
    member variables!
  • Subclassing by extension and by restriction
  • And, of course, domains and built-in datatypes

32
Basics of XML Schema
  • Need to use the XML Schema namespace (generally
    named xsd)
  • simpleTypes are a way of restricting domains on
    scalars
  • Can define a simpleType based on integer, with
    values within a particular range
  • complexTypes are a way of defining
    element/attribute structures
  • Basically equivalent to !ELEMENT, but more
    powerful
  • Specify sequence, choice between child elements
  • Specify minOccurs and maxOccurs (default 1)
  • Must associate an element/attribute with a
    simpleType, or an element with a complexType

33
Simple Schema Example
  • ltxsdschema xmlnsxsd"http//www.w3.org/2001/XMLS
    chema"gt
  • ltxsdelement namemastersthesis"
    typeThesisType"/gt
  • ltxsdcomplexType nameThesisType"gt
  • ltxsdattribute namemdate" type"xsddate"/gt
  • ltxsdattribute namekey" type"xsdstring"/gt
  • ltxsdattribute nameadvisor" type"xsdstring"/gt
  • ltxsdsequencegt
  • ltxsdelement nameauthor" typexsdstring"/gt
  • ltxsdelement nametitle" typexsdstring"/gt
  • ltxsdelement nameyear" typexsdinteger"/gt
  • ltxsdelement nameschool" typexsdstring/gt
  • ltxsdelement namecommitteemember"
    typeCommitteeType minOccurs0"/gt
  • lt/xsdsequencegt
  • lt/xsdcomplexTypegt
  • lt/xsdschemagt

34
Designing an XML Schema/DTD
  • Not as formalized as relational data design
  • We can still use ER diagrams to break into
    entity, relationship sets
  • ER diagrams have extensions for aggregation
    treating smaller diagrams as entities and for
    composite attributes
  • Note that often we already have our data in
    relations and need to design the XML schema to
    export them!
  • Generally orient the XML tree around the
    central objects
  • Big decision element vs. attribute
  • Element if it has its own properties, or if you
    might have more than one of them
  • Attribute if it is a single property or perhaps
    not!
Write a Comment
User Comments (0)
About PowerShow.com