Title: Introduction to XML and RSS
1Introduction to XML and RSS
2Types of data
- Structured
- Semi-structured
3Structured Data
-
- data is organized in entities
- similar entities are grouped together (tables)
- entities in the same group have the same
descriptions (attributes)
4Current Database World
- Structure
- Relational Database Management System (DBMS)
- everything is a relation
- Query languages SQL
- Software MS Access, Oracle.
5Example of a table (patients)
6Example ofa group of tables
7(No Transcript)
8World of Web Data
- Easy document exchange
- Unstructured (or poorly structured) data
- Everything is a document
- No standard for query languages
9World of Web Data
- Example
- An organization A publishes financial data on its
web pages (HTML), generated from DBMS. - A second organization B wants some financial
analyses can access only web data.
A
B
HTML
RDBMS
10Semi-structured Data
- data can be of any type
- not necessarily following any format
- does not follow any rules
- is not predictable
- examples include
- text
- video
- sound
- images
11Characteristics of Semi-Structured Data
- structure is irregular missing or additional
attributes (labels) - parts of data lack structure, e.g., images
- some may yield little structure, e.g., plain text
12Semi-structured Data ( Contd)
- Data that is inherently self-describing and does
not conform to an explicit and fixed schema is
known as Semistructured Data - information is contained within data itself
13Semi-structured Data ( Contd)
- The structure of the data is rapidly and
dynamically changing - It includes data as found in several application
areas such as Web Information Systems and Digital
Libraries
14Example of Semi-Structured Data
-
- name Peter Wood
- email ptw_at_dcs.bbk.ac.uk, p.wood_at_bbk.ac.uk
- --------------------------------------------------
---------------- - name
- first name Mark
- last name Levene
- email mark_at_dcs.bbk.ac.uk
- --------------------------------------------------
---------------- - name Alex Smith
- affiliation StFX
15IMDb A Motivating Example
- The Internet Movie Database is a classical
example of a collection of semistructured data - Although the information pertaining to different
movies may be essentially similar, their
structure may be different! - Let us consider an example movie database
16An Example Movie Database
17Irregularity In Structure
- Example Some movie may annotate information
about the actors, choreographer, director and
producer, while another movie may annotate
additional information about the lyricist and the
music director
18Irregularity In Structure
- The same kind of data may be typified differently
- For ex An actors name may be represented as a
string or as a tuple (first_name, last_name) - Since data gets added to this database
dynamically, the structure of the database as a
whole, also keeps changing dynamically
19Traditional Data Management
Universe of Discourse
Model of the UoD
Database
Query
20Post-Internet Data Management
Universe of Discourse
Retrieval?
Data
Query
21XML An Embodiment of Semistructured Data
- XML can be used to represent semistructured data
22What is XML?
- XML stands for EXtensible Markup Language
- XML is a markup language much like HTML
- XML was designed to describe data
- XML tags are not predefined. You must define your
own tags
23The main difference between XML and HTML
- XML and HTML were designed with different goals
- XML was designed to describe data and to focus on
what data is. - HTML was designed to display data and to focus on
how data looks. - It is important to understand that XML is not a
replacement for HTML.
24XML does not DO anything
- Maybe it is a little hard to understand, but XML
does not DO anything. XML is created to
structure, store and to send information. - The note has a header and a message body. It also
has sender and receiver information. But still,
this XML document does not DO anything. It is
just pure information wrapped in XML tags.
Someone must write a piece of software to send,
receive or display it.
John Mary
Reminder Don't forget
me this weekend!
25XML is free and extensible
- XML tags are not predefined. You must "invent"
your own tags. - The tags used to mark up HTML documents and the
structure of HTML documents are predefined. (like
, , etc.). - XML allows authors to define their own tags and
their own document structure. - The tags in the example above (like and
) are not defined in any XML standard.
These tags are "invented" by the author of the
XML document.
26XML is used to Exchange Data
- With XML, data can be exchanged between
incompatible systems. - In the real world, computer systems and databases
contain data in incompatible formats. One of the
most time-consuming challenges for developers has
been to exchange data between such systems over
the Internet. - Since XML data is stored in plain text format,
XML provides a software- and hardware-independent
way of sharing data.
27XML can be used to Create new Languages
- XML is the mother of WAP( Wireless Application
Protocol) and WML (The Wireless Markup
Language). - WML used to markup Internet applications for
handheld devices like mobile phones. - MathML, for creating Math formula and CML
(Chemical Markup language) is written in XML.
28XML Syntax
- The syntax rules of XML are very simple and very
strict. The rules are very easy to learn, and
very easy to use. - Because of this, creating software that can read
and manipulate XML is very easy to do.
29All XML elements must have a closing tag
- With XML, it is illegal to omit the closing tag.
- In HTML some elements do not have to have a
closing tag. The following code is legal in HTML - This is a paragraph
- In XML all elements must have a closing tag,
like this - This is a paragraph
30XML tags are case sensitive
- Unlike HTML, XML tags are case sensitive.
- With XML, the tag is different from the
tag . - Opening and closing tags must therefore be
written with the same case - This is incorrect
This is correct
31All XML elements must be properly nested
- Improper nesting of tags makes no sense to XML.
- In HTML some elements can be improperly nested
within each other like this - This text is bold and italic
- In XML all elements must be properly nested
within each other like this - This text is bold and
italic
32All XML documents must have a root element (tag)
- All XML documents must contain a single tag pair
to define a root element. - All other elements must be within this root
element. - All elements can have sub elements (child
elements). Sub elements must be correctly nested
within their parent element -
-
- .....
-
-
33With XML, white space is preserved
- With XML, white space is preserved
- With XML, the white space in your document is not
truncated. - This is unlike HTML. With HTML, a sentence like
this - Hello my name is John,
- will be displayed like this
- Hello my name is John,
- because HTML strips off the white space.
34Element Naming
- XML elements must follow these naming rules
- Names can contain letters, numbers, and other
characters - Names must not start with a number or punctuation
character - Names must not start with the letters xml (or XML
or Xml ..) - Names cannot contain spaces
35Element Naming
- Any name can be used, no words are reserved, but
the idea is to make names descriptive -
- XML documents often have a corresponding
database, in which fields exist corresponding to
elements in the XML document. A good practice is
to use the naming rules of your database for the
elements in the XML documents.
36Comments in XML
- The syntax for writing comments in XML is similar
to that of HTML. -
37XML Attributes
- XML elements can have attributes in the start
tag, just like HTML. - Attributes are used to provide additional
information about elements. - In HTML (and in XML) attributes provide
additional information about elements -
-
38XML Attributes
- Attribute values must always be enclosed in
quotes -
39XML Attributes Cont.
-
-
- John
- Mary
-
- --------------------------------------------------
-------------------- -
-
- John
- Mary
-
- The error in the first document is that the date
attribute in the note element is not quoted. - The first line in the document - the XML
declaration
40Use of Elements vs. Attributes
- Data can be stored in child elements or in
attributes. - Take a look at these examples
-
- Anna
- Smith
-
- --------------------------------------------------
-
- female
- Anna
- Smith
-
- In the first example sex is an attribute. In the
last, sex is a child element. Both examples
provide the same information.
41Errors in XML will stop the XML program
- The World Wide Web Consortium (W3C) XML
specification states that a program should not
continue to process an XML document if it finds a
validation error. The reason is that XML software
should be easy to write, and that all XML
documents should be compatible. -
- With HTML it was possible to create documents
with lots of errors (like when you forget an end
tag). One of the main reasons that HTML browsers
are so big and incompatible, is that they have
their own ways to figure out what a document
should look like when they encounter an HTML
error. -
- With XML this should not be possible.
42XML and Browsers
- Netscape 6 or higher supports XML
- Internet Explorer 5.0 or higher supports XML
43Viewing XML Files
- If you open an XML document in IE, it will
display the document with color coded root and
child elements. A plus () or minus sign (-) to
the left of the elements can be clicked to expand
or collapse the element structure. -
- If you want to view the raw XML source, you must
select "View Source" from the browser menu. - If an erroneous XML file is opened, the browser
will report the error.
44Other Examples
- Viewing some XML documents will help you get the
XML feeling. - An XML CD catalogThis is some CD collection,
stored as XML data - An XML plant catalogThis is a plant catalog from
a plant shop, stored as XML data. - A Simple Food MenuThis is a breakfast food menu
from a restaurant, stored as XML data.
45Why does XML display like this?
- XML documents do not carry information about how
to display the data. - Since XML tags are "invented" by the author of
the XML document, browsers do not know if a tag
like describes an HTML table or a dining
table. - Without any information about how to display the
data, most browsers will just display the XML
document as it is.
46The XML Rules (Summary)
- Single, unique root element
- Matching open/close tags
- Consistent capitalisation
- Correctly nested elements (no overlapping
elements) - Attribute values enclosed in quotes
- No repeating attributes in an element
-
-
- 3Months.com
- Web Development
-
- Wakefield st
- Wellington
- New Zealand
-
47Authoring XML Documents
- A basic XML document is an XML element that can,
but might not, include nested XML elements. - Example
-
-
- Second Chance
- Matthew Dunn
-
-
48Converting Relational Database to XML
- Example Export the following data into XML and
group books by store - Relational Database
- Store (sid, name, phone)
- Book (bid, title, authors)
- StoreBook (sid , bid, price, stock)
49Converting Relational Database to XML (Contd)
50Examples
- example of database
- Example of database converted to XML
51XML representation of a sample Movie Database
- standaloneyes?
-
-
-
- The Notebook
- Ryan Gosling
- Rachel McAdams
- Nick Cassavetes
-
-
- FRIENDS
- Seinfeld
-
52RSS ( Really Simply Synidication)
- RSS is a family of web feed formats used to
publish frequently updated digital content, such
as blogs, news feeds or podcasts. - Users of RSS content use programs called feed
"readers" or "aggregators" the user "subscribes"
to a feed by supplying to their reader a link to
the feed the reader can then check the user's
subscribed feeds to see if any of those feeds
have new content since the last time it checked,
and if so, retrieve that content and present it
to the user. - RSS formats are specified in XML (a generic
specification for data formats). RSS delivers its
information as an XML file called an "RSS feed,"
"webfeed," "RSS stream," or "RSS channel".
53RSS Feed representation
- On Web pages, web feeds (RSS) are typically
linked with the word "Subscribe", an orange
square, - or a rectangle with the letters
- Or
- Many news aggregators such as msnbc.com publish
subscription buttons for use on Web pages to
simplify the process of adding news feeds.
54Podcasting
- A podcast is a media file that is distributed
over the Internet using syndication feeds, for
playback on portable media players and personal
computers. - The term "podcast" is derived from Apple's
portable music player, the iPod. - Though podcasters' web sites may also offer
direct download or streaming of their content, a
podcast is distinguished from other digital audio
formats by its ability to be downloaded
automatically, using software capable of reading
feed formats such as RSS.
55Podcasting
- Podcasting is an automatic mechanism whereby
multimedia computer files are transferred from a
server to a client, which pulls down XML files
containing the Internet addresses of the media
files. In general, these files contain audio or
video, but also could be images, text, PDF, or
any file type. - Example StFX Podcasts
56XML Joke
- Question When should I use XML?
- Answer When you need a buzzword in your resume.