Title: Network publishing and mark-up languages
1Network publishing and mark-up languages
2p- versus e-form
- The share of documents in e-form and accessible
over the network is growing fast. - There are types of documents that will always
exist in a paper form or co-exist in both forms. - There are numerous types of documents that
already function better in e-form, at least for
some populations (or generations). - The e-document can replace the paper form only if
it is readable without limits of time and place.
3Independence of place
- Document should be usable in the same way
irrespective of servers distance and users
hardware and software for reading. - It is mainly a technological problem. We need
- a reader of a size and weight of a book,
- with screen with visual characteristics of a
paper, - with autonomous power supply,
- with wireless connection to servers with
documents. - All that we already have, but not in one
reasonably expensive appliance.
4Independence of time
- Document must have the same usability until
potential readers disappear. - It is mainly an organisational problem
- we need standard document formats, which will be
understood by future generations of software and
hardware, and - we need consensus to obey those standards.
- Such standard formats are made by mark-up
languages.
5History of e-publishing
- At the beginning the creators of e-documents were
few, mostly creators of bibliographic databases. - They independently developed formats of their
e-documents and software for their use. - It was easy because e-documents were simple,
ASCII files. - Things became complicated with the development of
more complicated, multimedia documents.
6History of e-publishing
- Because of the lack of standardisation new
documents were not usable by definition on every
operating system and brand of computer. - Producers of e-documents were forced to develop
different versions of documents and/or software
for its use for all major brands of OS and
computers. - It was economically unfeasible and the only exit
was standardisation. - Standardisation of computers? Impossible.
Standardisation of e-document formats.
7Mark-up languages introduction
- Mark-up languages must make possible
- the transfer of documents between different types
of computers and software for reading, - simple and economical transport through networks,
- longevity of documents (problem of e-archiving).
- Mark-up languages enable us to mark structure
and/or form (appearance) of documents.
8Mark-up languages introduction
- Mark-up languages are artificial languages,
composed of - labels (tags) that divide document into
structural elements, - tags that describe appearance of structural
elements, and - syntax that defines appropriate use of tags.
9Structure vs. appearance
- If mark-up language defines only structure of a
document, then its appearance on a screen or
paper is entirely dependent on the software used
for documents representation. - In such case the structure of document is
separated from definitions of fonts, colour of
background, distance between lines, etc. With
such attributes of documents the printing-house
mark-up languages are dealing.
10RTF
- Very common type of e-documents are documents
written with word processors, e.g. MS Word for
Windows. - Structure and appearance of documents are
inseparable. - The result is very limited possibility of
transfer of documents between different types of
computers or operating systems even between
different generations of same word processor. - More advanced is word processor, more closed
system it is.
11RTF
- A strong need exist for transportability of word
processor files. - Developers of word processors agreed upon the
common transport format, which is understood by
the most programmes of that kind. - This is RTF Rich Text Format.
- RTF denotes only documents appearance.
12RTF example
\rtf1\ansi\deff20\deflang1033\fonttbl\f4\froman
\fcharset0\fprq2 Times New Roman\f5\fswiss\fcha
rset0\fprq2 Arial\f20\fnil\fcharset0\fprq2
SLOHelvetica \colortbl\red0\green0\blue0\red
0\green0\blue255 ... 20 to 30 lines of lines
with general description of fonts and distances
follow ... \pard\plain \qr\sb40\sa40\tx357
\f20\fs20\lang2057 \fs18 Lecture Computer
communications, Databases 2 \par \pard\plain
\s18\sb40\sa40\tx357 b\f20\fs30\lang2057 \i\fs32
Predavanje Standardi za oznacevanje dokumentov
\par \pard\plain \s1\fi-360\li360\sb240\sa40\tx3
57 \b\f20\fs28\lang2057 1.\tab Reasons for
standardisation of document descriptio
tags for types and colours of fonts
tags for text positioning
13Postscript and PDF
- Postcript (Adobe)
- Mark-up language for driving laser printers.
- Marks only documents appearance, including
images. - PDF (Portable Document Format)
- Makes possible the original appearance of a
document on a web browser. - Documents on screen look the same as on the
paper. - Simplified and upgraded variant of Postscript.
- Marks appearance and only partly structure
(hyper-text pointers).
14SGML
- SGML Standard Generalized Mark-up Language.
- International standard, adopted by ISO on 1986
and upgraded several times since then. - Family of standards, managing the mark-up of all
known types of e-documents. - Its strength is generality, because logical
structure and appearance of a document are
completely separated. - Appearance is left to software for representation
of documents on screen or paper.
15SGML
- SGML divides e-document into three parts
- Declaration, which describes the most general
data about document (Latin or Cyrillic script)
and symbols with special meaning for SGML. - Document Type Definition (DTD), which describes
the - possible structural elements of document,
- their meaning,
- hierarchical relationship among structural
elements, and - tags that mark these structural elements.
- Body of a document, marked with tags.
16HTML
- SGML is not a real mark-up language but a recipe
how to build mark-up languages for different
types of documents. - HTML is such language developed for web
documents. - It is relatively simple and this is the reason
for extreme simplicity of web publishing. - In its original versions it mostly defines
structure and only partly appearance of
documents. - Author Tim Berners-Lee (early 90s).
17HTML
- The standardisation of HTML is endangered since
its birth. - Big producers of web browsers, Microsoft and
Netscape try to impose their own tags and
functionality to beat the competition. - Documents written with some word processors for
HTML can not be read on browsers made by
competition. - It is safe to use simple Notepad or Netscapes
editor.