Title: Publishing on the WWW
1Publishing on the WWW
- Miroslav Milinovic
- Croatian Academic and Research Network - CARNet
- Zagreb, Croatia
- ltmiro_at_srce.hrgt
5th CEENet Workshop on Network Technology,
Budapest, Hungary, August 1999.
2Content
- WWW - important concepts
- HTML standards
- Basic tags and concepts
- Cascading Style Sheets (CSS)
- Active Web pages
- Markup story
- Metadata
- Robots
- Internationalization
- Authoring Web pages
- Authoring Validation tools
- Promoting your WWW site
- What is a good design?
3WWW - World Wide Web
- Distributed, multimedia information service based
on hypertext - Distributed
- information located on hosts around the world
- Multimedia
- information includes text, graphics, sound, video
- Hypertext
- hypertext techniques used to enable access to the
information
4WWW - important concepts
- WWW resources - documents are prepared using
simple standard markup language which defines
document - content, appearance, links to the other documents
- Documents have a unique identifier
- depends on their location on a particular host
- Clients can communicate with any server
- using correct protocol
5WWW - important concepts
- HTML - HyperText Markup Language
- language for preparing the WWW documents
- URL - Uniform Resource Locator
- resource address - unique identifier
- HTTP - HyperText Transport Protocol
- defines communication between WWW client and
server
6How WWW works?
Internet
WWW servers
(WWW)
users browse
?
?
HTML files
authors write HTML
7HTML
- SGML (Standard Generalized Markup Language)
- ISO standard
- HTML is SGML application
- SGML Document Type Definition (DTD)
- lt!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML
4.0//EN "http//www.w3.org/TR/REC-html40/strict.
dtd"gt - standards
- HTML 1.0, 2.0, 3.0., 3.2, 4.0,
- browser extensions (Netscape, MS IE, ...)
- other (VRML, DHTML, SMIL, MathML, CSS, XML, XSL,
...) - XHTML 1.0 (in draft)
8HTML file Web page
HTML source
Web pagedisplayed by browser
9HTML document
- HTML document contains markup tags
- ltH1gt Example lt/H1gt
- tags are case insensitive
- ltH1gt or lth1gt
- tag attributes may be case sensitive
- e.g. filenames
- tags are (usually) paired to denote the start and
end of an element - ltH1gt Example lt/H1gt
10HTML document
text and/or tags
- . . . lttag attributevalue . . .gt . . . lt/taggt
. . .
element (tag pair)
11Minimal HTML document
- lthtmlgt
- ltheadgt
- lttitlegt document title lt/titlegt
- lt/headgt
- ltbodygt
- document body - text . . .
- lt/bodygt
- lt/htmlgt
12Some formatting tags
- paragraph - ltPgt ... lt/Pgt
- line break - ltBRgt
- headings (n1,...6) - ltHngt ...lt/Hngt
- preformatted text - ltPREgt ... lt/PREgt
- centered text - ltCENTERgt ... lt/CENTERgt
- horizontal line - ltHRgt
- remark - lt! ...gt
13Some formatting tags
- physical formatting tags
- bold - ltBgt ... lt/Bgt
- italic - ltIgt ... lt/Igt
- underline - ltUgt ... lt/Ugt
- logical formatting tags
- emphasis - ltEMgt ... lt/EMgt
- strong - ltSTRONGgt ... lt/STRONGgt
- code - ltCODEgt ... lt/CODEgt
14Lists
- bulleted list
- ltULgt
- ltLIgt list element lt/LIgt
- ...
- lt/ULgt
- numbered list - ltOLgt
- definition list - ltDLgt, ltDTgt, ltDDgt
- ...
15Hyperlinks
- ltA HREFurlgt ... lt/Agt
- hyperlink to any type of resource (not only HTML)
- other protocols can be used (not only HTTP)
- resource is identified with URL
- examples
- ltA HREFhttp//www.carnet.hrgtCARNetlt/Agt
- ltA HREFfile.txtgttextlt/Agt
- to deal with (display) resource browser may
launch helper application
16Anchors
- ltA NAMEanchor_namegt ... lt/Agt
- set up an anchor (point in text to be linked to)
- is referenced with file_nameanchor_name
- example of anchor
- ltA NAMElink_pointgttextlt/Agt
- example of link to anchor
- ltA HREFfile.htmllink_pointgttextlt/Agt
17Inline images
- ltIMG attributevalue ...gt
- some attributes
- SRCurl
- ALTtext
- ALIGNalign_value
- BORDERn
- WIDTHn
- HEIGHTn
- example
- ltIMG SRCglobe.jpg ALTglobe ALIGNRIGHTgt
18Tables
- ltTABLE attributevalue ...gt
- ltTR attributevalue ...gt
- ltTD attributevalue ...gt
- text - data cell of table
- lt/TDgt
- ...
- lt/TRgt
- ...
- lt/TABLEgt
tablerow
19Frames
- ltHTMLgt
- ltHEADgt
- ltTITLEgtFrame Pagelt/TITLEgt
- lt/HEADgt
- ltFRAMESET COLS"50,50"gt
- ltFRAME SRC"PAGE_A.html"gt
- ltFRAME SRC"PAGE_B.html"gt
- lt/FRAMESETgt
- lt/HTMLgt
20Fonts, colors, ...
- fonts
- ltFONT attributevalue...gt...lt/FONTgt
- special chars
- lt,gt,, _at_, national characters (e.g. C,, ...)
- code OR token
- colors
- attribute (tags BODY, FONT, ...)
- RGB, hex notation
- rrggbb
- multimedia (EMBED, BGSOUND, ...)
21Additional features
- downloadable fonts
- Style Sheets
- active Web pages
- Metadata
- ...
- BE AWARE of constant development
- W3C - http//www.w3.org/
- HWG - http//www.hwg.org/
- IWA - http//www.iwanet.org/member/resources/
22Cascading Style Sheets (CSS)
- mechanism for adding a style to a HTML document
- designed to separate content from presentation
- cascading concept improves accessibility
- actual recommendations (standards) CSS1 CSS2
- reference URLs
- http//www.w3.org/Style/
- http//www.htmlhelp.com/reference/css/
- "Hopefully, future Web innovations will emulate
the example - set by the Web Consortium in its work on CSS,
Jakob Nielsen
23Cascading Style Sheets (CSS)
ltSTYLE TYPEtext/cssgt css rules
... lt/STYLEgt _____________________________________
_______ ltLINK REL"STYLESHEET"
TYPE"text/css" HREF".../my_style.css"gt ___
_________________________________________ ltTAG
STYLEcss-rule...css-rulegt...lt/TAGgt
24Cascading Style Sheets (CSS)
H1 font 17pt "Arial CE" font-weight
bold color red H2 font 15pt "Arial
CE" font-weight bold color green P
font 12pt "Courier New CE" color
blue ...
25Active Web pages
- to enhance your site
- two way interaction
- page animation
- better multimedia
- access to other systems
- browser intelligence
- desktop integration
26Active Web pages
- techniques
- CGI - Common Gateway Interface
- SSI - Server Side Includes (.shtml)
- API - Application Programming Interface
- Cookies (making a browser remember)
- scripting languages (embedded in HTML document)
- Javascript, VBscript,
- Java (applets, servlets)
- ActiveX
- DHTML
?
27Active Web pages
?
28Active Web pages
- common examples
- forms (feedback processing)
- special tags ltFORMgt, ltINPUTgt, ltSELECTgt, ...
- usually CGI script is used to process a form
- active maps (clickable maps)
- special tags and attributes ltMAPgt, ltIMGgt, ...
- client-side or server-side (CGI scripts are used)
- database or other internet service gateways
29CGI
- WWW communicates with other programs (CGI
scripts) - CGI scripts should be in separate directory
defined in WWW servers configuration file - CGI scripts can be written in any programming
language (shell script, PERL, C, ) - workload is on the servers side (be careful)
30Calling CGI script
ltA HREFprogram_url?parameter_listgt ltIMG
SRCprogram_url?parameter_listgt ltFORM
ACTIONprogram_url?parameter_listgt
program_url http//server-name/
cgi-bin-directory-name/
program-name parameter_list par_1val_1...par
_nval_n
31SSI, API, ...
- workload is on the servers side (be careful)
- SSI
- simple mechanism for generating pages on the fly
- .shtml
- API
- enables writing server extensions (plug-ins)
- not standardized (ISAPI, NSAPI, Apache API)
32Cookies
- cookies.txt
- info about client-server communication
- enables browser inteligence
- server dependant
- browser dependant
- security ?
33Scripting languages
- JavaScript, VBScript, ...
- embedded in HTML source
- workload is on the clients side
- simple example
- ltHTMLgt
- ltHEADgt
- ltSCRIPT LANGUAGEJavaScriptgt
- document.write(Hello World!)
- lt/SCRIPTgt
- lt/HEADgt
- ltBODYgt
- Example
- lt/BODYgt
- lt/HTMLgt
34Java
- object oriented programming language
- platform independent
- programs are transferred via network and executed
on client side - applets - Java programs executing on server side - servlets
- special development tools (JDK, )
- http//www.javasoft.com
- http//www.javaworld.com
- http//www/gamelan.com
35Active/X
- platform designed by Microsoft
- special tools for programming with Active/X
- can be used only on MS Windows 95 / NT ...
- workload is on client side
- not widely supported outside MS world
36DHTML
- Dynamic HTML
- HTML Style Sheets Scripts
- extension to HTML standard
- started by Microsoft Netscape
- not only ltlayergt tag
- enables user to activate (make dynamic) his pages
- DOM (Document Object Model)
- glue for DHTML
- platform-independent and language-independent
interface
37Markup story overview
- SGML
- the basic architecture behind all ML
- XML
- simplified SGML suitable for use on the WWW
- HTML
- an SGML application (DTD)
- XHTML HTML written in XML
- Style Sheets
- add more presentation control
- CSS (for HTML)
- XSL (for XML)
38XML, XSL
- XML (Extensible Markup Language)
- standard for structured documents on the Web
developed by W3C - created to become a subset of SGML optimised for
the Web - XML fits for Web applications where HTML is
insufficient - XSL (Extensible Stylesheet Language)
- language for expressing stylesheets
- has two parts
- a language for transforming XML documents (XSLT)
- an XML vocabulary for specifying formatting
semantics
39CSS .vs. XSL
40XHTML
- challenges for HTML
- new kinds of browsers Digital TVs, handhelds,
phones - pressure to subset HTML for simple clients
- pressure to extend HTML for richer clients
- XHTML (Extensible HTML)
- HMTL 4.0 (strict) written in XML
- modularised HTML for subsetting/combining with
other tag-sets - next generation forms
- XML requires
- make tags case-sensitive (lower case)
- include end tags and add a / to empty tags
- attribute values in quotes
41Other standards
- VRML (Virtual Reality Modelling Language)
- for modelling three-dimension scenes
- MathML (Mathematical Markup Language)
- inclusion of mathematical expressions in Web
pages - XML application
- SMIL (Synchonized Multimedia Integration
Language) - XML-based language that allows authors to write
interactive multimedia presentations - ...
42Metadata
- information about networked information
- no real standard (?)
- search engines make use of metadata (?)
- HTML has META tag
- use metadata / META tag (with care)!
- useful URLs
- W3C http//www.w3.org/Metadata/
- Dublin Core http//purl.oclc.org/metadata/dublin
_core/
43META tag syntax
- two main types of META tag
- with NAME attribute
- used to specify information about the
resource(AUTHOR, KEYWORDS, DESCRIPTION, TITLE,
...) - ltMETA NAMEvalue CONTENTvaluegt
- with HTTP-EQUIV attribute
- used as the equivalent of HTTP header
- ltMETA HTTP-EQUIVvalue CONTENTvaluegt
- attribute CONTENT defines actual metadata value
-
-
44META tag - examples
- ltHEADgt
- ltTITLEgttitle textlt/TITLEgt
- ltMETA name"DESCRIPTION" contentshort text"gt
- ltMETA name"KEYWORDS" contentkeyword list"gt
- lt/HEADgt
- _________________________________________________
- ltHEADgt
- ltTITLEgttitle textlt/TITLEgt
- ltMETA HTTP-EQUIV"Content-Type
- CONTENT"text/html charsetiso-8859-2gt
- lt/HEADgt
45Metadata - related work
- RDF - Resource Description Framework
- P3P - Platform for Privacy Preferences Project
- PICS - Platform for Internet Content Selection
- Dsig - Digital Signatures
- GILS - Government (Global) Information Locator
Service
46Robots
- can place a heavy load on network and server
- there is a robot ethics
- robot exclusion protocol
- ROBOT META tag
- useful URL
- http//info.webcrawler.com/mak/projects/robots/rob
ots.html
47Robot Exclusion Protocol
- can be used by web site administrator
- robots.txt file
- should be placed in the document root directory
(at URL http//hostname/robots.txt) - has special syntax
- example
- User-agent
- Disallow /archives/
- Disallow /working/
48ROBOTS META tag
- can be used by web page author
- ltMETA NAME"ROBOTS CONTENTcontent"gt
- content ALL NONE directive ","
directive - directive index follow
- index "INDEX" "NOINDEX
- follow "FOLLOW" "NOFOLLOW
- default INDEX, FOLLOW
- example
- ltmeta name"robots" content"index,nofollow"gt
49Internationalization
- real problem
- HTML 4.0 HTTP 1.1
- UNICODE (not US-ASCII only)
- charset negotiation
- language negotiation
- META tag usage (override HTTP 1.0 limitations)
- ltMETA HTTP-EQUIV"Content-Type"
CONTENT"text/html charsetiso-8859-2gt
50Authoring Web pages
- We need
- authoring tools to create material
- HTML authoring
- tools for editing graphics (multimedia stuff)
- ...
- WWW server - place where to put the material
- some publishing mechanism
- at least ability to copy of FTP files at right
place on server
51Authoring HTML files
- HTML files can be created using
- simple editors
- specialized tools (HTML authoring tools)
- additional tools
- for creating (editing) multimedia staff
(graphics, audio, video) - for HTML validation (validation tools)
- for developing Java code (JDK)
- ...
52Simple editors
- Notepad, vi, emacs, joe, ...
- easy and cheap to start with
- no limitations in writing HTML
- do not stick to (old) standards
- need (good) knowledge of HTML
- easy to make mistakes
- additional validation is necessary
53Authoring tools
- HotMetal, HotDog, Netscape Composer, Front Page,
MS Office, Macromedia tools, ... - commercial, shareware, freeware
- standalone or embedded with other programs
- provide easy interface for HTML writers
- offer (limited) WYSIWYG
- automatic validation (stick to standard)
- contain other tools (image editing, ...)
54Validation tools
- embedded in authoring tools
- standalone, available on the Internet
- W3C Validator NetMechanic Weblint Kinder,
Gentler HTML validator - check
- typing mistakes
- syntax errors
- conformance to a standard
- Validate your pages!
- Browser sometimes do not forget!
- improper nesting
- WWW site tune-up tools
- http//www.websitegarage.com/
55Where to put the files?
56Where to put the files?
- WWW server document tree
- gt cd //htdocs/
- gt chmod 775 .
- gt vi index.html
- gt ...(upload and/or edit files)...
- gt chmod 664
- Home Pages
- gt cd
- gt chmod 711 .
- gt mkdir public_html
- gt chmod 755 public_html
- gt ...(upload and/or edit files)...
- gt chmod 644
57Promoting your WWW site
- Who knows that you have done a good job?
- register your site to major searching machines
and catalogs - one is nothing
- follow the rules
- use meta tags
- promotion tools
- http//www.submit-it.com
- http//www.register-it.com
- http//www.ambition.com/register
58What is a good design?
- WWW is a new media
- new way of publishing
- new rules of design
- prepare yourself before writing actual page
- What do you want to say and to who?
- organize the material
- be concise and precise
- there is NO good design
- there is adequate design (for selected purpose)
- good practices for WWW writers
59What is a good design?
- good presentation
- no barriers between user and information
- useful content (quality)
- gives users something that they want
- effective information provision
- ease of access for all users
- efficient information provision
- economical use of (network) resources
60What is a good design?
- be consistent
- layout
- library of icons, images, logo
- standard navigation bar (absolute, relative)
- metadata
- build a site not an unrecognizable set of pages
- navigation, site map
- use multimedia with measure
- do not overdo with any of style elements
- consult with users
61What is a bad design?
- long pages, to much scrolling, boring text
- no structure
- no navigation
- What is this page about?
- complex (very long) URLs
- ...for ... click here
- ... best viewed with ...
- frames suck(?)
62What is a bad design?
- overwhelmed with multimedia
- big images
- too many images and/or video clips
- obnoxious noise .vs. background music
- boring or aggressive animation
- bad images (poor quality)
- colours, fonts, blink
- cool pages suck(?)
63Good practices in writing HTML
- stick to the agreed HTML standards
- do not use private extensions
- do not use news tags as soon as they are
announced - think about all users
- (all) browsers and computer platforms
- with slow lines
- remember
- users do not like to scroll and read much
- do not use technology for technologys sake
64Good and bad design
- final advice
- stop premature and bury dead ones
- future
- use CSS (time has come)
- some useful URLs
- http//info.med.yale.edu/caim/manual/contents.html
- http//www.useit.com/
- http//www.glover.com/sucky.html/
65Questions ?