Markup Languages ML - PowerPoint PPT Presentation

About This Presentation
Title:

Markup Languages ML

Description:

GREETING Dear Prof. Stein, /GREETING BODY ... Applications are modeled as decks of cards. Features: ... card card /card /wml Stein Markup 1.31. Some ... – PowerPoint PPT presentation

Number of Views:74
Avg rating:3.0/5.0
Slides: 34
Provided by: yjs8
Category:
Tags: languages | markup

less

Transcript and Presenter's Notes

Title: Markup Languages ML


1
MarkupLanguagesML
SSS
A
legal-X
SG
VOX
CP
X
DHT
G
HT
math
W
DS
  • Yaakov J. Stein
  • Chief ScientistRAD Data Communications

C
2
What do I do?
I digest, edit and produce documents
  • business letters
  • email
  • meeting summaries
  • proposals
  • reports
  • requirement specifications
  • project plans
  • web pages
  • research articles
  • review articles
  • books

3
What do others do?
  • Pretty much the same
  • US corporations produce gt100 billion documents
    per year
  • 90 of a modern institutions information is in
    documents
  • gt50 of typical corporations efforts involves
    documents
  • Thats why word processing SW
  • was expected to bring efficiency increases
  • But didnt!

4
Word processing?
  • PROs
  • makes nicer looking documents
  • expedites document sharing during creation
  • CONs
  • typically 30 of effort on format and reformat
  • doesnt increase information accessibility
  • doesnt facilitate information mining

5
Databases?
  • The natural alternative to documents are
    databases
  • PROs
  • increase information accessibility
  • facilitate information mining
  • CONs
  • not human readable
  • format inflexible

6
The solution
  • What we really want is to write unconstrained
    text
  • but to have information retrieval as well !
  • Method 1 Automatic text analysis
  • AI program analyzes text
  • Recognizes document structure, sentence syntax
  • Performs gisting, facilitates information mining
  • Complete solution equivalent to solving Turing
    test
  • Method 2 Manual markup
  • Document author responsible for marking
  • Clarifies document structure
  • Enables automated retrieval of selected
    information
  • Suggests presentation format

7
Why is text analysis hard?
  • The man cried FIRE !

The man cried FIRE the gun !
The man cried FIRE the gun maker !
8
Are MLs computer languages?
  • There are many different types of computer
    languages
  • procedural languages
  • for (n0nlt10i)
  • if (ngt5) printf(markup languages are fun!\n)
  • graphic languages
  • newpath
  • 0 0 moveto 0 1 lineto 1 1 lineto 1 0 lineto
  • closepath fill
  • database languages
  • SELECT book FROM biblio WHERE subjectDSP AND
    authorSTEIN
  • logical languages
  • useful(DSP), useful(hardware), fun(DSP), fun(web)
  • interesting(X) if useful(X) and fun(X)
  • ?-interesting(X)

9
They are!
  • Markup languages do not directly instruct
    computers
  • like procedural languages
  • rather indirectly instruct computer
  • like logical languages
  • They do this by using
  • elements
  • attributes
  • entities
  • text


ltBOOK SUBJECTdspgt ltTITLE
FORMATshortgtDSP-CSPlt/TITLEgt ltAUTHORgtJ.
Steinlt/AUTHORgt This is a great book!
standard-disclaimer lt/BOOKgt
(tags)
10
Some markup element functions
  • Structural
  • Clarifies document structure
  • Delineates document parts
  • Descriptive (informative)
  • Indicates
  • Facilitates information retrieval
  • Presentational (display)
  • Presents information in nice format
  • Helps human readability
  • Referential (links, applications)
  • Provide hypertext links
  • Launch applications

11
Structural Markup
  • ltHEADINGgtSeptember 1, 2000lt/HEADINGgt
  • ltGREETINGgtDear Prof. Stein, lt/GREETINGgt
  • ltBODYgt
  • I would like to tell you how much I enjoyed
    reading your new text
  • Digital Signal Processing, A Computer Science
    Perspective.
  • I hope we will be able to meet at the next
    conference.
  • lt/BODYgt
  • ltSIGNATUREgt
  • Sincerely,
  • Dee Espy
  • lt/SIGNATUREgt

12
Descriptive Markup
  • ltDATEgtSeptember 1, 2000lt/DATEgt
  • Dear ltPERSONgtProf. Stein,lt/PERSONgt
  • I would like to tell you how much I enjoyed
    reading your new text
  • ltBOOKgt
  • Digital Signal Processing, A Computer Science
    Perspective.
  • lt/BOOKgt
  • I hope we will be able to meet at the next
    ltEVENTgtconference.lt/EVENTgt
  • Sincerely,
  • ltPERSONgtDee Espylt/PERSONgt

13
Presentational Markup
  • ltRIGHT-JUSTIFYgtSeptember 1, 2000lt/RIGHT-JUSTIFYgt
  • ltBOLDgtDear Prof. Stein,lt/BOLDgt
  • I would like to tell you how much I enjoyed
    reading your new text
  • ltUNDERLINEgt
  • Digital Signal Processing, A Computer Science
    Perspective.
  • lt/UNDERLINEgt
  • I hope we will be able to meet at the next
  • ltBLINKgtconference.lt/BLINKgt
  • Sincerely,
  • ltIMAGE SRCdeesignature.jpg ALIGNleftgt
  • ltFONT FACETimes-RomangtDee Espylt/FONTgt

14
Relational Markup
  • lttoday xlinkformsimple hrefdate
    actuateautogt
  • Dear Prof. Stein,
  • I would like to tell you how much I enjoyed
    reading your new text
  • ltA HREFwww.amazon.com/exec/obidos/ASIN/04712954
    gt
  • Digital Signal Processing, A Computer Science
    Perspective.
  • lt/Agt
  • I hope we will be able to meet at the next
  • ltA HREFconferencegtconference.lt/Agt
  • Sincerely,
  • ltIMAGE SRCdee-signature.jpg ALIGNleftgt
  • ltA HREFmailtodee_at_dee-epsy.netgtDee Espylt/Agt

15
Generalized Markup Language
  • William Tunnicliffe, Stanley Rice 1960s
  • (independently) invent idea of structural markup
    language
  • Problem need different ML for each type of
    document
  • (letter, report, article, book,
    etc)
  • Charles Goldfarb, Edward Mosher, Raymond Lorie
    (IBM) 1973
  • invent Generalized Markup Language (GML)
  • Solution use metalanguage
  • Document Type Definition (DTD)
    defines tags
  • IBM marked up 90 of its documents with GML

16
With GML structure is evident
  • Library
  • Novels
  • Journals
  • Textbooks
  • Algebraic zoology
  • Botanical history
  • Computer poetry
  • DSP
  • DSP-CSP
  • DSP just for fun
  • Elementary QED
  • Title
  • Full Digital Signal Processing
  • a Computer Science Perspective
  • Short DSPCSP
  • Author
  • Name Jonathan (Y) Stein
  • Association RAD Data Comm.
  • Publication
  • Publisher John Wiley
  • Year 2000
  • Location New York
  • ISBN 04712954

17
Standard Generalized Markup Language
  • Problems with GML
  • No validating parser
  • Not portable (between computer systems)
  • Solution
  • SGML
  • ANSI 1978
  • ISO/IEC 8879 1986 (Intl Org for Standardization
    / Intl Electrotechnical Commission)
  • JTC1/SC34/WG1 (WG 1 of SubCommittee 34 of Joint
    Technical Committee 1)
  • For presentation
  • Document Style Semantics and Specification
    Language

18
SGML - cont.
  • If SGML is so good why doesnt anyone use it ?
  • Complexity
  • base standard gt500 pages
  • SGML is a metalanguage
  • writing DTD is complex programming
  • marked up text is hard to read
  • DSSSL adds to complexity
  • Inflexibility - requires absolute conformity
  • assumes only one correct way to markup
  • constrains author to dictated structure
  • not good at capturing authors structure

19
HyperText Markup Language
  • CERN (particle physics institute in Switzerland)
    was an early Internet adopter
  • Used extensively for collaboration (articles have
    long author lists)
  • Major problems with format incompatibility
  • only straight ASCII worked reliably
  • Tim Berners-Lee (computer specialist) defined
    requirements
  • simplicity (couldnt expect physicists to use
    SGML)
  • freedom (didnt need validation, let browser
    ignore bad markup)
  • needed hypertext links (including to documents
    over Internet)
  • presentational markup (papers must look nice -
    authors used to TEX)
  • Solution HTML - a specific application of SGML
    (not metalanguage)

20
HTML versions
  • HTML 1.0 (1989) Berners-Lee original CERN version
  • hypertext, images, headbody structure,
    presentational markup
  • HTML 2.0 (1994) IETF standard - RFC 1866
  • added lists, forms, etc.
  • HTML 3.2 (1997) W3C recommendation (incorporates
    Netscape extensions)
  • added tables, applets, super/sub-scripts
  • HTML 4.0 (1997) W3C recommendation (and similar
    ISO/IEC 15445)
  • minimizes presentational markup
  • XHTML 1.0 (2000) present W3C recommendation
  • reformulates HTML in XML

21
HTML document structure
  • ltHTMLgt
  • ltHEADgt
  • global definitions such as
  • ltTITLEgtWeb page titlelt/TITLEgt
  • lt/HEADgt
  • ltBODYgt
  • marked-up text
  • lt/BODYgt
  • lt/HTMLgt

22
Some HTML (body) elements
  • ltH1gtLevel 1 Headinglt/H1gt Level 1
    Heading
  • ltH2gtLevel 2 Headinglt/H2gt Level 2
    Heading
  • ltH3gtLevel 3 Headinglt/H3gt Level 3
    Heading
  • ltEMgt emphasized lt/EMgt
    emphasized
  • ltPgt Paragraph lt/Pgt
    Paragraph
  • ltA HREFurlgtlinklt/Agt link
  • ltULgt
  • ltLIgt item 1 lt/LIgt
    .item 1
  • ltLIgt item 2 lt/LIgt
    . item 2
  • lt/ULgt
  • ltOLgt
  • ltLIgt item 1 lt/LIgt
    1 item 1
  • ltLIgt item 2 lt/LIgt
    2 item 2
  • lt/OLgt
  • ltIMG SRCurlgt

23
Problems with HTML
  • Presentational aspects have predominated
  • ltBgt bold text lt/Bgt
  • ltBLINKgt blinking text lt/BLINKgt
  • ltFONT COLORredgt red text lt/FONTgt
  • Practically no descriptive markup
  • Search engines are reduced to flat text search
  • Search by topic only through keywords or portals
  • Not extensible
  • Cant add new tags
  • Unknown tags ignored
  • Links are relatively simple
  • Usually user action is required (except IMG)
  • Only full document (with offset) linkable
  • Link management is logistic nightmare

24
Not everything is HTML
  • Due to HTML limitations other tools are also
    used
  • Multimedia extensions
  • (dynamic) gif, jpg,
  • streaming audio
  • Common Gateway Interface
  • generate HTML on-the-fly
  • Perl, C,
  • Server Push - Server Pull
  • Javascript
  • Java

25
eXtensible Markup Language
  • Simplified (best parts of) SGML (subset of
    features)
  • Flexible content management tool
  • W3C recommendation(s)
  • Extensible - can add new elements (even without
    DTD)
  • Easy to create special purpose languages (with
    DTD/SCHEMA)
  • Includes HTML-like hypertext links
  • and extensions (XLINK, XPOINTER)
  • The future of the web !

26
XML - an Example
  • lt?xml version"1.0" standalone"yes"?gt
  • ltbibliographygt
  • ltbook isbn04712954gt
  • lttitlegtDigital Signal Processing a Computer
    Science Perspectivelt/titlegt
  • ltauthorgtJonathan (Y) Steinlt/authorgt
  • ltpublishergtJohn Wiley and Sonslt/publishergt
  • lt/bookgt
  • ltarticlegt
  • lttitlegtFalse Alarm Reduction for ASR and
    OCRlt/titlegt
  • ltauthorgtYaakov Steinlt/authorgt
  • ltproceedingsgtTenth AICVNN Symposiumlt/proceedings
    gt
  • ltpagesgt195-200lt/pagesgt
  • lt/articlegt
  • ...
  • lt/bibliographygt

27
What can we do with an XML file?
  • Check if well-formed
  • Check if valid (against DTD or schema)
  • Display as-is in browser
  • Parse in special-purpose program (SAX, DOM)
  • Process (XSL) to XML, HTML, etc.
  • Display after processing

28
Wireless Markup Language
  • Markup language element of Wireless Application
    Protocol
  • WAP forum (1997)
  • Ericsson, Motorola, Nokia, Unwired Planet
    (phone.com)
  • bring Internet to cellular phone users
  • re-use fundamental Internet concepts (TCP/IP,
    http, html, javascript)
  • but adapted to lower bandwidth
  • smaller screen
  • limited input facilities
  • limited computational resources
  • applications scale across transport options (GSM,
    TDMA, CDMA, 3G)
  • and device types (mobile phones, personal
    assistants)

29
WML Philosophy
  • Defined using XML
  • Transported in compressed binary (for BW
    reduction)
  • Applications are modeled as decks of cards
  • Features
  • Actions (OK, navigation, help) can be performed
  • Hyperlinks (like in HTML)
  • String variables
  • Timers
  • wbmp images (BW)
  • Select boxes, forms (for input)
  • wmlscript (like javascript)

30
WML structure
  • lt ? xml version1.0 ? gt
  • lt!DOCTYPE wml gt
  • ltwmlgt
  • ltcardgt
  • ltpgt
  • text
  • lt/pgt
  • ltpgt
  • text
  • lt/pgt
  • lt/cardgt
  • ltcardgt
  • ...
  • lt/cardgt
  • lt/wmlgt

31
Some WML elements
  • ltpgt lt/pgt text
  • lta href...gt lt/agt hyperlink (anchor)
  • ltdogt lt/dogt action
  • ltgo href.../gt goto wml page
  • lttimergt trigger event
    (units tenths of a second)
  • ltinput/gt input user text
  • ltprev/gt return to previous
    page
  • () value of variable
  • ltimg src /gt display image
  • ltpostfield name value/gt set
    variable
  • ltselect gt ltoptiongt ltoptiongt lt/selectgt select
    box

32
Some more markup languages
  • VML Vector (graphics) Markup Language
  • VoiceXML
  • SSML Speech Synthesis Markup Language
  • CPML Call Policy Markup Language
  • DSML Directory Services Markup Language
  • MathML Mathematical Markup Language
  • CML Chemical Markup Language
  • AML Astronomical Markup Language
  • LegalXML
  • BSML Bioinformatic Sequence Markup Language
  • GedML Genealogical Data Markup Language
  • FinXML Financial market Markup Language
  • ChessML
  • SDML Signed Document Markup Language
  • RELML Real Estate Listing Markup Language
  • etc. etc. etc. ...

33
Examples
  • HTML
  • html examples
  • XML
  • xml-file xsl-file xml
  • VML
  • vml-file
  • WML (get M3gate emulator)
  • wml examples
Write a Comment
User Comments (0)
About PowerShow.com