Title: Traditional Electronic Printing On The Internet
1Traditional Electronic PrintingOn The Internet
- William J. Bill McCalpin
- EDPP, CDIA, MIT, LIT
- Principal, MHE
2- Xplor 21st Global Conference and Exhibit
- Miami Beach, Florida
- October 30, 2000
3Printing Versus The Internet
4Printing Versus The Internet
- Electronic printing is an 125,000,000,000 (US)
industry worldwide (www.xplor.org) - There are now an estimated 98,685,000 host
computers on the Internet (www.mids.org) - Xplor International estimates that the production
of paper documents and electronic documents is
still increasing - So, for a while yet, were living in a hybrid
world
5Printing Versus The Internet
- Customer service needs identical look and feel in
paper and electronic documents - Regulatory agencies continue to have an interest
in document presentation - Customers need a re-education process as
documents change media - Hence, there are good reasons in the short run to
be concerned about presentation
6The Nature Of Print Streams
7EBCDIC Versus ASCII
- BCD - Binary Coded Decimal
- BCDIC - Binary Coded Decimal Interchange Code
- EBCDIC - IBM Extended Binary Coded Decimal
Interchange Code - ASCII - American Standard Code for Information
Interchange
8EBCDIC Line Data
- EBCDIC encoded - 8 bit
- Record-oriented because of IBM OSs
- Carriage controls
- Machine carriage controls
- ANSI carriage controls
9ASCII Line Data
- ASCII encoded - 7 bit
- Record orientation is not intrinsic to OS
- Text files use print controls to delimit records
- Common print controls
- x0d carriage return
- x0a line feed
- x0c form feed
10The EBCDIC Family Tree
- EBCDIC text
- 1403 data - EBCDIC records with a carriage
control - LCDS - Line conditioned data stream
- 3800 Mod I
- 3211 data with Xerox DJDEs
- Others
- AFP, MODCA, and IPDS
11The ASCII Family Tree
- ASCII text
- ASCII text with print controls
- ASCII text with escape sequences
- Epson MX-80 Xerox UDK (XES)
- QMS QUIC IBM PPDS
- HP PCL Xerox Metacode
- Print programming languages using ASCII
- Interpress PostScript
12Line Data And Conditioned Line Data
- 1403, 3211, other EBCDIC line data streams,
including Xerox DJDE - 3800 Mod I and other IBM data streams
- ASCII text files of all sorts
- 1 This is text
- F44444E88A48A4A8AA
- 100000389209203573
- F CL
- F This is textRF
- 02222256672672767700
- C00000489309304584DA
13Print Data With Escape Sequences
- Epson and many other impact printers
- Xerox UDK (XES)
- QMS QUIC
- IBM PPDS
- HP PCL
- Xerox Metacode
- AFP, MODCA, and IPDS
- X01060001040002000154686973206973207465787401
- AMB 100 AMI 300 STO 0,90 SCFL 3 SVI 14 TRN This
is text
14Print Programming Languages
- !PS-Adobe-2.0
- Title Blue Book Program 7, on page 157
- EndComments/Times-Roman findfont 18 scalefont
setfont - 72 500 moveto
- (This is text) show
- ...
- Interpress
- PostScript (and PDF)
15The Nature Of Internet Formats
16Common Internet Formats
- The most commonly used data format on the
Internet is HTML - HyperText Markup Language - The next expected wave on the Internet is XML
(eXtensible Markup Language) and its related
standards such as XSL, SVG, etc. - As a secondary standard, PDF is widely used to
present static documents
17HTML
- HTML is an instance of SGML
- HTML has a set of 40 to 50 tags, which are
grammar based - HTML tags have default presentation
characteristics, but these can be overridden with
CSS (Cascading Style Sheets)
18Sample HTML
- lt!doctype html public "-//w3c//dtd html 4.0
transitional//en"gt - lthtmlgt
- lth1gtPoison Ivy Vineyardslt/h1gt
- ltpgtPoison Ivy Vineyards is an experiment in
growing wine-quality grapes in a backyard in a
residential neighborhood in Richardson, Texas.
This website serves as a running diary of the
steps I took to create the vineyard and -
eventually - to make wine.lt/pgt - lt/htmlgt
19XML
- XML is eXtensible Markup Language, which means
that you can make up the tags - Since a browser cant know how to format the
tags, default formatting is in outline form - Normally, you would use XSL (CSS) to describe how
each tag is to be formatted
20Sample XML
- ltNAMEgtWilliam J. "Bill" McCalpin, EDPP, CDIA,
MIT, LITlt/NAMEgt - ltJOBTITLEgtPrincipallt/JOBTITLEgt
- ltAFFILIATIONgtMHElt/AFFILIATIONgt
- ltADDRESSgt
- ltSTREETgt1400 Cheyenne Dr.lt/STREETgt
- ltCITYgtRichardsonlt/CITYgt
- ltSTATEgtTexaslt/STATEgt
- ltZIPCODEgt75080lt/ZIPCODEgt
- ltEMAILgtmccalpin_at_mhe-consulting.comlt/EMAILgt
- lt/ADDRESSgt
21Sample XSL
- This is an ltemphgtimportantlt/emphgt point.
- ltxsltemplate match"emphgt
- ltfosequence font-weight"boldgt
- ltxslprocess-children/gt
- lt/fosequencegt
- lt/xsltemplategt
22PDF
- PDF is Adobes Portable Document Format
- PDF is a print stream, not an SGML instance
- PDF is similar to PostScript, but more portable,
because it carries its own resources - PDF provides good fidelity, at a price
23Sample PDF
- PDF-1.1
- ...
- 2 0 obj
- ltlt
- /CreationDate (D19960809191047)
- /Producer (Acrobat Distiller 2.1 for Windows)
- /Creator (Adobe PageMaker 6.0)
- /Author (Doc)
- /Keywords ()
- /Title (bills)
- /Subject ()
- gtgt
- endobj
24Limits Of Browsers
25A Normal HTML Page
26Default Font Increased
27Using Ghouly Solid
28Adjusting The Fonts
29Methods Of Moving Traditional Electronic Print To
The Internet
30Five Methods
- Conversion to PDF
- Rasterization to gif or jpeg
- Recomposition into HTML/XML
- Conversion to normal HTML/XML
- Translation to highly formatted HTML/XML
31Conversion to PDF
- This is a print stream to print stream conversion
- The output in PDF usually looks very similar to
the original printed document - Many tools which create the PDF also add value,
such as hypertext links, bookmarking, et cetera,
to the PDF document
32Pros And Cons Of PDF
- Pros
- High fidelity to original document
- Reader is widespread and free
- Reasonably transportable
- Widely used in some circles (e.g., IRS)
- Cons
- PDF files tend to be large
- PDF documents are paper-sized centric
- Browser requires a plug-in
33PDF Sample
- PDF-1.1
- ...
- 2 0 obj
- ltlt
- /CreationDate (D19960809191047)
- /Producer (Acrobat Distiller 2.1 for Windows)
- /Creator (Adobe PageMaker 6.0)
- /Author (Doc)
- /Keywords ()
- /Title (bills)
- /Subject ()
- gtgt
- endobj
34Sources For To PDF
- Composition Tools - create new PDF documents from
source code - Transforms - translate existing formatted print
streams into PDF - Larger Systems- composition or translation
capabilities inserted transparently into document
systems - See Xplor Products and Services Reference Guide
35Rasterization to gif or jpeg
- The print stream israsterized, that is,
converted to a bit map format - GIF Graphical Interchange Format (GIF) -
Invented by CompuServe for graphics. Supports
only 256 colors, or 8 bits. - JPEG (Joint Photographic Experts Group)
Specifically for more than 256 colors, with
better compression, but is lossey - Excellent discussion of each at
http//www.efuse.com/Design/web_graphics_basics.ht
ml
36Pros And Cons Of Rasterization
- Pros
- Image is exact copy of original document
- Image can be viewed on any browser which takes
gifs and jpegs - Cons
- Resolution is hardcoded at one size
- Theres no text to search
- Download is longer
- No correspondence of printed pages and HTML
pages
37Sample Rasterization
- This page was originally created in PDF, then
rasterized, and converted to a jpeg
38Recomposition into HTML/XML
- Data is extracted from a print stream
- Templates have been created in advance
- The extracted data is merged into the templates
- There may be fewer or more output pages in HTML
than were in the print stream - Templates are built to be the most effective in
the browser window
39Pros And Cons Of Recomposition
- Pros
- HTM/XMLL pages are well-suited for the browser
- HTML/XML is considered by some to be simpler than
PDF - Cons
- HTML/XML pages dont necessarily match the
printed pages - All pages (templates) must be pre-composed
40Sample Recomposition
- This document is a sample telephone bill which
have been divided into 11 HTML pages - Note how the HTML pages are divided by subject,
not by page overflow
41Conversion to normal HTML/XML
- Both data and formatting information are
extracted from the print file - Some formats easily correspond to an HTML tag,
e.g., a heading to lth1gt - More complex formatting can be approximated by
the use of table tags
42Pros And Cons of Conversion
- Pros
- HTML/XML pages look similar to printed pages
- Pages are in HTML/XML, not PDF or raster
- Cons
- Fidelity is approximate
- Reader can substantially alter the presentation
- Graphics may not be supported
43Sample Conversion
44Translation to highly formatted HTML/XML
- This method uses particular CSS commands to do
exact placement of text in the window of the
browser - This is as close as XML gets (today) to being a
print stream - Fonts are still subject to user override
45Pros And Cons Of Translation
- Pros
- Author has very good control over the
presentation of text - Cons
- Much of the value of a tagged language is lost
- Portrait print pages still dont fit on landscape
browser windows - May not work with all browsers
- Fonts can still be overridden
46Sample Translation
- ltHTMLgt
- ltHEADgt
- .ps9positionabsolutetop676pxleft454pxwidth
65px - .ps10positionabsolutetop676pxleft535pxwidth
66px - .ps11positionabsolutetop676pxleft1102pxwidt
h70px - ltSPAN CLASS"ps9"gtltNOBRgtBalancelt/NOBRgtlt/SPANgt
- ltSPAN CLASS"ps10"gtltNOBRgtForwardlt/NOBRgtlt/SPANgt
- ltSPAN CLASS"ps11"gtltNOBRgt5,000.00lt/NOBRgtlt/SPANgt
47William J. Bill McCalpin
- EDPP, CDIA, MIT, LIT
- Principal, MHE
- 1400 Cheyenne Dr.
- Richardson, Texas 75080-3921
- 972-231-3660 (v) 972-690-4521 (f)
- mccalpin_at_mhe-consulting.com