Text Processing in Unix - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Text Processing in Unix

Description:

pic - for creating line drawings. 10. ITSW 2436/Kenneth R. Frazer. Basic nroff Commands ... pic Preprocessor .PS. box ht .4 wid .6. box ht .6 wid .8 with .c at ... – PowerPoint PPT presentation

Number of Views:280
Avg rating:3.0/5.0
Slides: 40
Provided by: Kenneth4
Category:
Tags: pic | processing | text | unix

less

Transcript and Presenter's Notes

Title: Text Processing in Unix


1
Text Processing in Unix
2
Where is MS Word??????
  • Unix has never had a standard word processing
    system
  • Although several word processors are now
    available (WordPerfect, StarOffice, ApplixWare)
    none has reached the status of a standard
  • Portability is an issue
  • Unix has traditionally relied upon text
    processors instead
  • roff
  • nroff/troff
  • TeX

3
Text Processing
  • Text processing is a two step process
  • First the document is created using a text editor
    such as vi
  • Document contains plain text and formatting codes
  • Second step is to print the document through a
    formatter, such as nroff
  • The formatter interprets the codes embedded in
    the document so that it is printed according to
    your specification
  • Note that the editor does not format, and the
    formatter is of no use for editing they are two
    distinct operations

4
The Formatting Process
The Great American Novel
Final Doc
5
Formatter Characteristics
  • roff-like formatting systems describe actions you
    want to have performed on your text
  • indent
  • italicize
  • bold
  • They use very primitive, low-level commands
  • It would be nice instead to define the type of
    text (paragraph, footnote, heading) and
    externally specify the particular format you want
    applied to this type of text

6
A Programming Language
  • Like Postscript, nroff is a programming language
  • It has a set of preprocessors, like C
  • It has macros
  • It has a full set of registers
  • As with Unix, most people do not need to
    understand the full programming aspects in order
    to effectively use nroff, any more than you need
    to know C to use Unix

7
Macros
  • Macros translate high level result oriented
    commands into low level commands
  • Start paragraph
  • Start heading
  • Start section
  • There are two principal macro packages
  • -mm macro memorandum
  • -ms
  • -mm is the more robust of the two packages
  • While -ms was designed to do everything, -mm was
    designed for robustness
  • Many simple errors in -ms lead to mystifying
    results while -mm usually provides helpful error
    messages

8
Preprocessors
  • Although nroff is a powerful formatter, it cant
    do everything
  • You can set tabs but nroff cant analyze the
    widths of columns and automatically set widths
  • nroff has some ability for drawing lines and
    boxes but it is hard to do manually
  • Although nroff has access to mathematical
    symbols, building equations directly is too hard
  • Preprocessors are used to extend the abilities of
    nroff
  • Preprocessors are designed to address a specific
    typesetting specialty, such as equations or line
    drawings

9
  • The preprocessor translates portions of your
    document to nroff primitive commands and leaves
    the rest alone
  • Preprocessors work in "the Unix way"
  • They are filters
  • They perform one specific function rather than
    causing nroff to bloat
  • They are used with pipes
  • tbl mydoc troff
  • Four common preprocessors
  • tbl - for managing tabular data
  • eqn (or neqn) - for typesetting mathematical
    equations
  • refer - for bibliographical references
  • pic - for creating line drawings

10
Basic nroff Commands
  • nroff contains two types of commands
  • dot commands
  • embedded commands
  • Dot Commands
  • Must stand alone on a line
  • Have a period (dot) as the first character on the
    line
  • Many dot commands accept one or more arguments to
    give them information about what to do
  • Example .sp produces one extra (blank) line of
    output
  • .sp 3 produces three extra blank lines
  • At least one space or tab must separate the
    arguments

11
Embedded Commands
  • Embedded commands start with a backslash ( \ )
  • Often used to access a special character
  • Most accept an argument which must immediately
    follow the command without any intervening space
  • Some features, such as point size or font
    changes, can be done with either a dot or
    embedded command
  • In general, dot commands are used to control
    global features
  • Embedded commands used to control local features
  • .\" is used to protect remarks from being printed

12
Spacing Commands
  • .sp controls vertical spacing.
  • By itself, .sp inserts a blank line
  • .sp num enters num blank lines
  • .sp numi adds num inches of blank space
  • Absolute movement is also possible
  • .sp 3i moves to the position 3 inches from the
    top of page
  • .sp -2i moves to 2 inches from the bottom of
    page
  • Note most macro packages turn on no-space mode
    after printing page headers so any top-of-page
    requests will be ignored. If you always want to
    produce the space, put a .rs on the line
    preceding the .sp command

13
  • .vs controls vertical spacing between lines
  • Default is generally about 20 more than the font
    point size
  • To change, use .vs psp, where ps is point size,
    ie .vs 12p for 12 point spacing (12/72 inch)
  • .ls controls line spacing
  • .ls 2v causes double spaced output

14
  • .ne tests to see how much space is left between
    the current location and the next trap, which
    generally signifies the end of text on the page
  • .ne 2v tests to see if there are at least two
    lines of space remaining on the page
  • If there isn't enough space, the page position is
    advanced to the next trap, which usually prints
    the footer and advances to the next page
  • .ne is used to avoid widows, where the first or
    last line of a paragraph is isolated on a
    different page from the rest of the paragraph

15
  • .in is used to control indenting
  • Measured in ems (approximately the width of an M)
  • .in 5m indents all following text by 5 ems
  • .ti controls temporary indent for the next line
  • .br causes a break in the text-filling process
  • Often used in header/return address portion of a
    letter to separate lines
  • Ken Frazer
  • .br
  • PO Box 1234
  • .br
  • Coppell, TX

16
  • .bp forces a new page
  • .ta sets the tab stops and requires arguments
  • .ta 8n 16n 24n 32n 40n sets tabs every 8
    character positions
  • Can also use other measurement units
  • .ta .5i 1i 1.5i 2i
  • Tabs remain in effect until changed so in theory,
    you only need to set them once
  • However, in practice you should set them every
    time you use them because the standard macro
    packages and preprocessors fiddle with them all
    the time

17
  • \u and \d are used to produce half line motions
  • \ and \ are narrow space codes
  • \ is 1/6 em, \ is 1/12 em
  • \c is the end-of-line continuation mark
  • Normally an end-of-line character is converted to
    a space when the lines are stitched together.
    When a \c is used at the end of a line, the space
    is discarded so the following line is attached to
    the current line without an intervening space.

18
Filling and Adjusting
  • .na and .ad control the adjustment of the margins
  • .na (no adjust) causes nroff to stop adjusting
    margins. Words will still be collected to form
    an output line but intervening spaces will not be
    added to make the margins align.
  • .ad tells nroff to resume adjusting margins
  • .ce centers the following input line(s) until a
    .ce 0 command is encountered
  • Input lines following the .ce command are not
    filled

19
  • .nf and .fi control the nroff fill mode
  • .nf causes nroff to stop collecting input lines
    to produce appropriately long output lines
  • .fi tells nroff to resume filling lines
  • .nf can be used instead of .br for producing
    letter headings
  • .nf
  • Ken Frazer
  • PO Box 1234
  • Coppell, TX
  • .fi
  • .sp 2
  • Dear Alice,

20
Fonts
  • .ft is used to switch fonts
  • Fonts are referred to by either number or by one-
    or two-character names
  • 1 is Times Roman
  • 2 is Times Roman italic
  • R is also Times Roman
  • RI is Times Roman italic
  • \fn is another way to switch fonts
  • \f1plain\f2italics\f3bold\f1 will produce
    plainitalicsbold

21
Hyphenation
  • nroff will automatically hyphenate words at the
    end of a line, but like any automatic hyphenator,
    it sometimes makes mistakes
  • .hn turns off automatic hyphenation
  • .hy enables automatic hyphenation. It accepts a
    numeric argument that controls when hyphenation
    is used
  • .hy 2 disables hyphenation for the last line of a
    page
  • .hy 4 disables splitting off the last two
    characters of a word
  • .hy 8 disables splitting off the first two
    characters of a word

22
  • .hw allows you to specify how certain words
    should be hyphenated
  • .hw de-vice proc-ess cata-logue un-known
    trans-portable

23
File Switching
  • nroff has two commands to switch from one input
    file to another
  • .so switches from from the current file to the
    file named as an argument. When the inserted
    file is completely read, nroff returns to the
    original file and continues reading from the
    point just past the .so.
  • .nx switches from the current file to the file
    named as an argument. All processing stops when
    the new file is completely read. Any text in the
    original file that follows the .nx command will
    not be processed.

24
-ms Macros
  • -ms was the first widely used macros package
  • To use
  • nroff -ms inputfile lpr

25
Commands
  • .NH n produces a numbered heading
  • .NH n
  • text
  • .LP
  • where n is the heading level, text is the
    heading text, .LP starts a new paragraph
  • .SH produces a section heading
  • .SH
  • text
  • .LP

26
Paragraph Commands
  • .PP starts a normal paragraph with first line
    indented
  • .LP starts a normal paragraph with all lines
    flush left
  • .IP label starts an indented paragraph. All lines
    are indented on the left and the optional label
    is printed to the left of the first line.
  • .XP starts an exdented paragraph. All lines
    except the first will be indented on the left.

27
Overall Document Format
  • .1C or .2C switches to one-column (the default)
    or two-column format
  • .DA date prints the date at the bottom of the
    page. The optional date argument overrides
    the current date.
  • .ND inhibits printing the date at the bottom of
    the page.
  • .OH 'L'C'R' These macros produce Headers and
    .EH 'L'C'R' Footers on Odd or Even pages.
    .OF 'L'C'R' Each three-part header
    consists of .EF 'L'C'R' text L for the left, C
    for the center,and R for the right. In a
    header or footer will print the page
    number.

28
Type Styles
  • .R Switch font to Roman, italic, or bold. .I
    wd1 wd2 For I or B, if wd1 is supplied, it .B
    wd1 wd2 alone will be in italic or bold. If
    wd2 is supplied, it will follow wd1,
    without a separating space, and be in the
    surrounding font.
  • .SM Switch to a smaller,
    normal-size, or .NL larger typeface. .SM or .LG
    can be .LG repeated to increase the size
    change.
  • .UL word Underline a single word.

29
Displays and Footnotes
  • .DS x Display text in no-fill mode. text will be
    text moved to the following page, leaving a .DE
    blank region, if it doesn't fit. Optional
    argument x may be L for a flush-left
    display, I for a slightly indented display,
    C for a line-by-line centered display, or B
    for a block centered display. The default is
    an indented display.
  • .LD Display multipage text..ID .LD
    replaces DS L.CD .ID replaces DS I
    .CD replaces DS C

30
  • .KS text will be moved to the following page text
    if it doesn't fit. A blank space may be .KE
    produced at the bottom of the current page.
  • .KF text will float to the start of the
    following text page if it doesn't fit on the
    current page. .KE Following text may be moved
    forward to fill the bottom of the page.

31
  • .EQ x n text will be processed by the eqn
    text preprocessor. Optional argument x may .EN
    be I for an indented equation, L for a
    flush-left equation, or C for a centered
    equation. Centered is the default. An
    argument n may follow the equation type. It
    will be placed flush left to identify the
    equation.
  • .TS text will be processed by the tbl text
    preprocessor. .TE

32
  • .RS text will be shifted to the right.text.RE
  • .FS text is a footnote that will be
    placed at text the bottom of the page. Berkeley
    -ms .FE allows \ to number footnotes
    automatically.

33
First Page Formats
  • .RP uses the ATT Released Paper style
  • .TM uses the Berkeley Thesis style
  • .TL uses text for a title
  • .TL
  • text
  • .AU specifies an author's name (text) and
    optional address and phone number (loc and ext,
    respectively)
  • .AU loc ext
  • text

34
  • .AI specifies an author's institution
  • .AI
  • text
  • .AB is used for the abstract
  • .AB
  • text
  • .AE
  • .SG inserts the author's signature (name) in the
    text

35
Table of Contents
  • .XS n uses text as a TOC entry, with page number
    n
  • .XS n
  • text
  • .XE
  • .PX prints the table of contents

36
Using the Preprocessors
  • Implemented as filters so piping is appropriate
  • tbl infile pic neqn nroff lpr

37
tbl Preprocessor
  • .TS
  • center box
  • C S
  • RI L.
  • Text Preprocessors
  • .sp .3v
  • tbl Tables of data
  • eqn Equations
  • refer References
  • pic Line Drawings
  • .TE

Text Preprocessors
tbl eqn refer pic
Tables of data Equations References Line Drawings
38
eqn Preprocessor
  • .EQ
  • int e sup i omega t e sup -i omega t
  • over 2 pi
  • .EN

39
pic Preprocessor
  • .PS
  • box ht .4 wid .6
  • box ht .6 wid .8 with .c at last box.c
  • PC box ht .3 wid 1 with .n at last box.s
  • PC at PC.c (-.35,0)
  • box ht .15 wid .3 at PC.c
  • box ht same at PC.c (.3,0)
  • .PE

PC
Write a Comment
User Comments (0)
About PowerShow.com