Unix%20Lecture%206 - PowerPoint PPT Presentation

About This Presentation
Title:

Unix%20Lecture%206

Description:

An information extraction system used as SRI International, ... sed search pattern is a regular expression, essentially the same as a grep regular expression ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 16
Provided by: hanaf
Learn more at: http://plaza.ufl.edu
Category:
Tags: 20lecture | grep | unix

less

Transcript and Presenter's Notes

Title: Unix%20Lecture%206


1
Unix Lecture 6
  • Hana Filip

2
HW6 - Part II
  • solutions posted on my website
  • see syllabus

3
Text ProcessingCommand Line Utility Programs
  • sed
  • wc
  • awk
  • comm
  • cut
  • ex
  • iconv
  • join
  • paste
  • sort
  • tr
  • uniq
  • xargs

4
TextPro Lexicon File
  • Lexicon file core.text
  • Background
  • TextPro
  • An information extraction system used as SRI
    International, Menlo Park, CA
  • Developed by Doug Appelt

5
copy machen.txt into your account
  • gt cd ..
  • gt cd c6932aab
  • gt ls
  • machen.txt
  • gt cp machen.txt c6932aad
  • gt cd
  • gt ls
  • machen.txt

6
Text ProcessingCommand Line Utility Programs
  • tr translate or delete characters
  • Example 1 delete (-d) all the new line
    characters from machen.txt, and redirect the
    output to a file named machen-cont.txt.
  • cat machen.txt tr -d "\n" gt machen-cont.txt
  • Example 2 delete (-d) all characters from
    machen.txt except for alphabetical characters,
    new lines, and spaces, and redirect the output to
    a file named machen-alpha.txt.
  • cat machen.txt tr -c -d "alpha\n " gt
    machen-alpha.txt
  • Try also
  • cat machen.txt tr -c -d "alpha\n" gt
    machen-alpha.txt

7
Text ProcessingCommand Line Utility Programs
  • tr can be used to make a wordlist from a text.
    This can be done by replacing all spaces with a
    newline
  • cat machen.txt tr " " "\n" less
  • cat machen.txt tr " " "\012" less
  • We can combine the command above with the delete
    functionality of tr to make a wordlist without
    unwanted characters
  • cat machen.txt tr " " "\n" tr -c -d
    "alpha\n" gt lex

8
Text ProcessingCommand Line Utility Programs
  • sort prints the lines of its input or
    concatenation of all files listed in its argument
    list in sorted order. (The -r flag will reverse
    the sort order.)
  • sort -r movie_characters

9
Text ProcessingCommand Line Utility Programs
  • uniq takes a text file and outputs the file with
    adjacent identical lines collapsed to one
  • it is a kind of filter program
  • typically it is used after sort
  • cat machen.txt tr " " "\n" tr -c -d
    "alpha\n sort uniq gt lex

10
Text ProcessingCommand Line Utility Programs
  • sed stream editor
  • a special editor for automatically modifying
    files
  • a find and replace program, it reads text from
    standard input and writes the result to standard
    outout (normally the screen) The search pattern
    is a regular expression (see references).
  • sed search pattern is a regular expression,
    essentially the same as a grep regular expression
  • often used in a program to make changes in a file

11
Text ProcessingCommand Line Utility Programs
  • sed simple example 1
  • sed 's/United States/USA/' lt usa-gaz.text gt
    new-usa-gaz.text
  • s Substitute command
  • /../../ Delimiter
  • United States Regular Expression Pattern String
  • USA Replacement string
  • lt old_file gt new_file

12
Text ProcessingCommand Line Utility Programs
  • sed simple example 2
  • sed 's/\(United\)\(States\)/\2\1/'lt
    usa-gaz.textgtusa-switch-gaz.text
  • switch two words around
  • \( word onset
  • \) word end
  • /../../ delimiter
  • \1 register 1
  • \2 register 2

13
Text ProcessingCommand Line Utility Programs
  • multiple sed commands may also be stored in a
    script file. The "-f" option is used on the
    command line to access the commands in the
    script
  • sed -f sedscript.sed file

14
Text ProcessingCommand Line Utility Programs
  • sed 's//LexEntry /gs// ./' lex gt newlex
  • match the beginning of the line
  • match the end of the line

15
Text ProcessingCommand Line Utility Programs
shell script
  • ! /usr/local/bin/tcsh
  • usage make_lex filename1 make_lex filename1
    filename2,
  • first, make sure the user typed in at least one
    argument
  • if ( lt 1 ) then
  • echo "This script needs at least 1 argument."
  • echo "Exiting...(annoyed)"
  • exit 666
  • endif
  • foreach name ()
  • cat name tr " " "\n" tr -c -d
    "alpha\n" sort uniq gt mylex
  • sed 's//LexEntry /gs// ./' mylex gt newlex
  • echo "Your new lexical file is called
    'newlex'."
  • end
Write a Comment
User Comments (0)
About PowerShow.com