Advanced File Processing - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Advanced File Processing

Description:

Design a new file-processing application by creating, testing, and running shell scripts ... Shell scripts should contain commands to execute programs and comments to ... – PowerPoint PPT presentation

Number of Views:113
Avg rating:3.0/5.0
Slides: 34
Provided by: til49
Category:

less

Transcript and Presenter's Notes

Title: Advanced File Processing


1
Guide To UNIX Using Linux Third Edition
  • Chapter 5
  • Advanced File Processing

2
Objectives
  • Use the pipe operator to redirect the output of
    one command to another command
  • Use the grep command to search for a specified
    pattern in a file
  • Use the uniq command to remove duplicate lines
    from a file
  • Use the comm and diff commands to compare two
    files
  • Use the wc command to count words, characters and
    lines in a file

3
Objectives (continued)
  • Use manipulation and transformation commands,
    which include sed, tr, and pr
  • Design a new file-processing application by
    creating, testing, and running shell scripts

4
Advancing YourFile-Processing Skills
  • Selection commands focus on extracting specific
    information from files

5
Advancing YourFile Processing Skills (continued)
  • Manipulation and transformation commands alter
    and transform extracted information into useful
    and appealing formats

6
Advancing YourFile Processing Skills (continued)
7
Using the Selection Commands
  • Using the Pipe Operator
  • The pipe operator () redirects the output of
    one command to the input of another
  • An example would be to redirect the output of the
    ls command to the more command
  • The pipe operator can connect several commands on
    the same command line

8
Using the Pipe Operator
Using pipe operators and connecting commands is
useful when viewing directory information
9
Using the grep Command
  • Used to search for a specific pattern in a file,
    such as a word or phrase
  • greps options and wildcard support allow for
    powerful search operations
  • You can increase greps usefulness by combining
    with other commands, such as head or tail

10
Using the uniq Command
  • Removes duplicate lines from a file
  • Compares only consecutive lines, therefore uniq
    requires sorted input
  • uniq has an option that allows you to generate
    output that contains a copy of each line that has
    a duplicate

11
Using the uniq Command (continued)
12
Using the uniq Command (continued)
13
Using the comm Command
  • Used to identify duplicate lines in sorted files
  • Unlike uniq, it does not remove duplicates, and
    it works with two files rather than one
  • It compares lines common to file1 and file2, and
    produces three column output
  • Column one contains lines found only in file1
  • Column two contains lines found only in file2
  • Column three contains lines found in both files

14
Using the diff Command
  • Attempts to determine the minimal changes needed
    to convert file1 to file2
  • The output displays the line(s) that differ
  • Codes in the output indicate that in order for
    the files to match, specific lines must be added
    or deleted

15
Using the wc Command
  • Used to count the number of lines, words, and
    bytes or characters in text files
  • You may specify all three options in one issuance
    of the command
  • If you dont specify any options, you see counts
    of lines, words, and characters (in that order)

16
Using the wc Command (continued)
The options for the wc command l for lines w
for words c for characters
17
Using Manipulation and Transformation Commands
  • These commands are sed, tr, pr
  • Used to edit and transform the appearance of data
    before it is displayed or printed

18
Introducing the sed Command
  • sed is a UNIX/Linux editor that allows you to
    make global changes to large files
  • Minimum requirements are an input file and a
    command that lets sed know what actions to apply
    to the file
  • sed commands have two general forms
  • Specify an editing command on the command line
  • Specify a script file containing sed commands

19
Translating CharactersUsing the tr Command
  • tr copies data from the standard input to the
    standard output, substituting or deleting
    characters specified by options and patterns
  • The patterns are strings and the strings are sets
    of characters
  • A popular use of tr is converting lowercase
    characters to uppercase

20
Using the pr Command toFormat Your Output
  • pr prints specified files on the standard output
    in paginated form
  • By default, pr formats the specified files into
    single-column pages of 66 lines
  • Each page has a five-line header containing the
    file name, its latest modification date, and
    current page, and a five-line trailer consisting
    of blank lines

21
Designing a New File-Processing Application
  • The most important phase in developing a new
    application is the design
  • The design defines the information an application
    needs to produce
  • The design also defines how to organize this
    information into files, records, and fields,
    which are called logical structures

22
Designing Records
  • The first task is to define the fields in the
    records and produce a record layout
  • A record layout identifies each field by name and
    data type (numeric or nonnumeric)
  • Design the file record to store only those fields
    relevant to the records primary purpose

23
Linking Files with Keys
  • Multiple files are joined by a key a common
    field that each of the linked files share
  • Another important task in the design phase is to
    plan a way to join the files
  • The flexibility to gather information from
    multiple files comprised of simple, short records
    is the essence of a relational database system

24
(No Transcript)
25
Creating the Programmerand Project Files
  • With the basic design complete, you now implement
    your application design
  • UNIX/Linux file processing predominantly uses
    flat files
  • Working with these files is easy, because you can
    create and manipulate them with text editors like
    vi and Emacs

26
Creating the Programmerand Project Files
(continued)
27
Formatting Output
  • The awk command is used to prepare formatted
    output
  • For the purposes of developing a new
    file-processing application, we will focus
    primarily on the printf action of the awk command

Awk provides a shortcut to other UNIX/Linux
commands
28
Using a Shell Script toImplement the Application
  • Shell scripts should contain
  • The commands to execute
  • Comments to identify and explain the script so
    that users or programmers other than the author
    can understand how it works
  • Use the pound () character to mark comments in a
    script file

29
Running a Shell Script
  • You can run a shell script in virtually any shell
    that you have on your system
  • The Bash shell accepts more variations in command
    structures that other shells
  • Run the script by typing sh followed by the name
    of the script, or make the script executable and
    type ./ prior to the script name

30
Putting it All Together toProduce the Report
  • An effective way to develop applications is to
    combine many small scripts in a larger script
    file
  • Have the last script added to the larger script
    print a report indicating script functions and
    results

31
Chapter Summary
  • UNIX/Linux file-processing commands are (1)
    selection and (2) manipulation and transformation
    commands
  • uniq removes duplicate lines from a sorted file
  • comm compares lines common to file1 and file2
  • diff tries to determine the minimal set of
    changes needed to convert file1 into file2

32
Chapter Summary (continued)
  • tr copies data read from the standard input to
    the standard output, substituting or deleting
    characters specified
  • sed is a file editor designed to make global
    changes to large files
  • pr prints the standard output in pages

33
Chapter Summary (continued)
  • The design of a file-processing application
    reflects what the application needs to produce
  • Use record layout to identify each field by name
    and data type
  • Shell scripts should contain commands to execute
    programs and comments to identify and explain the
    programs
Write a Comment
User Comments (0)
About PowerShow.com