Workbook 8, and 9 - PowerPoint PPT Presentation

About This Presentation
Title:

Workbook 8, and 9

Description:

Workbook 8, and 9 Pace Center for Business and Technology * Receiving Signals When a process receives a signal, it may take one of the following three actions. – PowerPoint PPT presentation

Number of Views:147
Avg rating:3.0/5.0
Slides: 122
Provided by: Delg161
Learn more at: http://csis.pace.edu
Category:

less

Transcript and Presenter's Notes

Title: Workbook 8, and 9


1
Workbook 8, and 9
Pace Center for Business and Technology
2
String Processing Tools
  • Key Concepts
  • The wc command counts the number of characters,
    words, and lines in a file. When applied to
    structured data, the wc command can become a
    versatile counting tool.
  • The cat command has options that allow
    representation of nonprinting characters such as
    NEWLINE.
  • The head and tail commands have options that
    allow you to print only a certain number of lines
    or a certain number of bytes (one byte usually
    correlates to one character) from a file.

3
Revisiting cat, head, and tail
  • Revisiting cat
  • We have been using the cat command to simply
    display the contents of files. Usually, the cat
    command generates a faithful copy of its input,
    without performing any edits or conversions. When
    called with one of the following command line
    switches, however, the cat command will indicate
    the presence tabs, line feeds, and other control
    sequences, using the following conventions.
  • Using the -A command line switch, the whitespace
    structure of the file becomes evident, as tabs
    are replaced with I, and line feeds are
    decorated with . E.g. cat -A /etc/hosts

4
Revisiting head and tail
  • For example, the following file contains a list
    of four musicians.
  • Linux (and Unix) text files generally adhere to a
    convention that the last character of the file
    must be a line feed for the last line of text.
    Following the cat of the file musicians.mac,
    which does not contain any conventional Linux
    line feed characters, the bash prompt is not
    displayed in its usual location.

5
Revisiting head and tail
6
The wc (Word Count) Command
  • When used without any command line switches, wc
    will report on the number of characters, lines,
    and words. Command line switches can be combined
    to return any combination of character count,
    line count or word count.

7
How To Recognize A Real Character
  • Text files are composed using an alphabet of
    characters. Some characters are visible, such as
    numbers and letters. Some characters are used for
    horizontal distance, such as spaces and TAB
    characters. Some characters are used for vertical
    movement, such as carriage returns and line
    feeds.
  • A line in a text file is a series of any
    character other than a NEWLINE (line feed)
    character and then a NEWLINE character.
    Additional lines in the file immediately follow
    the first line.
  • While a computer represents characters as
    numbers, the exact value used for each symbol
    varies depending on which alphabet has been
    chosen. The most common alphabet for English
    speakers is ASCII, also called Latin-1.
    Different human languages are represented by
    different computer encoding rules, so the exact
    numeric value for a given character depends on
    the human language being recorded.

8
So, What Is A Word?
  • A word is a group of printing characters, such as
    letters and digits, surrounded by white space,
    such as space characters or horizontal TAB
    characters.
  • Notice that our definition of a word does not
    include any notion of meaning. Only the form of
    the word is important, not its semantics. As far
    as Linux is concerned, a line such as

9
Chapter 2.  Finding Text grep
  • Key Concepts
  • grep is a command that prints lines that match a
    specified text string or pattern.
  • grep is commonly used as a filter to reduce
    output to only desired items.
  • grep -r will recursively grep files underneath a
    given directory.
  • grep -v prints lines that do NOT match a
    specified text string or pattern.
  • Many other command line switches allow users to
    specify grep's output format.

10
Searching Text File Contents using grep
  • In an earlier Lesson, we saw how the wc program
    can be used to count the characters, words and
    lines in text files. In this Lesson we introduce
    the grep program, a handy tool for searching text
    file contents for specific words or character
    sequences.
  • The name grep stands for general regular
    expression parser. What, you may well ask, is a
    regular expression and why on earth should I want
    to parse one? We will provide a more formal
    definition of regular expressions in a later
    Lesson, but for now it is enough to know that a
    regular expression is simply a way of describing
    a pattern, or template, to match some sequence of
    characters. A simple regular expression would be
    Hello, which matches exactly five characters
    H, e, two consecutive l characters, and a
    final o. More powerful search patterns are
    possible and we shall examine them in the next
    section.
  • The figure below gives the general form of the
    grep command line

11
Searching Text File Contents using grep
  • The following table summarizes some of grep's
    more commonly used command line switches. Consult
    the grep(1) man page (or invoke grep --help) for
    more.

12
Show All Occurrences of a String in a File
  • Under Linux, there are often several ways of
    accomplishing the same task. For example, to see
    if a file contains the word even, you could
    just visually scan the file
  • Reading the file, we see that the file does
    indeed contain the letters even. Using this
    method on a large file suffers because we could
    easily miss one word in a file of several
    thousand, or even several hundred thousand,
    words. We can use the grep tool to search through
    the file for us in an automatic search
  • Here we searched for a word using its exact
    spelling. Instead of just a literal string, the
    pattern argument can also be a general template
    for matching more complicated character
    sequences we shall explore that in a later
    Lesson.

13
Searching in Several Files at Once
  • An easy way to search several files is just to
    name them on the grep command line
  • Perhaps we are more interested in just
    discovering which file mentions the word nine
    than actually seeing the line itself. Adding the
    -l switch to the grep line does just that

14
Searching Directories Recursively
  • Grep can also search all the files in a whole
    directory tree with a single command. This can be
    handy when working a large number of files.
  • The easiest way to understand this is to see it
    in action. In the directory /etc/sysconfig are
    text files that contain much of the configuration
    information about a Linux system. The Linux name
    for the first Ethernet network device on a system
    is eth0, so you can find which file contains
    the configuration for eth0 by letting the grep -r
    command do the searching for you 11

15
Searching Directories Recursively
  • Every file in /etc/sysconfig that mentions eth0
    is shown in the results.
  • We can further limit the files listed to only
    those referring to an actual device by filtering
    the grep -r output through a grep DEVICE
  • This shows a common use of grep as a filter to
    simplify the outputs of other commands.
  • If only the names of the files were of interest,
    the output can be simplified with the -l command
    line switch.

16
Inverting grep
  • By default, grep shows only the lines matching
    the search pattern. Usually, this is what you
    want, but sometimes you are interested in the
    lines that do not match the pattern. In these
    instances, the -v command line switch inverts
    grep's operation.

17
Getting Line Numbers
  • Often you may be searching a large file that has
    many occurrences of the pattern. Grep will list
    each line containing one or more matches, but how
    is one to locate those lines in the original
    file? Using the grep -n command will also list
    the line number of each matching line.
  • The file /usr/share/dict/words contains a list of
    common dictionary words. Identify which line
    contains the word dictionary
  • You might also want to combine the -n switch with
    the -r switch when searching all the files below
    a directory

18
Limiting Matching to Whole Words
  • Remember the file containing our nursery rhyme
    earlier?
  • Suppose we wanted to retrieve all lines
    containing the word at. If we try the command
  • Do you see what happened? We matched the at
    string, whether it was an isolated word or part
    of a larger word. The grep command provides the
    -w switch to imply that the specified pattern
    should only match entire words.
  • The -w switch considers a sequence of letters,
    numbers, and underscore characters, surrounded by
    anything else, to be a word.

19
Ignoring Case
  • The string Bob has quite a meaning quite
    different from the string bob. However,
    sometimes we want to find either one, regardless
    of whether the word is capitalized or not. The
    grep -i command solves just this problem.

20
ExamplesFinding Simple Character Strings
  • Verify that your computer has the system account
    lp, used for the line printer tools. Hint the
    file /etc/passwd contains one line for each user
    account on the system.

21
Chapter 3.  Introduction to Regular Expressions
  • Key Concepts
  • Regular expressions are a standard Unix syntax
    for specifying text patterns.
  • Regular expressions are understood by many
    commands, including grep, sed, vi, and many
    scripting languages.
  • Within regular expressions, . and are used to
    match characters.
  • Within regular expressions, , , and ?specify a
    number of consecutive occurrences.
  • Within regular expressions, and specify the
    beginning and end of a line.
  • Within regular expressions, (, ), and specify
    alternative groups.
  • The regex(7) man page provides complete details.

22
Introducing Regular Expressions
  • In the previous chapter you saw grep used to
    match either a whole word or part of a word. This
    by its self is very powerful, especially in
    conjunction with arguments like -i and -v, but it
    is not appropriate for all search scenarios. Here
    are some examples of searches that the grep usage
    you've learned so far would not be able to do
  • First, suppose you had a file that looked like
    this

23
Introducing Regular Expressions
  • What if you wanted to pull out just the names of
    the people in people_and_pets.txt? A command like
    grep -w Name would match the 'Name' line for
    each person, but also the 'Name' line for each
    person's pet. How could we match only the 'Name'
    lines for people? Well, notice that the lines for
    pets' names are all indented, meaning that those
    lines begin with whitespace characters instead of
    text. Thus, we could achieve our goal if we had a
    way to say "Show me all lines that begin with
    'Name'".
  • Another example Suppose you and a friend both
    witnessed a hit-and-run car accident. You both
    got a look at the fleeing car's license plate and
    yet each of you recalls a slightly different
    number. You read the license number as "4I35VBB"
    but your friend read it as "413SV88". It seems
    that what you read as an 'I' in the second
    character, your friend read as a '1'. Similar
    differences appear in your interpretations of
    other parts of the license like '5' vs 'S' and
    'BB' vs '88'. The police, having taken both of
    your statements, now need to narrow down the
    suspects by querying their database of license
    plates for plates that might match what you saw.

24
Introducing Regular Expressions
  • One solution might be to do separate queries for
    "4I35VBB" and "413SV88" but doing so assumes that
    one of you is exactly right. What if the
    perpetrator's license number was actually
    "4135VB8"? In other words, what if you were right
    about some of the characters in question but your
    friend was right about others? It would be more
    effective if the police could query for a pattern
    that effectively said "Show me all license
    numbers that begin with a '4', followed by an 'I'
    or a '1', followed by a '3', followed by a '5' or
    an 'S', followed by a 'V', followed by two
    characters that are each either a 'B' or an '8'".
  • Query scenarios like these can be solved using
    regular expressions. While computer scientists
    sometimes use the term "regular expression" (or
    "regex" for short) to describe any method of
    describing complex patterns, in Linux and many
    programming languages the term refers to a very
    specific set of special characters used for
    solving problems like the above. Regular
    expressions are supported by a large number of
    tools including grep, vi, find and sed.

25
Introducing Regular Expressions
  • To introduce the usage of regular expressions,
    lets look at some solutions to two problems
    introduced earlier. Don't worry if these seem a
    bit complicated, the remainder of the unit will
    start from scratch and cover regular expressions
    in great detail.
  • A regex that could solve the first problem, where
    we wanted to say "Show me all lines that begin
    with 'Name'" might look like this
  • ...that's it! Regular expressions are all about
    the use of special characters, called
    metacharacters to represent advanced query
    parameters. The carat (""), as shown here, means
    "Lines that begin with...". Note, by the way,
    that the regular expression was put in
    single-quotes. This is a good habit to get into
    early on as it prevents bash from interpreting
    special characters that were meant for grep.

26
Introducing Regular Expressions
  • Ok, so what about the second problem? That one
    involved a much more complicated query "Show me
    all license numbers that begin with a '4',
    followed by an 'I' or a '1', followed by a '3',
    followed by a '5' or an 'S', followed by a 'V',
    followed by two characters that are each either a
    'B' or an '8'". This could be represented by a
    regular expression that looks like this
  • Wow, that's pretty short considering how long it
    took to write out what we were looking for! There
    are only two types of regex metacharacters used
    here square braces ('') and curly braces
    (''). When two or more characters are shown
    within square braces it means "any one of these".
    So 'B8' near the end of the expression means
    "'B' or '8'". When a number is shown within curly
    braces it means "this many of the preceding
    character". Thus, 'B82' means "two characters
    that are each either a 'B' or an '8'". Pretty
    powerful stuff!
  • Now that you've gotten a taste of what regular
    expressions are and how they can be used, let's
    start from scratch and cover them in depth.

27
Regular Expressions, Extended Regular
Expressions, and the grep Command
  • As the Unix implementation of regular expression
    syntax has evolved, new metacharacters have been
    introduced. In order to preserve backward
    compatibility, commands usually choose to
    implement regular expressions, or extended
    regular expressions. In order to not become
    bogged down with the differences, this Lesson
    will introduce the extended syntax, summarizing
    differences at the end of the discussion.
  • One of the most common uses for regular
    expressions is specifying search patterns for the
    grep command. As was mentioned in the previous
    Lesson, there are three versions of the grep
    command. Reiterating, the three differ in how
    they interpret regular expressions.

28
Regular Expressions, Extended Regular
Expressions, and the grep Command
  • fgrep
  • The fgrep command is designed to be a "fast"
    grep. The fgrep command does not support regular
    expressions, but instead interprets every
    character in the specified search pattern
    literally.
  • grep
  • The grep command interprets each patterns using
    the original, basic regular expression syntax.
  • egrep
  • The egrep command interprets each patterns using
    extended regular expression syntax.
  • Because we are not yet making a distinction
    between the basic and extended regular expression
    syntax, the egrep command should be used whenever
    the search pattern contains regular expressions.

29
Anatomy of a Regular Expression
  • In our discussion of the grep program family, we
    were introduced to the idea of using a pattern to
    identify the file content of interest. Our
    examples were carefully constructed so that the
    pattern contained exactly the text for which we
    were searching. We were careful to use only
    literal characters in our regular expressions a
    literal character matches only itself. So when we
    used hello as the regular expression, we were
    using a five-character regular expression
    composed only of literal characters. While this
    let us concentrate on learning how to operate the
    grep program, it didn't allow us to get a full
    appreciation of the power of regular expressions.
    Before we see regular expressions in use, we
    shall first see how they are constructed.

30
Anatomy of a Regular Expression
  • A regular expression is a sequence of
  • Literal Characters Literal characters match only
    themselves. Examples of literals are letters,
    digits and most special characters (see below for
    the exceptions).
  • Wildcards Wildcard characters match any
    character. Within a regular expression, a period
    (.) matches any character, be it a space, a
    letter, a digit, punctuation, anything.
  • Modifiers A modifier alters the meaning of the
    immediately preceding pattern character. For
    example, the expression abc matches the
    strings ac, abc, abbc, abbbc, and so on,
    because the asterisk () is a modifier that
    means any number of (including zero). Thus, our
    pattern means to match any sequence of characters
    consisting of one a, a (possibly empty) series
    of b characters, and a final c character.
  • Anchors Anchors establish the context for the
    pattern, such as "the beginning of a line", or
    "the end of a word". For example, the expression
    cat would match any occurrence of the three
    letters, while cat would only match lines that
    begin cat.

31
Taking Literals Literally
  • Literals are straightforward because each literal
    character in a regular expressions matches one,
    and only one, copy of itself in the searched
    text. Uppercase characters are distinct from
    lowercase characters, so that A does not match
    a.
  • Wildcards
  • The "dot" wildcard
  • The character . is used as a placeholder, to
    match one of any character. In the following
    example, the pattern matches any occurrence of
    the literal characters x and s, separated by
    exactly two other characters.

32
Bracket Expressions Ranges of Literal
Characters
  • Normally a literal character in a regex pattern
    matches exactly one occurrence of itself in the
    searched text. Suppose we want to search for the
    string hello regardless of how it is
    capitalized we want to match Hello and HeLLo
    as well. How might we do that?
  • A regex feature called a bracket expression
    solves this problem neatly. A bracket expression
    is a range of literals enclosed in square
    brackets ( and ). For example, the regex
    pattern Hh is a character range that matches
    exactly one character either an uppercase H or
    a lowercase h letter. Notice that it doesn't
    matter how large the set of characters within the
    range is, the set matches exactly one character,
    if it matches any at all. A bracket expression
    that matches the set of lowercase vowels could be
    written aeiou and would match exactly one
    vowel.
  • In the following example, bracket expressions are
    used to find words from the file
    /usr/share/dict/words. In the first case, the
    first five words that contain three consecutive
    (lowercase) vowels are printed. In the second
    case, the first 5 words that contain lowercase
    letters in the pattern of vowel-consonant-vowel-co
    nsonant-vowel-consonant are printed.

33
Bracket Expressions Ranges of Literal
Characters
  • If the first character of a bracket expression is
    a , the interpretation is inverted, and the
    bracket expression will match any single
    occurrence of a character not included in the
    range. For example, the expression aeiou
    would match any character that is not a vowel.
    The following example first lists words which
    contain three consecutive vowels, and secondly
    lists words which contain three consecutive
    consonant-vowel pairs.

34
Range Expressions vs. Character Classes Old
School and New School
  • Another way to express a character range is by
    giving the start- and end-letters of the sequence
    this way a-d would match any character from
    the set a, b, c or d. A typical usage of this
    form would be 0-9 to represent any single
    digit, or A-Z to represent all capital
    letters.

35
Range Expressions vs. Character Classes Old
School and New School
  • As an alternative to such quandaries, modern
    regular expression make use character classes.
    Character classes match any single character,
    using language specific conventions to decide if
    a given character is uppercase or lowercase, or
    if it should be considered part of the alphabet
    or punctuation. The following table lists some
    supported character classes, and the ASCII
    equivalent range expression, where appropriate.

36
Range Expressions vs. Character Classes Old
School and New School
  • Character classes avoid problems you may run into
    when using regular expressions on systems that
    use different character encoding schemes where
    letters are ordered differently. For example,
    suppose you were to run the command
  • On a Red Hat Enterprise Linux system, this would
    match every word in the file, not just those that
    contain capital letters as one might assume. This
    is because in unicode (utf-8), the character
    encoding scheme that RHEL uses, characters are
    alphabetized case-insensitively, so that A-Z is
    equivalent to AaBbCc...etc.

37
Range Expressions vs. Character Classes Old
School and New School
  • On older systems, though, a different character
    encoding scheme is used where alphabetization is
    done case-sensitively. On such systems A-Z
    would be equivalent to ABC...etc. Character
    classes avoid this pitfall. You can run
  • on any system regardless of the encoding scheme
    being used and it will only match lines that
    contain capital letters.
  • For more details about the predefined range
    expressions, consult the grep manual page. For
    more information on character encoding schemes
    under Linux, refer back to chapter 8.3. To learn
    about how character encoding schemes are used to
    support other languages in Red Hat Enterprise
    Linux, begin with the locale manual page.

38
Common Modifier Characters
  • We saw a common usage of a regex modifier in our
    earlier example abc to match an a and c
    character with some number of b letters in
    between. The character changed the
    interpretation of the literal b character from
    matching exactly one letter to matching any
    number of b's.
  • Here are a list of some common modifier
    characters
  • b? The question mark (?) means either one or
    none the literal character is considered to be
    optional in the searched text. For example, the
    regex pattern ab?c matches the strings ac,
    and abc, but not abbc.
  • b The asterisk () modifier means any number
    of (including zero) of the preceding literal
    character. The regex pattern abc matches the
    strings ac, abc, abbc, and so on.

39
Common Modifier Characters
  • b The plus () modifier means one or more,
    so the regex pattern b matches a non-empty
    sequence of b's. The regex pattern abc matches
    the strings abc and abbc, but does not match
    ac
  • bm,n The brace modifier is used to specify a
    range of between m and n occurrences of the
    preceding character. The regex pattern b2,4
    would match abbc and abbbc, and abbbbc, but
    not abc or abbbbbc.
  • bn With only one integer, the brace modifier is
    used to specify exactly n occurrences for the
    preceding character.

40
Common Modifier Characters
  • In the following example, egrep prints lines from
    /usr/share/dict/words that contain patterns which
    start with a (capital or lowercase) a, might or
    might not next have a (lowercase) b, but then
    definitely follow with a (lowercase) a.
  • The following example prints lines which contain
    patterns which start al, then use the .
    wildcard to specify 0 or more occurrences of any
    character, followed by the pattern bra.

41
Common Modifier Characters
  • Notice we found variations on the words algebra
    and calibrate. For the former, the . expression
    matched ge, while for the latter, it matched
    the letter i.
  • The expression ., which is interpreted as "0
    or more of any character", shows up often in
    regex patterns, acting as the "stretchable glue"
    between two patterns of significance.
  • As a subtlety, we should note that the modifier
    characters are greedy they always match the
    longest possible input string. For example, given
    the regex pattern

42
Anchored Searches
  • Four additional search modifier characters are
    available
  • foo A caret () matches the beginning of a
    line. Our example foo matches the string foo
    only when it is at the beginning of a line
  • foo A dollar sign () matches the end of a
    line. Our example foo matches the string foo
    only at the end of a line, immediately before the
    newline character.
  • \ltfoo\gt By themselves, the less than sign (lt)
    and the greater than sign (gt) are literals.
    Using the backslash character to escape them
    transforms them into meaning first of a word
    and end of a word, respectively. Thus the
    pattern \gtcat\lt matches the word cat but not
    the word catalog.
  • You will frequently see both and used
    together. The regex pattern foo matches a
    whole line that contains only foo and would not
    match that line if it contained any spaces.
  • The \lt and \gt are also usually used as pairs.

43
Anchored Searches
  • In the following an example, the first search
    lists all lines that contain the letters ion
    anywhere on the line. The second search only
    lists lines which end in ion.

44
Coming to Terms with Regex Grouping
  • The same way that you can use parenthesis to
    group terms within a mathematical expression, you
    also use parenthesis to collect regular
    expression pattern specifiers into groups. This
    lets the modifier characters ?, and
    apply to groups of regex specifiers instead of
    only the immediately preceding specifier.
  • Suppose we need a regular expression to match
    either foo or foobar. We could write the
    regex as foo(bar)? and get the desired results.
    This lets the ? modifier apply to the whole
    string bar instead of only the preceding r
    character.
  • Grouping regex specifiers using parenthesis
    becomes even more flexible when the pipe symbol
    () is used to separate alternative patterns.
    Using alternatives, we could rewrite our previous
    example as (foofoobar). Writing this as
    foofoobar is simpler and works just as well,
    because just like mathematics, regex specifiers
    have precedence. While you are learning, always
    enclose your groups in parenthesis.

45
Coming to Terms with Regex Grouping
  • In the following example, the first search prints
    all lines from the file /usr/share/dict/words
    which contain four consecutive vowels (compare
    the syntax to that used when first introducing
    range expressions, above). The second search
    finds words that contain a double o or a double
    e, followed (somewhere) by a double e.

46
Escaping Meta-Characters
  • Sometimes you need to match a character that
    would ordinarily be interpreted as a regular
    expression wildcard or modifier character. To
    temporarily disable the special meaning of these
    characters, simply escape them using the
    backslash (\) character. For example, the regex
    pattern cat. would match the letters cat
    followed by any character cats or catchup.
    To match only the letters cat. at the end of a
    sentence, use the regex pattern cat\. to
    disable interpreting the period as a wildcard
    character.
  • Note one distracting exception to this rule. When
    the backslash character precedes a lt or gt
    character, it enables the special interpretation
    (anchoring the beginning or ending of a word)
    instead of disabling the special interpretation.
    Shudder. It even gets worse - see the footnote at
    the bottom of the following table.

47
Summary of Linux Regular Expression Syntax
  • The following table summarizes regular expression
    syntax, and identifies which components are found
    in basic regular expression syntax, and which are
    found only in the extended regular expression
    syntax.

48
Summary of Linux Regular Expression Syntax
  • The following table summarizes regular expression
    syntax, and identifies which components are found
    in basic regular expression syntax, and which are
    found only in the extended regular expression
    syntax.

49
Regular Expressions are NOT File Globbing
  • When first encountering regular expressions,
    students understandably confuse regular
    expressions with pathname expansion (file
    globbing). Both are used to match patterns in
    text. Both share similar metacharacters (,
    ?, ...), etc.). However, they are
    distinctly different. The following table
    compares and contrasts regular expressions and
    file globbing.

50
Regular Expressions are NOT File Globbing
  • In the following example, the first argument is a
    regular expression, specifying text which starts
    with an l and ends .conf, while the second
    argument is a file glob which specifies all files
    in the /etc directory whose filename starts with
    l and ends .conf.
  • Take a close look at the second line of output.
    Why was it matched by the specified regular
    expression?
  • Why does the line containing the text krb5.conf
    match the expression? The l is found way back
    in the word default!
  • In a similar vain, when specifying regular
    expressions on the bash command line, care must
    be taken to quote or escape the regex
    meta-characters, lest they be expanded away by
    the bash shell with unexpected results. In all of
    the examples found in this discussion, the first
    argument to the egrep command is protected with
    single quotes for just this reason.

51
Where to Find More Information About Regular
Expressions
  • We have barely scratched the surface of the
    usefulness of regular expressions. The
    explanation we have provided will be adequate for
    your daily needs, but even so, regular
    expressions offer much more power, making even
    complicated text searches simple to perform.
  • For more online information about regular
    expressions, you should check
  • The regex(7) manual page.
  • The grep(1) manual page.

52
Examples
  • Regular Expression Modifiers

53
Workbook 9Managing Processes
Pace Center for Business and Technology
54
Chapter 1.  An Introduction to Processes
  • Key Concepts
  • A process is an instance of a running executable,
    identified by a process id (pid).
  • Because Linux implements virtual memory, every
    process possesses its own distinct memory
    context.
  • A process has a uid and a collection of gid as
    credentials.
  • A process has a filesystem context, including a
    cwd, a umask, a root directory, and a collection
    of open files.
  • A process has a scheduling context, including a
    niceness value.
  • A process has a collection of environment
    variables.
  • The ps command can be used to examine all
    currently running processes.
  • The top command can be used to monitor all
    running processes.

55
Processes are How Things Get Done
  • Almost anything that happens in a Linux system,
    happens as a process. If you are viewing this
    text in a web browser, that browser is running as
    a process. If you are typing at a bash shell's
    command line, that shell is running as a process.
    If you are using the chmod command to change a
    file's permissions, the chmod command operates as
    a separate process. Processes are how things get
    done, and the primary responsibility of the Linux
    kernel is to provide a place for processes to do
    their stuff without stepping on each other's
    toes.
  • Processes are an instance of an executing
    program. In other operating systems, programs are
    often large, elaborate, graphical applications
    that take a noticeably long time to start up. In
    the Linux (and Unix) world, these types of
    programs exist as well, but so do a whole class
    of programs which usually have no counterpart in
    other operating systems. These programs are
    designed to be quick to start, specialized in
    function, and play well with others. On a Linux
    system, processes running these programs are
    constantly popping into and out of existence.

56
Processes are How Things Get Done
  • For example, consider the user maxwell performing
    the following command line.
  • In the split second that the command line took to
    execute, no less four than processes (ps, grep,
    bash, and date) were started, did their thing,
    and exited.

57
What is a Process?
  • By this point, you could well be tired of hearing
    the answer a process in an instance of a running
    program. Here, however, we provide a more
    detailed list of the components that constitute a
    process.
  • Execution Context
  • Every process exists (at least to some extent)
    within the physical memory of the machine.
    Because Linux (and Unix) is designed to be a
    multiuser environment, the memory allocated to a
    process is protected, and no other process can
    access it. In its memory, a process loads a copy
    of its executable instructions, and stores any
    other dynamic information it is managing. A
    process also carries parameters associated with
    how often it gets the opportunity to access the
    CPU, such as its execution state and its niceness
    value (more on these soon).

58
What is a Process?
  • I/O Context
  • Every process interacts to some extent with the
    filesystem in order to read or write information
    that exists before or will exist after the
    lifespan of the process. Elements of a process's
    input/output context include the following.
  • Open File Descriptors
  • Almost every process is reading information from
    or writing information to external sources,
    usually both. In Linux, open file descriptors act
    as sources or sinks of information. Processes
    read information from or write information to
    file descriptors, which may be connected to
    regular files, device nodes, network sockets, or
    even each other as pipes (allowing interprocess
    communication).
  • Memory Mapped Files
  • Memory mapped files are files whose contents have
    been mapped directly into the process's memory.
    Rather than reading or writing to a file
    descriptor, the process just accesses the
    appropriate memory address. Memory maps are most
    often used to load a process's executable code,
    but may also be used for other types of
    non-sequential access to data.

59
What is a Process?
  • Filesystem Context
  • We have encountered several pieces of information
    related to the filesystem that processes
    maintain, such as the process's current working
    directory (for translating relative file
    references) and the process's umask (for setting
    permissions on newly created files). 13
  • Environment Variables
  • Every process maintains its own list of
    name-value pairs, referred to as environment
    variables, or collectively as the process's
    environment. Processes generally inherit their
    environment on startup, and may refer to it for
    information such as the user's preferred language
    or favorite editor.
  • Heritage Information
  • Every process is identified by a PID, or process
    id, which it is assigned when it is created. In a
    later Lesson, we will discover that every process
    has a clearly defined parent and possibly well
    defined children. A process's own identity, the
    identity of its children, and to some extent the
    identity of its siblings are maintained by the
    process.

60
What is a Process?
  • Credentials
  • Every process runs under the context of a given
    user (or, more exactly, a given user id), and
    under the context of a collection of group id's
    (generally, all of the groups that the user
    belongs to). These credentials limit what
    resources a process can access, such as which
    files it can open or with which other processes
    it is allowed to communicate.
  • Resource Statistics and Limits
  • Every process also records statistics to track
    the extent to which system resources have been
    utilized, such as its memory size, its number of
    open files, its amount of CPU time, and others.
    The amount of many of these resources that a
    process is allowed to use can also be limited, a
    concept called resource limits.

61
Viewing Processes with the ps Command
  • We have already encountered the ps command many
    times. Now, we will attempt to familiarize
    ourselves with a broader selection of the many
    command line switches associated with it. A quick
    ps --help will display a summary of over 50
    different switches for customizing the ps
    command's behavior. To complicate matters,
    different versions of Unix have developed their
    own versions of the ps command, which do not use
    the same command line switch conventions. The
    Linux version of the ps command tries to be as
    accommodating as possible to people from
    different Unix backgrounds, and often there are
    multiple switches for any give option, some of
    which start with a conventional leading hyphen
    (-), and some of which do not.

62
Viewing Processes with the ps Command
  • Process Selection
  • By default, the ps command lists all processes
    started from a user's terminal. While reasonable
    when users connected to Unix boxes using serial
    line terminals, this behavior seems a bit
    minimalist when every terminal window within an X
    graphical environment is treated as a separate
    terminal. The following command line switches can
    be used to expand (or reduce) the processes which
    the ps command lists.

63
Output Selection
  • As implied by the initial paragraphs of this
    Lesson, there are many parameters associated with
    processes, too many to display in a standard
    terminal width of 80 columns. The following table
    lists common command line switches used to select
    what aspects of a process are listed.

64
Output Selection
  • Additionally, the following switches can be used
    to modify how the selected information is
    displayed.

65
Oddities of the ps Command
  • The ps command, probably more so than any other
    command in Linux, has oddities associated with
    its command line switches. In practice, users
    tend to experiment until they find combinations
    that work for them, and then stick to them. For
    example, the author prefers ps aux for a general
    purpose listing of all processes, while many
    people prefer ps -ef. The above tables should
    provide a reasonable "working set" for the
    novice.
  • The command line switches tend to fall into two
    categories, those with the traditional leading
    hyphen ("Unix98" style options), and those
    without ("BSD" style options). Often, a given
    functionality will be represented by one of each.
    When grouping multiple single letter switches,
    only switches of the same style can be grouped.
    For example, ps axf is the same as ps a x f, not
    ps a x -f.

66
Monitoring Processes with the top Command
  • The ps command displays statistics for specified
    processes at the instant that the command is run,
    providing a snapshot of an instance in time. In
    contrast, the top command is useful for
    monitoring the general state of affairs of
    processes on the machine.
  • The top command is intended to be run from within
    a terminal. It will replace the command line with
    a table of currently running processes, which
    updates every few seconds. The following
    demonstrates a user's screen after running the
    top command.

67
Monitoring Processes with the top Command
  • While the command is running, the keyboard is
    "live". In other words, the top command will
    respond to single key presses without waiting for
    a return key. The following table lists some of
    the more commonly used keys.

68
Monitoring Processes with the top Command
  • The last two command, which either kill or renice
    a process, use concepts that we will cover in
    more detail in a later Lesson.
  • Although most often run without command line
    configuration, top does support the following
    command line switches.

69
Monitoring Processes with the gnome-system-monitor
Application
  • If running an X server, the GNOME desktop
    environment provides an application similar in
    function to top, with the benefits (and
    drawbacks) of a graphical application. The
    application can be started from the command line
    as gnome-system-monitor, or by selecting the
    System Administration System Monitor menu
    item.

70
Monitoring Processes with the gnome-system-monitor
Application
  • Like the top command, the System Monitor displays
    a list of processes running on the local machine,
    refreshing the list every few seconds. In its
    default configuration, the System Monitor
    provides a much simpler interface it lists only
    the processes owned by the user who started the
    application, and reduces the number of columns to
    just the process's command, owner, Process ID,
    and simple measures of the process's Memory and
    CPU utilization. Processes may be sorted by any
    one of these fields by simply clicking on the
    column's title.

71
Monitoring Processes with the gnome-system-monitor
Application
  • When right-clicking on a process, a pop-up menu
    allows the user to perform many of the actions
    that top allowed, such as renicing or killing a
    process, though again with a simpler (and not as
    flexible) interface.

72
Monitoring Processes with the gnome-system-monitor
Application
  • The System Monitor may be configured by opening
    the Edit Preferences menu selection. Within the
    Preferences dialog, the user may set the update
    interval (in seconds), and configure many more
    fields to be displayed.

73
Locating processes with the pgrep Command.
  • Often, users are trying to locate information
    about processes identified by the command they
    are running, or the user who is running them. One
    technique is to list all processes, and use the
    grep command to reduce the information. In the
    following, maxwell first looks for all instances
    of the sshd daemon, and then for all processes
    owned by the user maxwell.
  • While maxwell can find the information he needs,
    there are some unpleasant issues.
  • The approach is not exacting. Notice that, in the
    second search, a su process showed up, not
    because it was owned by maxwell, but because the
    word maxwell was one of its arguments.
  • Similarly, the grep command itself usually shows
    up in the output.
  • The compound command can be awkward to type.

74
Locating processes with the pgrep Command.
  • In order to address these issues, the pgrep
    command was created. Named pgrep for obvious
    reasons, the command allows users to quickly list
    processes by command name, user, terminal, or
    group.
  • pgrep SWITCHES PATTERN
  • Its optional argument, if supplied, is
    interpreted as an extended regular expression
    pattern to be matched against command names. The
    following command line switches may also be used
    to qualify the search.

75
Locating processes with the pgrep Command.
  • In addition, the following command line switches
    can be use to qualify the output formatting of
    the command.
  • For a complete list of switches, consult the
    pgrep(1) man page.
  • As a quick example, maxwell will repeat his two
    previous process listings, using the pgrep
    command.

76
ExamplesChapter 1.  An Introduction to Processes
  • Viewing All Processes with the "User Oriented"
    Format
  • In the following transcript, maxwell uses the ps
    -e u command to list all processes (-e) with the
    "user oriented" format (u).
  • The "user oriented" view displays the user who is
    running the process, the process id, and a rough
    estimate of the amount of CPU and memory the
    process is consuming, as well as the state of the
    process. (Process states will be discussed in the
    next Lesson).

77
QuestionsChapter 1.  An Introduction to
Processes
  • 1, 2, and 3

78
Chapter 2  Process States
  • Key Concepts
  • In Linux, the first process, /sbin/init, is
    started by the kernel on bootup. All other
    processes are the result of a parent process
    duplicating itself, or forking.
  • A process begins executing a new command through
    a process called execing.
  • Often, new commands are run by a process (often a
    shell) first forking, and then execing. This
    mechanism is referred to as the fork and exec
    mechanism.
  • Processes can always be found in one of five well
    defined states runnable, voluntarily sleeping,
    involuntarily sleeping, stopped, or zombie.
  • Process ancestry can be viewed with the pstree
    command.
  • When a process dies, it is the responsibility of
    the process's parent to collect it's return code
    and resource usage information.
  • When a parent dies before it's children, the
    orphaned children are inherited by the first
    process (usually /sbin/init).

79
A Process's Life Cycle
  • How Processes are Started
  • In Linux (and Unix), unlike many other operating
    systems, process creation and command execution
    are two separate concepts. Though usually a new
    process is created so that it can run a specified
    command (such as the bash shell creating a
    process to run the chmod command), processes can
    be created without running a new command, and new
    commands can be executed without creating a new
    process.
  • Creating a New Process (Forking) New processes
    are created through a technique called forking.
    When a process forks, it creates a duplicate of
    itself. Immediately after a fork, the newly
    created process (the child) is an almost exact
    duplicate of the original process (the parent).
    The child inherits an identical copy of the
    original process's memory, any open files of the
    parent, and identical copies of any parameters of
    the parent, such as the current working directory
    or umask. About the only difference between the
    parent and the child is the child's heritage
    information (the child has a different process ID
    and a different parent process ID, for starters),
    and (for the programmers in the audience) the
    return value of the fork() system call.
  • As a quick aside for any programmers in the
    audience, a fork is usually implemented using a
    structure similar to the following.

80
A Process's Life Cycle
  • As a quick aside for any programmers in the
    audience, a fork is usually implemented using a
    structure similar to the following.
  • When a process wants to create a new process, it
    calls the fork() system call (with no arguments).
    Though only one process enters the fork() call,
    two processes return from in. For the newly
    created process (the child), the return value is
    0. For the original process (the parent), the
    return value is the process ID of the child. By
    branching on this value, the child may now go off
    to do whatever it was started to do (which often
    involves exec()ing, see next), and the parent can
    go on to do its own thing.

81
A Process's Life Cycle
  • Executing a New Command (Exec-ing) New commands
    are run through a technique called execing (short
    for executing). When execing a new command, the
    current process wipes and releases most of its
    resources, and loads a new set of instructions
    from the command specified in the filesystem.
    Execution starts with the entry point of the new
    program.
  • After execing, the new command is still the same
    process. It has the same process ID, and many of
    the same parameters (such as its resource
    utilization, umask, current working directory,
    and others). It merely forgets its former
    command, and adopts the new one.
  • Again for any programmers, execs are performed
    through one of several variants of the execve()
    system call, such as the execl() library call.
  • The process enters the the execl(...) call,
    specifying the new command to run. If all goes
    well, the execl(...) call never returns. Instead,
    execution picks up at the entry point (i.e.,
    main()) of the new program. If for some reason
    execl(...) does return, it must be an error (such
    as not being able to locate the command's
    executable in the filesystem).

82
A Process's Life Cycle
  • Combining the Two
  • Fork and Exec Some programs may fork without
    execing. Examples include networking daemons, who
    fork a new child to handle a specific client
    connection, while the parent goes back to listen
    for new clients. Other programs might exec
    without forking. Examples include the login
    command, which becomes the user's login shell
    after successfully confirming a user's password.
    Most often, and for shell's in particular,
    however, forking and execing go hand and hand.
    When running a command, the bash shell first
    forks a new bash shell. The child then execs the
    appropriate command, while the parent waits for
    the child to die, and then issues another prompt.

83
The Lineage of Processes (and the pstree Command)
  • Upon booting the system, one of the
    responsibilities of the Linux kernel is to start
    the first process (usually /sbin/init). All other
    processes are started because an already existing
    process forked. 2
  • Because every process except the first is created
    by forking, there exists a well defined lineage
    of parent child relationships among the
    processes. The first process started by the
    kernel starts off the family tree, which can be
    examined with the pstree command.

84
How a Process Dies
  • When a process dies, it either dies normally by
    electing to exit, or abnormally as the result of
    receiving a signal. We here discuss a normally
    exiting process, postponing a discussion of
    signals until a later Lesson.
  • We have mentioned previously that processes leave
    behind a status code (also called return value)
    when they die, in the form of an integer. (Recall
    the bash shell, which uses the ? variable to
    store the return value of the previously run
    command.) When a process exits, all of its
    resources are freed, except the return code (and
    some resource utilization accounting
    information). It is the responsibility of the
    process's parent to collect this information, and
    free up the last remaining resources of the dead
    child. For example, when the bash shell forks and
    execs the chmod command, it is the parent bash
    shell's responsibility to collect the return
    value from the exited chmod command.
  • Orphans
  • If it is a parent's responsibility to clean up
    after their children, what happens if the parent
    dies before the child does? The child becomes an
    orphan. One of the special responsibilities of
    the first process started by the kernel is to
    "adopt" any orphans. (Notice that in the output
    of the pstree command, the first process has a
    disproportionately large number of children. Most
    of these were adopted as the orphans of other
    processes).

85
How a Process Dies
  • Zombies
  • In between the time when a process exits, freeing
    most of its resources, and the time when its
    parent collects its return value, freeing the
    rest of its resources, the child process is in a
    special state referred to as a Zombie. Every
    process passes through a transient zombie state.
    Usually, users need to be looking at just the
    right time (with the ps command, for example) to
    witness a zombie. They show up in the list of
    processes, but take up no memory, no CPU time, or
    any other system resources. They are just the
    shadow of a former process, waiting for their
    parent to come and finish them off.
  • Negligent Parents and Long Lived Zombies
  • Occasionally, parent processes can be negligent.
    They start child processes, but then never go
    back to clean up after them. When this happens
    (usually because of a programmer's error), the
    child can exit, enter the zombie state, and stay
    there. This is usually the case when users
    witness zombie processes using, for example, the
    ps command.
  • Getting rid of zombies is perhaps the most
    misunderstood basic Linux (and Unix) concept.
    Many people will say that there is no way to get
    rid of them, except by rebooting the machine.
    Using the clues discussed in this section, can
    you figure out how to get rid of long lived
    zombies?
  • You get rid of zombies by getting rid of the
    negligent parent. When the parent dies (or is
    killed), the now orphaned zombie gets adopted by
    the first process, which is almost always
    /sbin/init. /sbin/init is a very diligent parent,
    who always cleans up after its children
    (including adopted orphans).

86
The 5 Process States
  • The previous section discussed how processes are
    started, and how they die. While processes are
    alive they are always in one of five process
    states, which effect how and when they are
    allowed to have access to the CPU. The following
    lists each of the five states, along with the
    conventional letter that is used by the ps, top,
    and other commands to identify a process's
    current state.
  • Runnable (R)
  • Processes in the Runnable state are processes
    that, if given the opportunity to access the CPU,
    would take it. More formally, this is know as the
    Running state, but because only one process may
    be executing on the CPU at any given time, only
    one of these processes will actually be "running"
    at any given instance. Because runnable processes
    are switched in and out of the CPU so quickly,
    however, the Linux system gives the appearance
    that all of the processes are running
    simultaneously.

87
The 5 Process States
  • Voluntary (Interruptible) Sleep (S)
  • As the name implies, a process which is in a
    voluntary sleep elected to be there. Usually,
    this is a process that has nothing to do until
    something interesting happens. A classic example
    is a networking daemon, such as the httpd process
    that implements a web server. In between requests
    from a client (web browser), the server has
    nothing to do, and elects to go to sleep. Another
    example would be the top command, which lists
    processes every five seconds. While it is waiting
    for five seconds to pass, it drops itself into a
    voluntary sleep. When something that the process
    in interested in happens (such as a web client
    makes a request, or a five second timer expires),
    the sleeping process is kicked back into the
    Runnable state.
  • Involuntary (Non-interruptible)
  • Sleep (D) Occasionally, two processes try to
    access the same system resource at the same time.
    For example, one process attempts to read from a
    block on a disk while that block is being written
    to because of another process. In these
    situations, the kernel forces the process into an
    involuntary sleep. The process did not elect to
    sleep, it would prefer to be runnable so it can
    get things done. When the resource is freed, the
    kernel will put the process back into the
    runnable state.
  • Although processes are constantly dropping into
    and out of involuntary sleeps, they usually do
    not stay there long. As a result, users do not
    usually witness processes in an involuntary sleep
    except on busy systems.

88
The 5 Process States
  • Stopped (Suspended)
  • Processes (T) Occasionally, users decide to
    suspend processes. Suspended processes will not
    perform any actions until they are restarted by
    the user. In the bash shell, the CTRLZ key
    sequence can be used to suspend a process. In
    programming, debuggers often suspend the programs
    the are debugging when certain events happen
    (such as breakpoints occur).
  • Zombie Processes (Z)
  • As mentioned above, every dieing process goes
    through a transient zombie state. Occasionally,
    however, some get stuck there. Zombie processes
    have finished executing, and have freed all of
    their memory and almost all of their resources.
    Because they are not consuming any resources,
    they are little more than an annoyance that can
    show up in process listings.

89
Viewing Process States
  • When viewing the output of commands such as ps
    and top, process states are usually listed under
    the heading STAT. The process is identified by
    one of the following letters.
  • Runnable - R
  • Sleeping - S
  • Stopped - T
  • Uninterruptible sleep - D
  • Zombie - Z

90
ExamplesChapter 2.  Process States
  • Identifying Process States

91
QuestionsChapter 2.  Process States
  • 1, 2, and 4

92
Chapter 4.  Sending Signals
  • Key Concepts
  • Signals are a low level form of inter-process
    communication, which arise from a variety of
    sources, including the kernel, the terminal, and
    other processes.
  • Signals are distinguished by signal numbers,
    which have conventional symbolic names and uses.
    The symbolic names for signal numbers can be
    listed with the kill -l command.
  • The kill command sends signals to other
    processes.
  • Upon receiving a signal, a process may either
    ignore it, react in a kernel specified default
    manner, or implement a custom signal handler.
  • Conventionally, signal number 15 (SIGTERM) is
    used to request the termination of a process.
  • Signal number 9 (SIGKILL) terminates a process,
    and cannot be overridden.
  • The pkill and killall commands can be used to
    deliver signals to processes specified by command
    name, or the user who owns them.
  • Other utilities, such as top and the GNOME System
    Monitor can be used to deliver signals as well.

93
Signals
  • Linux (and Unix) uses signals to notify processes
Write a Comment
User Comments (0)
About PowerShow.com