Title: Workbook 8, and 9
1Workbook 8, and 9
Pace Center for Business and Technology
2String Processing Tools
- Key Concepts
- The wc command counts the number of characters,
words, and lines in a file. When applied to
structured data, the wc command can become a
versatile counting tool. - The cat command has options that allow
representation of nonprinting characters such as
NEWLINE. - The head and tail commands have options that
allow you to print only a certain number of lines
or a certain number of bytes (one byte usually
correlates to one character) from a file.
3Revisiting cat, head, and tail
- Revisiting cat
- We have been using the cat command to simply
display the contents of files. Usually, the cat
command generates a faithful copy of its input,
without performing any edits or conversions. When
called with one of the following command line
switches, however, the cat command will indicate
the presence tabs, line feeds, and other control
sequences, using the following conventions. - Using the -A command line switch, the whitespace
structure of the file becomes evident, as tabs
are replaced with I, and line feeds are
decorated with . E.g. cat -A /etc/hosts
4Revisiting head and tail
- For example, the following file contains a list
of four musicians. - Linux (and Unix) text files generally adhere to a
convention that the last character of the file
must be a line feed for the last line of text.
Following the cat of the file musicians.mac,
which does not contain any conventional Linux
line feed characters, the bash prompt is not
displayed in its usual location.
5Revisiting head and tail
6The wc (Word Count) Command
- When used without any command line switches, wc
will report on the number of characters, lines,
and words. Command line switches can be combined
to return any combination of character count,
line count or word count.
7How To Recognize A Real Character
- Text files are composed using an alphabet of
characters. Some characters are visible, such as
numbers and letters. Some characters are used for
horizontal distance, such as spaces and TAB
characters. Some characters are used for vertical
movement, such as carriage returns and line
feeds. - A line in a text file is a series of any
character other than a NEWLINE (line feed)
character and then a NEWLINE character.
Additional lines in the file immediately follow
the first line. - While a computer represents characters as
numbers, the exact value used for each symbol
varies depending on which alphabet has been
chosen. The most common alphabet for English
speakers is ASCII, also called Latin-1.
Different human languages are represented by
different computer encoding rules, so the exact
numeric value for a given character depends on
the human language being recorded.
8So, What Is A Word?
- A word is a group of printing characters, such as
letters and digits, surrounded by white space,
such as space characters or horizontal TAB
characters. - Notice that our definition of a word does not
include any notion of meaning. Only the form of
the word is important, not its semantics. As far
as Linux is concerned, a line such as
9Chapter 2. Finding Text grep
- Key Concepts
- grep is a command that prints lines that match a
specified text string or pattern. - grep is commonly used as a filter to reduce
output to only desired items. - grep -r will recursively grep files underneath a
given directory. - grep -v prints lines that do NOT match a
specified text string or pattern. - Many other command line switches allow users to
specify grep's output format.
10Searching Text File Contents using grep
- In an earlier Lesson, we saw how the wc program
can be used to count the characters, words and
lines in text files. In this Lesson we introduce
the grep program, a handy tool for searching text
file contents for specific words or character
sequences. - The name grep stands for general regular
expression parser. What, you may well ask, is a
regular expression and why on earth should I want
to parse one? We will provide a more formal
definition of regular expressions in a later
Lesson, but for now it is enough to know that a
regular expression is simply a way of describing
a pattern, or template, to match some sequence of
characters. A simple regular expression would be
Hello, which matches exactly five characters
H, e, two consecutive l characters, and a
final o. More powerful search patterns are
possible and we shall examine them in the next
section. - The figure below gives the general form of the
grep command line
11Searching Text File Contents using grep
- The following table summarizes some of grep's
more commonly used command line switches. Consult
the grep(1) man page (or invoke grep --help) for
more.
12Show All Occurrences of a String in a File
- Under Linux, there are often several ways of
accomplishing the same task. For example, to see
if a file contains the word even, you could
just visually scan the file - Reading the file, we see that the file does
indeed contain the letters even. Using this
method on a large file suffers because we could
easily miss one word in a file of several
thousand, or even several hundred thousand,
words. We can use the grep tool to search through
the file for us in an automatic search - Here we searched for a word using its exact
spelling. Instead of just a literal string, the
pattern argument can also be a general template
for matching more complicated character
sequences we shall explore that in a later
Lesson.
13Searching in Several Files at Once
- An easy way to search several files is just to
name them on the grep command line - Perhaps we are more interested in just
discovering which file mentions the word nine
than actually seeing the line itself. Adding the
-l switch to the grep line does just that
14Searching Directories Recursively
- Grep can also search all the files in a whole
directory tree with a single command. This can be
handy when working a large number of files. - The easiest way to understand this is to see it
in action. In the directory /etc/sysconfig are
text files that contain much of the configuration
information about a Linux system. The Linux name
for the first Ethernet network device on a system
is eth0, so you can find which file contains
the configuration for eth0 by letting the grep -r
command do the searching for you 11
15Searching Directories Recursively
- Every file in /etc/sysconfig that mentions eth0
is shown in the results. - We can further limit the files listed to only
those referring to an actual device by filtering
the grep -r output through a grep DEVICE - This shows a common use of grep as a filter to
simplify the outputs of other commands. - If only the names of the files were of interest,
the output can be simplified with the -l command
line switch.
16Inverting grep
- By default, grep shows only the lines matching
the search pattern. Usually, this is what you
want, but sometimes you are interested in the
lines that do not match the pattern. In these
instances, the -v command line switch inverts
grep's operation.
17Getting Line Numbers
- Often you may be searching a large file that has
many occurrences of the pattern. Grep will list
each line containing one or more matches, but how
is one to locate those lines in the original
file? Using the grep -n command will also list
the line number of each matching line. - The file /usr/share/dict/words contains a list of
common dictionary words. Identify which line
contains the word dictionary - You might also want to combine the -n switch with
the -r switch when searching all the files below
a directory
18Limiting Matching to Whole Words
- Remember the file containing our nursery rhyme
earlier? - Suppose we wanted to retrieve all lines
containing the word at. If we try the command
- Do you see what happened? We matched the at
string, whether it was an isolated word or part
of a larger word. The grep command provides the
-w switch to imply that the specified pattern
should only match entire words. - The -w switch considers a sequence of letters,
numbers, and underscore characters, surrounded by
anything else, to be a word.
19Ignoring Case
- The string Bob has quite a meaning quite
different from the string bob. However,
sometimes we want to find either one, regardless
of whether the word is capitalized or not. The
grep -i command solves just this problem.
20ExamplesFinding Simple Character Strings
- Verify that your computer has the system account
lp, used for the line printer tools. Hint the
file /etc/passwd contains one line for each user
account on the system.
21Chapter 3. Introduction to Regular Expressions
- Key Concepts
- Regular expressions are a standard Unix syntax
for specifying text patterns. - Regular expressions are understood by many
commands, including grep, sed, vi, and many
scripting languages. - Within regular expressions, . and are used to
match characters. - Within regular expressions, , , and ?specify a
number of consecutive occurrences. - Within regular expressions, and specify the
beginning and end of a line. - Within regular expressions, (, ), and specify
alternative groups. - The regex(7) man page provides complete details.
22Introducing Regular Expressions
- In the previous chapter you saw grep used to
match either a whole word or part of a word. This
by its self is very powerful, especially in
conjunction with arguments like -i and -v, but it
is not appropriate for all search scenarios. Here
are some examples of searches that the grep usage
you've learned so far would not be able to do - First, suppose you had a file that looked like
this
23Introducing Regular Expressions
- What if you wanted to pull out just the names of
the people in people_and_pets.txt? A command like
grep -w Name would match the 'Name' line for
each person, but also the 'Name' line for each
person's pet. How could we match only the 'Name'
lines for people? Well, notice that the lines for
pets' names are all indented, meaning that those
lines begin with whitespace characters instead of
text. Thus, we could achieve our goal if we had a
way to say "Show me all lines that begin with
'Name'". - Another example Suppose you and a friend both
witnessed a hit-and-run car accident. You both
got a look at the fleeing car's license plate and
yet each of you recalls a slightly different
number. You read the license number as "4I35VBB"
but your friend read it as "413SV88". It seems
that what you read as an 'I' in the second
character, your friend read as a '1'. Similar
differences appear in your interpretations of
other parts of the license like '5' vs 'S' and
'BB' vs '88'. The police, having taken both of
your statements, now need to narrow down the
suspects by querying their database of license
plates for plates that might match what you saw.
24Introducing Regular Expressions
- One solution might be to do separate queries for
"4I35VBB" and "413SV88" but doing so assumes that
one of you is exactly right. What if the
perpetrator's license number was actually
"4135VB8"? In other words, what if you were right
about some of the characters in question but your
friend was right about others? It would be more
effective if the police could query for a pattern
that effectively said "Show me all license
numbers that begin with a '4', followed by an 'I'
or a '1', followed by a '3', followed by a '5' or
an 'S', followed by a 'V', followed by two
characters that are each either a 'B' or an '8'".
- Query scenarios like these can be solved using
regular expressions. While computer scientists
sometimes use the term "regular expression" (or
"regex" for short) to describe any method of
describing complex patterns, in Linux and many
programming languages the term refers to a very
specific set of special characters used for
solving problems like the above. Regular
expressions are supported by a large number of
tools including grep, vi, find and sed.
25Introducing Regular Expressions
- To introduce the usage of regular expressions,
lets look at some solutions to two problems
introduced earlier. Don't worry if these seem a
bit complicated, the remainder of the unit will
start from scratch and cover regular expressions
in great detail. - A regex that could solve the first problem, where
we wanted to say "Show me all lines that begin
with 'Name'" might look like this - ...that's it! Regular expressions are all about
the use of special characters, called
metacharacters to represent advanced query
parameters. The carat (""), as shown here, means
"Lines that begin with...". Note, by the way,
that the regular expression was put in
single-quotes. This is a good habit to get into
early on as it prevents bash from interpreting
special characters that were meant for grep.
26Introducing Regular Expressions
- Ok, so what about the second problem? That one
involved a much more complicated query "Show me
all license numbers that begin with a '4',
followed by an 'I' or a '1', followed by a '3',
followed by a '5' or an 'S', followed by a 'V',
followed by two characters that are each either a
'B' or an '8'". This could be represented by a
regular expression that looks like this - Wow, that's pretty short considering how long it
took to write out what we were looking for! There
are only two types of regex metacharacters used
here square braces ('') and curly braces
(''). When two or more characters are shown
within square braces it means "any one of these".
So 'B8' near the end of the expression means
"'B' or '8'". When a number is shown within curly
braces it means "this many of the preceding
character". Thus, 'B82' means "two characters
that are each either a 'B' or an '8'". Pretty
powerful stuff! - Now that you've gotten a taste of what regular
expressions are and how they can be used, let's
start from scratch and cover them in depth.
27Regular Expressions, Extended Regular
Expressions, and the grep Command
- As the Unix implementation of regular expression
syntax has evolved, new metacharacters have been
introduced. In order to preserve backward
compatibility, commands usually choose to
implement regular expressions, or extended
regular expressions. In order to not become
bogged down with the differences, this Lesson
will introduce the extended syntax, summarizing
differences at the end of the discussion. - One of the most common uses for regular
expressions is specifying search patterns for the
grep command. As was mentioned in the previous
Lesson, there are three versions of the grep
command. Reiterating, the three differ in how
they interpret regular expressions.
28Regular Expressions, Extended Regular
Expressions, and the grep Command
- fgrep
- The fgrep command is designed to be a "fast"
grep. The fgrep command does not support regular
expressions, but instead interprets every
character in the specified search pattern
literally. - grep
- The grep command interprets each patterns using
the original, basic regular expression syntax. - egrep
- The egrep command interprets each patterns using
extended regular expression syntax. - Because we are not yet making a distinction
between the basic and extended regular expression
syntax, the egrep command should be used whenever
the search pattern contains regular expressions.
29Anatomy of a Regular Expression
- In our discussion of the grep program family, we
were introduced to the idea of using a pattern to
identify the file content of interest. Our
examples were carefully constructed so that the
pattern contained exactly the text for which we
were searching. We were careful to use only
literal characters in our regular expressions a
literal character matches only itself. So when we
used hello as the regular expression, we were
using a five-character regular expression
composed only of literal characters. While this
let us concentrate on learning how to operate the
grep program, it didn't allow us to get a full
appreciation of the power of regular expressions.
Before we see regular expressions in use, we
shall first see how they are constructed.
30Anatomy of a Regular Expression
- A regular expression is a sequence of
- Literal Characters Literal characters match only
themselves. Examples of literals are letters,
digits and most special characters (see below for
the exceptions). - Wildcards Wildcard characters match any
character. Within a regular expression, a period
(.) matches any character, be it a space, a
letter, a digit, punctuation, anything. - Modifiers A modifier alters the meaning of the
immediately preceding pattern character. For
example, the expression abc matches the
strings ac, abc, abbc, abbbc, and so on,
because the asterisk () is a modifier that
means any number of (including zero). Thus, our
pattern means to match any sequence of characters
consisting of one a, a (possibly empty) series
of b characters, and a final c character. - Anchors Anchors establish the context for the
pattern, such as "the beginning of a line", or
"the end of a word". For example, the expression
cat would match any occurrence of the three
letters, while cat would only match lines that
begin cat.
31Taking Literals Literally
- Literals are straightforward because each literal
character in a regular expressions matches one,
and only one, copy of itself in the searched
text. Uppercase characters are distinct from
lowercase characters, so that A does not match
a. - Wildcards
- The "dot" wildcard
- The character . is used as a placeholder, to
match one of any character. In the following
example, the pattern matches any occurrence of
the literal characters x and s, separated by
exactly two other characters.
32Bracket Expressions Ranges of Literal
Characters
- Normally a literal character in a regex pattern
matches exactly one occurrence of itself in the
searched text. Suppose we want to search for the
string hello regardless of how it is
capitalized we want to match Hello and HeLLo
as well. How might we do that? - A regex feature called a bracket expression
solves this problem neatly. A bracket expression
is a range of literals enclosed in square
brackets ( and ). For example, the regex
pattern Hh is a character range that matches
exactly one character either an uppercase H or
a lowercase h letter. Notice that it doesn't
matter how large the set of characters within the
range is, the set matches exactly one character,
if it matches any at all. A bracket expression
that matches the set of lowercase vowels could be
written aeiou and would match exactly one
vowel. - In the following example, bracket expressions are
used to find words from the file
/usr/share/dict/words. In the first case, the
first five words that contain three consecutive
(lowercase) vowels are printed. In the second
case, the first 5 words that contain lowercase
letters in the pattern of vowel-consonant-vowel-co
nsonant-vowel-consonant are printed.
33Bracket Expressions Ranges of Literal
Characters
- If the first character of a bracket expression is
a , the interpretation is inverted, and the
bracket expression will match any single
occurrence of a character not included in the
range. For example, the expression aeiou
would match any character that is not a vowel.
The following example first lists words which
contain three consecutive vowels, and secondly
lists words which contain three consecutive
consonant-vowel pairs.
34Range Expressions vs. Character Classes Old
School and New School
- Another way to express a character range is by
giving the start- and end-letters of the sequence
this way a-d would match any character from
the set a, b, c or d. A typical usage of this
form would be 0-9 to represent any single
digit, or A-Z to represent all capital
letters.
35Range Expressions vs. Character Classes Old
School and New School
- As an alternative to such quandaries, modern
regular expression make use character classes.
Character classes match any single character,
using language specific conventions to decide if
a given character is uppercase or lowercase, or
if it should be considered part of the alphabet
or punctuation. The following table lists some
supported character classes, and the ASCII
equivalent range expression, where appropriate.
36Range Expressions vs. Character Classes Old
School and New School
- Character classes avoid problems you may run into
when using regular expressions on systems that
use different character encoding schemes where
letters are ordered differently. For example,
suppose you were to run the command - On a Red Hat Enterprise Linux system, this would
match every word in the file, not just those that
contain capital letters as one might assume. This
is because in unicode (utf-8), the character
encoding scheme that RHEL uses, characters are
alphabetized case-insensitively, so that A-Z is
equivalent to AaBbCc...etc.
37Range Expressions vs. Character Classes Old
School and New School
- On older systems, though, a different character
encoding scheme is used where alphabetization is
done case-sensitively. On such systems A-Z
would be equivalent to ABC...etc. Character
classes avoid this pitfall. You can run - on any system regardless of the encoding scheme
being used and it will only match lines that
contain capital letters. - For more details about the predefined range
expressions, consult the grep manual page. For
more information on character encoding schemes
under Linux, refer back to chapter 8.3. To learn
about how character encoding schemes are used to
support other languages in Red Hat Enterprise
Linux, begin with the locale manual page. -
38Common Modifier Characters
- We saw a common usage of a regex modifier in our
earlier example abc to match an a and c
character with some number of b letters in
between. The character changed the
interpretation of the literal b character from
matching exactly one letter to matching any
number of b's. - Here are a list of some common modifier
characters - b? The question mark (?) means either one or
none the literal character is considered to be
optional in the searched text. For example, the
regex pattern ab?c matches the strings ac,
and abc, but not abbc. - b The asterisk () modifier means any number
of (including zero) of the preceding literal
character. The regex pattern abc matches the
strings ac, abc, abbc, and so on.
39Common Modifier Characters
- b The plus () modifier means one or more,
so the regex pattern b matches a non-empty
sequence of b's. The regex pattern abc matches
the strings abc and abbc, but does not match
ac - bm,n The brace modifier is used to specify a
range of between m and n occurrences of the
preceding character. The regex pattern b2,4
would match abbc and abbbc, and abbbbc, but
not abc or abbbbbc. - bn With only one integer, the brace modifier is
used to specify exactly n occurrences for the
preceding character.
40Common Modifier Characters
- In the following example, egrep prints lines from
/usr/share/dict/words that contain patterns which
start with a (capital or lowercase) a, might or
might not next have a (lowercase) b, but then
definitely follow with a (lowercase) a. - The following example prints lines which contain
patterns which start al, then use the .
wildcard to specify 0 or more occurrences of any
character, followed by the pattern bra.
41Common Modifier Characters
- Notice we found variations on the words algebra
and calibrate. For the former, the . expression
matched ge, while for the latter, it matched
the letter i. - The expression ., which is interpreted as "0
or more of any character", shows up often in
regex patterns, acting as the "stretchable glue"
between two patterns of significance. - As a subtlety, we should note that the modifier
characters are greedy they always match the
longest possible input string. For example, given
the regex pattern
42Anchored Searches
- Four additional search modifier characters are
available - foo A caret () matches the beginning of a
line. Our example foo matches the string foo
only when it is at the beginning of a line - foo A dollar sign () matches the end of a
line. Our example foo matches the string foo
only at the end of a line, immediately before the
newline character. - \ltfoo\gt By themselves, the less than sign (lt)
and the greater than sign (gt) are literals.
Using the backslash character to escape them
transforms them into meaning first of a word
and end of a word, respectively. Thus the
pattern \gtcat\lt matches the word cat but not
the word catalog. - You will frequently see both and used
together. The regex pattern foo matches a
whole line that contains only foo and would not
match that line if it contained any spaces. - The \lt and \gt are also usually used as pairs.
43Anchored Searches
- In the following an example, the first search
lists all lines that contain the letters ion
anywhere on the line. The second search only
lists lines which end in ion.
44Coming to Terms with Regex Grouping
- The same way that you can use parenthesis to
group terms within a mathematical expression, you
also use parenthesis to collect regular
expression pattern specifiers into groups. This
lets the modifier characters ?, and
apply to groups of regex specifiers instead of
only the immediately preceding specifier. - Suppose we need a regular expression to match
either foo or foobar. We could write the
regex as foo(bar)? and get the desired results.
This lets the ? modifier apply to the whole
string bar instead of only the preceding r
character. - Grouping regex specifiers using parenthesis
becomes even more flexible when the pipe symbol
() is used to separate alternative patterns.
Using alternatives, we could rewrite our previous
example as (foofoobar). Writing this as
foofoobar is simpler and works just as well,
because just like mathematics, regex specifiers
have precedence. While you are learning, always
enclose your groups in parenthesis.
45Coming to Terms with Regex Grouping
- In the following example, the first search prints
all lines from the file /usr/share/dict/words
which contain four consecutive vowels (compare
the syntax to that used when first introducing
range expressions, above). The second search
finds words that contain a double o or a double
e, followed (somewhere) by a double e.
46Escaping Meta-Characters
- Sometimes you need to match a character that
would ordinarily be interpreted as a regular
expression wildcard or modifier character. To
temporarily disable the special meaning of these
characters, simply escape them using the
backslash (\) character. For example, the regex
pattern cat. would match the letters cat
followed by any character cats or catchup.
To match only the letters cat. at the end of a
sentence, use the regex pattern cat\. to
disable interpreting the period as a wildcard
character. - Note one distracting exception to this rule. When
the backslash character precedes a lt or gt
character, it enables the special interpretation
(anchoring the beginning or ending of a word)
instead of disabling the special interpretation.
Shudder. It even gets worse - see the footnote at
the bottom of the following table.
47Summary of Linux Regular Expression Syntax
- The following table summarizes regular expression
syntax, and identifies which components are found
in basic regular expression syntax, and which are
found only in the extended regular expression
syntax.
48Summary of Linux Regular Expression Syntax
- The following table summarizes regular expression
syntax, and identifies which components are found
in basic regular expression syntax, and which are
found only in the extended regular expression
syntax.
49Regular Expressions are NOT File Globbing
- When first encountering regular expressions,
students understandably confuse regular
expressions with pathname expansion (file
globbing). Both are used to match patterns in
text. Both share similar metacharacters (,
?, ...), etc.). However, they are
distinctly different. The following table
compares and contrasts regular expressions and
file globbing.
50Regular Expressions are NOT File Globbing
- In the following example, the first argument is a
regular expression, specifying text which starts
with an l and ends .conf, while the second
argument is a file glob which specifies all files
in the /etc directory whose filename starts with
l and ends .conf. - Take a close look at the second line of output.
Why was it matched by the specified regular
expression? - Why does the line containing the text krb5.conf
match the expression? The l is found way back
in the word default! - In a similar vain, when specifying regular
expressions on the bash command line, care must
be taken to quote or escape the regex
meta-characters, lest they be expanded away by
the bash shell with unexpected results. In all of
the examples found in this discussion, the first
argument to the egrep command is protected with
single quotes for just this reason.
51Where to Find More Information About Regular
Expressions
- We have barely scratched the surface of the
usefulness of regular expressions. The
explanation we have provided will be adequate for
your daily needs, but even so, regular
expressions offer much more power, making even
complicated text searches simple to perform. - For more online information about regular
expressions, you should check - The regex(7) manual page.
- The grep(1) manual page.
52Examples
- Regular Expression Modifiers
53Workbook 9Managing Processes
Pace Center for Business and Technology
54Chapter 1. An Introduction to Processes
- Key Concepts
- A process is an instance of a running executable,
identified by a process id (pid). - Because Linux implements virtual memory, every
process possesses its own distinct memory
context. - A process has a uid and a collection of gid as
credentials. - A process has a filesystem context, including a
cwd, a umask, a root directory, and a collection
of open files. - A process has a scheduling context, including a
niceness value. - A process has a collection of environment
variables. - The ps command can be used to examine all
currently running processes. - The top command can be used to monitor all
running processes.
55Processes are How Things Get Done
- Almost anything that happens in a Linux system,
happens as a process. If you are viewing this
text in a web browser, that browser is running as
a process. If you are typing at a bash shell's
command line, that shell is running as a process.
If you are using the chmod command to change a
file's permissions, the chmod command operates as
a separate process. Processes are how things get
done, and the primary responsibility of the Linux
kernel is to provide a place for processes to do
their stuff without stepping on each other's
toes. - Processes are an instance of an executing
program. In other operating systems, programs are
often large, elaborate, graphical applications
that take a noticeably long time to start up. In
the Linux (and Unix) world, these types of
programs exist as well, but so do a whole class
of programs which usually have no counterpart in
other operating systems. These programs are
designed to be quick to start, specialized in
function, and play well with others. On a Linux
system, processes running these programs are
constantly popping into and out of existence.
56Processes are How Things Get Done
- For example, consider the user maxwell performing
the following command line. - In the split second that the command line took to
execute, no less four than processes (ps, grep,
bash, and date) were started, did their thing,
and exited.
57What is a Process?
- By this point, you could well be tired of hearing
the answer a process in an instance of a running
program. Here, however, we provide a more
detailed list of the components that constitute a
process. - Execution Context
- Every process exists (at least to some extent)
within the physical memory of the machine.
Because Linux (and Unix) is designed to be a
multiuser environment, the memory allocated to a
process is protected, and no other process can
access it. In its memory, a process loads a copy
of its executable instructions, and stores any
other dynamic information it is managing. A
process also carries parameters associated with
how often it gets the opportunity to access the
CPU, such as its execution state and its niceness
value (more on these soon).
58What is a Process?
- I/O Context
- Every process interacts to some extent with the
filesystem in order to read or write information
that exists before or will exist after the
lifespan of the process. Elements of a process's
input/output context include the following. - Open File Descriptors
- Almost every process is reading information from
or writing information to external sources,
usually both. In Linux, open file descriptors act
as sources or sinks of information. Processes
read information from or write information to
file descriptors, which may be connected to
regular files, device nodes, network sockets, or
even each other as pipes (allowing interprocess
communication). - Memory Mapped Files
- Memory mapped files are files whose contents have
been mapped directly into the process's memory.
Rather than reading or writing to a file
descriptor, the process just accesses the
appropriate memory address. Memory maps are most
often used to load a process's executable code,
but may also be used for other types of
non-sequential access to data.
59What is a Process?
- Filesystem Context
- We have encountered several pieces of information
related to the filesystem that processes
maintain, such as the process's current working
directory (for translating relative file
references) and the process's umask (for setting
permissions on newly created files). 13 - Environment Variables
- Every process maintains its own list of
name-value pairs, referred to as environment
variables, or collectively as the process's
environment. Processes generally inherit their
environment on startup, and may refer to it for
information such as the user's preferred language
or favorite editor. - Heritage Information
- Every process is identified by a PID, or process
id, which it is assigned when it is created. In a
later Lesson, we will discover that every process
has a clearly defined parent and possibly well
defined children. A process's own identity, the
identity of its children, and to some extent the
identity of its siblings are maintained by the
process.
60What is a Process?
- Credentials
- Every process runs under the context of a given
user (or, more exactly, a given user id), and
under the context of a collection of group id's
(generally, all of the groups that the user
belongs to). These credentials limit what
resources a process can access, such as which
files it can open or with which other processes
it is allowed to communicate. - Resource Statistics and Limits
- Every process also records statistics to track
the extent to which system resources have been
utilized, such as its memory size, its number of
open files, its amount of CPU time, and others.
The amount of many of these resources that a
process is allowed to use can also be limited, a
concept called resource limits.
61Viewing Processes with the ps Command
- We have already encountered the ps command many
times. Now, we will attempt to familiarize
ourselves with a broader selection of the many
command line switches associated with it. A quick
ps --help will display a summary of over 50
different switches for customizing the ps
command's behavior. To complicate matters,
different versions of Unix have developed their
own versions of the ps command, which do not use
the same command line switch conventions. The
Linux version of the ps command tries to be as
accommodating as possible to people from
different Unix backgrounds, and often there are
multiple switches for any give option, some of
which start with a conventional leading hyphen
(-), and some of which do not.
62Viewing Processes with the ps Command
- Process Selection
- By default, the ps command lists all processes
started from a user's terminal. While reasonable
when users connected to Unix boxes using serial
line terminals, this behavior seems a bit
minimalist when every terminal window within an X
graphical environment is treated as a separate
terminal. The following command line switches can
be used to expand (or reduce) the processes which
the ps command lists.
63Output Selection
- As implied by the initial paragraphs of this
Lesson, there are many parameters associated with
processes, too many to display in a standard
terminal width of 80 columns. The following table
lists common command line switches used to select
what aspects of a process are listed.
64Output Selection
- Additionally, the following switches can be used
to modify how the selected information is
displayed.
65Oddities of the ps Command
- The ps command, probably more so than any other
command in Linux, has oddities associated with
its command line switches. In practice, users
tend to experiment until they find combinations
that work for them, and then stick to them. For
example, the author prefers ps aux for a general
purpose listing of all processes, while many
people prefer ps -ef. The above tables should
provide a reasonable "working set" for the
novice. - The command line switches tend to fall into two
categories, those with the traditional leading
hyphen ("Unix98" style options), and those
without ("BSD" style options). Often, a given
functionality will be represented by one of each.
When grouping multiple single letter switches,
only switches of the same style can be grouped.
For example, ps axf is the same as ps a x f, not
ps a x -f.
66Monitoring Processes with the top Command
- The ps command displays statistics for specified
processes at the instant that the command is run,
providing a snapshot of an instance in time. In
contrast, the top command is useful for
monitoring the general state of affairs of
processes on the machine. - The top command is intended to be run from within
a terminal. It will replace the command line with
a table of currently running processes, which
updates every few seconds. The following
demonstrates a user's screen after running the
top command.
67Monitoring Processes with the top Command
- While the command is running, the keyboard is
"live". In other words, the top command will
respond to single key presses without waiting for
a return key. The following table lists some of
the more commonly used keys.
68Monitoring Processes with the top Command
- The last two command, which either kill or renice
a process, use concepts that we will cover in
more detail in a later Lesson. - Although most often run without command line
configuration, top does support the following
command line switches.
69Monitoring Processes with the gnome-system-monitor
Application
- If running an X server, the GNOME desktop
environment provides an application similar in
function to top, with the benefits (and
drawbacks) of a graphical application. The
application can be started from the command line
as gnome-system-monitor, or by selecting the
System Administration System Monitor menu
item.
70Monitoring Processes with the gnome-system-monitor
Application
- Like the top command, the System Monitor displays
a list of processes running on the local machine,
refreshing the list every few seconds. In its
default configuration, the System Monitor
provides a much simpler interface it lists only
the processes owned by the user who started the
application, and reduces the number of columns to
just the process's command, owner, Process ID,
and simple measures of the process's Memory and
CPU utilization. Processes may be sorted by any
one of these fields by simply clicking on the
column's title.
71Monitoring Processes with the gnome-system-monitor
Application
- When right-clicking on a process, a pop-up menu
allows the user to perform many of the actions
that top allowed, such as renicing or killing a
process, though again with a simpler (and not as
flexible) interface.
72Monitoring Processes with the gnome-system-monitor
Application
- The System Monitor may be configured by opening
the Edit Preferences menu selection. Within the
Preferences dialog, the user may set the update
interval (in seconds), and configure many more
fields to be displayed.
73Locating processes with the pgrep Command.
- Often, users are trying to locate information
about processes identified by the command they
are running, or the user who is running them. One
technique is to list all processes, and use the
grep command to reduce the information. In the
following, maxwell first looks for all instances
of the sshd daemon, and then for all processes
owned by the user maxwell. - While maxwell can find the information he needs,
there are some unpleasant issues. - The approach is not exacting. Notice that, in the
second search, a su process showed up, not
because it was owned by maxwell, but because the
word maxwell was one of its arguments. - Similarly, the grep command itself usually shows
up in the output. - The compound command can be awkward to type.
74Locating processes with the pgrep Command.
- In order to address these issues, the pgrep
command was created. Named pgrep for obvious
reasons, the command allows users to quickly list
processes by command name, user, terminal, or
group. - pgrep SWITCHES PATTERN
- Its optional argument, if supplied, is
interpreted as an extended regular expression
pattern to be matched against command names. The
following command line switches may also be used
to qualify the search.
75Locating processes with the pgrep Command.
- In addition, the following command line switches
can be use to qualify the output formatting of
the command. - For a complete list of switches, consult the
pgrep(1) man page. - As a quick example, maxwell will repeat his two
previous process listings, using the pgrep
command.
76ExamplesChapter 1. An Introduction to Processes
- Viewing All Processes with the "User Oriented"
Format - In the following transcript, maxwell uses the ps
-e u command to list all processes (-e) with the
"user oriented" format (u). - The "user oriented" view displays the user who is
running the process, the process id, and a rough
estimate of the amount of CPU and memory the
process is consuming, as well as the state of the
process. (Process states will be discussed in the
next Lesson).
77QuestionsChapter 1. An Introduction to
Processes
78Chapter 2 Process States
- Key Concepts
- In Linux, the first process, /sbin/init, is
started by the kernel on bootup. All other
processes are the result of a parent process
duplicating itself, or forking. - A process begins executing a new command through
a process called execing. - Often, new commands are run by a process (often a
shell) first forking, and then execing. This
mechanism is referred to as the fork and exec
mechanism. - Processes can always be found in one of five well
defined states runnable, voluntarily sleeping,
involuntarily sleeping, stopped, or zombie. - Process ancestry can be viewed with the pstree
command. - When a process dies, it is the responsibility of
the process's parent to collect it's return code
and resource usage information. - When a parent dies before it's children, the
orphaned children are inherited by the first
process (usually /sbin/init).
79A Process's Life Cycle
- How Processes are Started
- In Linux (and Unix), unlike many other operating
systems, process creation and command execution
are two separate concepts. Though usually a new
process is created so that it can run a specified
command (such as the bash shell creating a
process to run the chmod command), processes can
be created without running a new command, and new
commands can be executed without creating a new
process. - Creating a New Process (Forking) New processes
are created through a technique called forking.
When a process forks, it creates a duplicate of
itself. Immediately after a fork, the newly
created process (the child) is an almost exact
duplicate of the original process (the parent).
The child inherits an identical copy of the
original process's memory, any open files of the
parent, and identical copies of any parameters of
the parent, such as the current working directory
or umask. About the only difference between the
parent and the child is the child's heritage
information (the child has a different process ID
and a different parent process ID, for starters),
and (for the programmers in the audience) the
return value of the fork() system call. - As a quick aside for any programmers in the
audience, a fork is usually implemented using a
structure similar to the following.
80A Process's Life Cycle
- As a quick aside for any programmers in the
audience, a fork is usually implemented using a
structure similar to the following. - When a process wants to create a new process, it
calls the fork() system call (with no arguments).
Though only one process enters the fork() call,
two processes return from in. For the newly
created process (the child), the return value is
0. For the original process (the parent), the
return value is the process ID of the child. By
branching on this value, the child may now go off
to do whatever it was started to do (which often
involves exec()ing, see next), and the parent can
go on to do its own thing.
81A Process's Life Cycle
- Executing a New Command (Exec-ing) New commands
are run through a technique called execing (short
for executing). When execing a new command, the
current process wipes and releases most of its
resources, and loads a new set of instructions
from the command specified in the filesystem.
Execution starts with the entry point of the new
program. - After execing, the new command is still the same
process. It has the same process ID, and many of
the same parameters (such as its resource
utilization, umask, current working directory,
and others). It merely forgets its former
command, and adopts the new one. - Again for any programmers, execs are performed
through one of several variants of the execve()
system call, such as the execl() library call. - The process enters the the execl(...) call,
specifying the new command to run. If all goes
well, the execl(...) call never returns. Instead,
execution picks up at the entry point (i.e.,
main()) of the new program. If for some reason
execl(...) does return, it must be an error (such
as not being able to locate the command's
executable in the filesystem).
82A Process's Life Cycle
- Combining the Two
- Fork and Exec Some programs may fork without
execing. Examples include networking daemons, who
fork a new child to handle a specific client
connection, while the parent goes back to listen
for new clients. Other programs might exec
without forking. Examples include the login
command, which becomes the user's login shell
after successfully confirming a user's password.
Most often, and for shell's in particular,
however, forking and execing go hand and hand.
When running a command, the bash shell first
forks a new bash shell. The child then execs the
appropriate command, while the parent waits for
the child to die, and then issues another prompt.
83The Lineage of Processes (and the pstree Command)
- Upon booting the system, one of the
responsibilities of the Linux kernel is to start
the first process (usually /sbin/init). All other
processes are started because an already existing
process forked. 2 - Because every process except the first is created
by forking, there exists a well defined lineage
of parent child relationships among the
processes. The first process started by the
kernel starts off the family tree, which can be
examined with the pstree command.
84How a Process Dies
- When a process dies, it either dies normally by
electing to exit, or abnormally as the result of
receiving a signal. We here discuss a normally
exiting process, postponing a discussion of
signals until a later Lesson. - We have mentioned previously that processes leave
behind a status code (also called return value)
when they die, in the form of an integer. (Recall
the bash shell, which uses the ? variable to
store the return value of the previously run
command.) When a process exits, all of its
resources are freed, except the return code (and
some resource utilization accounting
information). It is the responsibility of the
process's parent to collect this information, and
free up the last remaining resources of the dead
child. For example, when the bash shell forks and
execs the chmod command, it is the parent bash
shell's responsibility to collect the return
value from the exited chmod command. - Orphans
- If it is a parent's responsibility to clean up
after their children, what happens if the parent
dies before the child does? The child becomes an
orphan. One of the special responsibilities of
the first process started by the kernel is to
"adopt" any orphans. (Notice that in the output
of the pstree command, the first process has a
disproportionately large number of children. Most
of these were adopted as the orphans of other
processes).
85How a Process Dies
- Zombies
- In between the time when a process exits, freeing
most of its resources, and the time when its
parent collects its return value, freeing the
rest of its resources, the child process is in a
special state referred to as a Zombie. Every
process passes through a transient zombie state.
Usually, users need to be looking at just the
right time (with the ps command, for example) to
witness a zombie. They show up in the list of
processes, but take up no memory, no CPU time, or
any other system resources. They are just the
shadow of a former process, waiting for their
parent to come and finish them off. - Negligent Parents and Long Lived Zombies
- Occasionally, parent processes can be negligent.
They start child processes, but then never go
back to clean up after them. When this happens
(usually because of a programmer's error), the
child can exit, enter the zombie state, and stay
there. This is usually the case when users
witness zombie processes using, for example, the
ps command. - Getting rid of zombies is perhaps the most
misunderstood basic Linux (and Unix) concept.
Many people will say that there is no way to get
rid of them, except by rebooting the machine.
Using the clues discussed in this section, can
you figure out how to get rid of long lived
zombies? - You get rid of zombies by getting rid of the
negligent parent. When the parent dies (or is
killed), the now orphaned zombie gets adopted by
the first process, which is almost always
/sbin/init. /sbin/init is a very diligent parent,
who always cleans up after its children
(including adopted orphans).
86The 5 Process States
- The previous section discussed how processes are
started, and how they die. While processes are
alive they are always in one of five process
states, which effect how and when they are
allowed to have access to the CPU. The following
lists each of the five states, along with the
conventional letter that is used by the ps, top,
and other commands to identify a process's
current state. - Runnable (R)
- Processes in the Runnable state are processes
that, if given the opportunity to access the CPU,
would take it. More formally, this is know as the
Running state, but because only one process may
be executing on the CPU at any given time, only
one of these processes will actually be "running"
at any given instance. Because runnable processes
are switched in and out of the CPU so quickly,
however, the Linux system gives the appearance
that all of the processes are running
simultaneously.
87The 5 Process States
- Voluntary (Interruptible) Sleep (S)
- As the name implies, a process which is in a
voluntary sleep elected to be there. Usually,
this is a process that has nothing to do until
something interesting happens. A classic example
is a networking daemon, such as the httpd process
that implements a web server. In between requests
from a client (web browser), the server has
nothing to do, and elects to go to sleep. Another
example would be the top command, which lists
processes every five seconds. While it is waiting
for five seconds to pass, it drops itself into a
voluntary sleep. When something that the process
in interested in happens (such as a web client
makes a request, or a five second timer expires),
the sleeping process is kicked back into the
Runnable state. - Involuntary (Non-interruptible)
- Sleep (D) Occasionally, two processes try to
access the same system resource at the same time.
For example, one process attempts to read from a
block on a disk while that block is being written
to because of another process. In these
situations, the kernel forces the process into an
involuntary sleep. The process did not elect to
sleep, it would prefer to be runnable so it can
get things done. When the resource is freed, the
kernel will put the process back into the
runnable state. - Although processes are constantly dropping into
and out of involuntary sleeps, they usually do
not stay there long. As a result, users do not
usually witness processes in an involuntary sleep
except on busy systems.
88The 5 Process States
- Stopped (Suspended)
- Processes (T) Occasionally, users decide to
suspend processes. Suspended processes will not
perform any actions until they are restarted by
the user. In the bash shell, the CTRLZ key
sequence can be used to suspend a process. In
programming, debuggers often suspend the programs
the are debugging when certain events happen
(such as breakpoints occur). - Zombie Processes (Z)
- As mentioned above, every dieing process goes
through a transient zombie state. Occasionally,
however, some get stuck there. Zombie processes
have finished executing, and have freed all of
their memory and almost all of their resources.
Because they are not consuming any resources,
they are little more than an annoyance that can
show up in process listings.
89Viewing Process States
- When viewing the output of commands such as ps
and top, process states are usually listed under
the heading STAT. The process is identified by
one of the following letters. - Runnable - R
- Sleeping - S
- Stopped - T
- Uninterruptible sleep - D
- Zombie - Z
90ExamplesChapter 2. Process States
- Identifying Process States
91QuestionsChapter 2. Process States
92Chapter 4. Sending Signals
- Key Concepts
- Signals are a low level form of inter-process
communication, which arise from a variety of
sources, including the kernel, the terminal, and
other processes. - Signals are distinguished by signal numbers,
which have conventional symbolic names and uses.
The symbolic names for signal numbers can be
listed with the kill -l command. - The kill command sends signals to other
processes. - Upon receiving a signal, a process may either
ignore it, react in a kernel specified default
manner, or implement a custom signal handler. - Conventionally, signal number 15 (SIGTERM) is
used to request the termination of a process. - Signal number 9 (SIGKILL) terminates a process,
and cannot be overridden. - The pkill and killall commands can be used to
deliver signals to processes specified by command
name, or the user who owns them. - Other utilities, such as top and the GNOME System
Monitor can be used to deliver signals as well.
93Signals
- Linux (and Unix) uses signals to notify processes