Title: EECS283, Fall 2002 Lecture 24: PERL Basics
1EECS283, Fall 2002Lecture 24 PERL Basics
- Andrew M Morgan
- Reading Programming Perl
- by Wahl, Christiansen, Schwartz
- (O'Reilly And Associates)
2Scripts
- Oftentimes, you will hear someone say "Oh, I'll
just write a quick script to do such-and-such..." - More often than not, a script is a quick, thrown
together set of instructions to do a very
specific operation - Scripts can be quite complex, though, and may be
used over and over, by many people - Because scripts are meant to be a "quick
solution" the language a script is written in
should be "easy" - Also, a script is usually not a compiled program
- Many scripts are just a set of commands.
- Rather than run a list of commands, you can put
them, in order, in a new file, which becomes your
script. Then, running the script issues al the
commands for you - Other scripts are in interpreted languages -
languages that are executed without being compiled
3Intro To Perl
- One very popular (for good reason) scripting
language is called PERL - PERL - Practical Extraction and Report Language
- PERL is an interpreted language
- You run the PERL interpreter, providing only the
script file you wrote as input - Consider the following "Hello World" programs
PERL Version
C Version
include ltiostream.hgt int main() cout ltlt
"Hello world!" ltlt endl return (0)
print "Hello world!\n"
4More On Perl
- PERL is not a strongly typed language
- You need not declare variables
- You need not do type checking for operations
- PERL has 3 basic data types
- Scalars - a single, simple value - typically, a
number or string - Arrays of scalars - An ordered list of scalars
- Hashes of scalars - An unordered collection of
mappings from one value to another (think of the
map in the STL) aka. Associative Arrays - All scalar variables in PERL start with the
character '' - All array variables in PERL start with the
character '_at_' - All hash variables in PERL start with the
character '' - Nonvariable identifiers do not start with any
specific character - These include labels and filehandles
5Perl Syntax
- PERL syntax generally looks a lot like C syntax
- Some important differences are
- Curly braces are required, even if only one
statement is needed in loops, conditionals, etc. - The "if..else if..else" structure is replaced
with "if..elsif..else" - Variables must start with the appropriate
character (, _at_, ) - In addition to C style loops, there is also a
"foreach" loop - Syntax foreach scalar (_at_list)
- This loop iterates over each element of the list
- At each iteration, the scalar is assigned the
next value of the list - Can use "print" instead of "printf" to print to
screen - Include variables inside the string to print,
rather than dealing with conversion specifiers,
etc.
6Scalars and Arrays, Example
scalarVar 19 for (i 0 i lt 5 i)
myAryi 10.25 - i for (i 0 i lt 5
i) print "index i is myAryi\n" su
m 0 foreach i (_at_myAry) print "i\n"
sum i
While this is an element of the array, it is a
scalar value, so it starts with ''
Even in a string, a dollar sign represents a
variable and is replaced with its value
number 0 is 10.25 number 1 is 9.25 number 2 is
8.25 number 3 is 7.25 number 4 is
6.25 10.25 9.25 8.25 7.25 6.25
Braces are required in PERL, even with just 1
statement
Here, I am referring to the whole array - so
I start the name with '_at_'
7Special Variables
- PERL makes use of some "special variables"
- There are NUMEROUS special variables, I will
discuss a few - STDOUT - Since this special variable does not
start with a '' '_at_' or '' it is a filehandle -
the standard output (screen) - STDIN - The standard input (keyboard)
- _at_ARGV - Since this special variable starts with a
'_at_' it is a list. Contains the values from the
execution command line. - _ - Since this special variables start with a
'' it is a scalar variable. This is the
"default" variable. - The default variable is used in many PERL
statements if no other variable is used. - Examples of this will follow
8Dealing With Text Files
- One advantage of PERL is that it is easy to deal
with text files - Open a file and assign a file handle, using
open() - Opening an input file open(FILEHANDLE,
"filename") - Opening an output file open(FILEHANDLE,
"gtfilename") - Read in a full line using "lt gt"
- Syntax line ltFILEHANDLEgt
- This reads in all characters on a line, including
the newline character, and any leading whitespace - Write out to a file using print
- Syntax print FILEHANDLE "string to print out"
93 Scripts To Echo A File
This script echos a file to the
screen open(INF, "data.txt") while (ltINFgt)
print
This script echos a file to the
screen open(INF, "data.txt") while (_
ltINFgt) print "_"
This script echos a file to the
screen open(INF, "data.txt") while (line
ltINFgt) print "line"
Contents of data.txt this is a sample
file nothing special about it - nothing it is a
file for testing That's it!
Output of all three programs this is a sample
file nothing special about it - nothing it is a
file for testing That's it!
10Script To Read and Write Files
This script echos odd numbered lines to the
screen and outputs even numbered lines to a file
called newData.txt open(INFILE, "data.txt")
Open input file open(OUTFILE, "gtnewData.txt")
Open output file line ltINFILEgt
Read the first line from the file while
(line) line2 ltINFILEgt Read in
next line (even numbered) print "line"
echo to screen print OUTFILE "line2"
print to file line ltINFILEgt Read
in next line (odd numbered)
Contents of data.txt 1. This is line 1 2. This
is line 2 3. And then there's line 3 4. 4 is
next 5. blah 6. ok, that's it
Output to screen 1. This is line 1 3. And then
there's line 3 5. blah
Contents of newData.txt after running 2. This is
line 2 4. 4 is next 6. ok, that's it
11More Text Based Stuff
- Lines can be split into arrays easily, which
makes for easy processing
chop removes the last character from a string
(the newline in this case) split separates a
string into many strings delimited by another
string
open(INFILE, "datafile.txt") sum 0 while
(line ltINFILEgt) chop(line) _at_words
split(" ", line) print "line Elem 2
words2\n" foreach i (_at_words) sum
i print "Sum sum\n"
Contents of datafile.txt 1 2 3 4 2 3 4 5 30 40
50 60 70 80 90 100
Output of program 1 2 3 4 Elem 2 3 2 3 4 5
Elem 2 4 30 40 50 60 Elem 2 50 70 80 90 100
Elem 2 90 Sum 544
12Strings In Perl
- There are several operators available for use on
Perl scalars when treated as strings - . The dot appends one string onto the end of
another - eq This operator checks if the two strings are
"equal" - ne This operator checks if the two strings are
"not equal" - lt This operator checks if the first string is
"less than" the second - gt This operator checks if the first string is
"greater than" the second - There are several functions that can be useful
with strings too - length This function returns the number of chars
in the string - substr Returns a portion of a string
- Params EXPR, OFFSET, LENGTH
- EXPR The string you are getting the substring
out of - OFFSET Reference point for beginning of string -
A negative value means to start that many
characters in from the END of the string - LENGTH How many characters the substring will
have - A negative value means to leave that
number of characters off the end of the result
13Example With Strings
str "Hi" name "Bob" str str . "
There" str . " " . name print "str\n" if
(name eq "Bob") print "NAME IS Bob\n"
else print "NAME IS NOT Bob\n" if (name
eq "Drew") print "NAME IS Drew\n" else
print "NAME IS NOT Drew\n" if (name
"Drew") print "NAME IS Drew\n" else print
"NAME IS NOT Drew\n" len
length(name) print "Name is len chars long\n"
Hi There Bob NAME IS Bob NAME IS NOT Drew NAME IS
Drew Name is 3 chars long
(Try not to program like this, with code on the
same lines as the curly braces!)
14Using substr For Substrings
str "Hi There, Bob!!" print "String
str\n" name substr(str, 10, -2) print
"Name name\n" str "Hi There,
George!!" print "String str\n" name
substr(str, 10, -2) print "Name
name\n" print "String str\n" substr(str,
10, -2) "Joe" print "New String
str\n" print "String str\n" name
substr(str, -5, 3) print "Name name\n"
Can use substr as rvalue to return substring Can
use substr as an lvalue to change that substring
to another (length is adjusted appropriately) Whi
le the top two examples work for ANY name, this
last example only works for three letter names...
String Hi There, Bob!! Name Bob String Hi
There, George!! Name George String Hi There,
George!! New String Hi There, Joe!! String Hi
There, Joe!! Name Joe
15Data Type Conversions
- One cool, but somewhat dangerous, feature is
automatic type conversion
print "Enter an integer " val ltSTDINgt
Read in a line from the keyboard chop val
Read in from keyboard as string print "Val
val\n" val 15 val is used as
an integer now print "Val val\n" val val .
"24" A string again, as I append to it print
"Val val\n" val - 3.5 Now, I treat
it as a float print "Val val\n"
Enter an integer 8 Val 8 Val 23 Val 2324 Val
2320.5
16List Information
- A list in PERL is very similar to an array in
C/C - When using the list as a whole, the name starts
with '_at_' - When indexing into a list, the operator is
used - When using the operator, the result is not a
list, but rather a scalar - Therefore, when indexing into an array to get a
scalar, the name starts with ''
Since "sort" works on an entire array, and the
square brackets are not used, the array "myList"
start with the '_at_' character. Since we derefence
this array to get a scalar value stored inside
the array, the same name "myList" starts with ''
to denote the result is a scalar
sort _at_myList print "myList2"
17Fun With Lists
- Converting between lists and the scalars in them
is easy in PERL
thirtyVal 30 Assign a list of scalars into
a list _at_myList (10, 0, thirtyVal, 50,
20) foreach i (_at_myList) loop over each
element of the array
_at_myList and print "i " assign the
"current" scalar to
"i" print "\n" print "_at_myList\n" print
out whole list _at_myList sort _at_myList sort the
list print "_at_myList\n" print out the sorted
list Assign list to individual scalars (a, b,
c, d, e) _at_myList print "a b c d
e\n" (a, b, c, d, e) reverse
_at_myList print "a b c d e\n"
10 0 30 50 20 10 0 30 50 20 0 10 20 30 50 0 10
20 30 50 50 30 20 10 0
18Hash Information
- A hash is similar to the STL's "map"
- Similar to an array, but the index values do not
have to be integers - Order of insertion should not be important when
using a hash - When using the hash as a whole, the name starts
with '' - When indexing into a hash, the operator is
used - When using the operator, a scalar results, so
the hash name starts with a '' - The index values are called the "keys" of the
hash - The values of a hash at different indexes are
called "values" of the hash
Since "keys" works on an entire hash, the name
"myHash" starts with '' Since I index into
myHash to get a scalar value, the name starts
with '' (both left and right side are scalars)
keys myHash foo myHash"IndexStr"
19Fun With Hashes
open(INFILE, "test.txt") while (ltINFILEgt)
chop _at_chunks split('')
myHashchunks1 chunks0 _at_keyList
keys myHash Returns a list of all
keys _at_valList values myHash Returns a list
of all values print "keys _at_keyList\n" print
"vals _at_valList\n" _at_keyList sort
_at_keyList foreach i (_at_keyList) print "i
myHashi\n"
test.txt
oneeleven twotwelve threethirteen fourfourteen
Screen output
keys fourteen thirteen twelve eleven vals four
three two one eleven one fourteen
four thirteen three twelve two
20Regular Expressions, Replacement, Etc
- PERL allows you to easily search and replace text
- The binding operator, , is used if operation is
not being performed on the default variable - Syntax s/search pattern/replace pattern/flag
- For example, to replace the first instance of
hello with goodbye in the string string, use the
following - string s/hello/goodbye/
- To replace ALL instances of hello with goodbye in
the string string, use the "global" flag as
follows - string s/hello/goodbye/g
- To replace ALL instances of "two" with "three" in
the default variable - s/two/three/g
21Search And Replace, Example
initialStr "abcdefgabcdefgabcdefg" print
"Initial initialStr\n" blah
initialStr blah s/bcd/dcb/ Change the
first print "\nblah\n" "bcd" to
"dcb" blah initialStr blah s/bcd/dcb/g
Change ALL "bcd"s print "\nblah\n" to
"dbc"s instead blah initialStr blah
s/bcd//g Remove all "bcd"s print
"\nblah\n"
Initial abcdefgabcdefgabcdefg adcbefgabcdefgabcd
efg adcbefgadcbefgadcbefg aefgaefgaefg
22Special Patterns
- There are some special patterns that allow
complex matching - \s This matches any whitespace character
- \S This matches any non-whitespace character
- . This matches any character
- There are also some manipulators to change the
behavior of a search - Means to search only at the beginning of the
text - After another character, means match 1 or more
of that character - After another character, means match 0 or more
of that character - ? After another character, means match 0 or 1
time - For a better explanation, see the example on the
next page
23Searching With Special Patterns
str " Hello. How are you?" print "BEFORE
str\n" str s/\s//g Remove ALL
whitespace print "AFTER str\n" str
"Hello. How are you?" print "\nBEFORE
str\n" str s/\s// Remove first chunk of
whitespace print "AFTER str\n" str
"Hello. How are you?" print "\nBEFORE
str\n" str s/\s//g Remove leading
whitespace print "AFTER str\n" str "
Hello. How are you?" print "\nBEFORE
str\n" str s/\s// Remove leading
whitespace print "AFTER str\n" str "x1 x2
y3 x4 z5 x z6" print "\nBEFORE str\n" str
s/x. /M /g Anytime the character x is print
"AFTER str\n" followed by another character
and then a space,
replace that pattern
with "M "
BEFORE Hello. How are you? AFTER
Hello.Howareyou? BEFORE Hello. How are
you? AFTER Hello.How are you? BEFORE Hello.
How are you? AFTER Hello. How are
you? BEFORE Hello. How are you? AFTER
Hello. How are you? BEFORE x1 x2 y3 x4 z5 x
z6 AFTER M M y3 M z5 x z6
24Executing Scripts
- A PERL script is just a text file containing the
code - PERL is interpreted, it does not need to be
compiled into object code. - You can execute a PERL script by running the
interpreter and providing the filename. - perl myScript.perl
- You can also make your PERL script realize it is
a PERL script, so that you can run it directly
without having to run the interpreter yourself - The first line of your script should be
- !/path/to/interpreter
- Your script's permissions should be changed so
that it is considered executable - Now when you type the name of your program, it
knows the interpreter can be found at the path
you provided, and your script will execute
25Executing Scripts, Example
UNIX Session
Script hello.perl
25 temp - ls -l hello -rw------- 1
help2801 24 Nov 29 1220 hello.perl -rw-----
-- 1 help2801 37 Nov 29 1223
hello2.perl 26 temp - chmod 700
hello2.perl 27 temp - ls -l
hello -rw------- 1 help2801 24 Nov 29
1220 hello.perl -rwx------ 1 help2801 37
Nov 29 1223 hello2.perl 28 temp -
./hello.perl ./hello.perl Permission denied.
29 temp - perl hello.perl Hello World! 30
temp - ./hello2.perl Hello World! 31 temp -
print "Hello World!\n"
Script helloExec.perl
!/usr/um/bin/perl print "Hello World!\n"
26Command Line Arguments
- Most PERL scripts do not take in user input from
the keyboard - All input to the program is done via data files,
or the command line - PERL has a variable called "_at_ARGV"
- This is obviously a list
- It contains the values the user entered on the
command line - Determine how many params were given by treating
array as a scalar - argc _at_ARGV
- Index directly into array with notation
- val1 ARGV0
- Use shift to get values
- val1 shift
- This actually takes the first param off the front
of the list, reducing its size!
27Command Line Arguments, Example
!/usr/um/bin/perl argc _at_ARGV print "Num
Args argc\n" argc 2 die "Provide TWO
params to add!\n" print "ORIG ARGV0
ARGV0\n" sum ARGV0 ARGV1 print
"Sum sum\n" val1 shift Takes value off
front!! print "AFTER SHIFT ARGV0
ARGV0\n" val2 shift Takes value off
front!! print "AFTER SHIFT ARGV0
ARGV0\n" sum2 val1 val2 print "Sum2
sum2\n" str val1 . val2 print "Args
appended str\n"
50 temp - ./cmdLine.perl Num Args
0 Provide TWO params to add! 51 temp -
./cmdLine.perl 87 72 12 Num Args 3 Provide TWO
params to add! 52 temp - ./cmdLine.perl 87
12 Num Args 2 ORIG ARGV0 87 Sum 99 AFTER
SHIFT ARGV0 12 AFTER SHIFT ARGV0 Sum2
99 Args appended 8712 53 temp -
28System Calls In PERL
- Two ways to perform system calls in PERL
- retVal system(cmdStr)
- Executes the command described in cmdStr, and
returns the exit status of the process times 256
(weird, I know) - Like the C/C system() function, output that
results is not available - output cmdStr
- Executes the command described in cmdStr, and
returns the output that the command sent to
standard output - This allows you to parse the commands output and
extract values of interest
29System Calls In PERL, Example
cmd "ls -l" print "Using system - output will
go to screen only!\n" retVal
system(cmd) print "retVal retVal\n" print
"Using back-ticks - output will go to return
variable!\n" listing cmd _at_listingLines
split("\n", listing) foreach curLine
(_at_listingLines) _at_elems split(/\s/,
curLine) if (substr(curLine, 0, 1) ne "d")
if (substr(curLine, 3, 1) eq "x")
print " Executable non-directory
elems8\n"
Using system - output will go to screen
only! total 36 drwx------ 2 morgana
csestudents 4096 Mar 26 2205
aDirectory -rwx------ 1 morgana csestudents
74 Mar 26 2206 fin.pl -rw------- 1
morgana csestudents 4492 Mar 26 2205
img3.gif -rwx------ 1 morgana csestudents
16028 Mar 26 2204 tryme -rw------- 1 morgana
csestudents 3288 Mar 26 2204
tryme.cpp retVal 0 Using back-ticks - output
will go to return variable! Executable
non-directory fin.pl Executable non-directory
tryme
30Parallel Programming In PERL
- When a fork operation is performed, the current
process running your PERL script splits itself
into two unique processes - The original process continues to execute after
the fork operation - A second (child) process also continues executing
after the fork operation - This implies that fork is called ONCE, but
returns to two different processes - The call to fork will return different values to
the two processes - To the original (parent) process, fork returns
the process ID of the new chid process that was
created - To the new (child) process, fork returns 0.
- You can check the return value in the script to
specify which code is to be executed by the
parent, and which code is to be executed by the
child
31More on fork
- To perform a fork, simply say
- pid fork
- After the fork operation, you should have an
if..else - if (pid ! 0)
- ... code to be executed by the parent
- else
- code to be executed by the child
- If the parent relies on the child completing
before it can continue its own operation, a
system call is more appropriate! - You can also tell the parent to wait for its
child process at a certain point - Child processes that finish do NOT get removed
from the system automatically you need to
reap the zombie processes
32Example of Using fork
!/usr/bin/perl print "Starting (Prints
ONCE)\n" totalRuns 20 pid fork if
(pid ! 0) for (i 0 i lt totalRuns / 2
i) print "Parent running run number
i\n" for (j 0 j lt 100000 j)
k j i just waste some time
else for (i totalRuns / 2 i lt
totalRuns i) print "Child running
run number i\n" for (j 0 j lt 100000
j) k j i just waste some
time print "Done (Prints TWICE)\n"
13 temp - ./forkTest.pl Starting (Prints
ONCE) Child running run number 10 Parent running
run number 0 Parent running run number 1 Child
running run number 11 Parent running run number
2 Child running run number 12 Parent running run
number 3 Parent running run number 4 Child
running run number 13 Parent running run number
5 Parent running run number 6 Child running run
number 14 Parent running run number 7 Child
running run number 15 Parent running run number
8 Child running run number 16 Child running run
number 17 Parent running run number 9 Child
running run number 18 Done (Prints TWICE) 14
temp - Child running run number 19 Done (Prints
TWICE)
Note UNIX prompt shows up in middle of output.
Why?
33More Functions To Deal With Processes
- The waitpid function checks to see if any of the
process' children have finished (died), and if
so, it "reaps the dead child process" - Note It is somewhat of a misleading name,
because it does not "wait" until a process dies
to continue if no processes are dead, waitpid
will just return -1 indicating that no processes
were reaped - Obtain the waitpid function by including the
following line in your script - use POSIX "sys_wait_h"
- Pass in a specific process id to check if it is
dead, or pass in "-1" to say "reap any child
processes that have died" - Pass in WNOHANG as the second argument
- The sleep function causes a process to stop
processing for a certain number of seconds - Usually used to allow child processes time to
finish
34Example 2 Using "fork"
18 temp - ./forkTest.pl Starting (Prints
ONCE) Child running run number 10 Parent running
run number 0 Parent running run number 1 Child
running run number 11 Parent running run number
2 Child running run number 12 Parent running run
number 3 Parent running run number 4 Child
running run number 13 Parent running run number
5 Parent running run number 6 Child running run
number 14 Parent running run number 7 Child
running run number 15 Parent running run number
8 Child running run number 16 Parent running run
number 9 Child running run number 17 Child
running run number 18 Waiting for child to
finish! Child running run number 19 Done (Prints
TWICE) Done (Prints TWICE) 19 temp -
!/usr/bin/perl use POSIX "sys_wait_h" print
"Starting (Prints ONCE)\n" totalRuns
20 pid fork if (pid ! 0) for (i 0
i lt totalRuns / 2 i) print "Parent
running run number i\n" for (j 0 j lt
100000 j) k j i just
waste some time reapedPid
waitpid(-1, WNOHANG) while (reapedPid !
pid) print "Waiting for child to
finish!\n" sleep(1) reapedPid
waitpid(-1, WNOHANG) else for (i
totalRuns / 2 i lt totalRuns i)
print "Child running run number i\n" for
(j 0 j lt 100000 j) k j
i just waste some time print "Done
(Prints TWICE)\n"
New Code
Now, parent finishes ONLY after its child dies
35PERLs exec Function
- The exec function transforms a currently
running process into a process executing
something else - It literally replaces the current program being
executed by the process with a different program
at the users request - When the program that is being executed by the
process finishes, the process is done - That is, exec DOES NOT RETURN TO THE ORIGINAL
PROCESS - Often, after a process is forked, its child is
execed to a different program
36Example of Using exec
!/usr/bin/perl use POSIX "sys_wait_h" print
"Starting (Prints ONCE)\n" totalRuns 20 pid
fork if (pid ! 0) for (i 0 i lt
totalRuns / 2 i) print "Parent
running run number i\n" for (j 0 j lt
100000 j) k j i
reapedPid waitpid(-1, WNOHANG) while
(reapedPid ! pid) print "Waiting for
child to finish!\n" sleep(1) reapedPid
waitpid(-1, WNOHANG) else cmd
"system.pl" print "Child process becoming
cmd!\n" exec(cmd) print "This will never
print!\n" print "Done (Prints ONCE)\n"
37 foo - ./execTest.pl Starting (Prints
ONCE) Child process becoming system.pl! Using
system - output will go to screen only! total
44 drwx------ 2 morgana csestudents 4096
Mar 26 2205 aDirectory -rwx------ 1 morgana
csestudents 651 Apr 8 2036
execTest.pl -rwx------ 1 morgana csestudents
74 Mar 26 2206 fin.pl -rw------- 1
morgana csestudents 4492 Mar 26 2205
img3.gif -rwx------ 1 morgana csestudents
491 Apr 8 2035 system.pl -rwx------ 1
morgana csestudents 16028 Mar 26 2204
tryme -rw------- 1 morgana csestudents
3288 Mar 26 2204 tryme.cpp Parent running run
number 0 Parent running run number 1 retVal
0 Using back-ticks - output will go to return
variable! Executable non-directory
execTest.pl Executable non-directory fin.pl
Executable non-directory system.pl Executable
non-directory tryme Parent running run number
2 Parent running run number 3 Parent running run
number 4 Parent running run number 5 Parent
running run number 6 Parent running run number
7 Parent running run number 8 Parent running run
number 9 Done (Prints ONCE) 38 foo -
Output of "system.pl" script