Title: Introduction to Perl Part I, II, and III
1Introduction to PerlPart I, II, and III
- By Bridget Thomson McInnes
- 20 January 2004
2What is Perl?
- Perl is a Portable Scripting Language
- No compiling is needed.
- Runs on Windows, UNIX and LINUX
- Fast and easy text processing capability
- Fast and easy file handling capability
- Written by Larry Wall
- Perl is the language for getting your job done.
3How to Access Perl
- Off the school network
- Located on the csdev machines at
/usr/local/bin/perl - To install at home
- www.perl.com Has rpm's for Linux
- www.activestate.com Has binaries for Windows
- Latest Version is 5.8
- To check if Perl is working and the version
number - perl -v
4Resources For Perl
- Books
- Learning Perl
- By Larry Wall
- Published by O'Reilly
- Programming Perl
- By Larry Wall,Tom Christiansen and Jon Orwant
- Published by O'Reilly
- Web Site
- http//safari.oreilly.com
- Contains both Learning Perl and Programming Perl
in ebook form
5Web Sources for Perl
- Web
- www.perl.com
- www.perldoc.com
- www.perl.org
- www.perlmonks.org
6The Basic Hello World Program
- Program
- !/usr/local/bin/perl -w
- print Hello World!\n
- Save this as hello.pl
- Give it executable permissions
- chmod ugx hello.pl
- Run it as follows
- ./hello.pl
7Hello World Observations
- .pl extension is optional but is commonly used
- The first line !/usr/local/bin/perl tells UNIX
where to find Perl - -w switches on warning not required but a
really good idea - Second Line brackets are not needed around the
argument of the print function
8Numerical Literals
- Numerical Literals
- 6 Integer
- 12.6 Floating Point
- 1e10 Scientific Notation
- 6.4E-33 Scientific Notation
- 4_348_348 Underscores instead of commas
for long numbers
9String Literals
- String Literals
- There is more than on way to do it!
- 'Just don't create a file called -rf.'
- Beauty?\nWhat's that?\n
-
- Real programmers can write assembly in any
language. - Quotes from Larry Wall
10Types of Variables
- Types of variables
- Scalar variables a, b, c
- Array variables _at_array
- Hash variables hash
- File handles STDIN, SRC, DEST
- Variables do not need to be declared
- Variable type (int, char, ...) is decided at run
time - a 5 now an integer
- a perl now a string
11Operators on Scalar Variables
- Numeric and Logic Operators
- Typical , -, , /, , , --, , -, , /,
, , ! ect - Not typical for exponentiation
- String Operators
- Concatenation . - similar to strcat
- first_name Larry
- last_name Wall
- full_name first_name . . last_name
12Equality Operators for Strings
- Equality/ Inequality eq and ne
- language Perl
- if (language Perl) ... Wrong!
- if (language eq Perl) ... Correct
- Use eq / ne rather than / ! for strings
13Relational Operators for Strings
- Greater than
- Numeric gt String gt
- Greater than or equal to
- Numeric gt String ge
- Less than
- Numeric gt String lt
- Less than or equal to
- Numeric gt String le
14String Functions
- Convert to upper case
- name uc(name)
- Convert only the first char to upper case
- name ucfirst(name)
- Convert to lower case
- name lc(name)
- Convert only the first char to lower case
- name lcfirst(name)
15A String Example Program
- !/usr/local/bin/perl
- var1 larry
- var2 moe
- var3 shemp
- print ucfirst(var1) Prints 'Larry'
- print uc(var2) Prints 'MOE'
- print lcfirst(uc(var3)) Prints 'sHEMP'
16Variable Interpolation
- Perl looks for variables inside strings and
replaces them with their value - stooge Larry
- print stooge is one of the three stooges.\n
- Produces the output
- Larry is one of the three stooges.
- This does not happen when you use single quotes
- print 'stooge is one of the three stooges.\n
- Produces the output
- stooge is one of the three stooges.\n
17Character Interpolation
- List of character escapes that are recognized
when using double quoted strings - \n newline
- \t tab
- \r carriage return
- Common Example
- print Hello\n prints Hello and then a
return
18Numbers and Strings are Interchangeable
- If a scalar variable looks like a number and Perl
needs a number, it will use it as a number - a 4 a number
- print a 18 prints 60
- b 50 looks like a string, but ...
- print b 10 will print 40!
19If ... else ... statements
- Similar to C/C - except the scope braces are
REQUIRED!! - if ( os eq Linux )
- print Sweet!\n
-
- elsif ( os eq Windows )
- print Time to move to Linux, buddy!\n
-
- else
- print Hmm...!\n
20Unless ... else Statements
- Unless Statements are the opposite of if ... else
statements. - Unless (os eq Linux)
- print Time to move to Linux, buddy!\n
-
- else
- print Sweet!\n
-
- And again remember the braces are required!
21While Loop
- While loop Similar to C/C but again the braces
are required!! - Example
- i 0
- while ( i lt 1000 )
- print i\n
- i
22Until Loop
- The until function evaluates an expression
repeatedly until a specific condition is met. - Example
- i 0
- until (i 1000)
- print i\n
- i
23For Loops
- Like C/C
- Example
- for ( i 0 i lt 1000 i )
- print i\n
-
- Another way to create a for loop
- Example
- for i(0..1000)
- print i\n
-
24Moving around in a Loop
- Where you would use continue in C, use next.
- Where you would use break in C, use last.
- What is the output for the following code
snippet - for ( i 0 i lt 10 i)
- if (i 1 i 3) next
- if(i 5) last
- print i\n
25Answer
26Arrays
- Array variable is denoted by the _at_ symbol
- _at_array ( Larry, Curly, Moe )
- To access the whole array, use the whole array
- print _at_array prints Larry Curly Moe
- Notice that you do not need to loop through the
whole array to print it Perl does this for you
27Arrays cont
- To access one element of the array use
- Why? Because every element in the array is scalar
- print array0\n prints Larry
- Question
-
- What happens if we access array3 ?
- Answer Nothing
28Arrays cont ...
- To find the index of the last element in the
array - print array prints 2 in the previous
example - Note another way to find the number of elements
in the array - array_size _at_array
- array_size now has 3 in the above example
because there are 3 elements in the array
29Sorting Arrays
- Perl has a built in sort function
- Two ways to sort
- Default sorts in a standard string comparisons
order - sort LIST
- Usersub create your own subroutine that returns
an integer less than, equal to or greater than 0 - Sort USERSUB LIST
- The ltgt and cmp operators make creating sorting
subroutines very easy
30Numerical Sorting Example
- !/usr/local/bin/perl -w
- _at_unsortedArray (3, 10, 76, 23, 1, 54)
- _at_sortedArray sort numeric _at_unsortedArray
- print _at_unsortedArray\n prints 3 10 76 23 1
54 - print _at_sortedArray\n prints 1 3 10 23 54 76
- sub numeric
- a ltgt b
31String Sorting Example
!/usr/local/bin/perl -w _at_unsortedArray
(Larry, Curly, moe) _at_sortedArray sort
lc(a) cmp lc(b) _at_unsortedArray print
_at_unsortedArray\n prints Larry Curly
moe print _at_sortedArray\n prints Curly
Larry moe
32Foreach
- Foreach allows you to iterate over an array
- Example
- foreach element (_at_array)
- print element\n
-
- This is similar to
- for (i 0 i lt array i)
- print arrayi\n
-
33Sorting with Foreach
- The sort function sorts the array and returns the
list in sorted order. - Example
- _at_array( Larry, Curly, Moe)
- foreach element (sort _at_array)
- print element
-
- Prints the elements in sorted order
- Curly Larry Moe
34Strings to Arrays split
- Split a string into words and put into an array
- _at_array split( / /, Larry Curly Moe )
- creates the same array as we saw
previously - Split into characters
- _at_stooge split( //, curly )
- array _at_stooge has 5 elements c, u, r, l, y
35Split cont..
- Split on any character
- _at_array split( //, 10203040)
- array has 4 elements 10, 20, 30, 40
- Split on Multiple White Space
- _at_array split(/\s/, this is a test
- array has 4 elements this, is, a, test
- More on \s later
36Arrays to Strings
- Array to space separated string
- _at_array (Larry, Curly, Moe)
- string join( , _at_array)
- string Larry Curly Moe
- Array of characters to string
- _at_stooge (c, u, r, l, y)
- string join( , _at_stooge )
- string curly
37Joining Arrays cont
- Join with any character you want
- _at_array ( 10, 20, 30, 40 )
- string join( , _at_array)
- string 10203040
- Join with multiple characters
- _at_array 10, 20, 30, 40)
- string join(-gt, _at_array)
- string 10-gt20-gt30-gt40
38Arrays as Stacks and Lists
- To append to the end of an array
- _at_array ( Larry, Curly, Moe )
- push (_at_array, Shemp )
- print array3 prints Shemp
- To remove the last element of the array (LIFO)
- elment pop _at_array
- print element prints Shemp
- _at_array now has the original elements
- (Larry, Curly, Moe)
39Arrays as Stacks and Lists
- To prepend to the beginning of an array
- _at_array ( Larry, Curly, Moe )
- unshift _at_array, Shemp
- print array3 prints Moe
- print array0 prints Shemp
- To remove the first element of the array
- element shift _at_array
- print element prints Shemp
- The array now contains only
- Larry, Curly, Moe
40Hashes
- Hashes are like array, they store collections of
scalars - ... but unlike arrays, indexing is by name
- Two components to each hash entry
- Key example name
- Value example phone number
- Hashes denoted with
- Example phoneDirectory
- Elements are accessed using (like in arrays)
41Hashes continued ...
- Adding a new key-value pair
- phoneDirectoryShirly 7267975
- Note the to specify scalar context!
- Each key can have only one value
- phoneDirectoryShirly 7265797
- overwrites previous assignment
- Multiple keys can have the same value
- Accessing the value of a key
- phoneNumber phoneDirectoryShirly
42Hashes and Foreach
- Foreach works in hashes as well!
- foreach person (keys phoneDirectory)
- print person phoneDirectoryperson
-
- Never depend on the order you put key/values in
the hash! Perl has its own magic to make hashes
amazingly fast!!
43Hashes and Sorting
- The sort function works with hashes as well
- Sorting on the keys
- foreach person (sort keys phoneDirectory)
- print person directoryperson\n
-
- This will print the phoneDirectory hash table in
alphabetical order based on the name of the
person, i.e. the key.
44Hash and Sorting cont...
- Sorting by value
- foreach person (sort phoneDirectorya ltgt
phoneDirectoryb keys phoneDirectory) - print person phoneDirectoryper
son\n -
- Prints the person and their phone number in the
order of their respective phone numbers, i.e.
the value.
45A Quick Program using Hashes
- Count the number of Republicans in an array
- seen () initialize hash to empty
- _at_politArray ( R, R, D, I, D, R, G
) - foreach politician (_at_politArray)
- seenpolitician
-
- print Number of Republicans seen'R'\n
46Slightly more advanced program
- Count the number of parties represented, and by
how much! - seen () initialize hash to empty
- _at_politArray ( R, R, D, I, D, R, G
) - foreach politician (_at_politArray)
- seenpolitician
-
- foreach party (keys seen)
- print Party party. Num reps
seenparty\n
47Command Line Arguments
- Command line arguments in Perl are extremely
easy. - _at_ARGV is the array that holds all arguments
passed in from the command line. - Example
- ./prog.pl arg1 arg2 arg3
- _at_ARGV would contain ('arg1', arg2', 'arg3)
- ARGV returns the number of command line
arguments that have been passed. - Remember array is the size of the array!
48Quick Program with _at_ARGV
- Simple program called log.pl that takes in a
number and prints the log base 2 of that number - !/usr/local/bin/perl -w
- log log(ARGV0) / log(2)
- print The log base 2 of ARGV0 is log.\n
- Run the program as follows
- log.pl 8
- This will return the following
- The log base 2 of 8 is 3.
49Another Example Program
- You want to print the binary form of an integer
- !/usr/local/bin/perl -w
- foreach integer (_at_ARGV)
- converts the integer to a 32 bit binary
number - _at_binarysplit//,unpack(B32,pack(N,integer))
- Store the last 4 elements of _at_binary into
_at_bits - _at_bits _at_binary28..binary
- Print the integer and its binary form
- print integer _at_bits\n
50File Handlers
- Very simple compared to C/ C !!!
- Are not prefixed with a symbol (, _at_, , ect)
- Opening a File
- open (SRC, my_file.txt)
- Reading from a File
- line ltSRCgt reads upto a newline character
- Closing a File
- close (SRC)
51File Handlers cont...
- Opening a file for output
- open (DST, gtmy_file.txt)
- Opening a file for appending
- open (DST, gtgtmy_file.txt)
- Writing to a file
- print DST Printing my first line.\n
- Safeguarding against opening a non existent file
- open (SRC, file.txt) die Could not open
file.\n
52File Test Operators
- Check to see if a file exists
- if ( -e file.txt)
- The file exists!
-
- Other file test operators
- -r readable
- -x executable
- -d is a directory
- -T is a text file
53Quick Program with File Handles
- Program to copy a file to a destination file
- !/usr/local/bin/perl -w
- open(SRC, file.txt) die Could not open
source file.\n - open(DSTlt gtnewfile.txt)
- while ( line ltSRCgt )
- print DST line
-
- close SRC
- close DST
54Some Default File Handles
- STDIN Standard Input
- line ltSTDINgt takes input from stdin
- STDOUT Standard output
- print STDOUT File handling in Perl is sweet!\n
- STDERR Standard Error
- print STDERR Error!!\n
55The ltgt File Handle
- The empty file handle takes the command line
file(s) or STDIN - line ltgt
- If program is run ./prog.pl file.txt, this will
automatically open file.txt and read the first
line. - If program is run ./prog.pl file1.txt file2.txt,
this will first read in file1.txt and then
file2.txt ... you will not know when one ends and
the other begins.
56The ltgt File Handle cont...
- If program is run ./prog.pl, the program will
wait for you to enter text at the prompt, and
will continue until you enter the EOF character - CTRL-D in UNIX
57Example Program with STDIN
- Suppose you want to determine if you are one of
the three stooges - !/usr/local/bin/perl
- stooges (larry gt 1, moe gt 1, curly gt 1 )
- print Enter your name ?
- name ltSTDINgt chomp name
- if(stoogeslc(name))
- print You are one of the Three
Stooges!!\n - else
- print Sorry, you are not a Stooge!!\n
58Chomp and Chop
- Chomp function that deletes a trailing newline
from the end of a string. - line this is the first line of text\n
- chomp line removes the new line character
- print line prints this is the first
line of text without returning - Chop function that chops off the last character
of a string. - line this is the first line of text
- chop line
- print line prints this is the first line
of tex
59_
- Perl default scalar value that is used when a
variable is not explicitly specified. - Can be used in
- For Loops
- File Handling
- Regular Expressions discussed later
60_ and For Loops
- Example using _ in a for loop
- _at_array ( Perl, C, Java )
- for(_at_array)
- print _ . is a language I know\n
-
- Output
- Perl is a language I know.
- C is a language I know.
- Java is a language I know.
61_ and File Handlers
- Example in using _ when reading in a file
- while( ltgt )
- chomp _ remove the
newline char - _at_array split/ /, _ split the line
on white space
and stores data
in an array -
- Note
- The line read in from the file is automatically
store in the default scalar variable _
62_ and File Handling cont..
- Another example similar to the previous example
- while(ltgt)
- chomp removes
trailing newline chars - _at_array split/ / splits the line on
white - space and stores the data
- in the array
-
- Notes
- The functions chomp and split automatically
perform their respective operations on _.
63Example Program
- Count the number of words in a text and display
the top 10 most frequency words. - !/usr/local/bin/perl
- vocab () counter 0
- while(ltgt)
- chomp
- foreach element (split/ /) vocabelement
-
- foreach word (sort vocabbltgtvocaba
vocab) - print word vocabword\n
- if(counter 10) last counter
64Regular Expressions
- What are Regular Expressions .. a few
definitions. - Specifies a class of strings that belong to the
formal / regular languages defined by regular
expressions - In other words, a formula for matching strings
that follow a specified pattern. - Some things you can do with regular expressions
- Parse the text
- Add and/or replace subsections of text
- Remove pieces of the text
65Regular Expressions cont..
- A regular expression characterizes a regular
language - Examples in UNIX
- ls .c
- Lists all the files in the current directory that
are postfixed '.c' - ls .txt
- Lists all the files in the current directory that
are postfixed '.txt'
66Simple Example for ... ? Clarity
- In the simplest form, a regular expression is a
string of characters that you are looking for - We want to find all the words that contain the
string 'ing' in our text. - The regular expression we would use
- /ing/
67Simple Example cont...
- What would are program then look like
- !/usr/local/bin/perl
- while(ltgt)
- chomp
- _at_words split/ /, _
- foreach word(_at_words)
- if(wordm/ing/) print word\n
-
68Regular Expressions Types
- Regular expressions are composed of two types of
characters - Literals
- Normal text characters
- Like what we saw in the previous program ( /ing/
) - Metacharacters
- special characters
- Add a great deal of flexibility to your search
69Metacharacters
- Match more than just characters
- Match line position
- start of a line ( carat )
- end of a line ( dollar sign )
- Match any characters in a list ...
- Example
- /Bbridget/ matches Bridget or bridget
- /McIinnes/ matches McInnes or Mcinnes
70Our Simple Example Revisited
- Now suppose we only want to match words that end
in 'ing' rather than just contain 'ing'. - How would we change are regular expressions to
accomplish this - Previous Regular Expression
- word m/ ing /
- New Regular Expression
- wordm/ ing /
-
71Ranges of Regular Expressions
- Ranges can be specified in Regular Expressions
- Valid Ranges
- A-Z Upper Case Roman Alphabet
- a-z Lower Case Roman Alphabet
- A-Za-z Upper or Lower Case Roman Alphabet
- A-F Upper Case A through F Roman
Characters - Invalid Ranges
- a-Z Not Valid
- A-z Not Valid
- F-A Not Valid
72Ranges cont ...
- Ranges of Digits can also be specified
- 0-9 Valid
- 9-0 Invalid
- Negating Ranges
- / 0-9 /
- Match anything except a digit
- / a /
- Match anything except an a
- / A-Z /
- Match anything that starts with something
other than a single upper case
letter - First start of line
- Second negation
73Our Simple Example Again
- Now suppose we want to create a list of all the
words in our text that do not end in 'ing' - How would we change are regular expressions to
accomplish this - Previous Regular Expression
- word m/ ing /
- New Regular Expression
- wordm/ ing /
74Literal Metacharacters
- Suppose that you actually want to look for all
strings that equal '' in your text - Use the \ symbol
- / \ / Regular expression to search for
- What does the following Regular Expressions
Match? - / A - Z /
- Matches any line that contains ( A-Z or )
followed by
75Patterns provided in Perl
- Some Patterns
- \d 0 9
- \w a z A z 0 9 _
- \s \r \t \n \f (white space pattern)
- \D 0 - 9
- \W a z A Z 0 9
- \S \r \t \n \f
- Example 19\d\d
- Looks for any year in the 1900's
76Using Patterns in our Example
- Commonly words are not separated by just a single
space but by tabs, returns, ect... - Let's modify our split function to incorporate
multiple white space - !/usr/local/bin/perl
- while(ltgt)
- chomp
- _at_words split/\s/, _
- foreach word(_at_words)
- if(wordm/ing/) print word\n
-
-
77Word Boundary Metacharacter
- Regular Expression to match the start or the end
of a 'word' \b - Examples
- / Jeff\b / Match Jeff but not Jefferson
- / Carol\b / Match Chris but not Caroline
- / Rollin\b / Match Rollin but not Rolling
- /\bform / Match form or formation but not
Information - /\bform\b/ Match form but neither information
nor formation
78DOT Metacharacter
- The DOT Metacharacter, '.' symbolizes any
character except a new line - / b . bble/
- Would possibly return bobble, babble, bubble
- / . oat/
- Would possibly return boat, coat, goat
- Note remember '.' usually means a bunch of
anything, this can be handy but also can have
hidden ramifications.
79PIPE Metacharacter
- The PIPE Metacharacter is used for alternation
- / Bridget (Thomson McInnes) /
- Match Bridget Thomson or Bridget McInnes but
NOT Bridget Thomson McInnes - / B bridget /
- Match B or bridget
- / ( B b ) ridget /
- Match Bridget or bridget at the beginning of a
line
80Our Simple Example
- Now with our example, suppose that we want to not
only get all words that end in 'ing' but also
'ed'. - How would we change are regular expressions to
accomplish this - Previous Regular Expression
- word m/ ing /
- New Regular Expression
- wordm/ (inged) /
81The ? Metacharacter
- The metacharacter, ?, indicates that the
character immediately preceding it occurs zero or
one time - Examples
- / worl?ds /
- Match either 'worlds' or 'words'
- / m?ethane /
- Match either 'methane' or 'ethane'
82The Metacharacter
- The metacharacter, , indicates that the
characterer immediately preceding it occurs zero
or more times - Example
- / abc/ Match 'ac', 'abc', 'abbc', 'abbbc'
ect... - Matches any string that starts with an a, if
possibly followed by a sequence of b's and ends
with a c. - Sometimes called Kleene's star
83Our Simple Example again
- Now suppose we want to create a list of all the
words in our text that end in 'ing' or 'ings' - How would we change are regular expressions to
accomplish this - Previous Regular Expression
- word m/ ing /
- New Regular Expression
- wordm/ ings? /
84Modifying Text
- Match
- Up to this point, we have seen attempt to match a
given regular expression - Example variable m/ regex /
- Substitution
- Takes match one step further if there is a
match, then replace it with the given string - Example variable s/ regex / replacement
- var / Thomson / McInnes /
- var / Bridgette / Bridget /
85Substitution Example
- Suppose when we find all our words that end in
'ing' we want to replace the 'ing' with 'ed'. - !/usr/local/bin/perl -w
- while(ltgt)
- chomp _
- _at_words split/ \s/, _
- foreach word(_at_words)
- if(words/ing/ed/) print
word\n -
86Special Variable Modified by a Match
-
- Copy of text matched by the regex
- '
- A copy of the target text in from of the match
-
- A copy of the target text after the match
- 1, 2, 3, ect
- The text matched by 1st, 2nd, ect., set of
parentheses. Note 0 is not included here -
- A copy of the highest numbered 1, 2, 3, ect..
87Our Simple Example once again
- Now lets revise are program to find all the words
that end in 'ing' without splitting our line of
text into an array of words - !/usr/local/bin/perl -w
- while(ltgt)
- chomp _
- if(_/(A-Za-zing\b)/) print "\n"
88Thank you ?