Title: Introduction to Perl Part II
1Introduction to PerlPart II
- By Bridget Thomson McInnes
- 22 January 2004
2File Handlers
- Very simple compared to C/ C !!!
- Are not prefixed with a symbol (, _at_, , ect)
- Opening a File
- open (SRC, my_file.txt)
- Reading from a File
- line ltSRCgt reads upto a newline character
- Closing a File
- close (SRC)
3File Handlers cont...
- Opening a file for output
- open (DST, gtmy_file.txt)
- Opening a file for appending
- open (DST, gtgtmy_file.txt)
- Writing to a file
- print DST Printing my first line.\n
- Safeguarding against opening a non existent file
- open (SRC, file.txt) die Could not open
file.\n
4File Test Operators
- Check to see if a file exists
- if ( -e file.txt)
- The file exists!
-
- Other file test operators
- -r readable
- -x executable
- -d is a directory
- -T is a text file
5Quick Program with File Handles
- Program to copy a file to a destination file
- !/usr/local/bin/perl -w
- open(SRC, file.txt) die Could not open
source file.\n - open(DSTlt gtnewfile.txt)
- while ( line ltSRCgt )
- print DST line
-
- close SRC
- close DST
6Some Default File Handles
- STDIN Standard Input
- line ltSTDINgt takes input from stdin
- STDOUT Standard output
- print STDOUT File handling in Perl is sweet!\n
- STDERR Standard Error
- print STDERR Error!!\n
7The ltgt File Handle
- The empty file handle takes the command line
file(s) or STDIN - line ltgt
- If program is run ./prog.pl file.txt, this will
automatically open file.txt and read the first
line. - If program is run ./prog.pl file1.txt file2.txt,
this will first read in file1.txt and then
file2.txt ... you will not know when one ends and
the other begins.
8The ltgt File Handle cont...
- If program is run ./prog.pl, the program will
wait for you to enter text at the prompt, and
will continue until you enter the EOF character - CTRL-D in UNIX
9Example Program with STDIN
- Suppose you want to determine if you are one of
the three stooges - !/usr/local/bin/perl
- stooges (larry gt 1, moe gt 1, curly gt 1 )
- print Enter your name ?
- name ltSTDINgt chomp name
- if(stoogeslc(name))
- print You are one of the Three
Stooges!!\n - else
- print Sorry, you are not a Stooge!!\n
10Chomp and Chop
- Chomp function that deletes a trailing newline
from the end of a string. - line this is the first line of text\n
- chomp line removes the new line character
- print line prints this is the first
line of text without returning - Chop function that chops off the last character
of a string. - line this is the first line of text
- chop line
- print line prints this is the first line
of tex
11Regular Expressions
- What are Regular Expressions .. a few
definitions. - Specifies a class of strings that belong to the
formal / regular languages defined by regular
expressions - In other words, a formula for matching strings
that follow a specified pattern. - Some things you can do with regular expressions
- Parse the text
- Add and/or replace subsections of text
- Remove pieces of the text
12Regular Expressions cont..
- A regular expression characterizes a regular
language - Examples in UNIX
- ls .c
- Lists all the files in the current directory that
are postfixed '.c' - ls .txt
- Lists all the files in the current directory that
are postfixed '.txt'
13Simple Example for ... ? Clarity
- In the simplest form, a regular expression is a
string of characters that you are looking for - We want to find all the words that contain the
string 'ing' in our text. - The regular expression we would use
- /ing/
14Simple Example cont...
- What would are program then look like
- !/usr/local/bin/perl
- while(ltgt)
- chomp
- _at_words split/ /
- foreach word(_at_words)
- if(wordm/ing/) print word\n
-
15Regular Expressions Types
- Regular expressions are composed of two types of
characters - Literals
- Normal text characters
- Like what we saw in the previous program (
/ing/ ) - Metacharacters
- special characters
- Add a great deal of flexibility to your search
16Metacharacters
- Match more than just characters
- Match line position
- start of a line ( carat )
- end of a line ( dollar sign )
- Match any characters in a list ...
- Example
- /Bbridget/ matches Bridget or bridget
- /McIinnes/ matches McInnes or Mcinnes
17Our Simple Example Revisited
- Now suppose we only want to match words that end
in 'ing' rather than just contain 'ing'. - How would we change are regular expressions to
accomplish this - Previous Regular Expression
- word m/ ing /
- New Regular Expression
- wordm/ ing /
-
18Ranges of Regular Expressions
- Ranges can be specified in Regular Expressions
- Valid Ranges
- A-Z Upper Case Roman Alphabet
- a-z Lower Case Roman Alphabet
- A-Za-z Upper or Lower Case Roman Alphabet
- A-F Upper Case A through F Roman
Characters - A-z Valid but be careful
- Invalid Ranges
- a-Z Not Valid
- F-A Not Valid
19Ranges cont ...
- Ranges of Digits can also be specified
- 0-9 Valid
- 9-0 Invalid
- Negating Ranges
- / 0-9 /
- Match anything except a digit
- / a /
- Match anything except an a
- / A-Z /
- Match anything that starts with something
other than a single upper case
letter - First start of line
- Second negation
20Our Simple Example Again
- Now suppose we want to create a list of all the
words in our text that do not end in 'ing' - How would we change are regular expressions to
accomplish this - Previous Regular Expression
- word m/ ing /
- New Regular Expression
- wordm/ ing /
21Literal Metacharacters
- Suppose that you actually want to look for all
strings that equal '' in your text - Use the \ symbol
- / \ / Regular expression to search for
- What does the following Regular Expressions
Match? - / A - Z /
- Matches any line that contains ( A-Z or )
followed by
22Patterns provided in Perl
- Some Patterns
- \d 0 9
- \w a z A z 0 9 _
- \s \r \t \n \f (white space pattern)
- \D 0 - 9
- \W a z A Z 0 9
- \S \r \t \n \f
- Example 19\d\d
- Looks for any year in the 1900's
23Using Patterns in our Example
- Commonly words are not separated by just a single
space but by tabs, returns, ect... - Let's modify our split function to incorporate
multiple white space - !/usr/local/bin/perl
- while(ltgt)
- chomp
- _at_words split/\s/, _
- foreach word(_at_words)
- if(wordm/ing/) print word\n
-
-
24Word Boundary Metacharacter
- Regular Expression to match the start or the end
of a 'word' \b - Examples
- / Jeff\b / Match Jeff but not Jefferson
- / Carol\b / Match Chris but not Caroline
- / Rollin\b / Match Rollin but not Rolling
- /\bform / Match form or formation but not
Information - /\bform\b/ Match form but neither information
nor formation
25DOT Metacharacter
- The DOT Metacharacter, '.' symbolizes any
character except a new line - / b . bble/
- Would possibly return bobble, babble, bubble
- / . oat/
- Would possibly return boat, coat, goat
- Note remember '.' usually means a bunch of
anything, this can be handy but also can have
hidden ramifications.
26PIPE Metacharacter
- The PIPE Metacharacter is used for alternation
- / Bridget (Thomson McInnes) /
- Match Bridget Thomson or Bridget McInnes but
NOT Bridget Thomson McInnes - / B bridget /
- Match B or bridget
- / ( B b ) ridget /
- Match Bridget or bridget at the beginning of a
line
27Our Simple Example
- Now with our example, suppose that we want to not
only get all words that end in 'ing' but also
'ed'. - How would we change are regular expressions to
accomplish this - Previous Regular Expression
- word m/ ing /
- New Regular Expression
- wordm/ (inged) /
28The ? Metacharacter
- The metacharacter, ?, indicates that the
character immediately preceding it occurs zero or
one time - Examples
- / worl?ds /
- Match either 'worlds' or 'words'
- / m?ethane /
- Match either 'methane' or 'ethane'
29The Metacharacter
- The metacharacter, , indicates that the
characterer immediately preceding it occurs zero
or more times - Example
- / abc/ Match 'ac', 'abc', 'abbc', 'abbbc'
ect... - Matches any string that starts with an a, if
possibly followed by a sequence of b's and ends
with a c. - Sometimes called Kleene's star
30Our Simple Example again
- Now suppose we want to create a list of all the
words in our text that end in 'ing' or 'ings' - How would we change are regular expressions to
accomplish this - Previous Regular Expression
- word m/ ing /
- New Regular Expression
- wordm/ ings? /
31Modifying Text
- Match
- Up to this point, we have seen attempt to match a
given regular expression - Example variable m/ regex /
- Substitution
- Takes match one step further if there is a
match, then replace it with the given string - Example variable s/ regex / replacement
- var / Thomson / McInnes /
- var / Bridgette / Bridget /
32Substitution Example
- Suppose when we find all our words that end in
'ing' we want to replace the 'ing' with 'ed'. - !/usr/local/bin/perl -w
- while(ltgt)
- chomp _
- _at_words split/ \s/, _
- foreach word(_at_words)
- if(words/ing/ed/) print
word\n -
33Special Variable Modified by a Match
-
- Copy of text matched by the regex
- '
- A copy of the target text in from of the match
-
- A copy of the target text after the match
- 1, 2, 3, ect
- The text matched by 1st, 2nd, ect., set of
parentheses. Note 0 is not included here -
- A copy of the highest numbered 1, 2, 3, ect..
34Our Simple Example once again
- Now lets revise are program to find all the words
that end in 'ing' without splitting our line of
text into an array of words - !/usr/local/bin/perl -w
- while(ltgt)
- chomp _
- if(_/(A-Za-zing\b)/) print "\n"
35Example
- !/usr/local/bin
- exp ltSTDINgt chomp exp
- if(exp/(A-Za-z\s)\bcrave\b(\sA-Za-z)/)
- print 1\n
- print 2\n
-
- Run Program with string I crave to rule the
world! - Results
- I
- to rule the world!
36Example
- !/usr/local/bin
- exp ltSTDINgt chomp exp
- if(exp/\bcrave\b/)
- print \n print \n print \n
-
- Run Program with string I crave to rule the
world! - Results
- I
- crave
- to rule the world!
37Thank you ?