Title: A Stroll through Perl
1A Stroll through Perl
- (R L Schwartz T Christiansen, OReilly)
- PERL Practical Extraction and Report Language.
- A major strength of Perl is the recognition and
substitution of text sequences called regular
expressions. - This is useful for
- Web searching - are the query keywords in this
web page? - Computation of frequencies in a document
collection, e.g. to produce a stoplist, or
mid-frequency terms for automatic indexing. - Making finite state transducers e.g. pluraliser,
stemmer, americanizer. - Dialogue systems, e.g. ELIZA.
2Hello World Program
- !/usr/bin/perl -w
- print Hello, world!\n
- The first line means this is a Perl program. -w
tells Perl to generate warning messages. - Apart from the first line, all Perl statements
end with a semicolon - To run a PERL program from UNIX
- perl programname.pl
- comments
- anything from the hash sign to the end of the
line is a comment
3Scalar Variables
- Now get the Hello, world program to call you by
your name. To do this, we need a place to hold
the name, a way to ask for the name, and a way to
get a response. - One place to hold values (like a name) is as a
scalar variable. Here we will use the scalar
variable name to hold your name. A scalar
variable starts with and can hold either a
single number or a string (sequence of
characters).
4print, ltSTDINgt, chomp
- The program needs to ask for the name (prompt)
use the print function. - The way to get a line from the terminal is with
the ltSTDINgt construct, which grabs one line of
input. We assign this input to the name
variable. This gives us the program - print What is your name?
- name ltSTDINgt
- The value of name has a terminating newline \n.
To get rid of that, we use the chomp function - chomp (name)
- Now we can reply with
- print Hello, name!\n
- (what does this do?)
5Putting it all together we get
- !/usr/bin/perl -w
- print What is your name?
- name ltSTDINgt
- chomp (name)
- print Hello, name!\n
6Adding Choices
- Lets say we have a special greeting for Randal,
but we want an ordinary greeting for anyone else.
To do this, we need to compare the name that was
entered with the string Randal, and if its the
same, do something special. Lets add a C-like
if-then-else branch and a comparison to the
program - !/usr/bin/perl -w
- print What is your name?
- name ltSTDINgt
- chomp (name)
- if (name eq Randal)
- print Hello Sir Randal!\n
-
- else
- print Hello, name!\n
7Guessing the Secret Password
- What does this code do?
- /usr/bin/perl -w
- secretword llama the secret word
- print What is the secret password?
- guess ltSTDINgt
- chomp(guess)
- while (guess ne secretword)
- print Wrong, try again\n
- guess ltSTDINgt
- chomp(guess)
-
- First, we define the secret word by putting it
into another scalar variable, secretword. The
person is asked (using print) for a guess, which
goes into guess. The guess is compared with the
secret word using the ne operator, which returns
true if the strings are not equal (this is the
logical opposite of the eq operator). The result
of the comparison controls a while loop, which
executes the block as long as the ne comparison
remains true.
8Arrays
- .
- We can store several secret words in sort of
list, a data structure called an array. Each
element of the array is a separate scalar
variable that can be independently set or
accessed. The entire array can also be given a
value in one fell swoop. We can assign a value to
the entire array named _at_words so that it contains
three possible good passwords. - _at_words (camel,llama,alpaca)
- or
- _at_words qw(camel llama alpaca)
- Note arrays begin with _at_, while scalar variables
begin with . - Once the array is assigned, we can access each
element using a subscript reference. So words0
is camel, words1 is llama, and words2 is
alpaca. The subscript can be an expression as
well, so if we set i 2 then wordsi
alpaca. - Note array elements start with rather than _at_
because they refer to a single element of an
array rather than the whole array.
9More than one Secret Word
- /usr/bin/perl -w
- _at_secretword qw (camel llama alpaca)
- print What is the secret password?
- guess ltSTDINgt
- chomp(guess)
- i 0
- correct maybe
- while(correct eq maybe)
- if(wordsi eq guess)
- correct yes
-
- elsif (i lt 2)
- i i 1
-
- else
- print Wrong, try again
- guess ltSTDINgt
- chomp (guess)
- i 0
10Hashes
- Giving each person a different secret word
- The easiest way to store such a table in Perl is
with a hash. - Each element of the hash holds a separate scalar
value (just like an array) but the hashes are
referenced by a key, which can be any scalar
value (string or number). - To create a hash called words (notice the
rather than _at_) we can write - words qw(
- fred camel
- barney llama
- betty alpaca
- wilma alpaca
- )
- To find the secret word for Betty, we need to use
betty as the key in a reference to the hash
words, via some expression such as - wordsbetty will return alpaca
- or
- person betty
- wordsperson will also return alpaca.
11Trying to look up a word not in the hash
- When we look up someones secret word, if their
name is not one of the hash keys, the value of
secretword will be an empty string, e.g - instantiate words, get name first, then
-
- secretword wordsname
- if(secretword eq )
- print secret word not found\n
-
- else
- print your secret word is secretword
12Handling Varying Input Formats
- How do we make our password checker accept
Randal, randal, or - Randal L. Schwartz ?
- If (name /Randal\b/i)
- yes, it matches
-
- else
- no, it doesnt
-
- Notes eq is for exact equality, for pattern
matching. - The regular expression is delimited by forward
slashes. - /Randal/ means any string starting with Randal.
- /Randal\b/ means there must be a white space
after Randal, so Randall is excluded. - /Randal\b/i means that we ignore case, so randal
is accepted.
13Two Text Converters
- We can write a case converter by using the
translate operator. - name tr/A-Z/a-z/
- The slashes delimit the searched-for and
replacement character lists. The hyphen stands
for all the characters between A and Z, so the
two lists are the same length (26 characters). - We can replace the word Eurasia with Eastasia
using the substitution operator. - temp s/Eastasia/XXXX/
- enemy s/Eurasia/Eastasia/
- ally s/XXXX/Eurasia/
14Making it Modular
- Perl provides subroutines that have parameters
and return values. A subroutine is defined once
in a program, and can be used repeatedly by being
invoked from any expression. - Lets create a subroutine called good_word that
takes a name and a guessed word, and returns true
if the word is correct and false if not - sub good_word
- my(somename, someguess) _at__
- name the parameters
- if (wordssomename eq someguess
- return 1 true
-
- else
- return 0 false
-
15Subroutines
- First, the definition of a subroutine consists of
a reserved word sub followed by the subroutine
name followed by a block of code delimited by
curly braces . The definition can go anywhere in
the program file, though most people put it at
the end. - The first line within this particular definition
is an assignment that copies the values of the
two parameters of this subroutine into two local
variables named somename and someguess. - The my()defines the two variables as private to
the enclosing block - in this case the whole
subroutine - and the parameters are initially in
a special local array called _at__ - A return statement can be used to make the
subroutine immediately return to its caller with
the supplied value. - Note that the subroutine assumes that the value
of the words hash is set by the main program.
16Lets Integrate this with the Rest of the Program
- !/usr/bin/perl
- words qw
- fred camel
- barney llama
- betty alpaca
- wilma alpaca
-
- print What is your name?
- name ltSTDINgt
- chomp(name)
- print What is the secret word?
- guess ltSTDINgt
- chomp(guess)
- while (! good_word(name, guess)
- print(Wrong, try again )
- guess ltSTDINgt
- chomp(guess)
-
- insert definition of good_word here
17While, !
- The while loop contains the subroutine good_word.
Here we see an invocation of the subroutine,
passing it two parameters, name and guess.
Inside the subroutine, the value of somename is
set from the first parameter, name, and the
value of someguess is set from the second
parameter guess. - The value returned by the subroutine (either 1 or
0) is logically inverted with the prefix !
(logical not) operator. This expression returns
true is the expression following is false, and
returns false if the expression following is
true. The overall meaning is while its not a
good word
18Moving the Secret Word List into a separate file
- Suppose we wanted to share the secret word list
among three programs, e.g. for simultaneous
updating. We can put the word list into a file
and then read the file to get the word list into
the program. To do this, we need to create an I/O
channel called a filehandle. Your Perl program
automatically gets three filehandles called
STDIN, STDOUT and STDERR. Now we want another
handle attached to a file of our own choice. - sub init_words
- open (WORDSLIST, wordslist) die cant
open wordlist ! - while ( defined (name ltWORDSLISTgt))
- chomp (name)
- word ltWORDSLISTgt
- chomp (word)
- wordsname word
-
- close (WORDSLIST) die couldnt close
wordlist ! -
19The (arbitrary) form of the word list
- fred
- camel
- barney
- llama
- betty
- alpaca
- wilma
- alpaca
- The open function initialises a filehandle named
WORDSLIST by associating it with a file named
wordslist in the current directory. - while ( defined (name ltWORDLISTgt) )
- i.e. while there are still values in the data
file to read - The die function is frequently used to exit the
program with an error message in case something
goes wrong, e.g. the word list file is not found.
! contains the system error message explaining
what went wrong.
20Three More Loops
- 1. To print out scalar variables
- This example prints the numbers 1 to 10, each
followed by a space - for (i 1 i lt 10 i)
- print i
-
- The above code is very similar to C.
- 2. To print out the contents of an array
- foreach i(_at_somelist)
- print somelisti\n
-
- The foreach statement takes a list of values and
assigns them one at a time to a scalar variable,
executing a block of code with each successive
statement. - 3. To print out the contents of a hash
- foreach key (keys(freqhash))
- print key freqhashkey\n
-
21Regular Expressions
- See Chapter 7 of Learning Perl, by R L Schwartz
T Christiansen, OReilly, 1993. - A regular expression is a pattern to be matched
against a string. - e.g. is put found in computer? Succeeds
- Is michael found in computer? Fails
- Sometimes match success or failure is all you are
concerned about. Other times you want to match
and replace. - e.g. Find put in computer and replace with pil.
If the match is unsuccessful, nothing happens. - _ is Perls default variable we dont have to
declare it.
22Search, Substitution
- Print out every line in the file specified on the
command line which contains abc - while (ltgt)
- if(/abc/)
- print _
-
-
- Substitution. If abc is found in _, replace it
with def (g means every time). - s/abc/def/g
23Patterns
- A regular expression is a pattern. Some parts of
the pattern match single characters, others match
multiple characters. - . stands for any single character except \n
(newline). - /a./ any two letter sequence that starts with a
but is not a\n - /abcde/ matches a, b, c, d, or e. (character
class) - /a-zA-Z0-9_/ matches a Perl word character.
- /0-9/ any NON-digit (negated character
class) - character class abbreviations
- \d digit
- \D non-digit
- \w Perl wordcharacter
- \W not a Perl word character
- \s space character (\r \t \n \f or )
- All of the above match one character. We now look
at grouping patterns - zero or more of the immediately previous
character or character class. - one or more of the immediately previous
character - ? zero or one of the immediately previous
character.
24Patterns are greedy by default
- _ fred xxxxxx barney
- s/x/boom/
- now _ fred boom barney
- /x3/ would mean match against exactly xxx.
25Parentheses as memory, anchoring patterns,
alternation
- Parentheses as memory
- abc matches ab, abc, abcc, abccc, abcccc etc.
- (abc) matches , abc, abcabc, abcabcabc etc.
- Anchoring patterns
- /fred\b/ matches fred and alfred but not
frederick - /\bfred/ matches fred and frederick but not
alfred - /\bfred\b/ matches fred but not frederick and
alfred. - Alternation
- (songblue)bird matches songbird or bluebird
26Selecting a different target (the operator)
- a hello world
- if(a /he/)
- do something
- a s/hello/goodbye/
- Special read-only variables
- _ this is a sample string
- /sam.le/ matches sample within the string
- is now this is a
- is now sample
- is now string
- More substitutions
- _ this is a test
- new quiz
- s/test/new/ now _ this is a quiz
27Basic Data Structures
- scalar - single value or string
- _at_array - list e.g.
- _at_flintstones qw(fred barney betty wilma)
- array2 betty
- foreach member (_at_flintstones)
- print flintstones member
-
- hash, e.g. frequency list freq built up by
- freqthe 100
- freqchandelier 1
- freqstring 5
- foreach key keys (freq)) once for each key
of freq - print key was found freqkey times\n
show key and value
28Sorting arrays
- _at_x qw(small medium large)
- _at_y sort _at_x
- Now _at_y is (large medium small).
- _at_x (15, 27, 9, 49, 14)
- _at_y sort _at_x
- Now _at_y is (14, 15, 27, 49, 9).
- _at_x (15, 27, 9, 49, 14)
- _at_y sort a ltgt b _at_x
- Now _at_y is (9, 14, 15, 27, 49).
29Sorting hashes
- Sort by alphabetic order of keys, or numeric
order of values - _at_sortedkeys sort by_names keys(freqhash)
- sub by_names
- return a cmp b
-
- foreach (_at_sortedkeys)
- print _ is found freqhash_times\n
-
- _at_sortedkeys sort by_number keys(freqhash)
- sub by_number
- return freqhasha ltgt freqhashb
-
- foreach (_at_sortedkeys)
- print _ is found freqhash_times\n
30Array of arrays (2D arrays)
- _at_AoA
- fred, barney ,
- george, jayne, elroy ,
- homer, marge, bart ,
-
- print AoA21 prints marge
- for x (0 .. 9)
- for y (0 .. 9)
- AoAxy x y
-
-
- while (ltgt) read in a line of text
- _at_tmp split split elements into a 1D array
- push _at_AoA, _at_tmp add 1D array as the next
row of a 2D array -
- for i (0 .. AoA) for each row in AoA
- row AoAi put row of 2D array
into a 1D array - - note subscript even so
- for j (0 .. _at_row) for each element of
that 1D array
31Hashes of Hashes
- HoH (
- flintstones gt
- husband gt fred,
- pal gt barney,
- ,
- jetsons gt
- husband gt george,
- wife gt jane,
- his boy gt elroy,
- ,
- simpsons gt
- husband gt homer,
- wife gt marge,
- kid gt bart,
- ,
- )
- To add another hash to the hash of hashes, you
can simply say - HoH mash
32Populating a Hash of Hashes
- Here is one technique for populating a hash of
hashes. To read from a file with the following
format - flintstones husbandfred palbarney wifewilma pe
tdino - while ( ltgt )
- next unless s/(.?)\s//
- look for characters from start of line to
colon - who 1
- 1 is first parenthesised part of reg exp
- for field (split)
- for each other Perl word in the input line
- (key, value) split //, field
- cut each keyvalue pair at
- HoHwhokey value
-
33To set a key/value pair, and print out a hash of
hashes
- You can set a key/value pair of a hash of hashes
as follows - HoHflintstoneswife wilma
- To print out the families, loop through all the
keys of the outer hash nd then looping through
the keys of the inner hash - for family ( keys HoH)
- print family
- for role (keys HoHfamily )
- print roleHoHfamilyrole
-
- print \n
-
34More advanced data structures
- Also possible Arrays of hashes, hashes of
arrays, hashes of functions and more elaborate
records. See chapter 9 of Programming Perl by
Larry Wall, Tom Christiansen Jon Orwant,
OReilly, 3rd edition.
35ELIZA (1)
- Substitutions may use memory
- e.g. /the (.)er they were, the \1er they will
be/ - will match the bigger they were, the bigger they
will be but not the bigger they were, the faster
they will be. - Substitutions using memory are very useful in
implementing a simple natural-language
understanding program like ELIZA (Weizenbaum,
1966), which could carry on conversations like
the following
36ELIZA (2)
- User Men are all alike.
- ELIZA IN WHAT WAY
- User Theyre always bugging us about something
or other. - ELIZA CAN YOU THINK OF A SPECIFIC EXAMPLE
- User Well, my boyfriend made me come here.
- ELIZA YOUR BOYFRIEND MADE YOU COME HERE
- User He says Im depressed much of the time.
- ELIZA I AM SORRY TO HEAR THAT YOU ARE DEPRESSED.
- ELIZA works by searching the users sentence for
regular expressions and substituting them, e.g.
s/my/YOUR/ and s/Im/YOU ARE/, and then - s/. YOU ARE (depressedsad) ./I AM SORRY TO
HEAR YOU ARE \1/ - s/. always ./CAN YOU THINK OF A SPECIFIC
EXAMPLE/
37ELIZA (3)
- ELIZA works by searching the users sentence for
regular expressions and substituting them, e.g.
s/my/YOUR/ and s/Im/YOU ARE/, and then - s/. YOU ARE (depressedsad) ./I AM SORRY TO
HEAR YOU ARE \1/ - s/. always ./CAN YOU THINK OF A SPECIFIC
EXAMPLE/