A Stroll through Perl - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

A Stroll through Perl

Description:

A Stroll through Perl – PowerPoint PPT presentation

Number of Views:74
Avg rating:3.0/5.0
Slides: 38
Provided by: osirisSun
Category:

less

Transcript and Presenter's Notes

Title: A Stroll through Perl


1
A Stroll through Perl
  • (R L Schwartz T Christiansen, OReilly)
  • PERL Practical Extraction and Report Language.
  • A major strength of Perl is the recognition and
    substitution of text sequences called regular
    expressions.
  • This is useful for
  • Web searching - are the query keywords in this
    web page?
  • Computation of frequencies in a document
    collection, e.g. to produce a stoplist, or
    mid-frequency terms for automatic indexing.
  • Making finite state transducers e.g. pluraliser,
    stemmer, americanizer.
  • Dialogue systems, e.g. ELIZA.

2
Hello World Program
  • !/usr/bin/perl -w
  • print Hello, world!\n
  • The first line means this is a Perl program. -w
    tells Perl to generate warning messages.
  • Apart from the first line, all Perl statements
    end with a semicolon
  • To run a PERL program from UNIX
  • perl programname.pl
  • comments
  • anything from the hash sign to the end of the
    line is a comment

3
Scalar Variables
  • Now get the Hello, world program to call you by
    your name. To do this, we need a place to hold
    the name, a way to ask for the name, and a way to
    get a response.
  • One place to hold values (like a name) is as a
    scalar variable. Here we will use the scalar
    variable name to hold your name. A scalar
    variable starts with and can hold either a
    single number or a string (sequence of
    characters).

4
print, ltSTDINgt, chomp
  • The program needs to ask for the name (prompt)
    use the print function.
  • The way to get a line from the terminal is with
    the ltSTDINgt construct, which grabs one line of
    input. We assign this input to the name
    variable. This gives us the program
  • print What is your name?
  • name ltSTDINgt
  • The value of name has a terminating newline \n.
    To get rid of that, we use the chomp function
  • chomp (name)
  • Now we can reply with
  • print Hello, name!\n
  • (what does this do?)

5
Putting it all together we get
  • !/usr/bin/perl -w
  • print What is your name?
  • name ltSTDINgt
  • chomp (name)
  • print Hello, name!\n

6
Adding Choices
  • Lets say we have a special greeting for Randal,
    but we want an ordinary greeting for anyone else.
    To do this, we need to compare the name that was
    entered with the string Randal, and if its the
    same, do something special. Lets add a C-like
    if-then-else branch and a comparison to the
    program
  • !/usr/bin/perl -w
  • print What is your name?
  • name ltSTDINgt
  • chomp (name)
  • if (name eq Randal)
  • print Hello Sir Randal!\n
  • else
  • print Hello, name!\n

7
Guessing the Secret Password
  • What does this code do?
  • /usr/bin/perl -w
  • secretword llama the secret word
  • print What is the secret password?
  • guess ltSTDINgt
  • chomp(guess)
  • while (guess ne secretword)
  • print Wrong, try again\n
  • guess ltSTDINgt
  • chomp(guess)
  • First, we define the secret word by putting it
    into another scalar variable, secretword. The
    person is asked (using print) for a guess, which
    goes into guess. The guess is compared with the
    secret word using the ne operator, which returns
    true if the strings are not equal (this is the
    logical opposite of the eq operator). The result
    of the comparison controls a while loop, which
    executes the block as long as the ne comparison
    remains true.

8
Arrays
  • .
  • We can store several secret words in sort of
    list, a data structure called an array. Each
    element of the array is a separate scalar
    variable that can be independently set or
    accessed. The entire array can also be given a
    value in one fell swoop. We can assign a value to
    the entire array named _at_words so that it contains
    three possible good passwords.
  • _at_words (camel,llama,alpaca)
  • or
  • _at_words qw(camel llama alpaca)
  • Note arrays begin with _at_, while scalar variables
    begin with .
  • Once the array is assigned, we can access each
    element using a subscript reference. So words0
    is camel, words1 is llama, and words2 is
    alpaca. The subscript can be an expression as
    well, so if we set i 2 then wordsi
    alpaca.
  • Note array elements start with rather than _at_
    because they refer to a single element of an
    array rather than the whole array.

9
More than one Secret Word
  • /usr/bin/perl -w
  • _at_secretword qw (camel llama alpaca)
  • print What is the secret password?
  • guess ltSTDINgt
  • chomp(guess)
  • i 0
  • correct maybe
  • while(correct eq maybe)
  • if(wordsi eq guess)
  • correct yes
  • elsif (i lt 2)
  • i i 1
  • else
  • print Wrong, try again
  • guess ltSTDINgt
  • chomp (guess)
  • i 0

10
Hashes
  • Giving each person a different secret word
  • The easiest way to store such a table in Perl is
    with a hash.
  • Each element of the hash holds a separate scalar
    value (just like an array) but the hashes are
    referenced by a key, which can be any scalar
    value (string or number).
  • To create a hash called words (notice the
    rather than _at_) we can write
  • words qw(
  • fred camel
  • barney llama
  • betty alpaca
  • wilma alpaca
  • )
  • To find the secret word for Betty, we need to use
    betty as the key in a reference to the hash
    words, via some expression such as
  • wordsbetty will return alpaca
  • or
  • person betty
  • wordsperson will also return alpaca.

11
Trying to look up a word not in the hash
  • When we look up someones secret word, if their
    name is not one of the hash keys, the value of
    secretword will be an empty string, e.g
  • instantiate words, get name first, then
  • secretword wordsname
  • if(secretword eq )
  • print secret word not found\n
  • else
  • print your secret word is secretword

12
Handling Varying Input Formats
  • How do we make our password checker accept
    Randal, randal, or
  • Randal L. Schwartz ?
  • If (name /Randal\b/i)
  • yes, it matches
  • else
  • no, it doesnt
  • Notes eq is for exact equality, for pattern
    matching.
  • The regular expression is delimited by forward
    slashes.
  • /Randal/ means any string starting with Randal.
  • /Randal\b/ means there must be a white space
    after Randal, so Randall is excluded.
  • /Randal\b/i means that we ignore case, so randal
    is accepted.

13
Two Text Converters
  • We can write a case converter by using the
    translate operator.
  • name tr/A-Z/a-z/
  • The slashes delimit the searched-for and
    replacement character lists. The hyphen stands
    for all the characters between A and Z, so the
    two lists are the same length (26 characters).
  • We can replace the word Eurasia with Eastasia
    using the substitution operator.
  • temp s/Eastasia/XXXX/
  • enemy s/Eurasia/Eastasia/
  • ally s/XXXX/Eurasia/

14
Making it Modular
  • Perl provides subroutines that have parameters
    and return values. A subroutine is defined once
    in a program, and can be used repeatedly by being
    invoked from any expression.
  • Lets create a subroutine called good_word that
    takes a name and a guessed word, and returns true
    if the word is correct and false if not
  • sub good_word
  • my(somename, someguess) _at__
  • name the parameters
  • if (wordssomename eq someguess
  • return 1 true
  • else
  • return 0 false

15
Subroutines
  • First, the definition of a subroutine consists of
    a reserved word sub followed by the subroutine
    name followed by a block of code delimited by
    curly braces . The definition can go anywhere in
    the program file, though most people put it at
    the end.
  • The first line within this particular definition
    is an assignment that copies the values of the
    two parameters of this subroutine into two local
    variables named somename and someguess.
  • The my()defines the two variables as private to
    the enclosing block - in this case the whole
    subroutine - and the parameters are initially in
    a special local array called _at__
  • A return statement can be used to make the
    subroutine immediately return to its caller with
    the supplied value.
  • Note that the subroutine assumes that the value
    of the words hash is set by the main program.

16
Lets Integrate this with the Rest of the Program
  • !/usr/bin/perl
  • words qw
  • fred camel
  • barney llama
  • betty alpaca
  • wilma alpaca
  • print What is your name?
  • name ltSTDINgt
  • chomp(name)
  • print What is the secret word?
  • guess ltSTDINgt
  • chomp(guess)
  • while (! good_word(name, guess)
  • print(Wrong, try again )
  • guess ltSTDINgt
  • chomp(guess)
  • insert definition of good_word here

17
While, !
  • The while loop contains the subroutine good_word.
    Here we see an invocation of the subroutine,
    passing it two parameters, name and guess.
    Inside the subroutine, the value of somename is
    set from the first parameter, name, and the
    value of someguess is set from the second
    parameter guess.
  • The value returned by the subroutine (either 1 or
    0) is logically inverted with the prefix !
    (logical not) operator. This expression returns
    true is the expression following is false, and
    returns false if the expression following is
    true. The overall meaning is while its not a
    good word

18
Moving the Secret Word List into a separate file
  • Suppose we wanted to share the secret word list
    among three programs, e.g. for simultaneous
    updating. We can put the word list into a file
    and then read the file to get the word list into
    the program. To do this, we need to create an I/O
    channel called a filehandle. Your Perl program
    automatically gets three filehandles called
    STDIN, STDOUT and STDERR. Now we want another
    handle attached to a file of our own choice.
  • sub init_words
  • open (WORDSLIST, wordslist) die cant
    open wordlist !
  • while ( defined (name ltWORDSLISTgt))
  • chomp (name)
  • word ltWORDSLISTgt
  • chomp (word)
  • wordsname word
  • close (WORDSLIST) die couldnt close
    wordlist !

19
The (arbitrary) form of the word list
  • fred
  • camel
  • barney
  • llama
  • betty
  • alpaca
  • wilma
  • alpaca
  • The open function initialises a filehandle named
    WORDSLIST by associating it with a file named
    wordslist in the current directory.
  • while ( defined (name ltWORDLISTgt) )
  • i.e. while there are still values in the data
    file to read
  • The die function is frequently used to exit the
    program with an error message in case something
    goes wrong, e.g. the word list file is not found.
    ! contains the system error message explaining
    what went wrong.

20
Three More Loops
  • 1. To print out scalar variables
  • This example prints the numbers 1 to 10, each
    followed by a space
  • for (i 1 i lt 10 i)
  • print i
  • The above code is very similar to C.
  • 2. To print out the contents of an array
  • foreach i(_at_somelist)
  • print somelisti\n
  • The foreach statement takes a list of values and
    assigns them one at a time to a scalar variable,
    executing a block of code with each successive
    statement.
  • 3. To print out the contents of a hash
  • foreach key (keys(freqhash))
  • print key freqhashkey\n

21
Regular Expressions
  • See Chapter 7 of Learning Perl, by R L Schwartz
    T Christiansen, OReilly, 1993.
  • A regular expression is a pattern to be matched
    against a string.
  • e.g. is put found in computer? Succeeds
  • Is michael found in computer? Fails
  • Sometimes match success or failure is all you are
    concerned about. Other times you want to match
    and replace.
  • e.g. Find put in computer and replace with pil.
    If the match is unsuccessful, nothing happens.
  • _ is Perls default variable we dont have to
    declare it.

22
Search, Substitution
  • Print out every line in the file specified on the
    command line which contains abc
  • while (ltgt)
  • if(/abc/)
  • print _
  • Substitution. If abc is found in _, replace it
    with def (g means every time).
  • s/abc/def/g

23
Patterns
  • A regular expression is a pattern. Some parts of
    the pattern match single characters, others match
    multiple characters.
  • . stands for any single character except \n
    (newline).
  • /a./ any two letter sequence that starts with a
    but is not a\n
  • /abcde/ matches a, b, c, d, or e. (character
    class)
  • /a-zA-Z0-9_/ matches a Perl word character.
  • /0-9/ any NON-digit (negated character
    class)
  • character class abbreviations
  • \d digit
  • \D non-digit
  • \w Perl wordcharacter
  • \W not a Perl word character
  • \s space character (\r \t \n \f or )
  • All of the above match one character. We now look
    at grouping patterns
  • zero or more of the immediately previous
    character or character class.
  • one or more of the immediately previous
    character
  • ? zero or one of the immediately previous
    character.

24
Patterns are greedy by default
  • _ fred xxxxxx barney
  • s/x/boom/
  • now _ fred boom barney
  • /x3/ would mean match against exactly xxx.

25
Parentheses as memory, anchoring patterns,
alternation
  • Parentheses as memory
  • abc matches ab, abc, abcc, abccc, abcccc etc.
  • (abc) matches , abc, abcabc, abcabcabc etc.
  • Anchoring patterns
  • /fred\b/ matches fred and alfred but not
    frederick
  • /\bfred/ matches fred and frederick but not
    alfred
  • /\bfred\b/ matches fred but not frederick and
    alfred.
  • Alternation
  • (songblue)bird matches songbird or bluebird

26
Selecting a different target (the operator)
  • a hello world
  • if(a /he/)
  • do something
  • a s/hello/goodbye/
  • Special read-only variables
  • _ this is a sample string
  • /sam.le/ matches sample within the string
  • is now this is a
  • is now sample
  • is now string
  • More substitutions
  • _ this is a test
  • new quiz
  • s/test/new/ now _ this is a quiz

27
Basic Data Structures
  • scalar - single value or string
  • _at_array - list e.g.
  • _at_flintstones qw(fred barney betty wilma)
  • array2 betty
  • foreach member (_at_flintstones)
  • print flintstones member
  • hash, e.g. frequency list freq built up by
  • freqthe 100
  • freqchandelier 1
  • freqstring 5
  • foreach key keys (freq)) once for each key
    of freq
  • print key was found freqkey times\n
    show key and value

28
Sorting arrays
  • _at_x qw(small medium large)
  • _at_y sort _at_x
  • Now _at_y is (large medium small).
  • _at_x (15, 27, 9, 49, 14)
  • _at_y sort _at_x
  • Now _at_y is (14, 15, 27, 49, 9).
  • _at_x (15, 27, 9, 49, 14)
  • _at_y sort a ltgt b _at_x
  • Now _at_y is (9, 14, 15, 27, 49).

29
Sorting hashes
  • Sort by alphabetic order of keys, or numeric
    order of values
  • _at_sortedkeys sort by_names keys(freqhash)
  • sub by_names
  • return a cmp b
  • foreach (_at_sortedkeys)
  • print _ is found freqhash_times\n
  • _at_sortedkeys sort by_number keys(freqhash)
  • sub by_number
  • return freqhasha ltgt freqhashb
  • foreach (_at_sortedkeys)
  • print _ is found freqhash_times\n

30
Array of arrays (2D arrays)
  • _at_AoA
  • fred, barney ,
  • george, jayne, elroy ,
  • homer, marge, bart ,
  • print AoA21 prints marge
  • for x (0 .. 9)
  • for y (0 .. 9)
  • AoAxy x y
  • while (ltgt) read in a line of text
  • _at_tmp split split elements into a 1D array
  • push _at_AoA, _at_tmp add 1D array as the next
    row of a 2D array
  • for i (0 .. AoA) for each row in AoA
  • row AoAi put row of 2D array
    into a 1D array -
  • note subscript even so
  • for j (0 .. _at_row) for each element of
    that 1D array

31
Hashes of Hashes
  • HoH (
  • flintstones gt
  • husband gt fred,
  • pal gt barney,
  • ,
  • jetsons gt
  • husband gt george,
  • wife gt jane,
  • his boy gt elroy,
  • ,
  • simpsons gt
  • husband gt homer,
  • wife gt marge,
  • kid gt bart,
  • ,
  • )
  • To add another hash to the hash of hashes, you
    can simply say
  • HoH mash

32
Populating a Hash of Hashes
  • Here is one technique for populating a hash of
    hashes. To read from a file with the following
    format
  • flintstones husbandfred palbarney wifewilma pe
    tdino
  • while ( ltgt )
  • next unless s/(.?)\s//
  • look for characters from start of line to
    colon
  • who 1
  • 1 is first parenthesised part of reg exp
  • for field (split)
  • for each other Perl word in the input line
  • (key, value) split //, field
  • cut each keyvalue pair at
  • HoHwhokey value

33
To set a key/value pair, and print out a hash of
hashes
  • You can set a key/value pair of a hash of hashes
    as follows
  • HoHflintstoneswife wilma
  • To print out the families, loop through all the
    keys of the outer hash nd then looping through
    the keys of the inner hash
  • for family ( keys HoH)
  • print family
  • for role (keys HoHfamily )
  • print roleHoHfamilyrole
  • print \n

34
More advanced data structures
  • Also possible Arrays of hashes, hashes of
    arrays, hashes of functions and more elaborate
    records. See chapter 9 of Programming Perl by
    Larry Wall, Tom Christiansen Jon Orwant,
    OReilly, 3rd edition.

35
ELIZA (1)
  • Substitutions may use memory
  • e.g. /the (.)er they were, the \1er they will
    be/
  • will match the bigger they were, the bigger they
    will be but not the bigger they were, the faster
    they will be.
  • Substitutions using memory are very useful in
    implementing a simple natural-language
    understanding program like ELIZA (Weizenbaum,
    1966), which could carry on conversations like
    the following

36
ELIZA (2)
  • User Men are all alike.
  • ELIZA IN WHAT WAY
  • User Theyre always bugging us about something
    or other.
  • ELIZA CAN YOU THINK OF A SPECIFIC EXAMPLE
  • User Well, my boyfriend made me come here.
  • ELIZA YOUR BOYFRIEND MADE YOU COME HERE
  • User He says Im depressed much of the time.
  • ELIZA I AM SORRY TO HEAR THAT YOU ARE DEPRESSED.
  • ELIZA works by searching the users sentence for
    regular expressions and substituting them, e.g.
    s/my/YOUR/ and s/Im/YOU ARE/, and then
  • s/. YOU ARE (depressedsad) ./I AM SORRY TO
    HEAR YOU ARE \1/
  • s/. always ./CAN YOU THINK OF A SPECIFIC
    EXAMPLE/

37
ELIZA (3)
  • ELIZA works by searching the users sentence for
    regular expressions and substituting them, e.g.
    s/my/YOUR/ and s/Im/YOU ARE/, and then
  • s/. YOU ARE (depressedsad) ./I AM SORRY TO
    HEAR YOU ARE \1/
  • s/. always ./CAN YOU THINK OF A SPECIFIC
    EXAMPLE/
Write a Comment
User Comments (0)
About PowerShow.com