An Introduction to Perl - PowerPoint PPT Presentation

About This Presentation
Title:

An Introduction to Perl

Description:

An Introduction to Perl – PowerPoint PPT presentation

Number of Views:79
Avg rating:3.0/5.0
Slides: 59
Provided by: alanwi8
Category:
Tags: cat | introduction | names | perl

less

Transcript and Presenter's Notes

Title: An Introduction to Perl


1
An Introduction to Perl
  • Sources and inspirations
  • http//www.cs.utk.edu/plank/plank/classes/cs494/4
    94/notes/Perl/lecture.html
  • Randal L. Schwartz and Tom Christiansen,Learning
    Perl 2nd ed., OReilly
  • Randal L. Schwartz and Tom Phoenix,Learning
    Perl 3rd ed., OReilly
  • Dr. Nathalie Japkowicz, Dr. Alan Williams

Go O'Reilly!
CSI 3125, Perl, page 1
2
Perl overview (1)
  • Perl Practical extraction and report language
  • Perl Pathologically eclectic rubbish lister ?
  • It is a powerful general-purpose language, which
    is particularly useful for writing quick and
    dirty programs.
  • Invented by Larry Wall, with no apologies for its
    lack of elegance (!).
  • If you know C and a fair bit of Unix (or Linux),
    you can learn Perl in days (well, some of it...).

3
Perl overview (2)
  • In the hierarchy of programming language, Perl is
    located half-way between high-level languages
    such as Pascal, C and C, and shell scripts
    (languages that add control structure to the Unix
    command line instructions) such as sh, sed and
    awk.
  • By the way
  • awk Aho, Weinberger, Kernighan
  • sed Stream Editor.

4
Advantages of Perl (1)
  • Perl combines the best (according to its admirers
    ?) features of
  • Unix/Linux shell programming,
  • The commands sed, grep, awk and tr,
  • C,
  • Cobol.
  • Shell scripts are usually written in many small
    files that refer to each other. Perl achieves the
    functionality of such scripts in a single program
    file.

5
Advantages of Perl (2)
  • Perl offers extremely strong regular expression
    capabilities, which allow fast, flexible and
    reliable string handling operations, especially
    pattern matching.
  • As a result, Perl works particularly well in
    text processing applications.
  • As a matter of fact, it is Perl that allowed a
    lot of text documents to be quickly moved to the
    HTML format in the early 1990s, allowing the Web
    to expand so rapidly.

6
Disadvantages of Perl
  • Perl is a jumble! It contains many, many features
    from many languages and tools.
  • It contains different constructs for the same
    functionality (for example, there are at least 5
    ways to perform a one-line if statement).
  • ?It is not a very readable language.
  • You cannot distribute a Perl program as an opaque
    binary. That is, you cannot really commercialize
    products you develop in Perl.

7
Perl resources and versions
  • http//www.perl.org tells you everything that you
    want to know about Perl.
  • What you will see here is Perl 5.
  • Perl 5.8.0 has been released in July 2002.
  • Perl 6 (http//dev.perl.org/perl6/) is the next
    version, still under development, but moving
    along nicely. The first book on Perl 6 is in
    stores (http//www.oreilly.com/catalog/perl6es).

8
Scalar data strings and numbers
  • Scalars need not to be defined or their types
    declaredPerl understands from context.
  • cat hellos.pl
  • !/usr/bin/perl -w
  • print "Hello" . " " . "world\n"
  • print "hi there " . 2 . " worlds!" ."\n"
  • print (("5" 6) . " eggs\n" . " in " . " 3 2
    " . ("3" "2") . " baskets\n" )

invoke Perl
hellos.pl Hello world hi there 2 worlds! 11
eggs in 3 2 5 baskets
9
Scalar variables
  • Scalar variable names start with a dollar sign.
    They do not have to be declared.

cat scalar.pl !/usr/bin/perl -w i 1 j
"2" print "i and j \n" k i j print
"k\n" print i . j . "\n" print 'k\n' . "\n"
scalar.pl 1 and 2 3 12 k\n
10
Quotes and substitution
  • Suppose x 3
  • Single-quotes ' ' allow no substitution except
    for the escape sequences \\ and \'.
  • print('x\n') gives x\n and no new line.
  • Double-quotes " " allow substitution of variables
    like x and control codes like \n (newline).
  • print("x\n") gives 3 (and a new line).
  • Back-quotes also allow substitution, then try
    to execute the result as a system command,
    returning as the final value whatever the system
    command outputs.
  • y date print(y) results in
  • Sun Aug 10 070417 EDT 2003

11
Control statements if, else, elsif
cat names.pl !/usr/bin/perl -w name
ltSTDINgt chomp(name) if (name gt 'fred')
print "'name' follows 'fred'\n" elsif (name
eq 'fred') print "both names are
'fred'\n" else print "'name' precedes
'fred'\n"
standard input
cut newline
names.pl Stan 'Stan' precedes 'fred'
my input
Perl's output
  • names.pl
  • stan
  • 'stan' follows 'fred'

12
Control statements loops (1)
cat oddsum_while.pl !/usr/bin/perl -w Add up
some odd numbers max ltSTDINgt n 1 while
(n lt max) sum n n 2 On to
the next odd number print "The total is sum.\n"
  • oddsum_while.pl
  • 10
  • Use of uninitialized value at oddnums.pl line 6,
    ltSTDINgt chunk 1.
  • The total is 25.

my input
a warning
Perl's output
13
Control statements loops (2)
  • End-line comments begin with
  • It is okay, though not nice, to use a variable
    without initialization (like sum). Such a
    variable is initialized to 0 if it is first used
    as a number or to the empty string "" if it is
    first used as a string. In fact, it is always
    undef, variously converted.
  • Perl can, if asked, issue a warning (use the -w
    flag).
  • Of course, while is only one of many looping
    constructs in Perl. Read on...

14
Control statements loops (3)
  • cat oddsum_until.pl
  • !/usr/bin/perl -w
  • Add up some odd numbers
  • max ltSTDINgt
  • n 1
  • sum 0
  • until (n gt max)
  • sum n
  • n 2 On to the next odd number
  • print "The total is sum.\n"
  • oddsum_until.pl
  • 10
  • The total is 25.

15
Control statements loops (4)
  • cat oddsum_for.pl
  • !/usr/bin/perl -w
  • Add up some odd numbers
  • max ltSTDINgt
  • sum 0
  • for (n 1 n lt max n 2)
  • sum n
  • print "The total is sum.\n"
  • oddsum_for.pl
  • 10
  • The total is 25.
  • We also have do-while and do-until, and we have
    foreach. Read on.

16
Control statements loops (5)
  • cat oddsum_foreach.pl
  • !/usr/bin/perl -w
  • Add up some odd numbers
  • max ltSTDINgt
  • sum 0
  • foreach n ( (1 .. max) )
  • if ( n 2 ! 0 ) sum n
  • print "The total is sum.\n"
  • oddsum_foreach.pl
  • 10
  • The total is 25.

17
Control constructs compared
18
Lists and arrays
  • A list is an ordered collection of scalars. An
    array is a variable that contains a list.
  • Each element is an independent scalar value. A
    list can hold numbers, strings, undef valuesany
    mixture of kinds of scalar values.
  • To use an array element, prefix the array name
    with a place a subscript in square brackets.
  • To access the whole array, prefix its name with a
    _at_.
  • You can copy an array into another. You can use
    the operators sort, reverse, push, pop, split.

19
Command-line arguments
  • Suppose that a Perl program stored in the file
    cleanUp is invoked in Unix/Linux with the
    command
  • cleanUp -o result.htm data.htm
  • The built-in list named _at_ARGV then contains three
    elements
  • ('-o', 'result.htm', 'data.htm')
  • These three element can be accessed
    as ARGV0 ARGV1 ARGV2

20
Array examples (1)
  • cat arraysort.pl
  • !/usr/bin/perl -w
  • i 0
  • while (k ltSTDINgt)
  • ai k
  • print " sorted \n"
  • print sort(_at_a)
  • arraysort.pl
  • Nathalie
  • Frank
  • hello
  • John
  • Zebra
  • notary
  • nil

sorted Frank John Nathalie Zebra hello
nil notary
control-D here
21
Array examples (2A)
Reversing a text file (whole lines). cat
whole_rev.pl !/usr/bin/perl -w while (k
ltSTDINgt) push(_at_a, k) print " reversed
\n" while (oldval pop(_at_a)) print
oldval
  • whole_rev.pl
  • a b c d
  • e f
  • g h i
  • reversed
  • g h i
  • e f
  • a b c d

control-D here
22
Array examples (2B)
Reversing each line in a text file
cat each_rev.pl !/usr/bin/perl -w while(k
ltSTDINgt) _at_a split(/\s/, k) s "" for
(i _at_a i gt 0 i--) s
"sai-1 " chop(s) print "s\n"
  • each_rev.pl
  • a bc d efg
  • efg d bc a
  • hi j
  • j hi
  • klm nopq st
  • st nopq klm

control-D
split cuts the line on white space (we will see
regular expressions soon)
output
23
Array examples (3)
  • Reversing a text file (whole lines)
  • print reverse(ltSTDINgt)
  • Reversing each line in a text file
  • while(k ltSTDINgt)
  • s ""
  • foreach i
  • (reverse(split(/\s/, k)))
  • s "si "
  • chop(s)
  • print "s\n"

24
A digressionPerl's favourite default variable
by default,Perl reads into _
  • while(ltSTDINgt)
  • s ""
  • foreach i
  • (reverse(split(/\s/, _)))
  • s "si "
  • chop(s) print "s\n"

by default,Perl splits _ too!
while(ltSTDINgt) s "" foreach i
(reverse(split(/\s/ ))) s "si "
chop(s) print "s\n"
25
Hashes
  • A hash is similar to an array, but instead of
    subscripts, we can have anything as a key, and we
    use curly brackets rather than square brackets.
  • The official name is associative array (known to
    be implemented by hashing ?).
  • Keys and values can be any scalars keys are
    always converted to strings.
  • To refer to a hash as a whole, prefix its name
    with a .
  • If you assign a hash to an array, it becomes a
    simple list.

26
Hash examples I (1)
  • cat hash_array.pl
  • !/usr/bin/perl -w
  • some_hash
  • ("foo", 35, "bar", 12.4, 2.5, "hello",
  • "wilma", 1.72e30, "betty", "bye\n")
  • _at_an_array some_hash
  • print "_at_an_array\n\n"
  • foreach key (keys some_hash)
  • print "key "
  • print delete some_hashkey
  • print "\n"

27
Hash examples I (2)
  • hash_array.pl
  • betty bye
  • wilma 1.72e30 foo 35 2.5 hello bar 12.4
  • betty bye
  • wilma 1.72e30
  • foo 35
  • 2.5 hello
  • bar 12.4

some_hash ("foo", 35, "bar", 12.4, 2.5,
"hello", "wilma", 1.72e30, "betty",
"bye\n") _at_an_array some_hash print
"_at_an_array\n\n" foreach key (keys
some_hash) print "key " print delete
some_hashkey print "\n"
28
Hash examples II
cat hash_arrows.pl !/usr/bin/perl -w my hash
( "a" gt 1, "b" gt 2, "c" gt 3) foreach key
(sort keys hash) value hashkey
print "key gt value\n"
  • hash_arrows.pl
  • a gt 1
  • b gt 2
  • c gt 3

29
A brief interludethe diamond operator
cat concat !/usr/bin/perl -w while ( ltgt )
print _
  • cat a
  • one-a
  • two-a
  • cat b
  • three-b
  • four-b
  • five-b
  • concat a b
  • one-a
  • two-a
  • three-b
  • four-b
  • five-b

ltgt loops over the files listed as command-line
arguments _ is the current input line
concat a b gtc cat c one-a two-a three-b four-b
five-b
30
Hash examples IIIcharacter frequency count
  • cat frequency.pl
  • !/usr/bin/perl -w
  • while (ltgt)
  • split _ into single characters, loop
  • foreach c (split //)
  • Increment count of c
  • countc
  • end of input, print count
  • for c (sort keys count)
  • print "c\tcountc\n"

31
Character frequency count (2)
  • frequency.pl
  • Nathalie
  • Fran
  • hello
  • John
  • rather
  • Notary
  • F 1
  • J 1

\n
8 2 1 2 F 2 J
2 N 2 a 5 e 3 h 4 i
1 l 3 n 2 o 3 r 4 t
3 y 1
space
D
32
Subroutines
  • A subroutine is a user-defined function. The
    syntax is very simple so is the semantics.

!/usr/bin/perl sub max if ( x gt y ) x
else y x 10 y 11 print max .
"\n"
  • There are no arguments the script accesses two
    global variables. The subroutine call is marked
    with . The value returned is that of the last
    expression evaluated.

33
Subroutines (2)
  • A few housekeeping rules.
  • You can place your definitions anywhere in the
    file, though it is recommended to have them at
    the beginning.
  • Perl always uses the latest definition in the
    fileany preceding one is ignored.
  • Certain elements of the syntax are optional.
  • The might sometimes be omitted (but it is not a
    good idea).
  • The return operator may precede a value to be
    returned (this can be useful)
  • if ( x gt y ) return x
  • else return y

34
Subroutines (3)
  • Clearly, the use of global variables is much too
    limited. Subroutines take arguments, and work on
    them via a predefined list variable _at__ or its
    elements _0, _1 and so on.

!/usr/bin/perl sub max if ( _0 gt _1 )
_0 else _1 print max ( 12, 13
) . "\n"
35
Subroutines (4)
  • _0, _1 are not fun to work with. We can
    rename them locally, using the my operatorit
    creates a sub's private variables. Here, we
    declare two such variables and right away
    initialize them.

!/usr/bin/perl sub max my ( a, b ) _at__
if ( a gt b ) a else b print max (
15, 14 ) . "\n"
36
Subroutines (5)
  • But this is not a safe max calculation.

!/usr/bin/perl sub max my ( a, b ) _at__
if ( a gt b ) a else b print max (
16, 19, 23 ) . "\n" print max ( 26 ) . "\n"
  • This produces 19 (23 gets ignored) and 26 (the
    second value is undef, that is, 0).

37
Subroutines (6)
  • We could stop the subroutine if the number of
    arguments is wrong. The (generally very useful!)
    operator die does that for us.

!/usr/bin/perl sub max if ( _at__ ! 2 )
die "max needs two arguments _at__\n" my (
a, b ) _at__ if ( a gt b ) a else b
print max ( 16, 19, 23 ) . "\n"
The script is stopped after printing this max
needs two arguments 16 19 23
38
Subroutines (7)
  • We can have just a warning, if we use the
    operator warn instead.

!/usr/bin/perl sub max if ( _at__ ! 2 )
warn "max needs two arguments _at__\n" my (
a, b ) _at__ if ( a gt b ) a else b
print max ( 16, 19, 23 ) . "\n"
The script prints this max needs two arguments
16 19 23 19
39
Subroutines (8)
  • It is, by the way, not a bad idea to generalize
    max by allowing it to take any number of
    arguments.

!/usr/bin/perl sub max my ( curr_max )
shift _at__ foreach ( _at__ ) if ( _ gt
curr_max ) curr_max _
curr_max print max ( 15, 14 ) . "\n" print
max ( 16, 19, 23 ) . "\n" print max ( 26 ) .
"\n"
40
Subroutines (9)
  • This even works for empty lists.

!/usr/bin/perl sub max my ( curr_max )
shift _at__ foreach ( _at__ ) if ( _ gt
curr_max ) curr_max _
curr_max z max ( ) if ( defined z )
print z . "\n" else print "undefined\n"
41
Regular expressions (1)
  • A regular expression (also called a pattern) is a
    template that describes a class of strings. A
    string can either match or not match the pattern.
  • The simplest pattern is one character.
  • A character classthe pattern matches any of
    these charactersis written in square brackets
  • 01234567 an octal digit
  • 0-7 an octal digit
  • 0-9A-F a hex digit
  • A-Za-z not a letter ( "negates")
  • 0-9- a decimal digit or a minus

42
Regular expressions (2)
  • Metacharacters
  • . (dot) any character except \n
  • Anchors
  • the beginning of a string
  • the end of a string
  • Multipliers
  • repeat the preceding item 0 or more times
  • repeat the preceding item 1 or more times
  • ? make the preceding item optional
  • n repeat n times
  • n, m repeat n to m times (n lt m)
  • n, repeat n or more times

43
Regular expressions (3)
  • The Boolean operator tries to match a string
    with a regular expression written inside slashes.
  • x "01239876AGH"
  • if ( x /01-94,/ )
  • print "yes1\n"
  • if ( x /A-Z3/ )
  • print "yes2\n"
  • if ( x /.A-Z4/ )
  • print "yes3\n"

44
Regular expressions (4)
  • Patterns can be grouped by parentheses (the whole
    pattern becomes one item).Alternative is
    denoted by the bar .
  • x "01239876AGH"
  • if ( x /(0-94A-Z3)2,/ )
  • print "yes4\n"
  • if ( x /(0?4)(51abc1,)/ )
  • print "yes5\n"

45
Regular expressions (5)
  • The precedence of pattern elements
  • parentheses ( )
  • multipliers ? n n,m n,
  • sequence, anchors
  • alternation
  • Some character classes are predefined
  • class not class
  • digit \d \D
  • word char a-zA-Z0-9_ \w \W
  • whitespace \s \S
  • Some additional anchors
  • word boundary \b \B

46
Regular expression examples (1)
  • i "Jim"
  • match
  • i /Jim/ yes
  • i /J/ yes
  • i /j/ no
  • i /j/i yes
  • i /\w/ yes
  • i /\W/ no
  • Case is ignored in matching if the postfix i is
    used.

47
Regular expression examples (2)
  • j "JjJjJjJj"
  • j /j/ yes matches anything
  • j /j/ yes matches the first j
  • j /j?/ yes matches the first j
  • j /j2/ no
  • j /j2/i yes ignores case
  • j /(Jj)3/ yes

48
Regular expression examples (3)
  • k "Boom Boom, out go the lights!"
  • k /JimBoom/ yes matches Boom
  • k /(Boom)2/ no a space between Booms
  • k /(Boom )2/ no fails on the comma
  • k /(Boom\W)2/ yes \W is space, comma
  • k /\bBoom\b/ yes
  • k /\bBoom.the\b/ yes
  • k /\Bgo\B/ no "go" is a complete word
  • k /\Bgh\B/ yes the "gh" inside "lights"

49
Regular expression substitution (1)
  • We can modify a string variable by applying a
    substitution.
  • The operator is and the substitution is
    written as
  • s/pattern1/pattern2/
  • v "a string to play with"
  • v s/\w/just a single/
  • print "v\n"
  • just a single string to play with

50
Regular expression substitution (2)
  • Matched patterns are remembered in built-in
    variables1, 2, 3 etc. These variables keep
    their values till the next matching operation.
  • Each set of paretheses in a pattern corresponds
    to a "memory" variable.
  • v "just a single string to play with"
  • v s/(\b\w\b)(.)/'1'2/
  • print "v\n"
  • print "2, 1 1\n"
  • 'just' a single string to play with
  • a single string to play with, just just

51
Regular expression substitution (3)
  • A substitution can be applied to all occurrences
    of the pattern, that is, globally
  • s/pattern1/pattern2/g
  • v "'just' a single string to play with"
  • v s/\b\w\b/word/g
  • print "v\n"
  • 'word' word word word word word word
  • v s/\b\w\b/last/
  • print "v\n"
  • 'word' word word word word word last

52
Regular expression substitution (4)
Parentheses as memory can help construct powerful
patterns with "instant repetition". We can use
\1, \2 etc. for matched substrings.
  • v "This is a double double word."
  • v s/(\b\w\b) \1/\1/
  • print "v\n"
  • This is a double word.
  • v "This is a triple triple triple word."
  • v s/(\b\w\b) \1 \1/\1/
  • print "v\n"
  • This is a triple word.

53
Regular expression substitution (5)
Here is a more realistic example (last year's
homework). You rather need explanations in
class, please.
  • Day '01-9120-93011-9'
  • Month '01-910121-9'
  • Year number up to 31 must have a leading zero
    or two.
  • Year '0-940-9332-94-90-9'
  • while(ltgt)
  • Find all dates, selecting and reinserting the
    context.
  • 1 and 6 match the context. Superfluous
    digits,
  • as 43 and 55 in 432001-01-2255, belong in the
    context.
  • "Dates" such as April 31 or February 30 are
    allowed.
  • There are no provisions for leap years.
  • s/(\D)((Year)-(Month)-(Day))(\D.)/1ltdate
    gt2lt\/dategt6/g
  • s/(\D)((Day)-(Month)-(Year))(\D.)/1ltdate
    gt2lt\/dategt6/g
  • print _

54
Regular expression substitution (6)
One example run, to show how it works.
  • DATA
  • Both 12-09-2000 and 25-8-324 are good dates,
  • but 30-14-1955 and 10-10-10 are not. OTOH,
    10-10-010 is.
  • RESULTS
  • Both ltdategt12-09-2000lt/dategt and
    ltdategt25-8-324lt/dategt are good dates,
  • but 30-14-1955 and 10-10-10 are not. OTOH,
    ltdategt10-10-010lt/dategt is.

55
In another course ?
  • Predefined variables (lots!)
  • More on lists, arrays and hashes
  • More on regular expressions
  • File management
  • Directory management
  • Process management
  • Perl database facilities
  • CGI programming
  • ... and more, and much more

56
Mistakes that novices make (1)
Thanks to Alan Williams for this list
  • Adapted from Programming Perl, page 361.
  • Testing "all-at-once" instead of incrementally,
    either bottom-up or top-down.
  • Optimistically skipping print scaffolding to dump
    values and show progress.
  • Not running with the perl -w switch to catch
    obvious typographical errors.
  • Leaving off or _at_ or from the front of a
    variable.
  • Forgetting the trailing semicolon.
  • Forgetting curly braces around a block.

57
Mistakes that novices make (2)
  • Unbalanced (), , , "", '', , and sometimes
    ltgt.
  • Confusing '' and "", or / and \.
  • Using instead of eq, ! instead of ne,
    instead of , and so on.
  • ('White' 'Black') and (x 5) evaluate as (0
    0) and (5) and thus are true!
  • Using "else if" instead of "elsif".
  • Putting a comma after the file handle in a print
    statement.

58
Mistakes that novices make (3)
  • Not chopping the output of backquotes date or
    not chopping input
  • print "Enter y to proceed "
  • ans ltSTDINgt
  • chop ans
  • if (ans eq 'y') print "You said y\n"
  • else print "You did not say 'y'\n"
  • Forgetting that Perl array subscripts and string
    indexes normally start at 0, not 1.
  • Using _, 1, or other side-effect variables,
    then modifying the code in a way that unknowingly
    affects or is affected by these.
  • Forgetting that regular expressions are greedy,
    seeking the longest match not the shortest match.
Write a Comment
User Comments (0)
About PowerShow.com