Title: Perl
1Perl
2Introduction
- Perl stands for "Practical Extraction and Report
Language" - Created by Larry Wall when awk ran out of steam
- Perl grew at almost the same rate as the Unix
operating system
3Introduction (cont.)
- Perl fills the gaps between program languages of
different levels - A great tool for leverage
- High portability and readily available
- Perl can be write-only, without proper care
during programming
4Availability
- It's free and runs rather nicely on nearly
everything that calls itself UNIX or UNIX-like - Perl has been ported to the Amiga, the Atari ST,
the Macintosh family, VMS, OS/2, even MS/DOS and
Windows - The sources for Perl (and many precompiled
binaries for non-UNIX architectures) are
available from the Comprehensive Perl Archive
Network (the CPAN). http//www.perl.com/CPAN
5Running Perl on Unix
- Setup path variable to point to the directory
where Perl is located - Check /usr/local/bin or /usr/bin for perl
- Run a Perl script by typing perl ltfilenamegt
- Alternatively, change the file attribute to
executable and include !/usr/bin/perl in the
first line of your perl script - The .pl extension is frequently associated to
Perl scripts
6Running Perl on Win32
- ActivePerl allows Perl scripts to be executed in
MS-DOS/Windows - Perl was ported faithfully
- The ! directive is no longer used because it
does not mean anything to MS-DOS/Windows - Perl scripts are executed by typing perl
ltfilenamegt - Alternatively, double clicking on the file if the
extension .pl is associated to the Perl
interpreter
7An Example
- !/usr/bin/perl
- print Hello World!
- The ! directive directs subsequent lines in the
file to the perl executable - All statements are terminated with as in
C/C/Java - print by default outputs any strings to the
terminal console (such as printf in C or cout in
C) - Perl completely parses and compiles the script
before executing it
8Variables
- Three main types of variables - scalar, hash and
array - Examples scale, hash, _at_array
- Perl is not a strongly typed language
- Retrieving values from the variables
- scale, hashkey, arrayoffset
- Variables are all global in scope unless defined
to be private or local - Note remember that hash and array are used to
hold scalar values
9Examples
- Assigning values to a scalar
- i hello world!
- j 1 1
- (i,j) (2, 3)
- Assigning values to an array
- array0 1
- array1 hello world!
- push(_at_array,1) stores the value 1 in
the end of _at_array - value pop(_at_array) retrieves and removes the
last element - from _at_array
- _at_array (8,_at_array) inserts 8 in front of
_at_array
10Examples (cont.)
- Assigning values to a hash
- hashgreeting Hello world!
- hashavailable 1
- or using a hash slice
- _at_hashgreeting,available (Hello world!,
1) - Deleting a key-value pair from a hash
- delete hashkey
11Conditional Statements
- Variables alone will not support switches or
conditions - If-Then-Else like clauses are used to make
decisions based on certain preconditions - Keywords if, else, elsif, unless
- Enclosed by and
12A Conditional Statement Example
- print "What is your name? "
- name ltSTDINgt
- chomp (name)
- if (name eq "Randal")
- print "Hello, Randal! How good of you to be
here!\n" - else
- print "Hello, name!\n" ordinary greeting
-
- unless(name eq Randal)
-
- print You are not Randal!!\n part of the
ordinary greeting - name ltSTDINgt reads from standard input
- chomp is a built-in function that removes newline
characters
13Loops
- Conditional statements cannot handle repetitive
tasks - Keywords while, foreach, for , until, do-while,
do-until - Foreach loop iterates over all of the elements in
an array or hash, executing the loop body on each
element - For is a shorthand of while loop
- until is the reverse of while
14Loops (cont.)
- Do-while and do-until loops execute the loop body
once before checking for termination - Statements in the loop body are enclosed by
and
15While Loop
- Syntax
- while(some expression)
- statements
-
-
- Example
- prints the numbers 1 10 in reverse order
- a 10
- while (a gt 0)
- print a
- a a 1
-
16Until Loop
- Syntax
- while(some expression)
- statements
-
-
- Example
- prints the numbers 1 10 in reverse order
- a 10
- until (a lt 0)
- print a
- a a 1
17Foreach Loop
- Syntax
- foreach ltvariablegt (_at_some-list)
- statements
-
- Example
- prints each elements of _at_a
- _at_a (1,2,3,4,5)
- foreach b (_at_a)
- print b
-
18Foreach Loop (cont.)
- Accessing a hash with keys function
- foreach key (keys (fred))
- once for each key of fred
- print "at key we have fredkey\n" show
key and value -
- Alternatively
- while ((first,last) each(lastname))
- print "The last name of first is last\n"
19For Loop
- Syntax
- For(initial_exp test_exp re-init_exp )
- statements
-
-
- Example
- prints numbers 1-10
- for (i 1 i lt 10 i)
- print "i "
-
20Do-While and Do-Until Loops
- Syntax
- do statments do statements
- while some_expression until
some_expression - Prints the numbers 1-10 in reverse order
- a 10 a 10
- do do
- print a print a
- a a 1 a a - 1
- while (a gt 0) until (a lt 0)
21Built-in functions
- shift function
- Ex value Shift(_at_fred) is similar to
(x,_at_fred) _at_fred - unshift function
- Ex unshift(_at_fred,a) like _at_fred (a,_at_fred)
- reverse function
- _at_a (7,8,9)
- _at_b reverse(_at_a) gives _at_b the value of (9,8,7)
- sort function
- _at_y (1,2,4,8,16,32,64)
- _at_y sort(_at_y) _at_y gets 1,16,2,32,4,64,8
22Built-In Functions (cont.)
- qw function
- Ex _at_words qw(camel llama alpaca) is
equivalent to _at_words (camel,llama,alpaca)
- defined function
- Returns a Boolean value saying whether the scalar
value resulting from an expression has a real
value or not - Ex defined a
- undefined function
- Inverse of the defined function
23Built-In Functions (cont.)
- uc and ucfirst functions vs- lc and lcfirst
functions - ltresultgt uc(ltstringgt)
- ltresultgt ucfirst(ltstringgt)
- string abcde
- string2 uc(string) ABCDE
- string3 ucfirst(string) Abcde
- Lc and lcfirst has the reverse effect as uc and
ucfirst functions
24Basic I/O
- STDIN and STDOUT
- STDIN Examples
- a ltSTDINgt
- _at_a ltSTDINgt
- while (defined(line ltSTDINgt))
- process line here
- STDOUT Examples
- print(list of arguments)
- print text
- printf (HANDLE, format, list of arguments)
25Regular Expressions
- Template to be matched against a string
- Patterns are enclosed in /s
- Matching against a variable is done by the
operator - Syntax /ltpatterngt/
- Examples
- string /abc/ matches abc anywhere in
string - ltSTDINgt /abc/ matches abc from standard
input
26Creating Patterns
- Single character patterns
- . matches any single character except newline
(\n), for example /a./ - ? matches zero or one of the preceding
characters - Character class can be created by using and
. Range of characters can be abbreviated by
using -, and a character class can be negated
by using the symbol. - For examples
- aeiouAEIOU matches any one of the vowels
- a-zA-Z matches any single letter in the English
alphabet - 0-9 matches any single non-digit
27Creating Patterns (cont.)
- Predefined character class abbreviations
- \d 0-9
- \D 0-9
- \w a-zA-Z0-9
- \W a-zA-Z0-9
- \s \r\t\n\f
- \s \r\t\n\f
28Creating Patterns (cont.)
- Multipliers , And
- matches 0 or more of the preceding character
- abc matches a followed by zero or more bs and
followed by a c - Matches 1 or more of the preceding character
- abc matches a followed by one or more bs and
followed by a c - is a general multiplier
- a3,5 matches three to five as in a string
- a3, matches three of more as
29Creating Patterns (cont.)
- a3 matches any string with more than three
as in it - Complex patterns can be constructed from these
operators - For examples
- /a.ce.d/ matches strings such as
asdffdscedfssadfz
30Creating Patterns Exercises
- Construct patterns for the following strings
- 1. "a xxx c xxxxxxxx c xxx d
- 2. a sequence of numbers
- 3. three or more digits followed by the string
abc - 4. Strings that have an a, one or more
bs and at least five cs - 5. Strings with three vowels next to each
other. Hint try character class and general
multiplier
31Creating Patterns Exercises
- Answers
- /a.c.d/
- /\d/ or /0-9/
- /\d\d\d.abc/ or /\d3,abc/
- /abc5,/
- /aeiouAEIOU3/
- Other possible answers?
32Anchoring Patterns
- No boundaries are defined by the previous
patterns - Word boundary \w and \W
- \b and \B is used to indicate word boundaries and
vice verse - Examples
- /fred\b/ matches fred, but not frederick
- /\b\\b/ matches xy, but not x y,
and . Why? - /\bfred\B/ matches frederick but not fred
33Anchoring Patterns (cont.)
- and
- matches beginning of a string
- matches end of a string
- Exampls
- /Fred/ matches only Fred
- /aaabbb/ matches nothing
34More on matching operators
- Additional flags for the matching operator
- /ltpatterngt/i ignores case differences
- /fred/i matches FRED,fred,Fred,FreD and etc
- /ltpatterngt/s treat string as single line
- /ltpatterngt/m treat string as multiple line
35More on Matching Operators (cont.)
- ( and ) can be used in patterns to remember
matches - Special variables 1, 2, 3 can be used to
access these matches - For example
- string Hello World!
- if( string /(\w) (\w))
-
- prints Hello World
- print 1 2\n
36More on Matching Operators (cont.)
- Alternatively
- string Hello World!
- (first,second) (string /(\w) (\w))
- print first second\n prints Hello World
- Line 2 Remember that the returns values just
like a function. Normally, it returns 0 or 1,
which stands for true or false, but in this case,
the existence of ( and ) make it return
values of the matching patterns
37Substitution
- Replacement of patterns in string
- s/ltpattern to searchgt/ltpattern to replacegt/ig
- i indicates case insensitive
- g enables the matching to be performed more than
once - Examples
- which this this this
- which s/this/that/ produces that this this
38Substitution (cont.)
- which s/this/that/g produces that that
that - which s/THIS/that/i produces that this
this - which s/THIS/that/ig produces that that
that - Multipliers, anchors and memory operators can be
used as well - string This is a string
- string s//So/ So This is a string
- string s/(\w1,)/I think 1/ I think
This is a string
39Split and Join Functions
- Syntax
- ltreturn value(s)gt split(/ltpatterngt/,ltvariablegt
) - ltreturn valuegt join(ltseperatorgt,ltarraygt)
- Examples
- string This is a string
- _at_words split(/ /,string) splits the string
into separate words - _at_words split(/\s/,string) same as above
- string join( ,_at_words) This is a string
- Great functions in parsing formatted documents
40Functions
- Automates certain tasks
- Syntax
- sub ltnamegt
-
-
- ltstatementsgt
-
- Global to the current package. Since we are not
doing OOP and packages, functions are global to
the whole program
41Functions (cont.)
- Example
- sub say_hello
-
- print Hello world!\n
-
- Invoking a function
- say_hello() takes in parameters
- say_hello no parameters
42Functions (cont.)
- Return values
- Two types of functions void functions (also
known as routine or procedure), and functions - void functions have no return values
- Functions in Perl can return more than one
variable - sub threeVar
-
- return (a, b, c) returns a list of 3
variables
43Functions (cont.)
- (one,two,three) threeVar()
- Alternatively
- _at_list threeVar() stores the three values into
a list - Note
- (one, _at_two, three) threeVar() three will
not have any value, why?
44Functions (cont.)
- Functions cant do much without parameters
- Parameters to a function are stored as a list
with the _at__ variable - Example
- sub say_hello_two
-
- string _at__ gets the value of the
parameter -
- Invocation
- say_hello_two(hello world!\n)
45Functions (cont.)
- For example
- sub add
-
- (left,right) _at__
- return left right
-
- three add(1,2)
46Functions (cont.)
- Variables are all global even if they are defined
within a function - my keyword defines a variable as being private to
the scope it is defined - For example
- sub add
-
- my(left,right) _at__
- return left right
-
47Functions (cont.)
- three add(1,2) three gets the value of 3
- print one\n prints 0
- Print two\n prints 0
48Exercises
- A trim() function that removes leading and
trailing spaces in a string - Hint use the s/// operator in conjunction with
anchors - A date() function that converts date string,
DDMMYY to 13th of December, 2003 - Hint use a hash table to create a lookup table
for the month strings.
49File I/O
- Filehandle
- Automatic filehandles STDIN, STDOUT and STDERR
- Syntax
- open(lthandle namegt,(ltgtgtgt)filename)
- close(lthandle namegt)
- Example
- open(INPUTFILE,ltinputs.txt) opens file handle
-
- Close(INPUTFILE) closes file handle
50File I/O (cont.)
- Handle access does not always yield true
- Check for return value of the open function
- Example
- if(open(INPUT,ltinputs.txt))
- do something
- else
- print File open failed\n
51File I/O (cont.)
- The previous method is the standard practice
- Unlike other languages, Perl is for lazy people
- Ifs can be simplified by the logical operator
- For example
- open(INPUT,ltinputs.txt) die File open
failed\n - Use ! variable to display additional operating
system errors - die cannot append !\n
52File I/O (cont.)
- Filehandles are similar to standard I/O handles
- ltgt operator to read lines
- For example
- open(INPUT,ltinputs.txt)
- while(ltINPUTgt)
- chomp
- print _\n
-
- Use print lthandle_namegt ltstringsgt to output to a
file
53File I/O (cont.)
- File copy example
- open(IN,a) die "cannot open a for reading
!" open(OUT,"gtb") die "cannot create b
!" - while (ltINgt) read a line from file
a into _ - print OUT _ print that line to file b
-
- close(IN) die "can't close a !"
- close(OUT) die "can't close b !"
54File I/O (cont.)
- File tests provides convenience for programmers
- -e r w x d f l T B
- For example
- if(-f name)
- print name is a file\n
-
- elsif(-d name)
- print name is a directory\n
55Special Variables
- _, _at__
- 1, 2 - backreferencing variables
- _ "this is a test"
- /(\w)\W(\w)/ 1 is "this" and 2 is "is"
- , and - match variables
- string this is a simple string
- /si.le/ is now sample, is this is a
and is string - And many morerefer to ActivePerls online
documentation for more functions
56Packages and Modules
- Concentrate only on their usage in the Greenstone
environment - Package a mechanism to protect code from
tampering with other packages variables - Module reusable package that is stored in ltName
of Modulegt.dm - The ppm (Perl Package Manager) for Linux and
Win32 version of Perl manages installation and
uninstallation of Perl packages
57Packages and Modules (cont.)
- Install the module and put use ModuleName or
require ModuleName near the top of the program - qualifying operator allows references to
things in the package, such as ModuleVariable - So use MathComplex module refers to the
module Math/Complex.pm - new creates an instance of the object, then use
the handle and operator -gt to access its
functions
58Packages and Modules (cont.)
- use accepts a list of strings as well, so we can
access the elements directly without the
qualifying operator - For example
- use Module qw(const1 const2 func1 func2 func3)
- const1, const2, func1, func2 and func3 can now be
used directly in the program
59Packages and Modules (cont.)
- Perl locates modules by searching the _at_INC array
- The first instance found will be used for the
module referenced within a program - Where to locate modules is an automatic process
as the Makefiles and PPM take care to place
modules in the correct path
60Packages and Modules (cont.)
- An example that uses the package CGI.pm
- use CGI uses the CGI.pm module
- query CGInew() creates an instance of
CGI - bday query-gtparam("birthday") gets a named
parameter - print query-gtheader() outputs html
header - print query-gtp("Your birthday is bday.")
outputs text to html
61Packages and Modules (cont.)
- Advantages Encourages code reuse and less work
- Disadvantages 33 as fast as procedural Perl
according to the book object-oriented Perl,
generation of Perl modules involves some ugly
code
62Packages and Modules (cont.)