Title: Lecture 2' Perl
1Lecture 2. Perl
2Perl
- Practical Extraction Report Language
3What is Perl ?
- Created by Larry Wall
- Trying to create news-net like hierarchy of files
- awk could not handle it - Decided to create a general purpose tool
- Released Dec 1987.
4Perl growth
- Grew rapidly in features
- Grew in portability
- gt 1,000 pages of documentation
- 600 page reference book
- Newsgroup with 200,000 subscribers
5What is perl?
What is the sound of perl? Is it not the sound
of a wall that people have stopped banging their
heads against? -- Larry Wall.
6What is Perl ? The language that saved the human
genome project!!
- The human genome needs 3 Gb space, just to store!
- Huge amount of sequences that need to be
assembled! - 1-10 terabytes information altogether!!!
- Reading http//www.stanford.edu/class/gene211/ha
ndouts/How_Perl_HGP.html
7Perl Pros and Cons
- Uses
- Shell scripts
- Good at
- Text processing
- Small/Medium sized projects
- Quick and dirty solutions
- Portability (to a certain degree)
- Very useful and powerful regular expressions and
string manipulations.
8Perl Pros and Cons
- Bad at
- Efficient large scale computation
- Neat code!
9Basic concepts
- Perl interpreted language
- Perl program is a text file. e.g. script.pl
- Execute perl script.pl
inputs
Text editor
Perl interpreter parses the file, executes the
commands
Perl script
outputs
10perl program
- File must indicate that it is a perl program
not a shell program - !/usr/bin/perl - as first line of the file
11perl syntax
- Free format language
- Whitespace space, tab, newlines, returns.
- sign to end of line comment
12Simple perl program
- !/usr/bin/perl -w
- print try out perl \n
- Line 1 - Indicates a perl program
- Line 1 is special - looks for optional arguments
- like -w - -w warning- always use it
13Running the sample program
- Create a text file with the two lines
- Save the file eg. test.pl
- Then
- perl test.pl
- The program will be executed
14Perl Data Types
- 3 types
- Scalar
- A single number, string or reference.
- name jelly number 2
- List or (array) of scalars
- A collection of scalar data types.
- _at_name ( john, mike, tom)
- Hash (of scalars) associative arrays
- Pairs of scalars, accessed by keys.
- hash ( color gt red,
- size gt 5 )
-
15Operation of data types
- Scalar operation
- numbers same as the other language
- a 2
- , -, , /
- a b exponential. (ab)
- a b remainder of a / b
16Operation of data types
- Scalar operation
- strings
- concatenate a b.c
- substitution a s/T/U/g
- reverse string reverse(string)
- translation str tr/ACGT/UGCA/
17Example program
- !/usr/bin/perl w
- print Please enter your name
- name ltSTDINgt
- chomp (name)
- print you are name! \n
18Comparison operators
19List/Array and access elements
- Array variable names start with _at_
- _at_words (try,to,ha)
- _at_wordsqw(try to ha)
- words0 is try
- words1 is to
- i 2
- What is wordsi?
20Array element access several functions
- foreach, pop,push, shift,unshift.
- foreach access each element in an array in order
- pop push operation on the end of an array
- shift unshift operation at the beginning of an
array. - _at_array (1..10)
- foreach element (_at_array)
-
- print element,\n
-
- push (_at_array, 11)
- num pop (_at_array)
- num shift (_at_array)
- num unshift (_at_array, 0..5)
21Perl Hashes
22Hash data type
- Associative arrays.
- Each entry is a key-value pair.
- E.g a hash stores genes their locations on the
chromosome - geneid_location (AF0001 gt 1200, AF0129 gt
12450) -
- id_name student_id ? student_name
23Hash properties
- Elements are stored in perls internal order, not
the one when you assign hash. - Keys must be unique!!
- Values can have duplicates.
24New elements
- stuffaaa bbb
- Creates key aaa with the value bbb
- stuff234.5 456.7
- Creates key 234.5 with the value 456.7
- print stuffaaa
- Prints bbb
- stuff234.5 3 makes it 459.7
25Assigning values
- _at_ray_list stuff
- _at_ray_list gets (aaa,bbb,234.6,456.7)
- copy original - Copy from original to copy
26Hash element access
- To access an element, use hashkey
- e.g.
- geneid_location (AF0001 gt 1200, AF0129 gt
12450) - where geneid_locationAF0001
- geneid_locationAF3224 21000
- Hash elements are stored in perls own order.
27Hash element access
- keys and values functions
- keys results in a list of all current keys in
the hash - values results in a list of all the values in
the hash - geneid_location (AF0001 gt 1200, AF0129
gt 12450, AF3224 gt 21000) - _at_geneid keys geneid_location
- _at_location values geneid_location
28Hash element access
- each function iterate over the entire hash.
- While ((key,value) each geneid_location)
-
- print key,gt, value, \n
-
29Hash operation
- delete function remove the key-value pair
- delete geneid_locationAF0001
- exists function check whether certain key
exists. - if (exists geneid_locationAF0001)
- print gene AF0001 exists\n
-
30Control statements
- If else
- if (name eq Ray)
-
- print Hello Ray\n
-
- else
-
- print what is your name?\n
-
- If elsif elsif else
31Boolean values
- False True
- Undef all other numeric values
- 0
- empty string all other strings
- 0
32Control statements
- unless
- e.g.
- unless (change gt 0)
-
- print change can not be negative value!\n
-
- while zero or more times loop.
- for e.g for (j0jlt5j) statements
33subroutines
!/usr/bin/perl w dna1 ggg long1
addACGT(dna1) print long1\n sub addACGT
my (dna) _at__ dna ACGT return
dna
34subroutines
- my restrict the scope of the variable to the
body of the subroutine. - The values that are passed to subroutine are
stored in _at__ - dna1 ? subroutine _at__
35subroutines
e.g. call addon (dna1, dna2, dna3) sub addon
my (s1,s2,s3) _at__
36Subroutine and references
- Usually pass by value to subroutine.
- Pass by reference
- Reference holds the address of the variable it
points to.
37references
- a_ref \a
- array_ref \_at_array
- hash_ref \hash
- To use the variable referenced, we need to
dereference the reference. - Append , _at_, in front of the reference.
- a_ref, _at_array_ref, hash_ref
38example
- _at_i 1..10
- _at_j (a,b,z)
- reference_sub(\_at_i,\_at_j)
- sub reference_sub
- my (i,j) _at__
- print first _at_i\n
- push (_at_j, s)
-
- print _at_j
39PERLContinuation
40Regular Expressions
- Regular expression is a pattern, describe a
string, or a whole family of strings via
wildcards. - Matching a regular expression against a string
either succeeds or fails.
41Simple use of regular expressions
- Grep command - One of Unixs most useful tools.
- Used for searching with regular expressions.
42Using grep
- Find all lines in a file containing the string
abc. - Grep abc somefile
- abc is the regular expression tested against each
input line. - Matching lines sent to standard output.
43Perl regular expression
- Regular expressions in perl is enclosed in
slashes, / /. - Use match operator
44Regular expressions
- if(name /Bell/)
-
- print ("string matches")
-
45Ignore case option
- Small i after closing slash Ignore case
- / /i
- To reject Belle - add word boundary \b
- \b - Insures no other letter follows pattern.
46Revised string
- /Bell\b/i
- Bell at beginning of string
- No letter or digit following
- OK in either upper or lower case
- Matches bell, Bell, etc. but not belle
47Regular expression
- Reg. exp build patterns based on 3 fundamental
ideas - Repetition
- Alternation
- concatenation
48Regular expression
- /fossil/ match fossil and fossiliferous
- To increase the power of regex, use
metacharacters - Repetition , , ?,
- one or more times
- zero or more times
- ? zero or one time
- 2, 4 occur between 2 and 4 times
49Regular expression
- metacharacters
- Alternation
- Alternatives are separated by (or)
- dna ATGCGA
- dna s/AGCT/2/g
- print dna, \n
50Regular expression
- metacharacters
- parenthesis () grouping
- Use to group characters.
- /a(bat)/ match abat, abatbat, abatc, etc.
- Character class
- abc match any of the chars in the square
bracket.
51Regular expression
52Regular expression
- Pattern anchors
- matches the beginning of the string
- matches the end
- /where/ matches any string starts with where
- /world/ matches anything ends with world
53Substitute operator
- Find a regular expression replace with a string
- s/form1/form2/ substitute form 1 with new form.
- dna ATCttG
- dna s/T/U/ig substitute all Ts with Us
- i and g are pattern modifiers.
- i ignore cases
- g globally
54Using files for input/output
- Consider a file protein.seq which contains the
following data - MNIDDKL
- SVLQ
- GLQVLL
- Problem read in the file.
55Using files for input/output
- !/usr/bin/perl w
- file protein.seq
- open the file and associate a filehandle.
- open (IN, file)
- read the first line from the file into a
variable - protein ltINgt
- print protein, \n
- read the second line
- protein ltINgt
- print protein, \n
- close IN
56Using files for input/output
- Several standard predefined file handles
- STDIN
- STDOUT
- STDERR
- e.g.
- protein ltSTDINgt
- chomp protein
- print protein, \n
- Perl will wait for the user to command line
input.
57Writing output to a file
- file output
- open (OUT, gtfile)
- print OUT protein\n
-
- print to console
- print protein\n
- the same as above
- print STDOUT protein\n
-
58Command line arguments
- _at_ARGV hold command line arguments.
- e.g. test.pl.
- !/usr/bin/perl w
- unless (_at_ARGV)
-
- print usage test.pl argument\n
- exit
-
- input ARGV0
- print input\n
- perl test.pl 5
59miscellaneous about perl
- use strict
- Requires all variables be declared by my before
using them. - my variable 3
- A good habit always use strict to start your
program. - This will prevent mistakes such as typos
60Perl process management
- To launch a process from a perl script, use
system - e.g.
- system date
- system ls
- Commands will be executed by the unix shell.
- Perl will wait for the process to terminate
before it continues. - The new process is a child of the shell.
61Capture the system output
- Use command
- _at_files ls /home/zhang
- _at_files have all the output of ls command.
62Context dependent scalar or list?
- perl operations behave differently depending on
the context. - e.g.
- _at_array 1..10
- b _at_array b is now ?
63An example perl script
- count Gs in command-line DNA
- !/usr/bin/perl w
- use strict
- my (usage) 0 DNA\n\n
- unless (_at_ARGV)
-
- print usage
- exit
-
- my (dna) ARGV0
- my (num_of_g) countG(dna)
- print num_of_g\n
- exit
- sub countG
- my (dna) _at__
- my (count) 0
- count while (dna /g/ig)
64(No Transcript)
65 Here are the arrays of parts of sentences my
_at_nouns ( 'Dad', 'TV', 'Mom', 'Groucho', 'Rebecca
', 'Harpo', 'Robin Hood', 'Joe and Moe', )
66my _at_verbs ( 'ran to', 'giggled with', 'put hot
sauce into the orange juice of', 'exploded', 'diss
olved', 'sang stupid songs with', 'jumped
with', )
67my _at_prepositions ( 'at the store', 'over the
rainbow', 'just for the fun of it', 'at the
beach', 'before dinner', 'in New York City', 'in
a dream', 'around the world', )
68 Seed the random number generator. time
combines the current time with the current
process id in a somewhat weak attempt to come
up with a random seed. srand(time) This
do-until loop composes six-sentence "stories".
until the user types "quit". do story
''
69for (count 0 count lt 6 count)
sentence nounsint(rand(scalar _at_nouns))
. " " .
verbsint(rand(scalar _at_verbs))
. " " . nounsint(rand(sca
lar _at_nouns)) . " "
. prepositionsint(rand(scalar
_at_prepositions)) . '. '
story . sentence
70print "\n",story,"\n" Get user input.
print "\nType \"quit\" to quit, or press Enter
to continue " input ltSTDINgt Exit
loop at user's request until(input
/\sq/i) exit
71Perl resource
- Online perl series
- learning perl.
- perl cookbook.
- www.perl.com
- The Perl Bible
- Perl In a Nutshell OReilly