Lecture 2' Perl - PowerPoint PPT Presentation

1 / 71
About This Presentation
Title:

Lecture 2' Perl

Description:

1,000 pages of documentation. 600 page reference book. Newsgroup with 200,000 subscribers ... 'at the beach', 'before dinner', 'in New York City', 'in a dream' ... – PowerPoint PPT presentation

Number of Views:64
Avg rating:3.0/5.0
Slides: 72
Provided by: liqing4
Learn more at: https://people.cs.vt.edu
Category:
Tags: lecture | perl

less

Transcript and Presenter's Notes

Title: Lecture 2' Perl


1
Lecture 2. Perl
2
Perl
  • Practical Extraction Report Language

3
What is Perl ?
  • Created by Larry Wall
  • Trying to create news-net like hierarchy of files
    - awk could not handle it
  • Decided to create a general purpose tool
  • Released Dec 1987.

4
Perl growth
  • Grew rapidly in features
  • Grew in portability
  • gt 1,000 pages of documentation
  • 600 page reference book
  • Newsgroup with 200,000 subscribers

5
What is perl?
What is the sound of perl? Is it not the sound
of a wall that people have stopped banging their
heads against? -- Larry Wall.
6
What is Perl ? The language that saved the human
genome project!!
  • The human genome needs 3 Gb space, just to store!
  • Huge amount of sequences that need to be
    assembled!
  • 1-10 terabytes information altogether!!!
  • Reading http//www.stanford.edu/class/gene211/ha
    ndouts/How_Perl_HGP.html

7
Perl Pros and Cons
  • Uses
  • Shell scripts
  • Good at
  • Text processing
  • Small/Medium sized projects
  • Quick and dirty solutions
  • Portability (to a certain degree)
  • Very useful and powerful regular expressions and
    string manipulations.

8
Perl Pros and Cons
  • Bad at
  • Efficient large scale computation
  • Neat code!

9
Basic concepts
  • Perl interpreted language
  • Perl program is a text file. e.g. script.pl
  • Execute perl script.pl

inputs
Text editor
Perl interpreter parses the file, executes the
commands
Perl script
outputs
10
perl program
  • File must indicate that it is a perl program
    not a shell program
  • !/usr/bin/perl - as first line of the file

11
perl syntax
  • Free format language
  • Whitespace space, tab, newlines, returns.
  • sign to end of line comment

12
Simple perl program
  • !/usr/bin/perl -w
  • print try out perl \n
  • Line 1 - Indicates a perl program
  • Line 1 is special - looks for optional arguments
    - like -w
  • -w warning- always use it

13
Running the sample program
  • Create a text file with the two lines
  • Save the file eg. test.pl
  • Then
  • perl test.pl
  • The program will be executed

14
Perl Data Types
  • 3 types
  • Scalar
  • A single number, string or reference.
  • name jelly number 2
  • List or (array) of scalars
  • A collection of scalar data types.
  • _at_name ( john, mike, tom)
  • Hash (of scalars) associative arrays
  • Pairs of scalars, accessed by keys.
  • hash ( color gt red,
  • size gt 5 )

15
Operation of data types
  • Scalar operation
  • numbers same as the other language
  • a 2
  • , -, , /
  • a b exponential. (ab)
  • a b remainder of a / b

16
Operation of data types
  • Scalar operation
  • strings
  • concatenate a b.c
  • substitution a s/T/U/g
  • reverse string reverse(string)
  • translation str tr/ACGT/UGCA/

17
Example program
  • !/usr/bin/perl w
  • print Please enter your name
  • name ltSTDINgt
  • chomp (name)
  • print you are name! \n

18
Comparison operators
19
List/Array and access elements
  • Array variable names start with _at_
  • _at_words (try,to,ha)
  • _at_wordsqw(try to ha)
  • words0 is try
  • words1 is to
  • i 2
  • What is wordsi?

20
Array element access several functions
  • foreach, pop,push, shift,unshift.
  • foreach access each element in an array in order
  • pop push operation on the end of an array
  • shift unshift operation at the beginning of an
    array.
  • _at_array (1..10)
  • foreach element (_at_array)
  • print element,\n
  • push (_at_array, 11)
  • num pop (_at_array)
  • num shift (_at_array)
  • num unshift (_at_array, 0..5)

21
Perl Hashes

22
Hash data type
  • Associative arrays.
  • Each entry is a key-value pair.
  • E.g a hash stores genes their locations on the
    chromosome
  • geneid_location (AF0001 gt 1200, AF0129 gt
    12450)
  • id_name student_id ? student_name

23
Hash properties
  • Elements are stored in perls internal order, not
    the one when you assign hash.
  • Keys must be unique!!
  • Values can have duplicates.

24
New elements
  • stuffaaa bbb
  • Creates key aaa with the value bbb
  • stuff234.5 456.7
  • Creates key 234.5 with the value 456.7
  • print stuffaaa
  • Prints bbb
  • stuff234.5 3 makes it 459.7

25
Assigning values
  • _at_ray_list stuff
  • _at_ray_list gets (aaa,bbb,234.6,456.7)
  • copy original - Copy from original to copy

26
Hash element access
  • To access an element, use hashkey
  • e.g.
  • geneid_location (AF0001 gt 1200, AF0129 gt
    12450)
  • where geneid_locationAF0001
  • geneid_locationAF3224 21000
  • Hash elements are stored in perls own order.

27
Hash element access
  • keys and values functions
  • keys results in a list of all current keys in
    the hash
  • values results in a list of all the values in
    the hash
  • geneid_location (AF0001 gt 1200, AF0129
    gt 12450, AF3224 gt 21000)
  • _at_geneid keys geneid_location
  • _at_location values geneid_location

28
Hash element access
  • each function iterate over the entire hash.
  • While ((key,value) each geneid_location)
  • print key,gt, value, \n

29
Hash operation
  • delete function remove the key-value pair
  • delete geneid_locationAF0001
  • exists function check whether certain key
    exists.
  • if (exists geneid_locationAF0001)
  • print gene AF0001 exists\n

30
Control statements
  • If else
  • if (name eq Ray)
  • print Hello Ray\n
  • else
  • print what is your name?\n
  • If elsif elsif else

31
Boolean values
  • False True
  • Undef all other numeric values
  • 0
  • empty string all other strings
  • 0

32
Control statements
  • unless
  • e.g.
  • unless (change gt 0)
  • print change can not be negative value!\n
  • while zero or more times loop.
  • for e.g for (j0jlt5j) statements

33
subroutines
!/usr/bin/perl w dna1 ggg long1
addACGT(dna1) print long1\n sub addACGT
my (dna) _at__ dna ACGT return
dna
34
subroutines
  • my restrict the scope of the variable to the
    body of the subroutine.
  • The values that are passed to subroutine are
    stored in _at__
  • dna1 ? subroutine _at__

35
subroutines
e.g. call addon (dna1, dna2, dna3) sub addon
my (s1,s2,s3) _at__
36
Subroutine and references
  • Usually pass by value to subroutine.
  • Pass by reference
  • Reference holds the address of the variable it
    points to.

37
references
  • a_ref \a
  • array_ref \_at_array
  • hash_ref \hash
  • To use the variable referenced, we need to
    dereference the reference.
  • Append , _at_, in front of the reference.
  • a_ref, _at_array_ref, hash_ref

38
example
  • _at_i 1..10
  • _at_j (a,b,z)
  • reference_sub(\_at_i,\_at_j)
  • sub reference_sub
  • my (i,j) _at__
  • print first _at_i\n
  • push (_at_j, s)
  • print _at_j

39
PERLContinuation

40
Regular Expressions
  • Regular expression is a pattern, describe a
    string, or a whole family of strings via
    wildcards.
  • Matching a regular expression against a string
    either succeeds or fails.

41
Simple use of regular expressions
  • Grep command - One of Unixs most useful tools.
  • Used for searching with regular expressions.

42
Using grep
  • Find all lines in a file containing the string
    abc.
  • Grep abc somefile
  • abc is the regular expression tested against each
    input line.
  • Matching lines sent to standard output.

43
Perl regular expression
  • Regular expressions in perl is enclosed in
    slashes, / /.
  • Use match operator

44
Regular expressions
  • if(name /Bell/)
  • print ("string matches")

45
Ignore case option
  • Small i after closing slash Ignore case
  • / /i
  • To reject Belle - add word boundary \b
  • \b - Insures no other letter follows pattern.

46
Revised string
  • /Bell\b/i
  • Bell at beginning of string
  • No letter or digit following
  • OK in either upper or lower case
  • Matches bell, Bell, etc. but not belle

47
Regular expression
  • Reg. exp build patterns based on 3 fundamental
    ideas
  • Repetition
  • Alternation
  • concatenation

48
Regular expression
  • /fossil/ match fossil and fossiliferous
  • To increase the power of regex, use
    metacharacters
  • Repetition , , ?,
  • one or more times
  • zero or more times
  • ? zero or one time
  • 2, 4 occur between 2 and 4 times

49
Regular expression
  • metacharacters
  • Alternation
  • Alternatives are separated by (or)
  • dna ATGCGA
  • dna s/AGCT/2/g
  • print dna, \n

50
Regular expression
  • metacharacters
  • parenthesis () grouping
  • Use to group characters.
  • /a(bat)/ match abat, abatbat, abatc, etc.
  • Character class
  • abc match any of the chars in the square
    bracket.

51
Regular expression
52
Regular expression
  • Pattern anchors
  • matches the beginning of the string
  • matches the end
  • /where/ matches any string starts with where
  • /world/ matches anything ends with world

53
Substitute operator
  • Find a regular expression replace with a string
  • s/form1/form2/ substitute form 1 with new form.
  • dna ATCttG
  • dna s/T/U/ig substitute all Ts with Us
  • i and g are pattern modifiers.
  • i ignore cases
  • g globally

54
Using files for input/output
  • Consider a file protein.seq which contains the
    following data
  • MNIDDKL
  • SVLQ
  • GLQVLL
  • Problem read in the file.

55
Using files for input/output
  • !/usr/bin/perl w
  • file protein.seq
  • open the file and associate a filehandle.
  • open (IN, file)
  • read the first line from the file into a
    variable
  • protein ltINgt
  • print protein, \n
  • read the second line
  • protein ltINgt
  • print protein, \n
  • close IN

56
Using files for input/output
  • Several standard predefined file handles
  • STDIN
  • STDOUT
  • STDERR
  • e.g.
  • protein ltSTDINgt
  • chomp protein
  • print protein, \n
  • Perl will wait for the user to command line
    input.

57
Writing output to a file
  • file output
  • open (OUT, gtfile)
  • print OUT protein\n
  • print to console
  • print protein\n
  • the same as above
  • print STDOUT protein\n

58
Command line arguments
  • _at_ARGV hold command line arguments.
  • e.g. test.pl.
  • !/usr/bin/perl w
  • unless (_at_ARGV)
  • print usage test.pl argument\n
  • exit
  • input ARGV0
  • print input\n
  • perl test.pl 5

59
miscellaneous about perl
  • use strict
  • Requires all variables be declared by my before
    using them.
  • my variable 3
  • A good habit always use strict to start your
    program.
  • This will prevent mistakes such as typos

60
Perl process management
  • To launch a process from a perl script, use
    system
  • e.g.
  • system date
  • system ls
  • Commands will be executed by the unix shell.
  • Perl will wait for the process to terminate
    before it continues.
  • The new process is a child of the shell.

61
Capture the system output
  • Use command
  • _at_files ls /home/zhang
  • _at_files have all the output of ls command.

62
Context dependent scalar or list?
  • perl operations behave differently depending on
    the context.
  • e.g.
  • _at_array 1..10
  • b _at_array b is now ?

63
An example perl script
  • count Gs in command-line DNA
  • !/usr/bin/perl w
  • use strict
  • my (usage) 0 DNA\n\n
  • unless (_at_ARGV)
  • print usage
  • exit
  • my (dna) ARGV0
  • my (num_of_g) countG(dna)
  • print num_of_g\n
  • exit
  • sub countG
  • my (dna) _at__
  • my (count) 0
  • count while (dna /g/ig)

64
(No Transcript)
65
Here are the arrays of parts of sentences my
_at_nouns ( 'Dad', 'TV', 'Mom', 'Groucho', 'Rebecca
', 'Harpo', 'Robin Hood', 'Joe and Moe', )
66
my _at_verbs ( 'ran to', 'giggled with', 'put hot
sauce into the orange juice of', 'exploded', 'diss
olved', 'sang stupid songs with', 'jumped
with', )
67
my _at_prepositions ( 'at the store', 'over the
rainbow', 'just for the fun of it', 'at the
beach', 'before dinner', 'in New York City', 'in
a dream', 'around the world', )
68
Seed the random number generator. time
combines the current time with the current
process id in a somewhat weak attempt to come
up with a random seed. srand(time) This
do-until loop composes six-sentence "stories".
until the user types "quit". do story
''
69
for (count 0 count lt 6 count)
sentence nounsint(rand(scalar _at_nouns))
. " " .
verbsint(rand(scalar _at_verbs))
. " " . nounsint(rand(sca
lar _at_nouns)) . " "
. prepositionsint(rand(scalar
_at_prepositions)) . '. '
story . sentence
70
print "\n",story,"\n" Get user input.
print "\nType \"quit\" to quit, or press Enter
to continue " input ltSTDINgt Exit
loop at user's request until(input
/\sq/i) exit
71
Perl resource
  • Online perl series
  • learning perl.
  • perl cookbook.
  • www.perl.com
  • The Perl Bible
  • Perl In a Nutshell OReilly
Write a Comment
User Comments (0)
About PowerShow.com