Title: Perl and BioPerl
1Perl and BioPerl
- Craig A. Struble, Ph.D.
- Marquette University
2Overview
- Perl
- Literals
- Variables
- Control Structures
- Miscellaneous
- BioPerl
- Sample Programs
- References
3Perl
- Practical Extraction and Report Language
- Created by Larry Wall
- Runs on just about every platform
- Most popular on Unix/Linux systems
- Excellent language for file and data processing
4Simple Program
On Unix, this is the location of the Perl
interpreter
!/usr/local/bin/perl This is a comment line.
This program prints Hello World. to the
screen. print Hello world.\n
Comments start with and end with the end of the
line
Program statements are terminated with semicolons
Newline character
5Literal Data
- Strings
- Hello world\n
- Hello world\n
- Numbers
- 123 integer
- 456.789 real or floating point
- 23.45e8 scientific notation
- 0xABC12 hexadecimal
- 0377 octal
\n is printed as a newline
\n is printed as \n
6Variables
- Scalars
- Store a single value
- Variables start with a
- Declare variables (optional)
- my a, hello, ou812, hi_there
- hello Hello World\n assign
- print hello print value
7Variables
- Arrays
- Store multiple values, indexed by integers
starting at 0 - A whole array variable starts with _at_
- Single elements are referred to with and for
the index - my _at_anarray
- my _at_x (1, 2, 3)
- x0 contains 1
8Variables
- Hashes
- Stores multiple values, indexed by strings
- A whole hash variable starts with
- Single elements are referred to with and
- my date
- date ( day gt Monday,
- mon gt September )
- print the day followed by a newline
- print dateday . \n
concatenation
9Control Structures
- Perl supports the standard control structures
- Syntax is generally similar to C/C/Java
- while, for, if, foreach
!/usr/local/bin/perl Print out direction from
Washington D.C. Usage checkcity city my city
ARGV0 _at_ARGV holds command line
arguments if (city eq New York) print
New York is northeast of Washington D.C.\n
elsif (city eq Chicago) print Chicago
is northwest of Washington D.C.\n else
print Im not sure where city is, sorry.\n
10Control Structures
!/usr/local/bin/perl Print out
0,1,2,3,4,5,6,7,8,9 in this case, x is local
only to the loop because my is used for (my x
0 x lt 10 x) print x if
(x lt 9) print , print \n
11Control Structures
!/usr/local/bin/perl Demonstrate the foreach
loop, which goes through elements in an
array. my _at_users (bonzo, gorgon, pluto,
sting) foreach user (_at_users) print
user is alright.\n
12Functions
- Use sub to create a function.
- No named formal parameters, assign _at__ to local
subroutine variables.
!/usr/local/bin/perl Subroutine for
calculating the maximum sub max my max
shift(_at__) shift removes the first value
from _at__ foreach val (_at__) max
val if max lt val Notice perl allows post
ifs return max high
max(1,5,6,7,8,2,4,9,3,4) print High value is
high\n
13Files
- File handles are used to access files
- open and close functions
!/usr/local/bin/perl Open a file and print its
contents to copy.txt my filename
ARGV0 open(MYFILE, ltfilename) lt
indicates read, gt indicates write open(OUTPUT,
gtcopy.txt) while (line ltMYFILEgt) The
ltgt operator reads a line print OUTPUT line
no newline is needed, read from
file close MYFILE Parenthesis
are optional
14Regular Expressions
- One of Perls strengths is pattern matching
- Perls regular expression language is extremely
powerful, but can be challenging to learn - Some examples follow
15Regular Expressions
!/usr/local/bin/perl my filename
ARGV0 open(INPUT, ltfilename) while
(ltINPUTgt) Note that the line read is
stored in _ print Found Fred.\n if
/Fred/ print Found a Flintstone.\n if
m/(FredWilmaPebbles) Flintstone/ if
(/(..)(..)(..)/) match a time, dots match
anything except \n seconds 3
parentheses store matches in 1, 2, 3,
print There are seconds seconds.\n
close INPUT
16Comma Separated Value Files
!/usr/local/bin/perl Some simple code
demonstrating how to use split and regular
expressions. This code extracts out values in a
CSV file. my filename ARGV0 open(INPUT,
ltfilename) while (ltINPUTgt) chomp
Remove terminating newline my
_at_values split /,/ Split string in _ where ,
exists print The first value is .
values0 . \n close INPUT
17Objects
- Perl supports object oriented programming
- Constructor name is new
- A class is really a special kind of package.
- Objects are created with bless
18Example Class Definition
package Critter constructor sub new my
objref reference to an empty hash
bless objref make it an object in
Critter class return objref return
the reference Instance method, first
parameter is object reference sub display
my self shift just to demonstrate
print Im a critter.\n 1 must end class
with a true value
Store in Critter.pm
19Example Object Usage
!/usr/local/bin/perl use Critter my critter
new Critter create an object critter-gtdisp
lay display the object display
critter alternative notation
20BioPerl
- BioPerl is a collection of Perl classes useful
for developing bioinformatics tools. - http//www.bioperl.org
- Installed on the student platform
21Example 1
!/usr/local/bin/perl Collect documents from
PubMed containing the term Breast Cancer and
print them. use BioBiblio my biblio new
BioBiblio my collection biblio-gtfind(breas
t cancer) while (collection-gthas_next)
there are underlines before next print
collection-gtget_next
22Example 2
!/usr/local/bin/perl Get a sequence from
RefSeq by accession number use BioDBRefSeq
gb new BioDBRefSeq seq
gb-gtget_Seq_by_acc(NM_007304) print
seq-gtseq()
23Example 3
!/usr/local/bin/perl Perform various
calculations on a sequence use BioSeq my seq
BioSeq-gtnew( -seq gt 'ATGGGGGTGGTGGTACCCT',
-id gt 'human_id',
-accession_number gt
'AL000012', ) print
seq-gtseq() . \n print the
sequence print seq-gtrevcom-gtseq() . \n
print the reverse complement print
seq-gttranslate-gtseq() . \n print a
translation
24References
- Programming Perl by Wall, Christiansen, and
Schwartz (OReilly) - Learning Perl by Schwartz and Phoenix (OReilly)
- Beginning Perl for Bioinformatics by Tisdall
(OReilly) - http//www.perl.com
- http//www.bioperl.org