More - PowerPoint PPT Presentation

1 / 60
About This Presentation
Title:

More

Description:

Title: PowerPoint Presentation Last modified by: Ian Donaldson Created Date: 1/1/1601 12:00:00 AM Document presentation format: On-screen Show (4:3) – PowerPoint PPT presentation

Number of Views:161
Avg rating:3.0/5.0
Slides: 61
Provided by: ire104
Category:
Tags: book | greed | learning | more

less

Transcript and Presenter's Notes

Title: More


1
More What Perl can do With an introduction to
BioPerl Ian Donaldson Biotechnology Centre of
Oslo IMBV 3070
2
Much of the material in this lecture is from the
Perl lecture and lab developed for the
Canadian Bioinformatics Workshops by Will
Hsiao Sohrab Shah Sanja Rogic And released under
the Creative Commons license
3
http//creativecommons.org/licenses/by-sa/2.5/
4
More What can Perl do
  • So far, weve had a very brief introduction to
    Perl
  • Next, we want to go a little deeper into
  • Use of strict
  • Perl regular expressions
  • Modules
  • An introduction to object-oriented Perl and
  • BioPerl

5
strict
6
Effects of use strict
  • Requires you to declare variables
  • Warns you about possible typos in variables

Correct Incorrect
my DNA DNA ATCG or my DNA ATCG DNA ATCG
No warning Warning
my DNA ATCG DNA tr/ATCG/TAGC/ my DNA ATCG DAN tr/ATCG/TAGC
7
Why bother use strict
  • Enforces some good programming rules
  • Helps to prevent silly errors
  • Makes trouble shooting your program easier
  • Becomes essential as your code becomes longer
  • We will use strict in all the code you see today
    and in your assignment
  • Bottom line ALWAYS use strict

8
Exercise 12
Write a program that has one function. Use a
variable named some_variable in this function
and in the main body of the program. Prove that
you can alter the value of some_variable in the
function without changing the value of
some_variable in the the main body of the
program. Try it yourself (15 minutes) then check
the answer at the end of this lecture.
9
regular expressions
10
What is a Regular Expression?
  • REGEX provides pattern matching ability
  • Tells you whether a string contains a pattern or
    not (Note its a yes or no question!)

Dog! Humans best friend
My dog ate my homework
I have a golden retriever
Yesterday I saw a big black dog
Regular Expression looking for dog
No since REGEX is case sensitive
Yes or True
Yes or True
No or False
11
Regular expressions are regular
  • Look at these names for yeast open reading frame
    names.
  • YDR0001W
  • YDR4567C
  • YAL0045W
  • YBL0008C
  • While they are all different, they all follow a
    pattern
  • (or regular expression).
  • 1. Y means yeast
  • 2. some letter between A and L represent a
    chromosome
  • 3. an R or L refers to an arm of the
    chromosome
  • 4. a four digit number refers to an open reading
    frame
  • 5. A W or a C refers to either the Watson or
    Crick strand
  • You can write a regular expression to recognize
    ALL yeast open reading frame names.

12
Perl REGEX example
  • my text The dog ate my homework
  • if (text m/dog/)
  • print The text contains a dog\n
  • m is the binding operator. It says does
    the string on the left contain the pattern on the
    right?
  • /dog/ is my pattern or regular expression
  • The matching operation results in a true or false
    answer

13
Regular Expressions in Perl
  • A pattern that match only one string is not very
    useful!
  • We need symbols to represent classes of
    characters
  • For example, say you wanted to recognize Dog or
    dog as being instances of the same thing
  • REGEX is its own little language inside Perl
  • Has different syntax and symbols!
  • Symbols which you have used in perl such as .
    have totally different meanings in REGEX

14
REGEX Metacharacters
  • Metacharacters allow a pattern to match different
    strings
  • Wildcards are examples of metacharacters
  • /.og/ will match dog, log, tog, og, etc.
  • So . Means any character
  • Perl REGEX has much more powerful metacharacters
    used to represent classes of characters

15
Types of Metacharacters
  • . matches any one character or space except
    \n
  • denotes a selection of characters and
    matches ONE of the characters in the
    selection. What does ATCG match?
  • \t, \s, \n match a tab, a space and a newline
    respectively
  • \w matches any characters in a-zA-Z0-9
  • \d matches 0-9
  • \D matches anything except 0-9

16
Using metacharacters to build a regular
expression
  • YBL3456W
  • /YA-LRL\d\d\d\dWC/
  • Is this a good pattern for a yeast ORF name?
  • What else does it match?
  • What if the name only has 3 digits?

17
REGEX Quantifiers
  • What if you want to match a character more than
    once?
  • What if you want to match an mRNA with a polyA
    tail that is at least 5 12 As?
  • ATGAAAAAAAAAAA

18
REGEX Quantifiers
ATGAAAAAAAAAAA
/ATGATCGA5,12/
  • matches one or more copies of the previous
    character
  • matches zero or more copies of the previous
    character
  • ? matches zero or one copy of the previous
    character
  • min,max matches a number of copies within the
    specified range

19
REGEX Anchors
  • The previous pattern is not strictly correct
    because
  • Itll match a string that doesnt start with ATG
  • Itll match a string that doesnt end with poly
    As
  • Anchors tell REGEX that a pattern must occur at
    the beginning or at the end of a string

20
REGEX Anchors
  • anchors the pattern to the start of a string
  • anchors the pattern to the end of a string
  • /ATGATCGA5,12/

21
REGEX is greedy!
  • The revised pattern is still incorrect because
  • Itll match a string that has more than 12 As at
    the end
  • quantifiers will try to match as many copies of a
    sub-pattern
  • as possible!
  • /ATGATCGA5,12/
  • ATGGCCCGGCCTTTCCCAAAAAAAAAAAA
  • ATGGCCCGGCCTTTCCCAAAAAAAAAAAA

22
Curb that Greed!
  • ? after a quantifier prevents REGEX from being
    greedy
  • /ATGATCG?A5,12/
  • ATGGCCCGGCCTTTCCGAAAAAAAAAAAA
  • ATGGCCCGGCCTTTCCGAAAAAAAAAAAA
  • note this is the second use of the question mark
    - what is the other use of ? in REGEX?

23
REGEX Capture
  • What if you want to keep the part of a string
    that matches to your pattern?
  • Use ( ) memory parentheses

ATGGCCCGGCCTTTCCGAAAAAAAAAAAA
/ATG(ATCG?)A5,12/
24
REGEX Capture
/ATG(ATCG?)(A5,12)/
1
2
  • Whats inside the first ( ) is assigned to 1
  • Whats inside the Second ( ) is 2 and so on
  • So 2 eq AAAAAAAAAAAA

25
REGEX Modifiers
  • Modifiers come after a pattern and affect the
    entire pattern
  • You have seen //g already which does global
    matching (/T/g) and global replacement(s/T/U/g)
  • Other useful modifiers

//i make pattern case insensitive
//s let . match newline
//m let and (anchors) match next to embedded newline
///e allow the replacement string to be a perl statement
26
REGEX Summary
  • REGEX is its own little language!!!
  • REGEX is one of the main strengths of Perl
  • To learn more
  • Learning Perl (3rd ed.) Chapters 7, 8, 9
  • Programming Perl (3rd ed.) Chapter 5
  • Mastering Regular Expression (2nd ed.)
  • http//www.perl.com/doc/manual/html/pod/perlre.htm
    l
  • A good cheat sheet is
  • http//www.biotek.uio.no/EMBNET/guides/guideRegEx
    p.pdf

27
Exercise 13
  • In a text file, write out three strings that
    match
  • the following regular expression
  • /ATG?CATCG?A3,10/
  • Write a program that reads each string from the
    text
  • file and checks your answers.
  • Try it yourself (30 min) then look at the answer
    at
  • the end of this lecture.

28
modules
29
What are Modules
  • a logical collection of functions
  • Using modules has the same advantage as using
    functions i.e., it simplifies code (makes it
    modular) and facilitates code reuse
  • Each collection (or module) has its own name
    space Name space a table containing the
    names of variables and functions used in your code

30
Why Use Modules?
  • Modules allow you to use others code to extend
    the functionality of your program.
  • There are a lot of Perl modules.

31
Finding out what modules you already have
  • In Perl, each module is a file stored in some
  • directory in your system.
  • The system that this class is using, stores Perl
  • modules (like cgi.pm) in one of two directories
  • C\bin\Perl\lib
  • C\bin\Perl\site\lib

32
Finding out what modules you already have
  • To find out where modules are installed, type
  • perl V
  • at the command prompt
  • To find out what is installed, type
  • perldoc perllocal
  • at the command prompt.

33
Using Modules
  • To use a module, you need to include the line
  • use modulename
  • at the beginning of your program.
  • But you already knew that
  • use strict
  • use warnings

34
Where to find modules
  • You can search for modules (and documentation)
    that may be useful to your particular problem
    using http//search.cpan.org/
  • CPAN Comprehensive Perl Archive Network
  • Central repository for Perl modules and more
  • If its written in Perl, and its helpful and
    free, its probably on CPAN
  • http//www.perl.com/CPAN/

35
Exercise14
Open a web browser Go to http//search.cpan.org/
Type in bioperl Follow the link to
BioToolsBlast Read the example code Copy
the example code to a file and try to run it.
36
Bioperl Overview
  • The Bioperl project www.bioperl.org
  • Comprehensive, well documented set of Perl
    modules
  • A bioinformatics toolkit for
  • Format conversion
  • Report processing
  • Data manipulation
  • Sequence analyses
  • and more!
  • Written in object-oriented Perl

37
Bioperl Overview
  • The last exercise most likely did not work
    (unless you have BioPerl installed)
  • So lets install it

38
How to install modules
  • This class is using the active state version of
    Perl that comes with a program called ppm (Perl
    Package Manager)
  • At the command prompt type
  • gtppm
  • And follow the instructions in the exercise

39
How to install modules (without ppm)
  • If you are not using active state Perl, you
  • you can also install modules from CPAN using
  • gtperl MCPAN e install SomeModule
  • Module dependency is taken care of automatically
  • Youll (usually) need to be root to install a
    module successfully

40
Exercise15
Install bioperl 1. At the command line prompt
type gtppm 2. Then at the ppm prompt typeppmgt
search bioperl 3. Then typeppmgt install
bioperl Try running the example code from the
last exercise. Enter the code on the next slide
and run it.
41
Exercise16
  • bioperl example code
  • use strict
  • use warnings
  • make the bioperl module (class) accessible to
    your program
  • use BioDBRefSeq
  • make a new instance (object) of the class and
    name it
  • my refseq new BioDBRefSeq
  • call a method of the object to do something
  • in this case, another object is returned
  • my molecule refseq-gtget_Seq_by_acc('NM_006732'
    )
  • call a method or retrieve an attribute of the
    object
  • in this case, the sequence is returned
  • print "seq is ", molecule-gtseq, "\n"

42
What are objects?
  • Examples of objects in real life
  • My car, my dog, my dishwasher
  • Objects have ATTRIBUTES and METHODS
  • Some attributes of a my dog Fido
  • Color of fur brown
  • Height 20 cm
  • Owners Name Ian
  • Weight 2 Kg
  • Tail position up
  • Some methods of my dog Fido
  • Bark
  • Walk
  • Run
  • Eat
  • Wag tail

Fido
43
What is a class?
  • A class is a type of object in the real world
  • Cars, dogs, dishwashers
  • Classes have ATTRIBUTES and METHODS
  • Some attributes of a dog
  • Color of fur
  • Height
  • Owners Name
  • Weight
  • Tail position
  • Some methods of a dog
  • Bark
  • Walk
  • Run
  • Eat
  • Wag tail

The concept of a dog
44
So an object is an instance of a class
class
The concept of dog
object
Fido
45
Objects have unique names called references and
classes have names too.
class
Dog
object
Class name
Fido
reference
46
All classes have a method called new that is used
to create objects.
class
Dog
object
Fido new Dog()
Fido
reference
47
A reference to an object can be used to access
its properties or methods.
class
Dog
object
Fido
print Fido-gtbark()
woof
48
A reference to an object can be used to access
its properties or methods.
class
BioDBRefSeq
object
refseq new BioDBRefSeq
refseq
molecule refseq-gtget_seq_by_acc(NP_01014)
molecule Some sequence record
49
Exercise16
  • bioperl example code
  • use strict
  • use warnings
  • make the bioperl module (class) accessible to
    your program
  • use BioDBRefSeq
  • make a new instance (object) of the class and
    name it
  • my refseq new BioDBRefSeq
  • call a method of the object to do something
  • in this case, another object is returned
  • my molecule refseq-gtget_Seq_by_acc('NM_006732'
    )
  • call a method or retrieve an attribute of the
    object
  • in this case, the sequence is returned
  • print "seq is ", molecule-gtseq, "\n"

50
Putting it all together
So now that you understand (sort
of) Classes Objects Attributes and Methods What
remains is learning what the different classes
are that are available in BioPerl and what you
can do with them. For the next exercise, use the
documentation at bioperl.org to figure out what
the following code does see www.bioperl.org/wik
i/HOWTOs and doc.bioperl.org (then click on
bioperl-live)
51
Exercise17
  • !/usr/bin/perl w
  • use strict
  • use BioSeqIO
  • my seq_in BioSeqIO-gtnew(
  • -file gt myGBrecord,
  • -format gt genbank)
  • my seq_out BioSeqIO-gtnew(
  • -file gt gtmyEMBLrec,
  • -format gt EMBL)
  • my seq_record seq_in-gtnext_seq()
  • seq_out-gtwrite_seq(seq_record)

Create a new BioSeqIO object and initialize
some attributes
52
More Bioperl modules
  • BioSeqIO Sequence Input/Output
  • Retrieve sequence records and write to files
  • Converting sequence records from one format to
    another
  • BioSeq Manipulating sequences
  • Get subsequences (seq-gtsubseq(start, end))
  • Find the length of the object (seq-gtlength)
  • Reverse complement a DNA sequence
  • Translate a DNA sequence .etc.
  • BioAnnotation Annotate a sequence
  • Assign journal references to a sequence, etc.
  • BioAnnotation is associated with an entire
    sequence record and not just part of a sequence
    (see also BioSeqFeature)

53
Some more Bioperl modules
  • BioSeqFeature Associate feature annotation to
    a sequence
  • features describe specific locations in the
    sequence
  • E.g. 5 UTR, 3 UTR, CDS, SNP, etc
  • Using this object, you can add feature
    annotations to your sequences
  • When you parse a genbank file using Bioperl, the
    features of a record are stored as SeqFeature
    objects
  • BioDBGenBank, GenPept, EMBL and Swissprot
    Remote Database Access
  • You can retrieve a sequence from remote databases
    (through the Internet) using these objects

54
Even more Bioperl modules
  • BioSearchIO Parse sequence database search
    reports
  • Parse BLAST reports (make custom report)
  • Parse HMMer, FASTA, SIM4, WABA, etc.
  • Custom reports can be output to various formats
    (HTML, Table, etc)
  • BioToolsRunStandAloneBLAST Run Standalone
    BLAST through perl
  • By combining this and SearchIO, you can automate
    and customize BLAST search
  • BioGraphics Draw biological entities (e.g. a
    gene, an exon, BLAST alignments, etc)

55
Bioperl Summary
  • For Online documentation
  • For this workshop http//doc.bioperl.org/releases
    /bioperl-1.4/
  • Tutorial http//www.bioperl.org/wiki/HOWTOBeginn
    ers
  • HOWTOs http//www.bioperl.org/wiki/HOWTOs
  • Modules http//www.bioperl.org/wiki/CategoryCore
    _Modules
  • Literature
  • Stajich et al., The Bioperl toolkit Perl modules
    for the life sciences. Genome Res. 2002
    Oct12(10)1611-8.PMID 12368254
  • Bioperl mailing list bioperl-l_at_bioperl.org
  • Best way to get help using Bioperl
  • Very active list (upwards of 10 messages a day)
  • Use with caution things change fast and without
    warning (unless you are on the mailing list)

56
Perl Documents
  • In-line documentation
  • POD plain old documents
  • Read POD by typing perldoc ltmodule namegt
  • E.g. perldoc perl, perldoc BioSeqIO
  • On-line documentation
  • http//www.cpan.org
  • http//www.perl.com
  • http/www.bioperl.org
  • Books
  • Learning Perl (the best way to learn Perl if you
    know a bit about programming already)
  • Beginning Perl for Bioinformatics (example based
    way to learn Perl for Bioinformatics)
  • Programming Perl (THE Perl reference book not
    for the faint of heart)

57
Additional Book References
  • Perl Cookbook 2nd edition (quick solutions to 80
    of what you want to do)
  • Learning Perl Objects, References Modules (for
    people who want to learn objects, references and
    modules in Perl)
  • Perl in a Nutshell (an okay quick reference)
  • Perl CD Bookshelf, Version 4.0 (electronic
    version of the above books best value,
    searchable, and kill fewer trees)
  • Mastering Perl for Bioinformatics (more example
    based learning)
  • CGI Programming with Perl (rather outdated
    treatment on the subject... Not really
    recommended)
  • Perl Graphics Programming (if you want to
    generate graphics using Perl side note Perl is
    probably not the best tool for generating
    graphics)

58
(No Transcript)
59
Answer 12
!/usr/bin/perl use strict use warnings TASK
demonstrate the use of my in setting the scope
of a variable my some_variable 100 body of
the main program with the function call print
"the value of some_variable is
some_variable\n" subroutine1() print "but
here, some_variable is still some_variable\n"
subroutine using some_variable sub
subroutine1 my some_variable 0 print "in
subroutine1,some_variable is some_variable\n"
what happens if you comment out "use strict"
and remove "my" from lines 7 and 16
60
!/usr/bin/perl use strict use warnings TASK
check your answers to the regex excercise open
input and output files open(IN,"myanswers.txt")
read the input file line-by-line for each line
test if it matches a regular expression while(ltINgt
) chomp my is_correct does_it_match(_) i
f (is_correct) print "_ is a
match\n" else print "_ is NOT a
match\n" close input file and
exit close(IN) exit() does it match sub
does_it_match my(answer) _at__ my is_correct
0 if (answer m/ATG?CATCG?A3,10/)
is_correct 1 return is_correct
Answer 13
Write a Comment
User Comments (0)
About PowerShow.com