http://creativecommons.org/licenses/by-sa/2.0/ - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

http://creativecommons.org/licenses/by-sa/2.0/

Description:

my $text = 'Bioinformatics Kicks Ass'; if ($text=~/Kicks/){ print 'The text contains Kicksn' ... ick/ will match 'kick', 'sick', 'tick', 'stick', 'kicks', etc. ... – PowerPoint PPT presentation

Number of Views:87
Avg rating:3.0/5.0
Slides: 44
Provided by: stephe78
Category:

less

Transcript and Presenter's Notes

Title: http://creativecommons.org/licenses/by-sa/2.0/


1
http//creativecommons.org/licenses/by-sa/2.0/
2
An Introduction to Perl for Bioinformatics Part
1
  • Will Hsiao
  • Simon Fraser University
  • Department of Molecular Biology and Biochemistry
  • wwhsiao_at_sfu.ca
  • www.pathogenomics.sfu.ca/brinkman

3
Outline
  • Session 1
  • Review of the previous day
  • Perl historical perspective
  • Expand on Regular Expression
  • General Use of Perl
  • Expand on Perl Functions and introduce Modules
  • Interactive demo on Modules
  • Break
  • Session 2
  • Use of Perl in Bioinformatics
  • Object Oriented Perl
  • Bioperl Overview
  • Interactive demo on Bioperl
  • Introduction to the Perl assignment

4
Todays Goals
  • Will have become familiar with a few more
    advanced programming concepts
  • Regular Expression
  • Functions and Modules
  • Object Oriented Perl
  • Will have heard a few common uses of Perl
  • Will have learned how Perl can be used in
    bioinformatics
  • Will have discovered Bioperl

5
Recap from Yesterday
  • Which ones below are variables?
  • sequence, _at_sequences, 74, I knew this,
    seq_id, exciting stuff
  • What are functions?
  • Which part of the statement below is a function
  • _at_sequences split (/\t/, genome)
  • Other issues?

6
What does this program do?
  • !/usr/bin/perl w
  • a mystery subroutine that does something
  • sub mystery_function
  • my (seq1, seq2)_at__
  • my rDNA reverse seq1
  • seq2 tr/T/U/
  • my hybrid rDNA.seq2
  • return hybrid
  • body of the main program
  • DNA1 GATACAATAC
  • DNA2 ATCGTAATCC
  • answer mystery_function(DNA1, DNA2)
  • print answer\n

7
use strict
  • !/usr/bin/perl w
  • use strict
  • a mystery subroutine that does something
  • sub mystery_function
  • my (seq1, seq2)_at__
  • my rDNA reverse seq1
  • seq2 tr/T/U/
  • my hybrid rDNA.seq2
  • return hybrid
  • body of the main program
  • my DNA1 GATACAATAC
  • my DNA2 ATCGTAATCC
  • my answer mystery_function(DNA1, DNA2)
  • print answer\n

8
Effects of use strict
  • Requires you to declare variables
  • Warns you about possible typos in variables

9
Why bother use strict
  • Enforces some good programming rules
  • Helps to prevent silly errors
  • Makes trouble shooting your program easier
  • Becomes essential as your code becomes longer
  • We will use strict in all the code you see today
    and in your assignment
  • Bottom line ALWAYS use strict

10
Perl a brief history
  • Purpose for scanning arbitrary text files,
    extracting information from those text files, and
    printing reports based on that information
  • - from perl manpage
  • 1987-Perl 1.0 released
  • 1993 CPAN conceived
  • 1995 Perl 5.000 released
  • Object oriented perl
  • Modules for creating interactive web pages (CGI)
  • Modules for connection to databases (DBI)
  • Current stable version of Perl is 5.8.5

11
How do we manipulate text?
Regular Expression
12
What is Regular Expression
  • REGEX provides pattern matching ability
  • Tells you whether a string contains a pattern or
    not (Note its a yes or no question!)

I have a golden retriever
Yesterday I saw a big black dog
Dog! Humans best friend
Regular Expression looking for dog
No or False
No b/c REGEX is case sensitive
Yes or True
Yes or True
13
Why need Regular Expression
  • Human does this quite well
  • But.
  • Imagine trying to find all ATGs in the human
    genome by hand
  • Furthermore, imagine trying to find all EcoRI
    digestion sites (GAATTC) in the human genome

14
Perl REGEX example
  • my text Bioinformatics Kicks Ass
  • if (text/Kicks/)
  • print The text contains Kicks\n
  • is the binding operator
  • It says does the string on the left contain the
    pattern on the right?
  • /Kicks/ is my pattern
  • The matching operation results in a true or false
    answer

15
More Regular Expression
  • A pattern that match only one string is not very
    useful!
  • We need symbols to represent classes of strings
  • REGEX is its own little language inside Perl
  • Has different syntax and symbols!
  • Symbols which you have used in perl such as .
    have totally different meanings in REGEX

16
REGEX Metacharacters
  • Metacharacters allow a pattern to match different
    strings
  • Wildcards are examples of metacharacters
  • /.ick/ will match kick, sick, tick,
    stick, kicks, etc.
  • Perl REGEX has much more powerful metacharacters
    used to represent classes of characters

17
Types of Metacharacters
  • . matches any one character or space except \n
  • denotes a selection of characters and matches
    ONE of the characters in the selection.
  • What does ATCG match?
  • \t, \s, \n match a tab, a space and a newline
    respectively
  • \w matches any characters in a-zA-Z0-9
  • \d matches 0-9
  • \D matches anything except 0-9

18
An Example of Metacharacters
  • V1S 5A6?
  • /\w\d\D\s\d.0-9/
  • Is it a good pattern for postal code?
  • What else does it match?

19
REGEX Quantifiers
  • What if you want to match a character more than
    once?
  • What if you want to match an mRNA with a polyA
    tail that is at least 5 12 As?
  • ATGAAAAAAAAAAA

20
REGEX Quantifiers
ATGAAAAAAAAAAA
/ATGATCGA5,12/
  • matches one or more copies of the previous
    character
  • matches zero or more copies of the previous
    character
  • ? matches zero or one copy of the previous
    character
  • min,max matches a number of copies within the
    specified range

21
REGEX Anchors
  • The previous pattern is not strictly correct
    because
  • Itll match a string that doesnt start with ATG
  • Itll match a string that doesnt end with poly
    As
  • Anchors tell REGEX that a pattern must occur at
    the beginning or at the end of a string

22
REGEX Anchors
  • anchors the pattern to the start of a string
  • anchors the pattern to the end of a string
  • /ATGATCGA5,12/

23
REGEX is greedy!
  • The revised pattern is still incorrect because
  • Itll match a string that has more than 12 As at
    the end
  • quantifiers will try to match as many copies of a
    sub-pattern as possible!
  • /ATGATCGA5,12/
  • ATGGCCCGGCCTTTCCCAAAAAAAAAAAA
  • ATGGCCCGGCCTTTCCCAAAAAAAAAAAA

24
Curb that Greed!
  • ? after a quantifier prevensts REGEX from being
    greed
  • note this is the second use of the question mark
  • What is the other use of ? in REGEX?
  • /ATGATCG?A5,12/
  • ATGGCCCGGCCTTTCCGAAAAAAAAAAAA
  • ATGGCCCGGCCTTTCCGAAAAAAAAAAAA

25
REGEX Capture
  • What if you want to keep the part of a string
    that matches to your pattern?
  • Use ( ) memory parentheses

ATGGCCCGGCCTTTCCGAAAAAAAAAAAA
/ATG(ATCG?)A5,12/
26
REGEX Capture
/ATG(ATCG?)(A5,12)/
1
2
  • Whats inside the first ( ) is assigned to 1
  • Whats inside the Second ( ) is 2 and so on
  • So 2 eq AAAAAAAAAAAA

27
REGEX Modifiers
  • Modifiers come after a pattern and affect the
    entire pattern
  • You have seen //g already which does global
    matching (/T/g) and global replacement(s/T/U/g)
  • Other useful modifiers

28
REGEX Demo
  • Demonstrate quantifiers
  • Demonstrate anchors
  • Demonstrate //i
  • Demonstrate capture
  • Demonstrate the effect of greedy vs. non-greedy
  • Demonstrate metacharacters

29
Other binding operators
  • is called the binding operator which binds
    the a string on the left to a pattern on the
    right
  • E.g. text /PATTERN/
  • Two other binding operators s/// and tr///
  • s/// (substitution) substitutes a matched
    pattern by a string (kind of like the replace
    function in MS Word)
  • tr/// (translation) translates a character to
    another

30
Summary on REGEX
  • REGEX is its own little language!!!
  • REGEX is used in some functions (e.g. split)
  • Perl REGEX extremely powerful and fast
  • REGEX is one of the main strengths of Perl
  • To learn more
  • Learning Perl (3rd ed.) Chapters 7, 8, 9
  • Programming Perl (3rd ed.) Chapter 5
  • Mastering Regular Expression (2nd ed.)

31
Common Uses of Perl
  • REGEX
  • Complete set of tools for pattern matching text
  • System administration
  • Perl scripts can be written to automate many
    system administration tasks
  • CGI.pm
  • Module for designing interactive web pages
  • DBI.pm
  • Database Interface allows communication between
    all major RDBMS systems (Oracle, MySQL, etc.)

32
Review on Functions
  • How do we call a function?
  • my sum add (2, 3)
  • Functions can take some input values (parameters)
    and can return some output values
  • You need to assign the return values to a
    variable in order to use them

33
More Review on Functions
  • Benefits of subroutines
  • Decompose a big problem into smaller, more
    manageable problems
  • Organize your code
  • Improve code reuse
  • Easier to test and debug your code

sub add some code that adds numbers
here return the sum
34
What are Modules
  • a logical collection of functions
  • Each collection (or module) has its own name
    space
  • Name space a table containing the names of
    variables and functions used in your code

35
Why is name space important
Package SEQanalysis_DNA
Package SEQanalysis_Prot
DNA ATGAATACTACTAT polyAtail
AAAAAAAAAA sub Revcom reverse complement
sequence sub concat concatenate two DNA
sequences
exon1 MEDAVRSKNTMI exon2
RSVADEGFLSMIRQH sub findmotif find a
peptide motif sub concat concatenate two exon
sequences
SEQanalysis_DNAconcat
SEQanalysis_Protconcat
36
Why Use Modules
  • Modules allow you to use others code to extend
    the functionality of your program
  • But, use other peoples modules is like going to
    other peoples houses
  • Not everything will be the way you like it
  • Read the module documentation
  • Be nice
  • use a module as it is intended
  • In Perl, each module is a file stored in some
    directory in your system
  • E.g. you can find cgi.pm in /usr/lib/ on your
    system (ask Graeme where it is)

37
Use Modules
  • To use a module
  • use ltmodulenamegt
  • Examples
  • use strict
  • use Env
  • use cgi qw(standard)
  • To find out where modules are installed perl V
  • To find out what standard modules are available
    perldoc perlmodlib

38
Module Demo
  • Demonstrate perldoc as a method to read module
    documentation
  • Demonstrate the difference before and after using
    a module (use strict and use Env)
  • Demonstrate the perl V and an example of
    directory structure of modules

39
Where to find modules
  • CPAN Comprehensive Perl Archive Network
  • Central repository for Perl modules and more
  • If its written in Perl, and its helpful and
    free, its probably on CPAN
  • http//www.perl.com/CPAN/
  • To install modules from CPAN
  • perl MCPAN e install SomeModule
  • Module dependency is taken cared of automatically
  • Youll (usually) need to be root to install a
    module successfully
  • For details see your notes

40
CPAN Web Demo
  • Demonstrate how to search for a module and how to
    access the online documentation
  • Well use GetoptLong as an example

41
Interactive Demo on GetoptLong
  • Open your laptop!
  • Open a terminal window
  • Type cd /perl_two
  • Type emacs ./getopt_demo.pl
  • Lets go over the example together

42
Summary for Session 1
  • Always use strict
  • Regular Expression is its own language inside
    Perl
  • I encourage you to read the chapters on REGEX in
    Learning Perl
  • A module is a logical collection of functions
  • You can find module documentation by using
    perldoc (command line) or by going online to CPAN

43
Break
Write a Comment
User Comments (0)
About PowerShow.com