http://creativecommons.org/licenses/by-sa/2.0/ - PowerPoint PPT Presentation

1 / 44
About This Presentation
Title:

http://creativecommons.org/licenses/by-sa/2.0/

Description:

An Introduction to Perl for Bioinformatics Part 2. Will Hsiao ... Create web forms. Lecture 6.1. 9. Bioperl Overview. The Bioperl project www.bioperl.org ... – PowerPoint PPT presentation

Number of Views:59
Avg rating:3.0/5.0
Slides: 45
Provided by: stephe78
Category:

less

Transcript and Presenter's Notes

Title: http://creativecommons.org/licenses/by-sa/2.0/


1
http//creativecommons.org/licenses/by-sa/2.0/
2
An Introduction to Perl for Bioinformatics Part
2
  • Will Hsiao
  • Simon Fraser University
  • Department of Molecular Biology and Biochemistry
  • wwhsiao_at_sfu.ca
  • www.pathogenomics.sfu.ca/brinkman

3
(No Transcript)
4
Outline
  • Session 1
  • Review of the previous day
  • Perl historical perspective
  • Expand on Regular Expression
  • General Use of Perl
  • Expand on Perl Functions and introduce Modules
  • Interactive demo on Modules
  • Break
  • Session 2
  • Use of Perl in Bioinformatics
  • Object Oriented Perl
  • Bioperl Overview
  • Interactive demo on Bioperl
  • Introduction to the Perl assignment

5
Perl in Bioinformatics
  • Case to point 1 Human Genome data exchange
  • How Perl saved the Human Genome Project
  • Lincoln Stein (1996) www.perl.org
  • Different sequencing centres all have different
    data format
  • Perl allowed various genome centres to exchange
    and communicate data with each other
  • Introduces a project to produce modules to
    process all known forms of biological data
    (Bioperl)

6
Perl in Bioinformatics
  • Case to point 2 Ensembl
  • Much of Ensembl is written in Perl
  • Ensembl has an extensive Perl API - allow you to
    access Ensembl database directly from your perl
    code
  • Case to point 3 GMOD Generic Model Organism
    Database
  • www.gmod.org
  • a joint effort by model organism system databases
    (worm, fly, corn, rat, yeast, E. coli,
    arabidopsis, rice) to develop reusable components
    suitable to be adapted for other biological
    databases
  • Written mostly in Java and Perl

7
Bioinformatics Spectrum
JAVA
Perl
Math
Biology
Computer Science
Software/ data analysis
C/C
8
Perl for bioinformatics in your lab
  • Scripting
  • automation of repetitive analyses
  • parse results obtained from other programs
  • Wrapping
  • accessing others programs (e.g. BLAST) through
    Perl
  • Web CGIing
  • Develop an interactive web page to your lab
  • Create web forms

9
Bioperl Overview
  • The Bioperl project www.bioperl.org
  • Comprehensive, well documented set of Perl
    modules
  • Last stable release 1.4.0 (developer 1.5.1)
  • A bioinformatics toolkit for
  • Format conversion
  • Report processing
  • Data manipulation
  • Sequence analyses
  • and more!
  • Written in object-oriented Perl

10
What are objects?
  • Examples of objects in real life
  • Cars, dogs, dishwashers
  • Objects have ATTRIBUTES and ACTIONS
  • Some attributes of a dog
  • Color of fur
  • Height
  • Owners Name
  • Weight
  • Tail position
  • Some actions of a dog
  • Bark
  • Walk
  • Run
  • Eat
  • Wag tail

11
What are programming objects?
  • Borrows from the concept of real life objects

A Program Dog Object
Attributes are stored as variables Actions are
implemented as functions
sub dye_fur
fur_color weight tail_position
sub eat
sub wag_tail
12
Object Exercise
  • Pair up with your neighbour (2-3 people)
  • In the next 2-3 minutes, come up with as many
    attributes and actions (aka methods) of a DNA
    sequence object
  • E.g. attributes of a DNA sequence object
  • length300, percent_GC50
  • E.g. methods of a DNA sequence object
  • Translate_to_protein, remove_polyA_tail
  • Share with the class

13
Objects belong to Classes
  • If we take all your suggestions and design a
    generic template. We can then use this template
    to create objects.
  • This template is called a Class
  • An instance of a class is called an object

DNA sequence object 1 DNA sequence object 2 DNA
sequence object 3 DNA sequence object 4
DNA Sequence Class
14
How do we interact with an object?
  • We have to refer to an object by its name

WOOF
POLO
Polo is the name of my dog
15
Interact with a program object
WOOF
A Program Dog Object
Polo
sub dye_fur
fur_color weight tail_position
sub eat
sub wag_tail
Polo is the name of a program dog object
16
A name is a reference
  • Objects have unique names (labels)
  • You refer to an object by its unique name
  • This unique name that you give to an object is
    called a reference

17
Reference in Perl
  • A reference is a scalar (simple) variable that
    refers to a chunk of memory
  • Stored in that memory can be another variable or
    an object

Memory
My Program
array_ref
18
Reference to an object
Memory
My Program
my_protein is called a reference to an object
(in this case a protein object) To access the
attributes and methods of the protein object, you
have to go through its reference (i.e.
my_protein) Objects have inherent functions
that are useful These inherent functions also
have specific names
my_protein
A protein object
varSwissProt_ID varname varlength varso
uce var_at_journal_articles vardomain_location
sub new sub return_ID sub get_domain
19
Object Oriented Programming
  • What is O-O Programming?
  • Simple answer a way to organize code so it
    interacts in certain ways and follows certain
    rules
  • Long answer to be found in books on O-O
  • Why O-O Programming?
  • Provides well defined framework
  • Promotes certain good practice such as code
    reuse, abstraction, cleaner design, etc.
  • Does have certain trade-offs (e.g. O-O Perl is
    usually slower than declarative Perl)
  • Designing good object classes requires
    forethoughts and skills

20
To use an object
  • Find out which class you need and learn about the
    class by reading its documentation
  • Make the class available to your program
  • Create a new object of the class
  • Start using the object by modifying its
    attributes and calling its methods

21
Example of using objects
  • Task
  • I have a sequence file in Genbank format that I
    want to convert to EMBL format
  • How many objects do you think we need to
    accomplish the task above?

22
1. Find the Objects you need
  • Objects that we need
  • an object that read in sequences from a file
  • an object that represents a sequence record
  • an object that write sequences to a file

Memory
EMBL
Genbank
Sequence File Input Object
Sequence Object
Sequence File Output Object
23
Example of using objects
  • Solution
  • I remember that Bioperl provides this
    functionality. So first Ill take a look at the
    Bioperl documentation
  • Website http//www.bioperl.org

24
Bioperl Documentation demo
  • Go to the webpage and navigate to SeqIO doc
  • Pay attention to
  • 1) the name of the module
  • 2) Synopsis (code examples)
  • 3) Description
  • 4) list of methods

25
(No Transcript)
26
Click
27
List of Modules by Class
Complete List of Modules by Name
28
(No Transcript)
29
2. Make the object class available
  • In perl, classes are implemented as
    object-oriented modules
  • To include a class, simply use the module
  • E.g. use BioSeqIO
  • Note the name of the module is case sensitive
  • By using BioSeqIO, my program automatically
    gain access to any modules included in BioSeqIO

30
3. Create an object
  • Make up a name for my object reference (e.g.
    seq_input)
  • Create the object by calling the object classs
    new method
  • every class has a constructor method to create
    an object of that class
  • constructor method is often called new
  • use single arrow operator to call methods
  • Assign the object to the object reference
  • You can give the object you are about to create
    some initial attributes (e.g. the file name of my
    sequence record, the format of the record)

my seq_in
BioSeqIO-gtnew

( -file gt myGBrecord, -format gt
genbank)
31
4. Call objects methods?
  • Weve seen the -gt (single arrow) operator for
    calling a class method (e.g. new)
  • The same operator is used for calling an object
    method
  • E.g. to ask seq_in object to get a sequence
    record from your Genbank sequence file
  • my seq_record seq_in-gtnext_seq()

32
Putting it all together
  • !/usr/bin/perl w
  • use strict
  • use BioSeqIO
  • my seq_in BioSeqIO-gtnew(
  • -file gt myGBrecord,
  • -format gt genbank)
  • my seq_out BioSeqIO-gtnew(
  • -file gt gtmyEMBLrec,
  • -format gt EMBL)
  • my seq_record seq_in-gtnext_seq()
  • seq_out-gtwrite_seq(seq_record)

Create a new BioSeqIO object and initialize
some attributes
33
More Bioperl modules
  • BioSeqIO Sequence Input/Output
  • Retrieve sequence records and write to files
  • Converting sequence records from one format to
    another
  • BioSeq Manipulating sequences
  • Get subsequences (seq-gtsubseq(start, end))
  • Find the length of the object (seq-gtlength)
  • Reverse complement a DNA sequence
  • Translate a DNA sequence .etc.
  • BioAnnotation Annotate a sequence
  • Assign journal references to a sequence, etc.
  • BioAnnotation is associated with an entire
    sequence record and not just part of a sequence
    (see also BioSeqFeature)

34
Some more Bioperl modules
  • BioSeqFeature Associate feature annotation to
    a sequence
  • features describe specific locations in the
    sequence
  • E.g. 5 UTR, 3 UTR, CDS, SNP, etc
  • Using this object, you can add feature
    annotations to your sequences
  • When you parse a genbank file using Bioperl, the
    features of a record are stored as SeqFeature
    objects
  • BioDBGenBank, GenPept, EMBL and Swissprot
    Remote Database Access
  • You can retrieve a sequence from remote databases
    (through the Internet) using these objects

35
Even more Bioperl modules
  • BioSearchIO Parse sequence database search
    reports
  • Parse BLAST reports (make custom report)
  • Parse HMMer, FASTA, SIM4, WABA, etc.
  • Custom reports can be output to various formats
    (HTML, Table, etc)
  • BioToolsRunStandAloneBLAST Run Standalone
    BLAST through perl
  • By combining this and SearchIO, you can automate
    and customize BLAST search
  • BioGraphics Draw biological entities (e.g. a
    gene, an exon, BLAST alignments, etc)

36
Bioperl Summary
  • For Online documentation
  • For this workshop http//doc.bioperl.org/releases
    /bioperl-1.4/
  • Tutorial http//www.bioperl.org/wiki/HOWTOBeginn
    ers
  • HOWTOs http//www.bioperl.org/wiki/HOWTOs
  • Modules http//www.bioperl.org/wiki/CategoryCore
    _Modules
  • Literature
  • Stajich et al., The Bioperl toolkit Perl modules
    for the life sciences. Genome Res. 2002
    Oct12(10)1611-8.PMID 12368254
  • Bioperl mailing list bioperl-l_at_bioperl.org
  • Best way to get help using Bioperl
  • Very active list (upwards of 10 messages a day)
  • Use with caution things change fast and without
    warning (unless you are on the mailing list)

37
Interactive demo on Bioperl
  • Open your laptop!
  • Open a terminal window
  • Type cd /perl_two
  • Type gedit ./bioperl_demo.pl
  • Lets go over the example together

38
Summary for Session 2
  • Perl is a popular language in bioinformatics
    because
  • it handles text well
  • It has great user base and support (e.g. Bioperl)
  • Bioperl is a large collection of object oriented
    perl modules for many biological data analyses
  • an object is a collection of attributes and
    methods
  • You have to access an object through its
    reference
  • a reference is a name

39
Perl Documents
  • In-line documentation
  • POD plain old documents
  • Read POD by typing perldoc ltmodule namegt
  • E.g. perldoc perl, perldoc BioSeqIO
  • On-line documentation
  • http//www.cpan.org
  • http//www.perl.com
  • http/www.bioperl.org
  • Books
  • Learning Perl (the best way to learn Perl if you
    know a bit about programming already)
  • Beginning Perl for Bioinformatics (example based
    way to learn Perl for Bioinformatics)
  • Programming Perl (THE Perl reference book not
    for the faint of heart)

40
Additional Book References
  • Perl Cookbook 2nd edition (quick solutions to 80
    of what you want to do)
  • Learning Perl Objects, References Modules (for
    people who want to learn objects, references and
    modules in Perl)
  • Perl in a Nutshell (an okay quick reference)
  • Perl CD Bookshelf, Version 4.0 (electronic
    version of the above books best value,
    searchable, and kill fewer trees)
  • Mastering Perl for Bioinformatics (more example
    based learning)
  • CGI Programming with Perl (rather outdated
    treatment on the subject... Not really
    recommended)
  • Perl Graphics Programming (if you want to
    generate graphics using Perl side note Perl is
    probably not the best tool for generating
    graphics)

41
Introduction to the Assignment Part A
  • Goals
  • To convert passive knowledge to active skills
  • To write some simple perl programs by yourself
  • Consists of 2 modules
  • Write a program to convert the temperature from F
    to C
  • Write a program to count the frequencies of bases
    in a sequence (sequence MAN1.fasta can be
    downloaded from Day6 wiki)

42
Introduction to the Assignment Part B
  • Goals
  • To see the power of Perl in bioinformatics
  • To see how some common bioinformatics tasks are
    done using Perl
  • Consists of 3 modules
  • Download E. coli O157H7 proteins in FASTA format
  • Use Regular Expression to find a protein motif
  • Run BLAST on all proteins in the proteome (gt5000
    BLAST runs)

43
Introduction to the Assignment Part B
  • Most of the code is given to you, you just have
    to modify them (in total, no more than 15 lines
    of new code!!)
  • You are not expected to know everything in the
    scripts. It takes time to learn a new language
  • TAs and your CS team mates will help you, dont
    wait until last minute to ask for help
  • Remember, you still have to hand in your own
    version of the assignment! No copying!

44
Acknowledgements
  • Thanks to Sohrab Shah and Sanja Rojic (CS, UBC)
    for a wonderful collaborative work on the
    lecture/lab material
  • Some ideas of this lecture is borrowed from
    Lincoln Steins workshop (http//stein.cshl.org/ge
    nome_informatics/)
Write a Comment
User Comments (0)
About PowerShow.com