BioPerl

About This Presentation

Title:

BioPerl

Description:

BioPerl Introduction by Hairong Zhao. BioPerl Script Examples ... Components of a module name are separated by double colons (::). For example, Math::Complex ... – PowerPoint PPT presentation

Number of Views:234

Avg rating:3.0/5.0

Slides: 117

Provided by: HAIR2

Category:

more less

Transcript and Presenter's Notes

Title: BioPerl

1
BioPerl

An Introduction to Perl by Seung-Yeop Lee
XS extension by Sen Zhang
BioPerl Introduction by Hairong Zhao
BioPerl Script Examples by Tiequan Zhang

2
Part I. An Introduction to Perl

by Seung-Yeop Lee

3
What is Perl?

Perl is an interpreted programming language that
resembles both a real programming language and a
shell.
A Language for easily manipulating text, files,
and processes
Provides more concise and readable way to do jobs
formerly accomplished using C or shells.
Perl stands for Practical Extraction and Report
Language.
Author Larry Wall (1986)

4
Why use Perl?

Easy to use
Basic syntax is C-like
Type-friendly (no need for explicit casting)
Lazy memory management
A small amount of code goes a long way
Fast
Perl has numerous built-in optimization features
which makes it run faster than other scripting
language.
Portability
One script version runs everywhere (unmodified).

5
Why use Perl?

Efficiency
For programs that perform the same task (C and
Perl), even a skilled C programmer would have to
work harder to write code that
Runs as fast as Perl code
Is represented by fewer lines of code
Correctness
Perl fully parses and pre-compiles script
before execution.
Efficiently eliminates the potential for runtime
SYNTAX errors.
Free to use
Comes with source code

6
Hello, world!
!/usr/local/bin/perl print Hello, world \n
7
Basic Program Flow

No main function
Statements executed from start to end of file.
Execution continues until
End of file is reached.
exit(int) is called.
Fatal error occurs.

8
Variables

Data of any type may be stored within three basic
types of variables
Scalar
List
Associative array (hash table)
Variables are always preceded by a dereferencing
symbol.
- Scalar variables
_at_ - List variables
- Associative array variables

9
Variables

Notice that we did NOT have to
Declare the variable before using it
Define the variables data type
Allocate memory for new data values

10
Scalar variables

References to variables always being with in
both assignments and accesses
For scalars
x 1
x Hello World!
x y
For scalar arrays
a1 0
a1 b1

11
List variables

Lists are prefaced by an _at_ symbol
_at_count (1, 2, 3, 4, 5)
_at_count (apple, bat, cat)
_at_count2 _at_count
A list is simply an array of scalar values.
Integer indexes can be used to reference elements
of a list.
To print an element of an array, do
print count2

12
Associative Array variables

Associative array variables are denoted by the
dereferencing symbol.
Associative array variables are simply hash
tables containing scalar values
Example
freda aaa
fredb bbb
fred6 cc
fred1 2
To do this in one step
fred (a, aaa, b, bbb, 6, cc, 1, 2)

13
Statements Input/Output

Statements
Contains all the usual if, for, while, and more
Input/Output
Any variable not starting with , _at_ or is
assumed to be a filehandle.
There are several predefined filehandles,
including STDIN, STDOUT and STDERR.

14
Subroutines

We can reuse a segment of Perl code by placing it
within a subroutine.
The subroutine is defined using the sub keyword
and a name.
The subroutine body is defined by placing code
statements within the code block symbols.
sub MySubroutine
Perl code goes here.

15
Subroutine call

To call a subroutine, prepend the name with the
symbol
MySubroutine
Subroutine may be recursive (call themselves).

16
Pattern Matching

Perl enables to compare a regular expression
pattern against a target string to test for a
possible match.
The outcome of the test is a boolean result (TRUE
or FALSE).
The basic syntax of a pattern match is
myScalar /PATTERN/
Does myScalar contain PATTERN ?

17
Functions

Perl provides a rich set of built-in functions to
help you perform common tasks.
Several categories of useful built-in function
include
Arithmetic functions (sqrt, sin, )
List functions (push, chop, )
String functions (length, substr, )
Existance functions (defined, undef)

18
Perl 5

Introduce new features
A new data type the reference
A new localization the my keyword
Tools to allow object oriented programming in
Perl
New shortcuts like qw and gt
An object oriented based liberary system focused
around Modules

19
References

A reference is a scalar value which points to
any variable.

20
Creating References

References to variables are created by using the
backslash(\) operator.

name bio perl
reference \name
array_reference \_at_array_name
hash_reference \hash_name
subroutine_ref \sub_name

21
Dereferencing a Reference

Use an extra and _at_ for scalars and arrays, and
-gt for hashes.

print scalar_reference\n
_at_array_reference\n
hash_reference-gtname\n

22
Variable Localization

local keyword is used to limit the scope of a
variable to within its enclosing brackets.
Visible not only from within the enclosing
bracket but in all subroutine called within those
brackets

a 1 sub mySub local a
2 mySub1(a) sub mySub1 print a is
a\n
23
Variable Localization contd

my keyword hides the variable from the outside
world completely.
Totally hidden

a 1 sub mySub my a 2 mySub1(a) su
b mySub1 print a is a\n
24
Object Oriented Programming in Perl (1)

Defining a class
A class is simply a package with subroutines that
function as methods.

!/usr/local/bin/perl package Cat sub new
sub meow
25
Object Oriented Programming in Perl (2)

Perl Object
To initiates an object from a class, call the
class new method.

new_object new ClassName

Using Method
To use the methods of an object, use the -gt
operator.

cat-gtmeow()
26
Object Oriented Programming in Perl (3)

Inheritance
Declare a class array called _at_ISA.
This array store the name and parent class(es) of
the new species.

package NorthAmericanCat _at_NorthAmericanCatISA
(Cat) sub new
27
Miscellaneous Constructs

qw
The qw keyword is used to bypass the quote and
comma character in list array definitions.

_at_name (Tom, Mary, Michael)
28
Miscellaneous Constructs

gt
The gt operator is used to make hash definitions
more readable.

client name, , Michael, phone ,
123-3456, email , mich_at_nj.net
29
Perl Modules

A Perl module is a reusable package defined in a
library file whose name is the same as the name
of the package.
Similar to C link library or C class

package Foo
sub bar print Hello _0\n
sub blat print World _0\n
1

30
Names

Each Perl module has a unique name.
To minimize name space collision, Perl provides a
hierarchical name space for modules.
Components of a module name are separated by
double colons ().
For example,
MathComplex
MathApprox
StringBitCount
StringApprox

31
Module files

Each module is contained in a single file.
Module files are stored in a subdirectory
hierarchy that parallels the module name
hierarchy.
All module files have an extension of .pm.

32
Module libraries

The Perl interpreter has a list of directories in
which it searhces for modules.
Global arry _at_INC

gtperl V
_at_INC
/usr/local/lib/perl5/5.00503/sun4-solaris
/usr/local/lib/perl5/5.00503
/usr/local/lib/perl5/site-perl/5.005/sun4-solaris
/usr/local/lib/perl5/site-perl/5.005

33
Creating Modules

To create a new Perl module
../developmentgth2xs X n FooBar
Writing Foo/Bar/Bar.pm
Writing Foo/Bar/Makefile.PL
Writing Foo/Bar/test.pl
Writing Foo/Bar/Changes
Writing Foo/Bar/MANIFEST
../developmentgt

34
Building Modules

To build a Perl module
perl Makefile.PL
make
make test
make install

35
Using Modules

A module can be loaded by calling the use
function.
use Foo
bar( a )
blat( b )
Calls the eval function to process the code.
The 1 causes eval to evaluate to TRUE.

36
End of Part I.

Thank You

37
Part IIXS(eXternal subroutine)extension

Sen Zhang

38
XS

XS is an acronym for eXternal Subroutine.
With XS, we can call C subroutines directly from
Perl code, as if they were Perl subroutines.

39
Perl is not good at

very CPU-intensive things, like numerical
integration .
very memory-intensive things. Perl programs that
create more than 10,000 hashes run slowly.
system software, like device drivers.
things that have already been written in other
languages.

40
Usually

These things are done by other highly efficient
system programming languages such as C\C.

41
Can we call C subroutine from Perl?

Solution is Perl C API

42
When perl talks with C subroutine using perl C
API

two things must happen
control flow - control must pass from Perl to C
(and back)
C program execution
Perl program execution
data flow - data must pass from Perl to C (and
back)
C data representation
Perl data representation

43
In order to use perl C API

What is Perl's internal data structures.
How the Perl stack works, and how a C subroutine
gets access to it.
How C subroutines get linked into the Perl
executable.
Understand the data paths through the DynaLoader
module that associate the name of a Perl
subroutine with the entry point of a C subroutine

44
If you do code directly to the Perl C API

You will find You keep writing the same little
bits of code
to move parameters on and off the Perl stack
to convert data from Perl's internal
representation to C variables
to check for null pointers and other Bad Things.
When you make a mistake, you don't get bad
output you crash the interpreter.
It is difficult, error-prone, tedious, and
repetitive.

45
Pain killer is

46
What is XS?

Narrowly, XS is the name of the glue language
More broadly, XS comprises a system of programs
and facilities that work together
MakeMaker,
Xsub glue routine,
XS language itself,
xsubpp,
h2xs,
DynaLoader.

47
MakeMaker -tool

Perl's MakeMaker facility can be used to provide
a Makefile to easily install your Perl modules
and scripts.

MakeMaker,
Xsub glue routine,
XS language itself,
xsubpp,
h2xs,
DynaLoader.

MakeMaker,
Xsub glue routine,
XS language itself,
xsubpp,
h2xs,
DynaLoader.

50
Xsub

The Perl interpreter calls a kind of glue routine
as an xsub.
Rather than drag the Perl C API into all our C
code, we usually write glue routines. (We'll
refer to an existing C subroutine as a target
routine.)

51
Xsub- control flow

The glue routine converts the Perl parameters to
C data values, and then calls the target routine,
passing it the C data values as parameters on the
processor stack.
When the target routine returns, the glue routine
creates a Perl data object to represent its
return value, and pushes a pointer to that object
onto the Perl stack. Finally, the glue routine
returns control to the Perl interpreter.

52
Xsub-data flow

Something has to convert between Perl and C data
representations.
The Perl interpreter doesn't, so the xsub has to.
Typically, the xsub uses facilities in the Perl C
API to get parameters from the Perl stack and
convert them to C data values.
To return a value, the xsub creates a Perl data
object and leaves a pointer to it on the Perl
stack.

MakeMaker,
Xsub glue routine,
XS language itself,
xsubpp,
h2xs,
DynaLoader.

MakeMaker,
Xsub glue routine,
XS language itself,
xsubpp,
h2xs,
DynaLoader.

55
XS - language

Glue routines provide some structure for the data
flow and control flow, but they are still hard to
write. So we don't.
Instead, we write XS code. XS is, more or less, a
macro language. It allows us to declare target
routines, and specify the correspondence between
Perl and C data types.
XS is a collection of macros , while Perl docs
refer to XS as a language, it is a macro
language.

MakeMaker,
Xsub glue routine,
XS language itself,
xsubpp,
h2xs,
DynaLoader.

MakeMaker,
Xsub glue routine,
XS language itself,
xsubpp,
h2xs,
DynaLoader.

58
Xsubpp-tool

xsubpp is a XS language processor, xsubpp is the
program that translates XS code to C code.
xsubpp will compile XS code into C code by
embedding the constructs necessary to let C
functions manipulate Perl values and creates the
glue necessary to let Perl access those
functions.
xsubpp expands XS macros into the bits of C
code(xsub-glue routines) necessary to connect the
Perl interpreter to your C-language subroutines .
write XS code so that xsubpp will do the right
thing.

MakeMaker,
Xsub glue routine,
XS language itself,
xsubpp,
h2xs,
DynaLoader.

MakeMaker,
Xsub glue routine,
XS language itself,
xsubpp,
h2xs,
DynaLoader.

61
H2xs - tool

h2xs was originally written to generate XS
interfaces for existing C libraries.
h2xs is a utility that reads a .h file and
generates an outline for an XS interface to the C
code.

MakeMaker,
Xsub glue routine,
XS language itself,
xsubpp,
h2xs,
DynaLoader.

MakeMaker,
Xsub glue routine,
XS language itself,
xsubpp,
h2xs,
DynaLoader.

64
DynaLoader-module

In order for a C subroutine to become an xsub,
three things must happen
Loadingthe subroutine has to be loaded into
memory
Linkingthe Perl interpreter has to find its
entry point
Installationthe interpreter has to set the xsub
pointer in a code reference to the entry point of
the subroutine

65
DynaLoader.

Fortunately, all this is done for us by a Perl
module called the DynaLoader.
When we write an XS module, our module inherits
from DynaLoader.
When our module loads, it makes a single call to
the DynaLoaderbootstrap method. bootstrap
locates our link libraries, loads them, finds our
entry points, and makes the appropriate calls.

66
Development time
Pure perl code
Some Manual change
.c
.h
h2xs
Complier, linker
XS code

xsubpp

Perl module
Xsub(glue subrutine)
library
DynaLoader.
Perl C API
Input
Perl interprator
Output
Running time
67
An Example- Needleman-Wunsch(NW)

Sequence alignment is an important problem in the
bleeding-edge field of genomics.
Sequence alignment is a combinatorial problem,
and naive algorithms run in exponential time. The
Needleman-Wunsch algorithm runs in (more or less)
O(n3),
Dynamic programming algorithm for global optimal
sequence alignment.

68
Algorithm
69
Score matrix
70
Complexity analysis

The O(n3) step in the NW algorithm is filling in
the score matrix everything else runs in linear
time. We want to
use the C implementation to fill in the score
matrix,
use the Perl implementation for everything else,
and use XS to call from one to the other.

71
Our approach

Implement the algorithm as a straight Perl module
Analyze (or benchmark) the code for performance
Reimplement performance-critical methods (score
matrix filling) in C
Write XS to connect the C routines to the Perl
module

72
Performance comparison

a straight Perl implementation of the NW
algorithm aligns 2 200-character sequences in
300 seconds .
XS version runs the benchmark 200x200 alignment
in 3 seconds.
XS version is about 100 times faster than the
Perl implementation.

73
BioToolspSW - pairwise Smith Waterman object

Bioperl project has pSW implementation.
pSW is an Alignment Factory. It builds pairwise
alignments using the smith waterman algorithm.
The alignment algorithm is implemented in C and
added in using an XS extension.
The Smith-Waterman algorithm needs O(n2) time to
find the highest scoring cell in the matrix.

74
The end of Part II

Thanks

75
Bioperl Introduction

Hairong Zhao

76
Whats Bioperl?

Bioperl is not a new language
It is a collection of Perl modules that
facilitate the development of Perl scripts for
bio-informatics applications.

77
(No Transcript)
78
Why Bioperl for Bio-informatics?

Perl is good at file manipulation and text
processing, which make up a large part of the
routine tasks in bio-informatics.
Perl language, documentation and many Perl
packages are freely available.
Perl is easy to get started in, to write small
and medium-sized programs.

79
Bioperl Project

It is an international association of developers
of open source Perl tools for bioinformatics,
genomics and life science research
Started in 1995 by a group of scientists tired of
rewriting BLAST and sequence parsers for various
formats
Now there are 45 registered developers, 10-15
main developers, 5 core coordinate developers
Project websitehttp//bioperl.org
Project FTP server bioperl.org

80
How many people use Bioperl?

Bioperl has been used worldwide in both small
academic labs through to enterprise level
computing in large pharmaceutical companies since
1998
Bioperl Usage Survey
http//www.bioperl.org/survey.html

81
The current status of Bioperl

The latest mature and stable version 1.0 was
released in March 2002.
This new version contains 832 files. The test
suite contains 93 scripts which collectively
perform 3042 functionality tests.
This new version is "feature complete" for
sequence handling, the most common task in
bioinformatics, it adds some new features and
improve some existing features

82
The future of Bioperl

It is far from mature
Except sequence handling, all other modules are
not complete.
The portability is not very good, not all modules
will work with on all platforms.

83
Bioperl resources

www.bioperl.org
http//www.bioperl.org/Core/bptutorial.html
Example code, in the scripts/ and examples/
directories.
Online course written at the Pasteur Institute.
See http//www.pasteur.fr/recherche/unites/sis/fo
rmation/bioperl.

84
Biopython, biojava

Similar goals implemented in different language
Most effort to date has been to port Bioperl
functionality to Biopython and Biojava, so the
differences are fairly peripheral
In the future, some bio-informatics tasks may
prove to be more effectively implemented in java
or python, interoperability between them is
necessary
CORBA is one such framework for interlanguage
support, and the Biocorba project is currently
implementing a CORBA interface for bioperl

85
Bioperl-Object Oriented

The Bioperl takes advantages of the OO design to
create a consistent, well documented, object
model for interacting with biological data in the
life sciences.
Bioperl Name space
The Bioperl package installs everything in the
Bio namespace.

86
Bioperl Objects

Sequence handling objects
Sequence objects
Alignment objects
Location objects
Other Objects
3D structure objects, tree objects and
phylogenetic trees, map objects, bibliographic
objects and graphics objects

87
Sequence handling

Typical sequence handling tasks
Access the sequence
Format the sequence
Sequence alignment and comparison
Search for similar sequences
Pairwise comparisons
Multiple alignment

88
Sequence Objects

Sequence objects Seq, RichSeq, SeqWithQuality,
PrimarySeq, LocatableSeq, LiveSeq, LargeSeq, SeqI
Seq is the central sequence object in bioperl,
you can use it to describe a DNA, RNA or protein
sequence.
Most common sequence manipulations can be
performed with Seq.

89
Sequence Annotation

BioSeqFeature Sequence object can have
multiple sequence feature (SeqFeature) objects -
eg Gene, Exon, Promoter objects - associated with
it.
BioAnnotation A Seq object can also have an
Annotation object (used to store database links,
literature references and comments) associated
with it

90
Sequence Input/Output

The BioSeqIO system was designed to make
getting and storing sequences to and from the
myriad of formats as easy as possible.

91
Diagram of Objects and Interfaces for Sequence
Analysis
92
Accessing sequence data

Bioperl supports accessing remote databases as
well as local databases.
Bioperl currently supports sequence data
retrieval from the genbank, genpept, RefSeq,
swissprot, and EMBL databases

93
Format the sequences

SeqIO object can read a stream of sequences in
one format Fasta, EMBL, GenBank, Swissprot, PIR,
GCG, SCF, phd/phred, Ace, or raw (plain
sequence), then write to another file in another
format
use BioSeqIO
in BioSeqIO-gtnew('-file' gt
"inputfilename",
'-format' gt 'Fasta')
out BioSeqIO-gtnew('-file' gt
"gtoutputfilename",
'-format' gt 'EMBL')
while ( my seq in-gtnext_seq() )
out-gtwrite_seq(seq)

94
Manipulating sequence data

seqobj-gtdisplay_id() the human read-able id
of the sequence
seqobj-gtsubseq(5,10) part of the sequence as
a string
seqobj-gtdesc() a description of the
sequence
seqobj-gttrunc(5,10) truncation from 5 to 10
as new object
seqobj-gtrevcom reverse complements
sequence
seqobj-gttranslate translation of the
sequence

95
Alignment

Searching for similar'' sequences, Bioperl can
run BLAST locally or remotely, and then parse the
result.
Aligning 2 sequences with Smith-Waterman (pSW) or
blast
The SW algorithm itself is implemented in C and
incorporated into bioperl using an XS extension.
Aligning multiple sequences (Clustalw.pm,
TCoffee.pm)
bioperl offers a perl interface to the
bioinformatics-standard clustalw and tcoffee
programs.
Bioperl does not currently provide a perl
interface for running HMMER. However, bioperl
does provide a HMMER report parser.

96
Alignment Objects

Early versions used UnivAln, SimpleAlign
Ver. 1.0 only support SimpleAlign. It allows the
user to
convert between alignment formats
extracting specific regions of the alignment
generating consensus sequences.

Sequence handling objects
Sequence objects
Alignment objects
Location objects

98
Location Objects

BioLocations a collection of rather
complicated objects
A Location object is designed to be associated
with a Sequence Feature object to indicate where
on a larger structure (eg a chromosome or contig)
the feature can be found.

99
Conclusion

Bioperl is
Powerful
Easy
Waiting for you (biologist) to use

100
Scripts Examples by Using Bioperl

Tiequan zhang

101
SimpleAlign module

Description
It handles multiple alignments of sequences
Lightweight display/formatting and minimal
manipulation

102
Method

new
Usage my aln new BioSimpleAlign()
Function Creates a new simple align object
Returns BioSimpleAlign
Args -source gt string representing the
source program where this alignment came from
each_seq
Usage foreach seq ( align-gteach_seq() )
Function Gets an array of Seq objects from the
alignment
Returns an array
length()
Usage len ali-gtlength()
Function Returns the maximum length of the
alignment. To be sure the alignment is a block,
use is_flush

103

consensus_string
Usage str ali-gtconsensus_string(thresho
ld_percent)
Function Makes a strict consensus
Args Optional treshold ranging from 0 to
100.
The consensus residue has to appear at least
threshold of the sequences at a given
location, otherwise a '?' character will be
placed at that location. (Default value 0)
is_flush
Usage if( ali-gtis_flush() )
Function Tells you whether the alignment is
flush, ie all of the same length
Returns 1 or 0
percentage_identity
Usage id align-gtpercentage_identity
Function The function calculates the average
percentage identity
Returns The average percentage identity
no_sequences
Usage depth ali-gtno_sequences
Function number of sequence in the sequence
alignment
Returns integer

104
testaln.pfam 1433_LYCES/9-246
REENVYMAKLADRAESDEEMVEFMEKVSNSLGS.EELTVEERNLLSVAYK
NVIGARRAS 1434_LYCES/6-243 REENVYLAKLAEQAERYEE
MIEFMEKVAKTADV.EELTVEERNLLSVAYKNVIGARRAS 143R_ARA
TH/7-245 RDQYVYMAKLAEQAERYEEMVQFMEQLVTGATPAEELT
VEERNLLSVAYKNVIGSLRAA 143B_VICFA/7-242
RENFVYIAKLAEQAERYEEMVDSMKNVANLDV...ELTIEERNLLSVGYK
NVIGARRAS 143E_HUMAN/4-239 REDLVYQAKLAEQAERYDE
MVESMKKVAGMDV...ELTVEERNLLSVAYKNVIGARRAS BMH1_YEA
ST/4-240 REDSVYLAKLAEQAERYEEMVENMKTVASSGQ...ELS
VEERNLLSVAYKNVIGARRAS RA24_SCHPO/6-241
REDAVYLAKLAEQAERYEGMVENMKSVASTDQ...ELTVEERNLLSVAYK
NVIGARRAS RA25_SCHPO/5-240 RENSVYLAKLAEQAERYEE
MVENMKKVACSND...KLSVEERNLLSVAYKNIIGARRAS 1431_ENT
HI/4-239 REDCVYTAKLAEQSERYDEMVQCMKQVAEMEA...ELS
IEERNLLSVAYKNVIGAKRAS
105
Script use BioAlignIO str
BioAlignIO-gtnew('-file' gt 'testaln.pfam')
aln str-gtnext_aln() print aln-gtlength,
"\n" print aln-gtno_residues, "\n" print
aln-gtis_flush, "\n" print aln-gtno_sequences,
"\n" print aln-gtpercentage_identity,
"\n" print aln-gtconsensus_string(50), "\n"
pos aln-gtcolumn_from_residue_number('1433_LYC
ES', 14) 6 foreach seq (aln-gteach_seq)
res seq-gtsubseq(pos, pos)
countres foreach res (keys count)
printf "Res s Count 2d\n", res,
countres
106
Result argerich-54 biogt perl align.pl 242 103 1
16 66.9052451661147 RE??VY?AKLAEQAERYEEMV??MK?VAE?
?????ELSVEERNLLSVAYKNVIGARRASWRIISSIEQKEE??G?N????
?LIKEYR?KIE?EL??IC?DVL?LLD??LIP?A?????ESKVFYLKMKGD
YYRYLAEFA?G??RKE?AD?SL?AYK?A?DIA?AEL?PTHPIRLGLALNF
SVFYYEILNSPD?AC?LAKQAFDEAIAELDTL?EESYKDSTLIMQLLRDN
LTLWTSD????? Res Q Count 5 Res Y Count
10 Res . Count 1 argerich-55 biogt
107
SwissProt,Seq and SeqIO modules

Description
SwissProt is a curated database of proteins
managed by the Swiss Bioinformatics Institute.
This is in contrast to EMBL/GenBank/DDBJ Which
are archives of protein information.
It allows the dynamic retrieval of Sequence
objects (BioSeq)

108

SeqIO can be used to convert different formats
Fasta FASTA format
EMBL EMBL format
GenBank GenBank format
swiss Swissprot format
SCF SCF tracefile format
PIR Protein Information Resource format
GCG GCG format
raw Raw format
ace ACeDB sequence format

109
Objective

loading a sequence from a remote server
Create a sequence object for the BACR_HALHA
SwissProt entry
Print its Accession number and description
Display the sequence in FASTA format

110
(No Transcript)
111
(No Transcript)
112

Scripts
!/usr/bin/perl
use strict
use BioDBSwissProt
use BioSeq
use BioSeqIO
my database new BioDBSwissProt
my seq database-gtget_Seq_by_id('BACR_HALHA')
print "Seq ", seq-gtaccession_number(), " -- ",
seq-gtdesc(), "\n\n"
my out BioSeqIO-gtnewFh ( -fh gt \STDOUT,
-format gt 'fasta')
print out seq

113
Result argerich-47 biogt perl protein.pl Seq
P02945 -- BACTERIORHODOPSIN PRECURSOR
(BR). gtBACR_HALHA BACTERIORHODOPSIN PRECURSOR
(BR). MLELLPTAVEGVSQAQITGRPEWIWLALGTALMGLGTLYFLVKG
MGVSDPDAKKFYAITT LVPAIAFTMYLSMLLGYGLTMVPFGGEQNPIYW
ARYADWLFTTPLLLLDLALLVDADQGT ILALVGADGIMIGTGLVGALTK
VYSYRFVWWAISTAAMLYILYVLFFGFTSKAESMRPEV ASTFKVLRNVT
VVLWSAYPVVWLIGSEGAGIVPLNIETLLFMVLDVSAKVGFGLILLRSR
AIFGEAEAPEPSAGDGAAATSD argerich-48 biogt
114
Summary

Perl language and modules
Perl XS
Bioperl
Example scripts

115
References

1 L. Wall and R. Schwarz. Programming Perl.
OReilly Associates, Inc, 1991.
2 Web Developers Virtual Library.
http//www.wdvl.com/Authoring/Languages/Perl/5/
3 OReily Perl.com. http//www.perl.com/
4 http//archive.ncsa.uiuc.edu/General/Training/
PerlIntro/
5 http//www.vis.ethz.ch/manuals/Perl/intro.html
6 http//www.fukada.com/selena/tutorials/perl5/i
ndex.html
7 http//world.std.com/swmcd/steven/perl/module
_mechanics.html
8 http//www.sdsc.edu/moreland/courses/IntroPer
l/
9 www.bioperl.org/Core/POD/Bio/SeqIO.html
10 http//docs.bioperl.org/releases/bioperl-1.0/
Bio/SimpleAlign.html
11 www.pasteur.fr/recherche/unites/sis/formation
/bioperl/index.html

116
References

12 www.bioinformatics.com
13 Bioperl Standard Perl Modules for
Bioinformatics by Stephen A Chervitz, Georg
Fuellen, Chris Dagdigian, Steven E Brenner, Ewan
Birney and Ian Korf Objects in Bioinformatics
'98
15 http//cvs.open-bio.org/cgi-bin/viewcvs/viewc
vs.cgi/bioperl-papers/bioperldesign
16 http//www.cpan.org
17 http//www.maths.tcd.ie/lily/pres2/sld008.ht
m
18 http//www.sbc.su.se/per/molbioinfo2001/dynp
rog/dynamic.html
19 http//world.std.com/swmcd/steven/perl/pm/xs
/intro/index.html