Programming and Perl for Bioinformatics Part I - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

Programming and Perl for Bioinformatics Part I

Description:

You need to tell Perl to put a carriage return at the end of a printed line ... the string which is captured includes a newline (carriage return) at its end ... – PowerPoint PPT presentation

Number of Views:242
Avg rating:3.0/5.0
Slides: 36
Provided by: duan90
Category:

less

Transcript and Presenter's Notes

Title: Programming and Perl for Bioinformatics Part I


1
Programming and Perlfor BioinformaticsPart I
2
Why Write Programs?
  • Automate computer work that you do by hand - save
    time and reduce errors
  • Run the same analysis on lots of similar data
    files scale-up
  • Analyze data, make decisions
  • sort Blast results by e-value and/or species of
    best mach
  • Build a pipeline
  • Create new analysis methods

3
Why Perl?
  • Fairly easy to learn the basics
  • Many powerful functions for working with text
    search extract, modify, combine
  • Can control other programs
  • Free and available for all operating systems
  • Most popular language in bioinformatics
  • Many pre-built modules are available that do
    useful things

4
Get Perl
  • You can install Perl on any type of computer
  • Just log in - you dont even need to type any
    command to make Perl active.
  • Download and install Perl on your own computer
  • www.perl.org

5
Programming Concepts
  • Program a text file that contains instructions
    for the computer to follow
  • Programming Language a set of commands that the
    computer understands (via a command
    interpreter)
  • Input data that is given to the program
  • Output something that is produced by the
    program

6
Programming
  • Write the program (with a text editor)
  • Run the program
  • Look at the output
  • Correct the errors (debugging)
  • Repeat
  • computers are VERY dumb -they do exactly what you
    tell them to do, so be careful what you ask for

7
Strings
  • Text is handled in Perl as a string
  • This basically means that you have to put quotes
    around any piece of text that is not an actual
    Perl instruction.
  • Perl has two kinds of quotes
  • - single ... and double ...
  • (they are different- more about this later)

8
Print
  • Perl uses the term print to create output
  • Without a print statement, you wont know what
    your program has done
  • You need to tell Perl to put a carriage return at
    the end of a printed line
  • Use the \n (newline) command
  • Include the quotes
  • The \ character is called an escape - Perl
    uses it a lot

9
A Taste of Perl print a message
  • hello_world.pl Greet the entire world.
  • !/usr/bin/perl -w
  • greet the entire world
  • x 6e9
  • print Hello world!\n
  • print All x of you!\n

- command interpretation header
- a comment
- variable assignment statement

- function calls (output statements)
10
Variables
  • Up till now, weve been telling the computer
    exactly what to print. But in order for the
    program to generate what is printed, we need to
    use variables.
  • A scalar variable name starts with
  • It can store either a string or a number.

11
Basic Syntax and Data Types
  • whitespace doesnt matter to Perl. One can write
    all statements on one line
  • All Perl statements end in a semicolon just
    like C
  • Comments begin with and Perl ignores
    everything after the until end of line.
  • Example this is a comment
  • Perl has three basic data types
  • scalar
  • array (list)
  • associative array (hash)

12
Variables
  • To be useful at all, a program needs to be able
    to store information from one line to the next
  • Perl stores information in variables
  • A scalar variable name starts with the
    symbol, and it can store strings or numbers
  • Variables are case sensitive
  • Give them sensible names
  • Use the sign to assign values to variables
  • one_hundred 100
  • my_sequence ttattagcc

13
Scalars
  • Scalar variables begin with followed by an
    identifier
  • Example this_is_a_scalar
  • An identifier is composed of upper or lower case
    letters, numbers, and underscore '_'. Identifiers
    are case sensitive (like all of Perl)
  • progname first_perl
  • numOfStudents 4
  • ( gets) sets the content of progname to be
    the string first_perl and numOfStudents to be
    the integer 4

14
Scalar Values
  • Numerical Values
  • integer 5, 3, 0, -307
  • floating point 6.2e9, -4022.33
  • NOTE all numerical values stored as
    floating-point numbers (double precision)

15
A program with variables
  • !/usr/bin/perl -w
  • this program uses variables containing numbers
  • my two 2
  • my three two 1
  • print \two \three two three ",
  • (two three)
  • print "\n"

16
Do the Math
  • Mathematical functions work pretty much as you
    would expect
  • 47
  • 64
  • 43-27
  • 256/12
  • 2/(3-5)
  • Example
  • !/usr/bin/perl
  • print "45\n"
  • print 45 , "\n"
  • print "45" , 45 , "\n"
  • myNumber 88
  • Note use commas to separate multiple items in a
    print statement

45 9 459
What will be the output?
17
Scalar Values
  • String values
  • Example
  • day "Monday "
  • print "Happy Monday!\n"
  • print "Happy day!\n"
  • print 'Happy Monday!\n'
  • print 'Happy day!\n'
  • Double-quoted interpolates (replaces variable
    name/control character with its value)
  • Single-quoted NO interpolation done (as-is)

Happy Monday!ltnewlinegt
Happy Monday!ltnewlinegt
Happy Monday!\n
Happy day!\n
What will be the output?
18
String Manipulation
  • Concatenation
  • dna1 ACTGCGTAGC
  • dna2 CTTGCTAT
  • juxtapose in a string assignment or print
    statement
  • new_dna dna1dna2
  • Use the concatenation operator .
  • new_dna dna1 . dna2
  • Substring
  • dna ACTGCGTAGC
  • exon1 substr(dna,2,5)

TGCGT
19
Substitution
  • DNA transcription T ? U
  • Substitution operator s///
  • dna GATTACATACACTGTTCA
  • rna dna
  • rna s/T/U/g GAUUACAUACACUGUUCA
  • is a binding operator indicating to exam the
    contents of rna for a match pattern g
    global
  • Ex Start with dna gaTtACataCACTgttca
  • and do the same as above. What will be the output?

20
Example
  • transcribe.pl
  • dna "gaTtACataCACTgttca"
  • rna dna
  • rna s/T/U/g
  • print "DNA dna\n"
  • print "RNA rna\n"
  • Does it do what you expect? If not, why not?
  • Patterns in substitution are case-sensitive! What
    can we do?
  • Convert all letters to upper/lower case (
    preferred when possible )
  • If we want to retain mixed case, use
    transliteration/translation operator tr///
  • rna tr/tT/uU/ replace all t by u, all T
    by U

21
Case conversion
  • string acCGtGcaTGc
  • Upper case
  • dna uc(string) ACCGTGCATGC
  • or dna uc string
  • or dna \Ustring \U string directive
  • Lower case
  • dna lc(string) accgtgcatgc
  • or dna \Lstring
  • Sentence case
  • dna ucfirst(string) Accgtgcatgc
  • or dna \u\Lstring

22
Reverse Complement
  • 5- A C G T C T A G C . . . . G C A T -3
  • 3- T G C A G A T C G . . . . C G T A -5
  • Reverse reverses a string
  • string "ACGTCTAGC"
  • string reverse(string) "CGATCTGCA
  • Complementation use transliteration operator
  • string tr/ACGT/TGCA/

23
Whats Wrong?
  • DNA "ACGTCTAGC"
  • print "DNA\n\n"
  • revcom reverse DNA
  • Next substitute all bases by their
    complements,
  • A-gtT, T-gtA, G-gtC, C-gtG
  • revcom s/A/T/g
  • revcom s/T/A/g
  • revcom s/G/C/g
  • revcom s/C/G/g
  • Print the reverse complement DNA onto the
    screen
  • print "revcom\n"

24
More on String Manipulation
  • String length
  • length( dna )
  • Index
  • index STR,SUBSTR,POSITION
  • index( strand, primer, 2 )

25
Flow Control
  • Conditional Statements
  • parts of code executed depending on truth value
    of a logical statement
  • truth (logical) values in Perl
  • false 0, 0.0, 0e0, , undef, default
  • true anything else, default 1
  • (a, b) (75, 83)
  • if ( a lt b )
  • a b
  • print Now a b!\n
  • if ( a gt b ) print Yes, a gt b!\n
    Compact

26
Comparison Operators
Comparison String Number
Equality eq
Inequality ne !
Greater than gt gt
Greater than or equal to ge gt
Less than lt lt
Less than or equal to return 1/null le lt
Comparison Returns -1, 0, 1 cmp ltgt
27
Logical Operators
Operation Computerese English version
AND and
OR or
NOT ! not
28
if/else/elsif
  • allows for multiple branching/outcomes
  • a rand()
  • if ( a lt 0.25 )
  • print A
  • elsif (a lt 0.50 )
  • print C
  • elsif ( a lt 0.75 )
  • print G
  • else
  • print T

29
Whats a block?
  • In the case of an if statement
  • If the test is true, execute all the command
    lines inside the brackets. If not, then go on
    past the closing to the statements below.
  • You can also do stuff in a block over and over
    again using a loop.

30
Conditional Loops
  • while ( statement ) commands
  • repeats commands until statement is no longer
    true
  • do commands while ( statement )
  • same as while, except commands executed as least
    once
  • NOTE the after the while statement
  • Short-circuiting commands next and last
  • next jumps to end, do next iteration
  • last jumps out of the loop completely

31
While-Loop
  • Loops test a condition and repeat a block of code
    based on the result
  • while loops repeat while the condition is true
  • count 1
  • while (count lt 10)
  • print count bottles of pop\n
  • count count 1
  • print POP!\n
  • Try this program yourself

32
for and foreach loops
  • Execute a code loop a specified number of times,
    or for a specified list of values
  • for and foreach are identical use whichever you
    want
  • Incremental loop (C style)
  • for ( i0 i lt 50 i )
  • x ii
  • print "i squared is x.\n"
  • Loop over list (foreach loop)
  • foreach name ( "Billy", "Bob", "Edwina" )
  • print "name is my friend.\n"

33
Standard Input
  • To make the program do something, we need to
    input data.
  • The angle bracket operator (lt gt) tells Perl to
    expect input, by default from the keyboard.
  • Usually this is assigned to a variable
  • print Please type a number
  • num ltSTDINgt
  • print Your number is num\n

34
chomp
  • When data is entered from the keyboard, the
    program waits for you to type the carriage return
    key.
  • But.. the string which is captured includes a
    newline (carriage return) at its end
  • You can use the chomp function to remove the
    newline character
  • print Enter your name
  • name ltSTDINgt
  • print Hello name, happy to meet you!\n
  • chomp name
  • print Hello name, happy to meet you!\n

35
Basic Data Types
  • Perl has three basic data types
  • scalar
  • array (list)
  • associative array (hash)
Write a Comment
User Comments (0)
About PowerShow.com