Roadmap - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Roadmap

Description:

One can write all statements on one line. All Perl statements end in a ... names = ('mary', 'tom', 'mark', 'john', 'jane'); $names [1] ; ? _at_names [1..4] ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 32
Provided by: Unive48
Category:
Tags: jane | mary | roadmap

less

Transcript and Presenter's Notes

Title: Roadmap


1
Roadmap
  • The topics
  • basic concepts of molecular biology
  • Gene, protein
  • Central dogma of molecular biology
  • PCR, DNA sequencing
  • Elements of Perl
  • overview of the field
  • biological databases and database searching
  • sequence alignments
  • phylogenetics
  • structure prediction
  • microarray data analysis

2
Programming and Perlfor BioinformaticsPart I
3
A Taste of Perl print a message
  • perltaste.pl Greet the entire world.
  • !/usr/bin/perl
  • greet the entire world
  • x 6e9
  • print Hello world!\n
  • print All x of you!\n

- command interpretation header
- a comment
- variable assignment statement

- function calls (output statements)
4
Basic Syntax and Data Types
  • whitespace doesnt matter to Perl. One can write
    all statements on one line
  • All Perl statements end in a semicolon just
    like C
  • Comments begin with and Perl ignores
    everything after the until end of line.
  • Example this is a comment
  • Perl has three basic data types
  • scalar
  • array (list)
  • associative array (hash)

5
Scalars
  • Scalar variables begin with followed by an
    identifier
  • Example this_is_a_scalar
  • An identifier is composed of upper or lower case
    letters, numbers, and underscore '_'. Identifiers
    are case sensitive (like all of Perl)
  • progname first_perl
  • numOfStudents 4
  • sets the content of progname to be the string
    first_perl numOfStudents to be the integer 4

6
Scalar Values
  • Numerical Values
  • integer 5, 3, 0, -307
  • floating point 6.2e9, -4022.33
  • hexadecimal/octal 0xd4f, 0477
  • binary 0b011011
  • NOTE all numerical values stored as
    floating-point numbers (double precision)

7
Do the Math
  • Mathematical functions work pretty much as you
    would expect
  • 47
  • 64
  • 43-27
  • 256/12
  • 2/(3-5)
  • Example
  • !/usr/bin/perl
  • print "45\n"
  • print 45 , "\n"
  • print "45" , 45 , "\n"
  • myNumber 88
  • Note use commas to separate multiple items in a
    print statement

45 9 459
What will be the output?
8
Scalar Values
  • String values
  • Example
  • day "Monday "
  • print "Happy Monday!\n"
  • print "Happy day!\n"
  • print 'Happy Monday!\n'
  • print 'Happy day!\n'
  • Double-quoted interpolates (replaces variable
    name/control character with its value)
  • Single-quoted no interpolation done (as-is)

Happy Monday!ltnewlinegt
Happy Monday!ltnewlinegt
Happy Monday!\n
Happy day!\n
What will be the output?
9
String Manipulation
  • Concatenation
  • dna1 ACTGCGTAGC
  • dna2 CTTGCTAT
  • juxtapose in a string assignment or print
    statement
  • new_dna dna1dna2
  • Use the concatenation operator .
  • new_dna dna1 . dna2
  • Substring
  • dna ACTGCGTAGC
  • exon1 substr(dna,2,5)

TGCGT
10
Substitution
  • DNA transcription T ? U
  • Substitution operator s///
  • dna GATTACATACACTGTTCA
  • rna dna
  • rna s/T/U/g GAUUACAUACACUGUUCA
  • is a binding operator indicating to exam the
    contents of rna for a match pattern
  • Ex Start with dna gaTtACataCACTgttca
  • and do the same as above. What will be the output?

11
Example
  • transcribe.pl
  • dna "gaTtACataCACTgttca"
  • rna dna
  • rna s/T/U/g
  • print "DNA dna\n"
  • print "RNA rna\n"
  • Does it do what you expect? If not, why not?
  • Patterns in substitution are case-sensitive! What
    can we do?
  • Convert all letters to upper/lower case
    (preferred when possible)
  • If we want to retain mixed case, use
    transliteration/translation operator tr///
  • rna tr/tT/uU/ replace all t by u, all T
    by U

12
Case conversion
  • string acCGtGcaTGc
  • Upper case
  • dna uc(string) ACCGTGCATGC
  • or dna uc string
  • or dna \Ustring
  • Lower case
  • dna lc(string) accgtgcatgc
  • or dna \Lstring
  • Sentence case
  • dna ucfirst(string) Accgtgcatgc
  • or dna \u\Lstring

13
Reverse Complement
  • 5- A C G T C T A G C . . . . G C A T -3
  • 3- T G C A G A T C G . . . . C G T A -5
  • 5- A T G C . . . . G C T A G A C G T -3
  • Reverse reverses a string
  • string "ACGTCTAGC"
  • string reverse(string) "CGATCTGCA
  • Complementation use transliteration operator
  • string tr/ACGT/TGCA/

14
More on String Manipulation
  • String length
  • length(dna)
  • Index
  • index STR,SUBSTR,POSITION
  • index(strand, primer, 2)

15
Flow Control
  • Conditional Statements
  • parts of code executed depending on truth value
    of a logical statement
  • truth (logical) values in Perl
  • false 0, 0.0, 0e0, , undef, default
  • true anything else, default 1
  • (a, b) (75, 83)
  • if ( a lt b )
  • a b
  • print Now a b!\n
  • if ( a gt b ) print Yes, a gt b!\n
    Compact

16
Comparison Operators
17
Logical Operators
18
if/else/elsif
  • allows for multiple branching/outcomes
  • randDNA ""
  • while ( length(randDNA) lt 200 )
  • a rand()
  • if ( a lt0.25 )
  • randDNA . "A"
  • elsif (a lt0.50 )
  • randDNA . "C"
  • elsif ( a lt 0.75 )
  • randDNA . "G"
  • else
  • randDNA . "T"
  • print randDNA

19
Conditional Loops
  • while ( statement ) commands
  • repeats commands until statement is no longer
    true
  • do commands while ( statement )
  • same as while, except commands executed as least
    once
  • NOTE the after the while statement!!
  • Short-circuiting commands next and last
  • next jumps to end, do next iteration
  • last jumps out of the loop completely

20
while
  • Example
  • while (alive)
  • if (needs_nutrients)
  • print Cell needs nutrients\n
  • Any problem?

21
for and foreach loops
  • Execute a code loop a specified number of times,
    or for a specified list of values
  • for and foreach are identical use whichever you
    want
  • Incremental loop (C style)
  • for ( i0 i lt 50 i )
  • x ii
  • print "i squared is x.\n"
  • Loop over list (foreach loop)
  • foreach name ( "Billy", "Bob", "Edwina" )
  • print "name is my friend.\n"

22
Basic Data Types
  • Perl has three basic data types
  • scalar
  • array (list)
  • associative array (hash)

23
Arrays
  • An array (list) is an ordered group of scalar
    values.
  • _at_ is used to refer to the entire array
  • Example
  • (1,2,3) Array of three values 1, 2, and 3
  • ("one","two","three") Array of 3 values
    "one", "two", "three"
  • _at_names ("mary", "tom", "mark", "john", "jane")
  • names 1 ?
  • _at_names 1..4

tom
24
Basic Data Types
  • Perl has three basic data types
  • scalar
  • array (list)
  • associative array (hash)

25
More on Arrays
  • _at_a () empty list
  • _at_b (1,2,3) three numbers
  • _at_c ("Jan","Joe","Marie") three strings
  • _at_d ("Dirk",1.92,46,"20-03-1977") a mixed
    list
  • Variables and sublists are interpolated in a list
  • _at_b (a,a1,a2) variable interpolation
  • _at_c ("Jan",("Joe","Marie")) list
    interpolation
  • _at_d ("Dirk",1.92,46,(),"20-03-1977") empty
    list interpolation
  • _at_e ( _at_b, _at_c ) same as (1,2,3,"Jan","Joe","M
    arie")
  • Practical construction operators (x..y)
  • _at_x (1..6) same as (1, 2, 3, 4, 5, 6)
  • _at_y (1.2..4.2) same as (1.2, 2.2, 3.2, 4.2,
    5.2)
  • _at_z (2..5,8,11..13) same as
    (2,3,4,5,8,11,12,13)

26
Array Manipulations
  • reverse Reverses the order of array elements
  • _at_a (1, 2, 3)
  • _at_b reverse _at_a _at_b (3, 2, 1)
  • split Splits a string into a list/array
  • line "John Smith 28"
  • (first, last, age) split /\s/, line
  • DNA "ACGTTTGA"
  • _at_DNA split ('', DNA)
  • join Joins a list/array into a string
  • gene join "", (exon1, exon3)
  • name join "-", ("Zhong", "Hui")
  • scalar Returns the number of elements in
    _at_array

27
Exercise
  • Determine freq of nucleotides
  • dna "gaTtACataCACTgttca"
  • ?

28
Ex Determine freq of nucleotides
  • dna "gaTtACataCACTgttca"
  • dna uc(dna) GATTACATACACTGTTCA
  • count_A 0
  • count_C 0
  • count_G 0
  • count_T 0
  • _at_dna split '', dna
  • foreach base (_at_dna)
  • if (base eq 'A') count_A
  • elsif (base eq 'C') count_C
  • elsif (base eq 'G') count_G
  • elsif (base eq 'T') count_T
  • else print "error!\n"
  • print "count of A count_A \n"
  • print "count of C count_C \n"
  • print "count of G count_G \n"
  • print "count of T count_T \n"

29
Filehandles
  • File I/O (input/output) reading from/writing to
    files
  • Files represented in Perl by a filehandle
    variable
  • (for clarity, usu. written as a bare word in
    UPPERCASE)
  • Open a file on a filehandle using the open
    function
  • for reading (input)
  • open INFILE, lt datafile.txt
  • or open (INFILE, lt datafile.txt)
  • for writing (output), overwriting the file
  • open OUTFILE, gt output
  • for appending to the end of the file
  • open OUTFILE, gtgt output
  • Close a file on a filehandle
  • Close (OUTFILE)

30
Special Filehandles
  • Special files that are always open
  • STDIN (standard input)
  • input from command window read only
  • STDOUT (standard output)
  • output to command window write only
  • print STDOUT Have fun with Perl!\n
  • or just
  • print Have fun with Perl!\n

31
Input from Filehandles
  • Angle Bracket input operator
  • reads one line of input (up to newline/carriage
    return)
  • from STDIN
  • print "Enter name of protein "
  • line ltSTDINgt
  • chomp line removes \n from end of line
  • print \nYou entered line.\n
  • from a file
  • open (INPUT, aminos.txt)
  • amino1 ltINPUTgt
  • amino2 ltINPUTgt
  • chomp (amino1, amino2)

32
sequences.fasta
  • gtgi145536gbL04574.1Escherichia coli DNA
    polymerase III chi subunit (holC) gene, complete
    cds
  • TAACGGCGAAGAGTAATTGCGTCAGGCAAGGCTGTTATTGCCGGATGCGG
    CGTGAACGCCTTATCCGACC
  • TACACAGCACTGAACTCGTAGGCCTGATAAGACACAACAGCGTCGCATCA
    GGCGCTGCGGTGTATACCTG
  • ATGCGTATTTAAATCCACCACAAGAAGCCCCATTTATGAAAAACGCGACG
    TTCTACCTTCTGGACAATGA
  • CACCACCGTCGATGGCTTAAGCGCCGTTGAGCAACTGGTGTGTGAAATTG
    CCGCAGAACGTTGGCGCAGC
  • GGTAAGCGCGTGCTCATCGCCTGTGAAGATGAAAAGCAGGCTTACCGGCT
    GGATGAAGCCCTGTGGGCGC
  • GTCCGGCAGAAAGCTTTGTTCCGCATAATTTAGCGGGAGAAGGACCGCGC
    GGCGGTGCACCGGTGGAGAT
  • CGCCTGGCCGCAAAAGCGTAGCAGCAGCCGGCGCGATATATTGATTAGTC
    TGCGAACAAGCTTTGCAGAT
  • TTTGCCACCGCTTTCACAGAAGTGGTAGACTTCGTTCCTTATGAAGATTC
    TCTGAAACAACTGGCGCGCG
  • AACGCTATAAAGCCTACCGCGTGGCTGGTTTCAACCTGAATACGGCAACC
    TGGAAATAATGGAAAAGACA
  • TATAACCCACAAGATATCGAACAGCCGCTTTACGAGCACTGGGAAAAGCA
    GGGCTACTTTAAGCCTAATG
  • GCGATGAAAGCCAGGAAAGTTTCTGCATCATGATCCCGCCGCCGAA

Determine freq of nucleotides
33
Determine frequency of nucleotides
  • Input file ltsequences.fastagt
  • open (INPUT, "sequences.fasta") open file for
    sequence
  • line1 ltINPUTgt
  • line2 ltINPUTgt
  • line3 ltINPUTgt
  • chomp (line2, line3)
  • dna line2.line3
  • count_A 0
  • count_C 0
  • count_G 0
  • count_T 0
  • _at_dna split '', dna
  • foreach base (_at_dna)
  • if (base eq 'A') count_A
  • elsif (base eq 'C') count_C
  • elsif (base eq 'G') count_G
  • elsif (base eq 'T') count_T
Write a Comment
User Comments (0)
About PowerShow.com