Perl - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

Perl

Description:

Find' command in your word processor, 'Find File' in your computer's operating system. Based on an underlying concept ... Complete gibberish, right? It means: ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 14
Provided by: tri5274
Category:
Tags: gibberish | grep | perl | slashes

less

Transcript and Presenter's Notes

Title: Perl


1
Perl 3 Regular Expressions
  • Pattern Analysis in Biology

2
Patterns in Biology
  • Biology is all about patterns
  • (Im working on a book)
  • DNA patterns
  • restriction sites
  • promoters/transcription factor binding sites
  • intron splice site
  • Protein patterns
  • conserved domains (motifs)
  • active sites
  • structural motifs (membrane spanning, signal
    peptide, etc.)

3
Computers are good at finding Patterns
  • Find command in your word processor, Find
    File in your computers operating system
  • Based on an underlying concept called a Regular
    Expression (regexp)
  • a regexp is a text string, such as aatcg
  • can also have variable characters aatacg
  • or a wildcard aaxcg
  • or a variable spacer aax(1-20)cg

4
grep
  • grep is a handy Unix tool to get regular
    expresssions
  • it is powerful and moderately complex tool (has
    one of the longest man pages in the online Unix
    help system)
  • does not require its own OReilly book, but is a
    solid chapter in Intro and intermediate
    Unix/Linux books

5
Perl Regular Expressions
  • Perl Regular Expressions are more complex and
    more powerful than grep
  • Can find and substitute bits of text in a single
    command
  • Various options for fuzzy matches
  • Perl regular expressions can get extremely
    complex - goes way beyond the scope of this
    course
  • gt man perlrequick

6
The Match Operator / /
  • Perl uses a special type of operator to do text
    matching with regular expressions
  • / /
  • The symbol is a pattern match comparison
    operator
  • - it can be translated as contains
  • The forward slashes contain the pattern to be
    matched, like this
  • print EcoRI site found! if dna /GAATTC/

7
Alternative Characters
  • Square brackets within the match expression allow
    for alternative characters
  • if dna /GGGGATCCCC/
  • This will match an DNA string that starts with
    GGG has G,A,T, or C in the 4th position,
    followed by CCC
  • A vertical line within the /expression/ allows
    you to look for either of two completely
    different patterns
  • if dna /GAATTCAAGCTT/

8
Wildcards
  • Perl has a set of wildcard characters for Reg.
    Exps. that are completely different than the ones
    used by Unix
  • the dot (.) matches any character
  • \d matches any digit (a number from 0-9)
  • \w matches any text character (a letter
    or number, not punctuation or space)
  • \s matches white space (any amount)
  • matches the beginning of a line
  • matches the end of a line
  • (Yes, this is very confusing!)

9
Repeat for a count
  • Use curly brackets to show that a character
    repeats a specific number (or range) of times
  • find an EcoRI fragment of 100-500 bp length (two
    EcoRI sites with any other sequence between)
  • if ecofrag /GAATTCGATC100,500GAATTC/
  • The sign is used to indicate an unlimited
    number of repeats (occurs 1 or more times)

10
It gets worse
  • What if you need to match text that contains a
    special character?
  • (the dot shows up all the time in GenBank IDs,
    filenames, etc.)
  • Now you have to use a backslash (\) to escape
    the wildcard meaning of that character
  • if seqname /\w \ . \d/
  • -This would match any sequence ID that has some
    text characters, a dot, followed by a single
    digit M65783.2

11
Grabbing parts of a string
  • Regular expressions can do more than just ask
    if questions
  • They can be used to extract parts of a line of
    text into variables Check this out
  • /gt(\w)\s(. )/
  • Complete gibberish, right?
  • It means
  • -look for the gt sign at the beginning of a FASTA
    formatted sequence file
  • -dump the first word (\w) into variable 1 (the
    sequence ID)
  • -after a space, dump the rest of the line (.),
    until you reach the end of line , into variable
    2 (the description)

12
You can also do Substitution
  • To replace one string with another, use the
    tricky s/// function
  • It works like this s/expression/replacement/
  • text s/C-/A/
  • (If only life were as easy as Perl)

13
Thats all the Perl Im going to try to teach
yall.
  • You know enough now to learn more from perl
    documentation, a book, or a website
  • gt man perlintro
  • gt man perlrequick
  • Linconl Steins Genome Informatics class at CSHL
  • http//stein.cshl.org/genome_informatics/index.ht
    ml
  • Other cool websites
  • http//www.troubleshooters.com/codecorn/littperl/p
    erlreg.htm
Write a Comment
User Comments (0)
About PowerShow.com