Title: Python and Perl
1Python and Perl
- Lecture 2
- Regular Expressions
2Why use Regular Expressions?
- It is very powerful when extracting information
from flat files. - Easy to identify text rows with interesting data.
- Easy to retrieve sub groups/sub strings from text
rows. - Easy to identify numbers, words, white spaces and
separators.
3Real World exempel
ID TRBG361 standard mRNA PLN 1859
BP. XX AC X56734 S46826
Regular expression /ID/
Regular expression /AC\s\w/
4Meta characters - 1
- Ordinary characters match them selves (a vs
a, H vs H, Arsenal vs Arsenal). - Meta characters are special characters that
controls how other characters are interpreted. - . Matches any characters (except newline \n)
- The characters that follows must match
the first characters on a line/string. - The characters before must match the last
characters in the row/string. - The character before matches zero or
many occurrences. - The character before matches one or
many occurrences. - ? The character before ? will match but is
not necessary. (optional)
5Meta characters - 2
- The characters preceding n , will
match n repeats (an intervall like a2,4
matches aa, aaa, aaaa). - Characters enclosed like xyz , matches
either one of x, y, z. - Characters on either side like x y,
matches x or y. - () Characters enclosed by (), like
ab(xyz)cd, determines a sub group xyz in the
match. - \ Meta characters preceded by \, matches
themselves and revokes the meaning of the
special meta feature. Like \ match plus sign.
6Special Sequences
- \d Matches any number, same as 0-9.
- \D Inversed form of \d, matches any non
number. - \s Matches any white spaces, same as
\t\n\r\f\v\b. - \S Inversed form of \s, matches any non white
space character - \w Matches any alphanumeric character, same as
a-zA-Z0-9_. - \W Inversed form of \w.
7How to apply RE? - overview
- Python
- Compile a pattern
- gt returns a re object.
- reobj re.compile(pattern)
- match the re object with a stringvariable.
- gt returns a match object.
- mobj reobj.match(stringvar)
- Test the match object
- if mobj
- Code block for match
- else
- Code block for non-match
- Perl
- Use the string operator
- string m/pattern/
- gt returns True if matching was succesful.
- example
- if ( string m/pattern/ )
- code block for match
See Lecture 1, example 7 for a complete program
using Reg Exp.
8How to apply RE with sub groups? - overview
- Python
- Compile a pattern with subgroups like
patt(sub1)er(sub2)n - gt compile returns a re object.
- reobj re.compile(p(sub1)a(sub2)t)
- match the re object with a stringvariable.
- gt match/search returns a match object.
- mobj reobj.match(stringvar)
- Test the match object
- if mobj
- Extract sub groups
- subgr1 mobj.group(1)
- subgr2 mobj.group(2)
- Perl
- Use the string operator
-
- if ( stringm/p(sub1)a(sub2)t/)
- subgr1 1
- subgr2 2
-
- 1, 2, etc is built in variables in Perl. They
are empty until a match was succesful.
import re racc re.compile(KW\s(.)\s(.))
m racc.match(line) if m subgr1
m.group(1) subgr2 m.group(2)
9How does the syntax looks like?
My RE Perl program Filename RE.pl line
AC M5032 if ( line m/AC\s\w/
) print Found a match\n else print No
match found\n
My RE Python program Filename RE.py import
re line AC M50362 racc
re.compile(AC\s\w) if racc.match(line) pri
nt Found a match else print No match found