Title: Perl Training
1Perl Training
- Pattern Matching and Regular Expressions
http//perlwizard.org/perl/week8
2Agenda
- Homework Review
- Regular Expressions
- Simple True/False Searches
- Getting Shorter
- Those Pesky Backslashes
- Inexact matches
- Characters with Class
- Range Shortcuts
- Special Locations
- Multiplicity
- Match Return values
- Greedy Matching
- Scalar and List Context of m//
- Modifiers
- Global searches
- Grep
- Simple Substitutions
- Using Match Results/Expressions in Second
Argument - tr///
3Homework Problem 1
- Create a program that will open the /www/
directory on your VServer. Walk through the
directory structure, making sure that all the
directories are 755 and files are 644.
View Devens Solution at http//www.netexplorer.or
g/dirwalker.txt
4Homework Problem 2
- Create a program that will run a Unix command and
do something with the output (use Pipes)
!/usr/local/bin/perl open(WHO, "who
") while(ltWHOgt) print
5Homework Problem 3
- Read a list of files from a text file, and print
the permissions, file size, whether they are
readable/writable and other usefull file
information for each file
sub parse_date my _at_time_array ()
local (time) _at__ _at_time_array
localtime(time) Increment Month so
1Jan, 2Feb time_array4
Increment Year, so it's 2000
instead of 100 time_array5 1900
my date_string "time_array4/"
date_string . "time_array3/"
date_string . "time_array5 "
date_string . sprintf("02d",
time_array2) date_string .
sprintf("02d", time_array1)
date_string . sprintf("02d", time_array0)
return date_string
!/usr/local/bin/perl open(FILE,
"/tmp/files.txt") print "Filename\t\tPermissions\
tSize\tLast_Mod\t\tLast_Acc\n" while(ltFILEgt)
filename _ chomp(filename)
my perms (stat (filename))2 07777
my size (stat (filename))7
my last_mod (stat (filename))9
last_mod parse_date(last_mod) my
last_acc (stat (filename))8
last_acc parse_date(last_acc) print
"filename\tperms\t\tsize\tlast_mod\tlast_acc\
n"
6Regular Expressions
- _at_fields split(/\t/, TheRecord)
- This is the simplest form of Regular Expressions,
just the exact characters you want to specify,
such as a single tab in the above example. - _at_fields split(//, TheRecord) Split each
field by abccdefff - _at_fields split(/999/, TheRecord) Split each
field by 999 abc999cde999fff
7Simple True-False Searches
- Format
- m// Returns true or false, based on if it finds
text matching its RegEx - if(STRING m/REGEX/)
- Example
- if(abcdef m//) print True else
print False -
- Test abcdef
- if(Test m//) print True else print
False - if(Test ! m//) If Test does not contain a
-
- find
- if(Test m/find/) Its ok to use variables
- Make SURE that find is not the null string
- If it is, Perl uses whatever is in _, which
is BAAAAD.
8Getting Shorter
- You can use other delimiters
- if(Test m)
- if(Test m!!)
- Ill use m// because most people do.
- If you use, the m// form, you dont have to
include the m. - if(Test //)
- This is considered the short form and many
programmers use it. - To get shorter still, if you are searching in
Perls special variable _, you dont need the
at all -
- while(ltINFILEgt)
- if(/From/) The line has From in it
-
9Those pesky backslashes
How do you search for a / in a
string? if(line /\//) Search for a
vertical bar? if(line /\/) Use Quotemeta
(Requires Perl 5) function print
quotemeta("This string? has ( ) meta chars in
it") prints This\ string\?\ has\ \\ \(\ \)\
meta\ chars\ in\ it Must be escaped
? . ( ) \ (/ is also there
if you use them as your delimiter)
10Inexact Matches in RegEx
- Use a for or
- if(Test /(xX)/) If Test has an x or an
X in it - if(Test /can(dledycer)/) Matches on
candle, candy or cancer - Matching any character
- if(Test /N.T/) Matches the letter Nltany
chargtT - NET, NeT, NT Not NesT (2 chars)
- Does not work across \n, so it would fail
if - Test was N\nT
11Characters with Class
- if(Test /(xX)/) is the same as if(Test
/xX/) - Using will match any characters within the
brackets - if(Test /ABCDE/)
- Using in the pattern negates it, so this would
match any character not in that range - Which is better?
- if(Test /(0123456789)/)
- if(Test /0123456789/)
- if(Test /0-9/)
- Test for any upper or lowercase letter
- if(Test /A-Za-z/)
12Range Shortcuts
- Code Replaces Description
- \d 0-9 Any digit
- \w a-zA-Z_0-9 Any alphanumeric
- \s \t\n\r\f Any whitespace character
- \D 0-9 Any non-digit
- \W a-zA-Z_0-9 Any non-alphanumeric
- \S \t\n\r\f Any non whitespace
- if(Test /\d/) Search for a digit
- if(Test /A-Z\d\d/) Search for a capital
letter followed by two digits - if(Test /\s\w\w\w\s/) Searches for
whitespace, followed by 3 letters - followed by whitespace (a 3-letter word)
13Special Locations
- indicates the beginning of the string,
indicates the end of the string - if(Test /Beg/) If the string starts with
Beg - Test Begin here True
- Test I Beg your Pardon False
- if(Test /don/) If the string ends in don
- if(Test /\bY/) If a word starts with a Y
- \b is for word boundary
- if(Test /J\b/) If a word ends with J
- \B is the opposite, i.e. the middle of a
word
14Multiplicity
- How do you find one or more hash characters?
- if(Temp //) Not
general enough -
- Symbol Meaning
- Match 1 or more times
- Match 0 or more times
- ? Match 0 or 1 time
- n Match exactly n times
- n, Match at least n times
- n,m Match at least n but not more than m
times - (n and m values must be less than 65,536!!)
15Multiplicity Examples
- if(Test //) Matches one or more
- if(Test /\d3/) Match exactly three digits
- if(Test /\w\d/) Search for one or more
alphanumeric - characters followed by one digit
- if(Test /N.T/) NeT, NT, NeT
- if(Test /N.T/) NeT, NesT, NeesT
- if(Test /N.T/) NT, NeT, NesT, NeesT
16What Perl returns from a match
- Test NESTING
- if(Test /(N.T)/) print True
- print 1\n Prints NEST
- 1 matches the result of the first set of
parentheses - 2 matches the result of the second set of
parentheses - if(Test /(N.)S(.I)/) 1 is NE, 2 is TI
17Greedy Matching
- Test NITWITS
- if(Test /(N.T)/) 1 is NITWIT
- Perl uses Greedy matching by default, i.e. match
as much as possible on the first try, you can
supress it with the ? after the quantifier - if(Test /(N.?T)/) 1 is NIT
18Scalar and List context of m//
- In scalar context, the m// operator returns true
or false. In list context, it returns the list
of items returned, such as (1, 2, 3...) - Test NESTING
- _at_Birds (Test /(N.)S(.I)/) Birds0
NE, Birds1 TI
19Modifiers
- Modifier Description
- g Returns each occurrence (global search)
- i Ignores case
- m Allows multiple lines in the string
- o Compiles pattern once (rarely used)
- s Treats as single line
- x Allows whitespace for comments
20Modifier Examples
- if(Test /(N.T)/i) Matches NesT, Nest,
nest, net, neT - if(Test /
- (N.) (? Start with N and any letter)
- S (? ... followed by the letter S)
- (.I) (? ... followed by another letter, then I)
- /x) Same as if(Test /(N.)S(.I)/)
- Its usually more readable to use a regular perl
comment on the line before your if statement.
21Global Searches
- Test NESTING
- _at_Birds (Test /N./g)
- Birds0 is NE
- Birds1 is NG
- Handy way to find all the matching strings, if
you dont know how many there will be. - i0
- Remembers where in the string it last found
the last match and starts from there for
the next search - while(Test /N./g) i
- print There were i matches\n
22Grep (A Unix Geek Favorite)
- grep(RegEx, list)
- In List context, returns items that the RegEx is
true - In scalar context, returns the number of elements
that return true - _at_JustSNames grep(/S/, _at_LastNames) Get all
names that start with S - TotSNames grep(/S/, _at_LastNames) Get the
number of S Names
23Simple Substitutions
- s/// function format
- s/String to search for/String to replace with/
- Takes the same modifiers as m//
- Count (Test s/e/E/g) All es in Test
are now E - Count is number of es changed
- count (Test s/aeiouAEIOU/V/g)
- Change all vowels to V
- If you dont use the , it assumes you want _
- while(ltINFILEgt)
- s/\t//g Change the tabs to vertical bars
- Safer to assign _ to a variable and work
with the variable -
- You can use variables in either part of the s///
operator - Test s/tmp/tmp2/g
24Using Match Results/Expressions in Second
Argument
- Test s/(\d)/Line 1/ Change all lines
that start with a - number and change time to Line
- Test s/(,), (.)/2 1/ Change Noble,
Jason to Jason Noble - s/// also has the e modifier, which means
evaluate the second part as an expression rather
than a string - Test s/(\d)/1 1/e Turns 23 into 24,
for all numbers
25Another look at splitting
- The first argument is a regular expression, not
just a string. - Example (from DNS files)
- _at_fields split(/\t\t\t/, TheRecord) What if
it has 2 or 4 tabs? - _at_fields split(/\t2,4, TheRecord) Split
on 2-4 tabs
26Letter for Letter Translations
- tr/// function translates lots of letters at once
- Test tr/A-Z/a-z/ Translate all uppercase
to lowercase - tr/// returns the number of changes it makes.
- Modifiers
- c Complement the first argument (same as in
m// and s///) - d Deletes matching characters that are not
replaced - s Removes duplicate replace characters
- Test tr/0-9/\n/c Change all non-digits
into line feeds - Test tr/0-9//cd Change non-digits into
nothing - Test tr/,/,/s Change two or more commas
into one comma - y/// is equivalent to tr///
27Homework
- Im going to take next week off (from teaching
this class). Instead, each of you needs to
research a topic in Perl that we either have not
covered in class, or that you feel needs more
coverage. - Perl University class next Monday/Tuesday
- www.cpan.org
- www.perl.com
- www.tpj.com
- Please send me a Power Point Slide presentation
so that I can put it up on the Perl Website
28Questions?