High Level Languages - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

High Level Languages

Description:

Regex Anchors ^ is a special character meaning 'beginning of the line' ... Anchors. Modifiers. Substitution. 1. High Level Languages. Next Lecture... Things ... – PowerPoint PPT presentation

Number of Views:450
Avg rating:3.0/5.0
Slides: 38
Provided by: csNo
Category:

less

Transcript and Presenter's Notes

Title: High Level Languages


1
High Level Languages
  • Lecture 4
  • Regex

Chris Coleman cqc_at_cs.nott.ac.uk
2
Last lecture
  • Last time we covered
  • print
  • Files and input
  • Sub-routines
  • Scoping

3
Today
  • REGULAR
  • EXPRESSIONS

4
Regular Expressions
  • Also known as regex, these provide pattern
    matching capabilities
  • Allows you to search through a string, or change
    a string in a number of different ways
  • A little scary at first, but they are very
    usefull tools.

5
Regex Bold claim
  • By the end of this lecture you'll understand
    this
  • (word) /(a-z-a-z)/
  • somehash (line /(\w)(\w)/g)
  • s/(\w)(inged)/1/gi
  • Well, at least with the aid of a reference book.

6
Regex - Matching
  • The match operator is m/pattern/
  • The m at the front is optional
  • The operator says which string to perform the
    match on, e.g.
  • somestring /pattern/
  • If no variable is specified with the default
    variable is used instead, e.g.
  • /pattern/
  • True is returned if pattern is found in the string

7
Regex Patterns
  • A pattern may simply be a sequence of letters
  • if (input /perl/)
  • print "line contains 'perl'\n"

8
Regex Wild cards
  • You can use a full-stop as a wildcard meaning
    any character
  • pattern example matches
  • pe.l perl, peal, pe7l, pezl

9
Regex Quantifiers
  • A means the previous character repeated zero or
    more times
  • perl pel, perl, perrl, perrrl
  • pe.l pel, peal, pedfl, perlperl
  • A means the previous character repeated one or
    more times
  • perl perl, perrl, perrrl
  • pe.l peal, pedfl, perlperl

10
Regex Quantifiers
  • You can specify an exact number of matches with
  • bar3 barrr
  • bar2,4 barr, barrr, barrrr
  • WARNING! both of those would also match
    barrrrrrrrrr since the pattern barrr (for
    example) is present in barrrrrrrrrr.
  • Info on how to avoid this in a bit

11
Regex Quantifiers
  • and are greedy, i.e. they'll match as much
    as they can
  • If you add a ? Afterwards they become minimal
    and match as little as they can
  • _ "djm100020passwordDuncan Martin"
  • /./ djm100020password
  • /.?/ djm

12
Regex Grouping
  • You can use brackets to grouping parts of a
    pattern together
  • pe(rl) pe, perl, perlrl, perlrlrl
  • p(e.l) pexl, pexle4l, pe9leDleXl
  • means or
  • pe(rR)l perl, peRl
  • (perlpython) perl, python
  • (heshe) said he said, she said

13
Regex Remembering values
  • Using brackets has a side effect, that Perl will
    remember the part in brackets.
  • These are stored in variables, 1, 2, 3 etc.
  • 1 is the first bracket set from the left
  • line "he ran"
  • if (line /(heshe) (ranswam)/)
  • print "person 1\n" he
  • print "action 2\n" ran

14
Regex Remembering values
  • In list context a match returns all the found
    values
  • _ "hello duncan"
  • _at_res /(hellogoodbye) (.)/
  • print res0 hello
  • print res1 duncan

15
Regex Remembering values
  • Doing that when you're operating on a specific
    variable (not the default one) is a bit messier
  • line "hello duncan"
  • _at_res (line /(hellogoodbye) (.)/)
  • print res0 hello
  • print res1 duncan

16
Regex Anchors
  • is a special character meaning beginning of
    the line
  • means end of the line
  • /hello/ line starts with hello
  • /QED/ line ends with QED
  • /piffle/ line consists entirely of piffle

17
Regex Finding borders
  • Consider the text
  • "The cat ate the mat"
  • "The owner was irate and kicked it"
  • We want to find the word ate and use
  • /ate/
  • Both lines match since irate contains ate

18
Regex Finding borders
  • One way of solving this is to look for spaces
    either side of the word
  • / ate /
  • But this wouldn't match
  • "ate a lot yesterday"
  • "must be something I ate"
  • Since there is no space before or after
    respectively

19
Regex Finding borders
  • We use anchors to solve this problem
  • /( )ate( )/
  • Reads as
  • Start of the line or a space
  • Followed by the word 'ate'
  • Followed by the end of line or a space

20
Regex Character classes
  • A character class is like a selective wildcard
  • Will match a single character from the choice
    given
  • Defined with
  • /bcabbage/ babbage, cabbage
  • /a-zie/ aie, bie, cie, die
  • /a-z0-9f/ af, bf, cf, 0f, 1f, 2f, 3f

21
Regex Character classes
  • To match a number at the start of a line
  • (number) /(0-9)/
  • To find a hyphenated word
  • (word) /(a-z-a-z)/
  • means not
  • /bcow/ match aow, dow, eow, fow

22
Regex Special characters
  • There are a few of the special regex characters
  • \s Whitespace either space, tab or new-line
  • \S Any non-whitespace character
  • \w Word character letter, number or underscore
  • \W Any non word-character
  • \d Digit 0 to 9
  • \D Any non digit

23
Regex Modifiers
  • A modifier is a character that comes after the
    pattern to affect the behaviour of the match
  • Modifier i' means ignore case
  • _ "Duncan"
  • /duncan/ no match
  • /duncan/i match

24
Regex Modifiers
  • g returns all the patterns that match
  • line "The cat ate the mat"
  • _at_words (line /\w/g)
  • foreach (_at_words)
  • print "word _\n"
  • We didn't use any ( ) in the regex so it returns
    the whole pattern

25
Regex Modifiers
  • If we had used brackets it would return all of
    the groups in sequence. Consider
  • line "animalcamel likessand humps2"
  • We want to turn this into a hash, as if we'd
    said
  • somehash'animal' "camel"
  • somehash'likes' "sand"
  • somehash'humps' 2

26
Regex Modifiers
  • We can do this with the line
  • somehash (line /(\w)(\w)/g)
  • Why? The match will return a list like
  • ('animal','camel','likes','sand','humps',2)
  • And you can initialise a hash from a list

27
Regex Modifiers
  • x ignores white space in your pattern and lets
    you use comments
  • somehash (line /
  • (\w) first word in pair
  • the dividing equals
  • (\w) second word in pair
  • /gx)

28
Regex Substitution
  • Matching is only half of regex's power...
  • We can also get Perl to change what it matches to
    something else
  • The format is
  • s/pattern/replace with/
  • Returns the number of changes made

29
Regex Substitution
  • Easy example
  • x "The cat sat on the mat. Cats are boring."
  • changes (x s/cat/dog/i)
  • print changes\n
  • print x
  • 1
  • "The dog sat on the mat. Cats are boring"
  • Only the first instance has been changed

30
Regex Substitution
  • To change all occurrences, use the g modifier
  • x "The cat sat on the mat. Cats are boring."
  • x s/cat/dog/gi i ignores case
  • print x
  • "The dog sat on the mat. dogs are boring"

31
Regex Substitution
  • You can access remembered patterns from the
    left-hand side
  • Example, to remove the -ing and -ed endings from
    words
  • s/(\w)(inged)/1/gi
  • started becomes start
  • swimming becomes swimm

32
Regex Substitution
  • Or do something more complicated
  • id ("djm" gt "Duncan", "jog" gt "Jim)
  • text "djm sends mail to jog"
  • text s/(djmjog)/id1/g
  • "Duncan sends mail to Jim"
  • print text

33
Regex Substitution
  • The e modifier turns the right-hand side into an
    expression rather than a pattern
  • x "The cat sat on the mat"
  • x s/(\w)/reverse(1)/ge
  • "ehT tac tas no eht tam"
  • print x

34
  • A few useful substitutions to finish with
  • Remove white-space from start of line
  • s/\s//
  • Remove white-space from end of line
  • s/\s//

35
Summary
  • Today we covered
  • Simple matching
  • Wildcards
  • Quantifiers
  • Grouping
  • Remembering values
  • Anchors
  • Modifiers
  • Substitution

36
Next Lecture...
  • Things get simpler again...
  • We look at
  • basic HTML
  • How data is passed around the web

37
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com