CSC1018F: Regular Expressions - PowerPoint PPT Presentation

1 / 11
About This Presentation
Title:

CSC1018F: Regular Expressions

Description:

Regular expressions are a powerful means for parsing text to identify complex ... But regular expressions can be complicated and difficult ... Revision Exercise ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 12
Provided by: daven2
Category:

less

Transcript and Presenter's Notes

Title: CSC1018F: Regular Expressions


1
CSC1018FRegular Expressions
  • Diving into Python Ch. 7
  • Number Systems

2
Lecture Outline
  • Recap of OO Python week 3
  • Regular Expressions
  • Standard
  • Verbose
  • Number Systems
  • Binary, decimal, hexadecimal

3
Recap of OO Python
  • Object Orientation
  • Module importing
  • Defining, initializing and instantiating Classes
  • Class attributes
  • Class methods
  • Exceptions
  • File Handling
  • Opening, reading, writing and closing

4
Intro to Regular Expressions
  • Regular expressions are a powerful means for
    parsing text to identify complex patterns of
    characters
  • Standard string methods (find, replace, split)
    can be insufficient in complex cases
  • But regular expressions can be complicated and
    difficult to read so avoid them if string methods
    will do the job
  • Read regular expressions from left to right
  • Usage
  • Import re regular expression functionality in
    re module
  • Re.sub(regexpr, repstr, inputstr) typical
    search replace

5
Format of Regular Expressions
  • Syntax
  • - end of string marker
  • - start of string marker
  • \b - word boundary marker (to avoid backslash
    escapes use a raw string - r"stringcontents")
  • ? - optional match to a single character
  • (ABC) - indicates mutually exclusive options A,
    B and C
  • Examples
  • re.sub(r"\bROAD", "RD.", addr)
  • addr 60 BROAD ROAD ? 60 BROAD RD.
  • re.search(r"(abc) -", question)
  • question a - how are you? ? ltSRE_Match object gt

6
Further Syntax
  • Pn, m syntax
  • Deals with repeating patterns
  • Read as pattern P appears at least n times but no
    more than m times
  • More syntax
  • \d - any numeric digit
  • \D - any character except a numeric digit
  • - 1 or more
  • - 0 or more
  • ( ) - to indicate groups
  • Examples
  • gtgtgt phPat re.compile(r"(\d3)\D(\d7)")
  • gtgtgt phPat.search(021 6504058).groups()
  • (021, 6504058)

7
Verbose Regular Expressions
  • So far only compact regular expressions
  • To aid readability we would like to include
    comments and spaces
  • Use re.VERBOSE as the last arguments to re
    functions
  • Whitespace is ignored
  • Comments ( commentstr) are ignored
  • Example

pattern """ beginning of string
end of string """
8
Case Study
  • Counting 1-10 in roman numerals
  • Additive and subtractive combination of I (1),
    V(5), X (10)
  • Can have at most 3 of a particular numeral in a
    row

gtgtgt roman r"(I?XIVV?I0,3)" gtgtgt
re.search(roman, "X") lt_sre.SRE_Match object at
0x1e55be0gt gtgtgt re.search(roman,
"VIII") lt_sre.SRE_Match object at 0x1e55ba0gt gtgtgt
re.search(roman, "") lt_sre.SRE_Match object at
0x1e55ce0gt gtgtgt re.search(roman, "IIII")
None True
9
Number Systems
  • Decimal (base 10)
  • Digits (0-9)
  • Each place represents a power of ten
  • 172 2100 7101 1102 172
  • Binary (base 2)
  • Digits (0,1)
  • Each place represents a power of two
  • 10011 120 121 0 22 0 23 1 24 19
  • Hexadecimal (base 16)
  • Digits (0-9, A-F)
  • A-F represent 10-15
  • Each place represents a power of sixteen
  • E.g., F7A 10160 7 161 15 162 3962

10
Conversion
  • Decimal to others
  • Repeatedly divide number by base and populate
    places from right to left with the remainder
  • E.g. Dec2Bin 50 / 2 0 25 / 2 1
    12 / 2 0 6 / 2 0 3 / 2 1
    1 / 2 1 0 110010
  • Bin2Hex
  • Collect binary digits into groups of four and
    convert
  • E.g., 111000011111 1110 0001 1111 E1F
  • Hex2Bin
  • Hexadecimal digits convert into groups of four
    binary digits
  • E.g., A7C 1010 0111 1100 101001111100
  • Hex is used because
  • It is easy to convert to and from binary
  • Offers a more compact representation

11
Revision Exercise
  • Create a function which will take a date string
    in any one of the following formats
  • dd/mm/yyyy or dd/mm/yy
  • Other separators (e.g., \, , -) are also
    allowed
  • Single figure entries may have the form x or 0x,
    e.g. 3/4/5 or 03/04/05
  • dd month yy or yyyy where month may be written in
    full (December) or abbreviated (Dec. or Dec)
  • And return it in the format
  • dd month(in full) yyyy, e.g. 13 March 2006
  • Implement this using regular expressions and also
    implement range checking on dates
Write a Comment
User Comments (0)
About PowerShow.com