Title: CSC1018F: Regular Expressions
1CSC1018FRegular Expressions
- Diving into Python Ch. 7
- Number Systems
2Lecture Outline
- Recap of OO Python week 3
- Regular Expressions
- Standard
- Verbose
- Number Systems
- Binary, decimal, hexadecimal
3Recap of OO Python
- Object Orientation
- Module importing
- Defining, initializing and instantiating Classes
- Class attributes
- Class methods
- Exceptions
- File Handling
- Opening, reading, writing and closing
4Intro to Regular Expressions
- Regular expressions are a powerful means for
parsing text to identify complex patterns of
characters - Standard string methods (find, replace, split)
can be insufficient in complex cases - But regular expressions can be complicated and
difficult to read so avoid them if string methods
will do the job - Read regular expressions from left to right
- Usage
- Import re regular expression functionality in
re module - Re.sub(regexpr, repstr, inputstr) typical
search replace
5Format of Regular Expressions
- Syntax
- - end of string marker
- - start of string marker
- \b - word boundary marker (to avoid backslash
escapes use a raw string - r"stringcontents") - ? - optional match to a single character
- (ABC) - indicates mutually exclusive options A,
B and C - Examples
- re.sub(r"\bROAD", "RD.", addr)
- addr 60 BROAD ROAD ? 60 BROAD RD.
- re.search(r"(abc) -", question)
- question a - how are you? ? ltSRE_Match object gt
6Further Syntax
- Pn, m syntax
- Deals with repeating patterns
- Read as pattern P appears at least n times but no
more than m times - More syntax
- \d - any numeric digit
- \D - any character except a numeric digit
- - 1 or more
- - 0 or more
- ( ) - to indicate groups
- Examples
- gtgtgt phPat re.compile(r"(\d3)\D(\d7)")
- gtgtgt phPat.search(021 6504058).groups()
- (021, 6504058)
7Verbose Regular Expressions
- So far only compact regular expressions
- To aid readability we would like to include
comments and spaces - Use re.VERBOSE as the last arguments to re
functions - Whitespace is ignored
- Comments ( commentstr) are ignored
- Example
-
pattern """ beginning of string
end of string """
8Case Study
- Counting 1-10 in roman numerals
- Additive and subtractive combination of I (1),
V(5), X (10) - Can have at most 3 of a particular numeral in a
row
gtgtgt roman r"(I?XIVV?I0,3)" gtgtgt
re.search(roman, "X") lt_sre.SRE_Match object at
0x1e55be0gt gtgtgt re.search(roman,
"VIII") lt_sre.SRE_Match object at 0x1e55ba0gt gtgtgt
re.search(roman, "") lt_sre.SRE_Match object at
0x1e55ce0gt gtgtgt re.search(roman, "IIII")
None True
9Number Systems
- Decimal (base 10)
- Digits (0-9)
- Each place represents a power of ten
- 172 2100 7101 1102 172
- Binary (base 2)
- Digits (0,1)
- Each place represents a power of two
- 10011 120 121 0 22 0 23 1 24 19
- Hexadecimal (base 16)
- Digits (0-9, A-F)
- A-F represent 10-15
- Each place represents a power of sixteen
- E.g., F7A 10160 7 161 15 162 3962
10Conversion
- Decimal to others
- Repeatedly divide number by base and populate
places from right to left with the remainder - E.g. Dec2Bin 50 / 2 0 25 / 2 1
12 / 2 0 6 / 2 0 3 / 2 1
1 / 2 1 0 110010 - Bin2Hex
- Collect binary digits into groups of four and
convert - E.g., 111000011111 1110 0001 1111 E1F
- Hex2Bin
- Hexadecimal digits convert into groups of four
binary digits - E.g., A7C 1010 0111 1100 101001111100
- Hex is used because
- It is easy to convert to and from binary
- Offers a more compact representation
11Revision Exercise
- Create a function which will take a date string
in any one of the following formats - dd/mm/yyyy or dd/mm/yy
- Other separators (e.g., \, , -) are also
allowed - Single figure entries may have the form x or 0x,
e.g. 3/4/5 or 03/04/05 - dd month yy or yyyy where month may be written in
full (December) or abbreviated (Dec. or Dec) - And return it in the format
- dd month(in full) yyyy, e.g. 13 March 2006
- Implement this using regular expressions and also
implement range checking on dates