Title: Characters, String and Regular expressions
1Characters, String and Regular expressions
2Characters
- char data type is used to represent a single
character. - Characters are stored in a computer memory using
some form of encoding. - Java uses Unicode, which includes ASCII, for
representing char constants.
3ASCII Encoding
4Unicode Encoding
- The Unicode Worldwide Character Standard
(Unicode) supports the interchange, processing,
and display of the written texts of diverse
languages. - Java uses the Unicode standard for representing
char constants.
5Character Processing
6Strings
- A string is a sequence of characters that is
treated as a single value. - Instances of the String class are used to
represent strings in Java.
7String declaration
- Create a String object
- String ltvariable namegt
- ltvariable namegtnew String(ltvalue of a
stringgt) - Create a String literal
- String ltvariable namegt
- ltvariable namegt ltvalue of a string
8Example
- String word1
- word1 new String(Java)
- OR
- String word1
- word1 Java
9Examples
We can do this because String objects
are immutable.
10String constructor
- No-argument constructor
- One-argument constructor
- A String object
- One-argument constructor
- A char array
- Three-argument constructor
- A char array
- An integer specifies the starting position
- An integer specifies the number of characters to
access
11Review
- To compute how many characters the string
myString contains, we use - myString.size
- myString.size()
- myString.length
- myString.length()
12Review
- To compute how many characters the string
myString contains, we use - myString.size
- myString.size()
- myString.length
- myString.length()
13Review
- Java uses this to represent characters of diverse
languages - ASCII
- UNICODE
- EDBIC
- BINARY
14Review
- Java uses this to represent characters of diverse
languages - ASCII
- UNICODE
- EDBIC
- BINARY
15Accessing Individual Elements
- Individual characters in a String accessed with
the charAt method.
name.charAt( 3 )
name
The method returns the character at position 3.
This variable refers to the whole string.
16Other Useful String Operators
17Compute Length of a string
- Method length()
- Returns the length of a string
- Example
- String strVar
- strVar new String(Java)
- int len strVar.length()
18Substring
- Method
- Extract a substring from a given string by
specifying the beginning and ending positions - Example
- String strVar, strSubStr
- strVar new String(Exam after Easter)
- strSubStr strVar.substring(0,4)
19Index position of a substring within another
string
- Method
- Find an index position of a substring within
another string. - Example
- String strVar1 Google it
- String strVar2 Google
- int index
-
- index strVar1.indexOf(strVar2)
20String concatenation
- Method
- Create a new string from two strings by
concatenating the two strings. - Example
- String strVar1 Google
- String strVar2 Search Engine
- String sumStr
-
- sumStr strVar1.concat(strVar2)
21Review
- What method is used to refer to individual
character in a String - getBytes
- indexOf
- getChars
- charAt
22Review
- What method is used to refer to individual
character in a String - getBytes
- indexOf
- getChars
- charAt
23Review
- To compare two strings in Java, we use
-
- equals method
- !
- ltgt
24Review
- To compare two strings in Java, we use
-
- equals method
- !
- ltgt
25String comparison
- Methods
- equals
- equalsIgnoreCase
- compareTo
- compareToIgnoreCase
26String comparison
- equals, and equalsIgnoreCase
-
- Example
- String string1 COMPSCI
- String string2 compsci
- boolean isEqual
- isEqual string1.equals(string2)
27Common error
- Comparing references with can lead to logic
errors, because compares the references to
determine whether they refer to the same object,
not whether two objects have the same contents.
When two identical (but separate) objects are
compared with , the result will be false. When
comparing objects to determine whether they have
the same contents, use method equals.
28String comparison
- compareTo
-
- Example
- String string1 Adam
- String string2 AdamA
- int compareResult
- compareResult string1.compareTo(string2)
29String comparison
- - string1.compareTo(string2)
- Compares two strings lexicographically
- will return 0 if two strings are equal
- will return negative value if string1 is less
than string 2 - will return positive value if string1 is
greater than string 2 - The comparison is based on the Unicode value of
each character in the strings -
30String comparison
- The comparison is based on the Unicode value of
each character in the strings -
- let k be the smallest index valid for both
strings - compareTo returns the difference of the two
character values at position k in the two string
-- that is, the value - character at the position k of string 1
character at the position k of string 2 -
31regionMatches
- regionMatches(boolean ignoreCase, int toffset,
String other, int ooffset, int len) - A substring of this String object is compared to
a substring of the argument other. - The result is true if these substrings represent
character sequences that are the same, ignoring
case if and only if ignoreCase is true.
32The String Class is Immutable
- In Java a String object is immutable
- This means once a String object is created, it
cannot be changed, such as replacing a character
with another character or removing a character - The String methods we have used so far do not
change the original string. They created a new
string from the original. For example, substring
creates a new string from a given string.
33Review
- If x.equals(y) is true, then xy is always true
- True
- False
34Review
- If x.equals(y), then xy is always true
- True
- False
35The StringBuffer Class
- In many string processing applications, we would
like to change the contents of a string. In other
words, we want it to be mutable. - Manipulating the content of a string, such as
replacing a character, appending a string with
another string, deleting a portion of a string,
and so on, may be accomplished by using the
StringBuffer class.
36StringBuffer Example
Changing a string Java to Diva
37Delete a substring from a StringBuffer object
- StringBuffer word new StringBuffer(CCourse)
- word.delete(0,1)
-
38Append a string
- StringBuffer word new StringBuffer(CS )
- word.append(Course)
-
39Insert a string
- StringBuffer word new StringBuffer(MCS
Course) - word.insert(4,220)
-
40Convert from StringBuffer to String
- StringBuffer word new StringBuffer(Java)
- word.setCharAt(0,D)
- word.setCharAt(1,i)
- System.out.println(word.toString())
41Review
- Both the String and StringBuffer classes include
the charAt and setCharAt methods - True
- False
42Review
- Both the String and StringBuffer classes include
the charAt and setCharAt methods - True
- False
43Review
- What will be the value of str after the following
statements are executed - String str
- StringBuffer strBuf
- str "Decaffeinated"
- strBuf new StringBuffer(str.substring(2,7))
- strBuf.setCharAt(1,'o')
- strBuf.append('e')
- str strBuf.toString()
44StringBuffer methods
- Method length
- Return StringBuffer length
- Method capacity
- Return StringBuffer capacity
- Method setLength
- Increase or decrease StringBuffer length
- Method ensureCapacity
- Set StringBuffer capacity
- Guarantee that StringBuffer has minimum capacity
45Class StringTokenizer
- Tokenizer
- Partition String into individual substrings
- Use delimiter
- Typically whitespace characters (space, tab,
newline, etc) - Java offers java.util.StringTokenizer
46Outline
47Pattern Example
- Suppose students are assigned a three-digit code
- The first digit represents the major (5 indicates
computer science) - The second digit represents either in-state (1),
out-of-state (2), or international (3) - The third digit indicates campus housing
- On-campus dorms are numbered 1-7.
- Students living off-campus are represented by the
digit 8.
51231-7
The 3-digit pattern to represent computer science
majors living on-campus is
48Regular Expressions, Class Pattern and Class
Matcher
- Regular expression
- Sequence of characters and symbols
- Useful for validating input and ensuring data
format - Facilitate the construction of a compiler
- Regular-expression operations in String
- Method matches
- Matches the contents of a String to regular
expression - Returns a boolean indicating whether the match
succeeded
49Regular Expressions, Class Pattern and Class
Matcher
- Predefine character classes
- Escape sequence that represents a group of
character - Digit
- Numeric character
- Word character
- Any letter, digit, underscore
- Whitespace character
- Space, tab, carriage return, newline, form feed
50Predefined character classes.
51Regular Expressions
- Other patterns
- Square brackets ()
- Match characters that do not have a predefined
character class - E.g., aeiou matches a single character that is
a vowel - Dash (-)
- Ranges of characters
- E.g., A-Z matches a single uppercase letter
-
- Not include the indicated characters
- E.g., Z matches any character other than Z
52Regular expression
- Quantifiers
- Plus ()
- Match one or more occurrences
- E.g., A
- Matches AAA but not empty string
- Asterisk ()
- Match zero or more occurrences
- E.g., A
- Matches both AAA and empty string
- Others in Fig. 29.22
53Quantifiers used in regular expressions.
54Regular Expression Examples
55Regular expression
- Replacing substrings and splitting strings
- String method replaceAll
- Replace text in a string with new text
- String method replaceFirst
- Replace the first occurrence of a pattern match
- String method split
- Divides string into several substrings
56Regular expression
- Class Pattern
- Represents a regular expression
- Class Match
- Contains a regular-expression pattern and a
CharSequence - Interface CharSequence
- Allows read access to a sequence of characters
- String and StringBuffer implement CharSequence
57Regular Expression Examples
58Matching
- Searches for 2 character pattern whose first
character may be any uppercase letter between A
and G, and whose second character may be any
number except 4 - Searches for a character pattern that may be any
alphabet except p,q,r,s, or t - A. str.matches(a-zA-Zpqrst)
- B. str.matches(A-G0-9 4)
59Matching
- Searches for 2 character pattern whose first
character may be any uppercase letter between A
and G, and whose second character may be any
number except 4 - Searches for a character pattern that may be any
alphabet except p,q,r,s, or t - A. str.matches(a-zA-Zpqrst)
- B. str.matches(A-G0-94)
B
A
60Review
- Which character sequence would be used to
designate a character pattern of a fixed length
of three digits - 30-9
- 0-90-90-9
- 0-93
61Review
- Which character sequence would be used to
designate a character pattern of a fixed length
of three digits - 30-9
- 0-90-90-9
- 0-93
62Review
- Choose the correct argument for the following
code that searches for any number between 100 and
999 in a given string. - str.matches( )
- 0-90-91-9
- 0-90-90-9
- 1-90-90-9
63Review
- Choose the correct argument for the following
code that searches for any number between 100 and
999 in a given string. - str.matches( )
- 0-90-91-9
- 0-90-90-9
- 1-90-90-9