Title: Languages, grammars, and regular expressions
1 Languages, grammars, and regular expressions
- LING 570
- Fei Xia
- Week 2 10/03/07
TexPoint fonts used in EMF. Read the TexPoint
manual before you delete this box. A
2Today
- Hw1 is due 1 penalty per hour after 3pm.
- Probability Theory (from last time)
- Formal languages, formal grammars, and regular
expression ? JM 2 - Hw2
3Coming up
- Next Monday
- Finite-state automaton JM Chapter 2
- Next Wed
- Finite-state transducer JM Section 3.4
- Hw2 is due
4Probability theory(from last time)
5Common trick 1 Maximum likelihood estimation
- An example toss a coin 3 times, and got two
heads. What is the probability of getting a head
with one toss? - Maximum likelihood (ML)
- ? arg max? P(data ?)
- In the example,
- P(X2) 3 p p (1-p)
- ? when p2/3, P(X2) reaches the maximum.
-
6Common trick 2Chain rule
7Common trick 3joint prob ?Marginal prob
8Common trick 4Bayes rule
9Independent random variables
- Two random variables X and Y are independent iff
the value of X has no influence on the value of Y
and vice versa. - P(X,Y) P(X) P(Y)
- P(YX) P(Y)
- P(XY) P(X)
- Our previous coin examples P(X, C) ! P(X) P(C)
10Conditional independence
- Once we know the value of Z, knowing the value of
Y does not help us predict the value of X, and
vice versa. - P(X,Y Z) P(XZ) P(YZ)
- P(XY,Z) P(X Z)
- P(YX, Z) P(Y Z)
11Independence and conditional independence
- If A and B are independent, are they conditional
independent? - Example
- Burglar, Earthquake
- Alarm
12Common trick 5Independence assumption
13An example
- P(w1 w2 wn)
- P(w1) P(w2 w1) P(w3 w1 w2)
- P(wn w1 , wn-1)
- ¼ P(w1) P(w2 w1) . P(wn wn-1)
- Why do we make independence assumption which we
know are not true?
14Summary of elementaryprobability theory
- Basic concepts sample space, event space, random
variable, random vector - Joint / conditional /marginal probability
- Independence and conditional independence
- Five common tricks
- Max likelihood estimation
- Chain rule
- Calculating marginal probability from joint
probability - Bayes rule
- Independence assumption
15Formal languages, formal grammars andregular
languages
16Unit 1
- Formal grammar, language and regular expression
- Finite-state automaton (FSA)
- Finite-state transducer (FST)
- Morphological analysis using FST
17Other units in ling570
- Unit 2 LM and smoothing
- Unit 3 HMM and POS tagging
- Unit 4 Classification and sequence labeling
- Unit 5 Introduction to other common tasks
(e.g., IR/IE/WSD)
18Regular expression
- Two concepts
- Regular expression in formal language theory
- Regular expression (or pattern) in pattern
matching it is a way of expressing a pattern for
the purpose of matching a string - Both concepts describe a set of strings.
- The two concepts are closely related, but the
latter is often more expressive than the former.
19Outline
- Formal language
- Regular language
- Regular expression in formal language theory
- Formal grammars
- Regular grammars
- Regular expression in pattern matching
20Formal languages
21Definition of formal language
- An alphabet is a finite set of symbols
- Ex a, b, c
- A string is a finite sequence of symbols from a
particular alphabet juxtaposed - Ex the string baccab
- Ex empty string ²
- A formal language is a set of strings defined
over some alphabet. - Ex1 aa, bb, cc, aaaa, abba, acca, baab, bbbb,
. - Ex2 an bn n gt 0
- Ex3 the empty set Á
22Definition of regular languages
- The class of regular languages over an alphabet
is formally defined as - The empty set, Á, is a regular language
- 8 a 2 ², a is a regular language.
- If L1 and L2 are regular languages, then so are
- (a) L1 ² L2 xy x 2 L1 y 2 L2
(concatenation) - (b) L1 ? L2 (union or disjunction)
- (c) L1 x1 x2 , xn xi 2 L1 , n 2 N (Kleene
closure) - There are no other regular languages
23Kleene star
- Another way to define L
- L2 L ² L
- Ln Ln-1 ² L
- L ² ? L1 ? L2 ?
- Examples
- L a, bc
- L2 aa, abc, bca, bcbc
- L abcbca, . (abc)
24Properties
- Regular languages are closed under
- Concatenation
- Union
- Kleene closure
- Regular languages are also closed under
- Intersection L1 Å L2
- Difference L1 L2
- Complementation - L1
- Reversal
25Are the following languages regular?
- a, aa, aaa, .
- Any finite set of strings
- xy x2 , and y is the reverse of x
- xx x 2
- an bn n 2 N
- an bn cn n 2 N
26Regular expression
27Definition of Regular expression(as in formal
language theory)
- The set of regular expressions is defined as
follows - (1) Every symbol of is a regular expression
- (2) ² is a regular expression
- (3) If r1 and r2 are regular expressions, so are
(r1), r1 r2, r1 r2 , r1 - (4) Nothing else is a regular expression.
28Examples
- abc
- a (012..9) b
- (CV CCV) C?C? C is a consonant, V is a vowel
- Other operations that we can use
- a a a
- a? (a ²)
29Relation between regular language and Regex
- With every regular expression we can associate a
regular language. - Conversely, every regular language can be
obtained from a regular expression. - Examples
- Regex abc
- Regular language ac, abc, abbc, .
30Formal grammar
31Definition of formal grammar
- A formal grammar is a concise description of a
formal language. It is a (N, , P, S) tuple - A finite set N of nonterminal symbols
- A finite set S of terminal symbols that is
disjoint from N - A finite set P of production rules, each of the
form - ( ? N) N ( ? N) ! ( ? N)
-
- A distinguished symbol S 2 N that is the start
symbol
32Chomsky hierarchy
- The left-hand side of a rule must contain at
least one non-terminal. - , , 2 (N ? ), A,B 2 N, a 2
- Type 0 unrestricted grammar no other
constraints. - Type 1 Context-sensitive grammar
- The rules must be of the form A
! - Type 2 Context-free grammar (CFGs)
- The rules must be of the form A !
- Type 3 Regular grammar The rules are of the
forms - right regular grammar A ! a, A ! aB, or A
! ² - left regular grammar A ! a, A ! Ba, or A !
² - Are there other kinds of grammars?
33Strings generated from a grammar
- The rules are
- S ? x y z S S S - S S S S/S (S)
- What strings can be generated?
- A grammar is ambiguous if there exists at least
one string which has multiple parse trees. - Is this grammar ambiguous?
34Languages generated by grammars
- Given a grammar G, L(G) is the set of strings
that can be generated from G. - Ex G (N, , P, S)
- N S, a, b, c
- P S ! a S b, S ! c
- What is L(G)?
35The relation between regular grammars and regular
languages
- The regular grammars describe exactly all regular
languages. - All the following are equivalent
- Regular languages
- Regular grammars
- Regular expression
- Finite state automaton (FSA)
36Relation between grammars and languages (from
wikipedia page)
37How about human languages?
- Are they formal languages?
- What is alphabet?
- What is string?
- What type of formal languages are they?
38Outline
- Formal language
- Regular language
- Regular expression in formal language theory
- Formal grammar
- Regular grammar
- Patterns in pattern matching ? JM 2.1
39Patterns in Perl
- ab ab
- . match any character
- the starting position in a string
- the ending position in a string
- (..) defines a marked subexpression
- a match a zero or more times
- a match a one or more time
- a? match a zero or one time
- an,m a appears n to m times
40Special symbols in the patterns
- \s match any whitespace char
- \d match any digit
- \w match any letter or digit
- \S match any non-whitespace char
-
- \, \-, \., \?, \,
41Examples
- Integer (\\-)?\d
- Real (\\-)?\d\.\d
- Scientific notation (\\-)? \d (\.\d)?e
(\\-)?\d - Any of the three
- (\\-)? \d (\.\d)? (e (\\-)?\d)?
42Patterns in Perl and Regex
- /(.)\1/ ? xx x 2
- /(.)a(.)\1\2/ ? xayxy x, y 2
- ? The extra power comes from the ability to refer
to marked subexpression.
43Outline
- Formal language
- Regular language
- Regular expression in formal language theory
- Formal grammars
- Regular grammars
- Regex patterns in pattern matching
44Homework 2
45- Part I probability
- Part II formal grammar
- Part III regular expression