Languages, grammars, and regular expressions - PowerPoint PPT Presentation

1 / 45
About This Presentation
Title:

Languages, grammars, and regular expressions

Description:

Hw1 is due: 1% penalty per hour after 3pm. Probability Theory (from ... Burglar, Earthquake. Alarm. 12. Common trick #5: Independence assumption. 13. An example ... – PowerPoint PPT presentation

Number of Views:131
Avg rating:3.0/5.0
Slides: 46
Provided by: coursesWa5
Category:

less

Transcript and Presenter's Notes

Title: Languages, grammars, and regular expressions


1
Languages, grammars, and regular expressions
  • LING 570
  • Fei Xia
  • Week 2 10/03/07

TexPoint fonts used in EMF. Read the TexPoint
manual before you delete this box. A
2
Today
  • Hw1 is due 1 penalty per hour after 3pm.
  • Probability Theory (from last time)
  • Formal languages, formal grammars, and regular
    expression ? JM 2
  • Hw2

3
Coming up
  • Next Monday
  • Finite-state automaton JM Chapter 2
  • Next Wed
  • Finite-state transducer JM Section 3.4
  • Hw2 is due

4
Probability theory(from last time)
5
Common trick 1 Maximum likelihood estimation
  • An example toss a coin 3 times, and got two
    heads. What is the probability of getting a head
    with one toss?
  • Maximum likelihood (ML)
  • ? arg max? P(data ?)
  • In the example,
  • P(X2) 3 p p (1-p)
  • ? when p2/3, P(X2) reaches the maximum.

6
Common trick 2Chain rule
7
Common trick 3joint prob ?Marginal prob
8
Common trick 4Bayes rule
9
Independent random variables
  • Two random variables X and Y are independent iff
    the value of X has no influence on the value of Y
    and vice versa.
  • P(X,Y) P(X) P(Y)
  • P(YX) P(Y)
  • P(XY) P(X)
  • Our previous coin examples P(X, C) ! P(X) P(C)

10
Conditional independence
  • Once we know the value of Z, knowing the value of
    Y does not help us predict the value of X, and
    vice versa.
  • P(X,Y Z) P(XZ) P(YZ)
  • P(XY,Z) P(X Z)
  • P(YX, Z) P(Y Z)

11
Independence and conditional independence
  • If A and B are independent, are they conditional
    independent?
  • Example
  • Burglar, Earthquake
  • Alarm

12
Common trick 5Independence assumption
13
An example
  • P(w1 w2 wn)
  • P(w1) P(w2 w1) P(w3 w1 w2)
  • P(wn w1 , wn-1)
  • ¼ P(w1) P(w2 w1) . P(wn wn-1)
  • Why do we make independence assumption which we
    know are not true?

14
Summary of elementaryprobability theory
  • Basic concepts sample space, event space, random
    variable, random vector
  • Joint / conditional /marginal probability
  • Independence and conditional independence
  • Five common tricks
  • Max likelihood estimation
  • Chain rule
  • Calculating marginal probability from joint
    probability
  • Bayes rule
  • Independence assumption

15
Formal languages, formal grammars andregular
languages
16
Unit 1
  • Formal grammar, language and regular expression
  • Finite-state automaton (FSA)
  • Finite-state transducer (FST)
  • Morphological analysis using FST

17
Other units in ling570
  • Unit 2 LM and smoothing
  • Unit 3 HMM and POS tagging
  • Unit 4 Classification and sequence labeling
  • Unit 5 Introduction to other common tasks
    (e.g., IR/IE/WSD)

18
Regular expression
  • Two concepts
  • Regular expression in formal language theory
  • Regular expression (or pattern) in pattern
    matching it is a way of expressing a pattern for
    the purpose of matching a string
  • Both concepts describe a set of strings.
  • The two concepts are closely related, but the
    latter is often more expressive than the former.

19
Outline
  • Formal language
  • Regular language
  • Regular expression in formal language theory
  • Formal grammars
  • Regular grammars
  • Regular expression in pattern matching

20
Formal languages
21
Definition of formal language
  • An alphabet is a finite set of symbols
  • Ex a, b, c
  • A string is a finite sequence of symbols from a
    particular alphabet juxtaposed
  • Ex the string baccab
  • Ex empty string ²
  • A formal language is a set of strings defined
    over some alphabet.
  • Ex1 aa, bb, cc, aaaa, abba, acca, baab, bbbb,
    .
  • Ex2 an bn n gt 0
  • Ex3 the empty set Á

22
Definition of regular languages
  • The class of regular languages over an alphabet
    is formally defined as
  • The empty set, Á, is a regular language
  • 8 a 2 ², a is a regular language.
  • If L1 and L2 are regular languages, then so are
  • (a) L1 ² L2 xy x 2 L1 y 2 L2
    (concatenation)
  • (b) L1 ? L2 (union or disjunction)
  • (c) L1 x1 x2 , xn xi 2 L1 , n 2 N (Kleene
    closure)
  • There are no other regular languages

23
Kleene star
  • Another way to define L
  • L2 L ² L
  • Ln Ln-1 ² L
  • L ² ? L1 ? L2 ?
  • Examples
  • L a, bc
  • L2 aa, abc, bca, bcbc
  • L abcbca, . (abc)

24
Properties
  • Regular languages are closed under
  • Concatenation
  • Union
  • Kleene closure
  • Regular languages are also closed under
  • Intersection L1 Å L2
  • Difference L1 L2
  • Complementation - L1
  • Reversal

25
Are the following languages regular?
  • a, aa, aaa, .
  • Any finite set of strings
  • xy x2 , and y is the reverse of x
  • xx x 2
  • an bn n 2 N
  • an bn cn n 2 N

26
Regular expression
27
Definition of Regular expression(as in formal
language theory)
  • The set of regular expressions is defined as
    follows
  • (1) Every symbol of is a regular expression
  • (2) ² is a regular expression
  • (3) If r1 and r2 are regular expressions, so are
    (r1), r1 r2, r1 r2 , r1
  • (4) Nothing else is a regular expression.

28
Examples
  • abc
  • a (012..9) b
  • (CV CCV) C?C? C is a consonant, V is a vowel
  • Other operations that we can use
  • a a a
  • a? (a ²)

29
Relation between regular language and Regex
  • With every regular expression we can associate a
    regular language.
  • Conversely, every regular language can be
    obtained from a regular expression.
  • Examples
  • Regex abc
  • Regular language ac, abc, abbc, .

30
Formal grammar
31
Definition of formal grammar
  • A formal grammar is a concise description of a
    formal language. It is a (N, , P, S) tuple
  • A finite set N of nonterminal symbols
  • A finite set S of terminal symbols that is
    disjoint from N
  • A finite set P of production rules, each of the
    form
  • ( ? N) N ( ? N) ! ( ? N)
  • A distinguished symbol S 2 N that is the start
    symbol

32
Chomsky hierarchy
  • The left-hand side of a rule must contain at
    least one non-terminal.
  • , , 2 (N ? ), A,B 2 N, a 2
  • Type 0 unrestricted grammar no other
    constraints.
  • Type 1 Context-sensitive grammar
  • The rules must be of the form A
    !
  • Type 2 Context-free grammar (CFGs)
  • The rules must be of the form A !
  • Type 3 Regular grammar The rules are of the
    forms
  • right regular grammar A ! a, A ! aB, or A
    ! ²
  • left regular grammar A ! a, A ! Ba, or A !
    ²
  • Are there other kinds of grammars?

33
Strings generated from a grammar
  • The rules are
  • S ? x y z S S S - S S S S/S (S)
  • What strings can be generated?
  • A grammar is ambiguous if there exists at least
    one string which has multiple parse trees.
  • Is this grammar ambiguous?

34
Languages generated by grammars
  • Given a grammar G, L(G) is the set of strings
    that can be generated from G.
  • Ex G (N, , P, S)
  • N S, a, b, c
  • P S ! a S b, S ! c
  • What is L(G)?

35
The relation between regular grammars and regular
languages
  • The regular grammars describe exactly all regular
    languages.
  • All the following are equivalent
  • Regular languages
  • Regular grammars
  • Regular expression
  • Finite state automaton (FSA)

36
Relation between grammars and languages (from
wikipedia page)
37
How about human languages?
  • Are they formal languages?
  • What is alphabet?
  • What is string?
  • What type of formal languages are they?

38
Outline
  • Formal language
  • Regular language
  • Regular expression in formal language theory
  • Formal grammar
  • Regular grammar
  • Patterns in pattern matching ? JM 2.1

39
Patterns in Perl
  • ab ab
  • . match any character
  • the starting position in a string
  • the ending position in a string
  • (..) defines a marked subexpression
  • a match a zero or more times
  • a match a one or more time
  • a? match a zero or one time
  • an,m a appears n to m times

40
Special symbols in the patterns
  • \s match any whitespace char
  • \d match any digit
  • \w match any letter or digit
  • \S match any non-whitespace char
  • \, \-, \., \?, \,

41
Examples
  • Integer (\\-)?\d
  • Real (\\-)?\d\.\d
  • Scientific notation (\\-)? \d (\.\d)?e
    (\\-)?\d
  • Any of the three
  • (\\-)? \d (\.\d)? (e (\\-)?\d)?

42
Patterns in Perl and Regex
  • /(.)\1/ ? xx x 2
  • /(.)a(.)\1\2/ ? xayxy x, y 2
  • ? The extra power comes from the ability to refer
    to marked subexpression.

43
Outline
  • Formal language
  • Regular language
  • Regular expression in formal language theory
  • Formal grammars
  • Regular grammars
  • Regex patterns in pattern matching

44
Homework 2
45
  • Part I probability
  • Part II formal grammar
  • Part III regular expression
Write a Comment
User Comments (0)
About PowerShow.com