Computational Linguistics Introduction - PowerPoint PPT Presentation

About This Presentation
Title:

Computational Linguistics Introduction

Description:

Finite State Machinery and Language Description Acknowledgement The material for this lecture is derived from a series of talks given by Dr. Ken Beesley (Xerox ... – PowerPoint PPT presentation

Number of Views:148
Avg rating:3.0/5.0
Slides: 43
Provided by: AnneSchi3
Category:

less

Transcript and Presenter's Notes

Title: Computational Linguistics Introduction


1
Computational Linguistics Introduction
  • Finite State Machinery and Language Description

2
Acknowledgement
  • The material for this lecture is derived from a
    series of talks given by
  • Dr. Ken Beesley (Xerox European Research Centre,
    Grenoble)
  • in Malta, 2001.

3
Todays Topics
  • Finite State Technology
  • Regular Languages and Relations
  • Review of Set Theory
  • Understand the mathematical operations that can
    be performed on such Languages.
  • Understand how Languages, Relations, Regular
    Expressions, and Networks are interrelated.

4
What is Finite State Technology?
  • Finite State Technology refers to a collection of
    techniques for application of Finite State
    Automata (FSA) to a range of linguistically
    motivated problems.
  • Such Techniques include
  • Design of user languages for specifying FSA
  • Compilation of such languages into efficient
    transition networks.
  • Development environments and runtime systems

5
What is Finite-State Technology Good For?
  • Finite-state techniques cannot handle central
    embedding
  • the man the dog the cat bit followed ate.
  • They are well suited to lower-level natural
    language processing such as
  • Tokenization what is the next word?
  • Spelling error detection does the next word
    belong to a list?
  • Morphological/phonological analysis/generation
  • Shallow syntactic parsing and chunking

6
Tokenisation Problems
  • VfB Stuttgart scored twice in quick success-ion
    early in the second half on their way to a
    deserved 2-1 victory over Manchester United in
    the Champions League on Wednesday.(example from
    Mary Dalrymple, University of London)
  • VfB Stuttgart, Manchester United
  • succession
  • 2-1
  • Wednesday
  • Finite state techniques provide a means to
    specify the language of words, thus defining what
    it means to be the next token.
  • There are three ways to specify such languages

7
Languages,Notations and Machines
LANGUAGE (set of strings)
NOTATION
8
Languages,Notations and Machines
FINITE STATE LANGUAGE
FINITE STATE NOTATION
9
FINITE STATE AUTOMATApreliminary definition
  • A finite state automaton includes
  • A finite set of states
  • A finite set of labelled transitions between
    states

10
Physical Machines with Finite States
  • The Lightswitch Machine

UP
OFF
ON
DOWN
11
Physical Machines with Finite States
  • The Lightswitch Toggle Machine

PUSH
OFF
ON
PUSH
12
The Five Cent Machine
  • Problem
  • Assume you have one, two, and five cent pieces
  • Design a finite state automaton which accepts
    exactly 5 cents.

13
The Cola Machine
  • Need to enter 25 cents (USA) to get a drink
  • Accepts the following coins
  • Nickel 5 cents
  • Dime 10 cents
  • Quarter 25 cents
  • For simplicity, our machine needs exact change
  • We will model only the coin-accepting mechanism

14
Physical Machines with Finite States
  • The Cola Machine

Start State
Final State
N
N
N
N
N
5
10
15
20
25
0
D
D
D
D
Q
15
The Cola Machine Language
  • List of all the sequences of coins accepted
  • Q, DDN, DND, NDD, DNNN, NDNN,
  • NNDNNNND, NNNNN
  • Think of the coins as SYMBOLS or CHARACTERS
  • The set of symbols accepted is the ALPHABET of
    the machine
  • Think of sequences of coins as WORDS or strings
  • The set of words accepted by the machine is its
    LANGUAGE

16
FINITE STATE AUTOMATAbetter definition
  • A finite state automaton includes
  • A finite set of states
  • Initial State
  • Final State (s)
  • A finite set of labelled transitions beween
    states
  • Labels are symbols from an alphabet
  • Recognises a language
  • Generates a language as well!

17
A Network that Accepts aOne Word Language
c
n
a
t
o
18
A Network that Accepts aThree Word Language
a
n
t
o
c
t
g
i
r
e
m
a
e
s
19
Scaling Up the Network
  • Imagine the same network expanded to handle three
    million words, all of them corresponding to valid
    words of a given language.
  • We supply a word and apply it to the network.
    If it is accepted by the network, then it is a
    valid word. Otherwise it does not belong to the
    language
  • This is the basis for a Spanish spelling error
    detector.

20
Looking Up a Word
a
n
t
o
c
t
g
i
r
e
m
a
e
s
Apply
m e s a
21
Lookup Failure
  • Lookup succeeds when all input is consumed and
    final state is reached. Lookup can fail because
  • Not all input is consumed ("libro", "tigra")
  • Input is fully consumed but state is not final
    ("cant")
  • Final state is reached but there is still
    unconsumed output ("mesas")

22
Shared Structure
c
l
e
a
r
v
e
e
23
Transducers
Lookdown
mesaNounFemPl
m
e
s
a
Noun
Fem
Pl
m
e
s
a
0
0
s
Lookup
m e s a s
24
A Morphological Analyzer
dog n pl
Transducer
dogs
25
A Morphological Analyzer
Lexical Language
Transducer
Surface Language
26
A Quick Review of Set Theory
  • A set is a collection of objects.

B
A
E
D
We can enumerate the members or elements of
finite sets A, D, B, E . There is no
significant order in a set, so A, D, B, E is
the same set as E, A, D, B , etc.
27
Uniqueness of Elements
  • You cannot have two or more
  • A elements in the same set

B
A
D
E
A, A, D, B, E is just a redundant
specification of the set A, D, B, E .
28
Cardinality of Sets
  • The Empty Set
  • A Finite Set
  • An Infinite Set e.g. The Set of all Positive
    Integers

Norway Denmark Sweden
29
Simple Operations on Sets Union
A
B
D
E
C
Set 1
Set 2
B C A D E
Union of Set1 and Set 2
30
Simple Operations on Sets (2) Union
A
B
C
D
C
Set 1
Set 2
B C A D
Union of Set1 and Set 2
31
Simple Operations on Sets (3) Intersection
A
B
C
D
C
Set 1
Set 2
C
Intersection of Set1 and Set 2
32
Simple Operations on Sets (4) Subtraction
A
B
C
D
C
Set 1
Set 2
A B
Set 1 minus Set 2
33
Formal Languages
Very Important Concept in Formal Language Theory
A Language is just a Set of Words.
  • We use the terms word and string
    interchangeably.
  • A Language can be empty, have finite
    cardinality, or be infinite in size.
  • You can union, intersect and subtract languages,
    just like any other sets.

34
Union of Languages (Sets)
dog cat rat
elephant mouse
Language 1
Language 2
dog cat rat elephant mouse
Union of Language 1 and Language 2
35
Intersection of Languages (Sets)
dog cat rat
elephant mouse
Language 1
Language 2
Intersection of Language 1 and Language 2
36
Intersection of Languages (Sets)
dog cat rat
rat mouse
Language 1
Language 2
rat
Intersection of Language 1 and Language 2
37
Subtraction of Languages (Sets)
dog cat rat
rat mouse
Language 1
Language 2
dog cat
Language 1 minus Language 2
38
Languages
  • A language is a set of words (strings).
  • Words (strings) are composed of symbols (letters)
    that are concatenated together.
  • At another level, words are composed of
    morphemes.
  • In most natural languages, we concatenate
    morphemes together to form whole words.

For sets consisting of words (i.e. for
Languages), the operation of concatenation is
very important.
39
Concatenation of Languages
work talk walk
0 ing ed s
Root Language
Suffix Language
The concatenation of the Suffix language after
the Root language.
work working worked works talk talking talked
talks walk walking walked walks
40
Languages and Networks
0
t
a
s
w a l k
s
s
o
i
r
n g
e
Network/Language 1
d
Network/Language 2
0
a
t
s
w a l k
The concatenation of Network 1 and Network 2
s
i
o
n g
r
e
d
41
Why is Finite State Computing so Interesting?
  • Finite-state systems are mathematically elegant,
    easily manipulated and modifiable.
  • Computationally efficient. Usually very compact.
  • The programming we linguists do is declarative.
    We describe the facts of our natural language
    i.e. we write grammars. We do not hack ad hoc
    code.
  • The runtime code, which applies our systems to
    linguistic input, is already written and it is
    completely language-independent.
  • Finite-state systems are inherently
    bidirectional we can use the same system to
    analyze and to generate.

42
Languages,Notations and Machines
FINITE STATE LANGUAGE
FINITE STATE NOTATION
Write a Comment
User Comments (0)
About PowerShow.com