Title: CSA305: Natural Language Algorithms
1CSA305 Natural Language Algorithms
- Deterministic and Non Deterministic Recognition
2Acknowledgement
- Material presented adapted fromJurafsky and
Martin Ch 2
3Representation of Automata using Transition Tables
4Transition Table Representation in Prolog
- S a b !
- s(0,1,0,0).
- s(1,0,2,0).
- s(2,0,3,0).
- s(3,0,3,4).
- s(4,0,0,0).
- next(OldState,a,NewState) -
- s(OldState,NewState,_,_).
- next(OldState,b,NewState) -
- s(OldState,_,NewState,_).
- next(OldState,!,NewState) -
- s(OldState,_,_,NewState).
5A Better Representation
- s(0,b,1).
- s(1,a,2).
- s(2,a,3).
- s(3,a,3).
- s(3,!,4).
- next(OldState,Sym,NewState) -
- s(OldState,Sym,NewState).
6The Process of Recognition 1
- Start in the initial state and at the first
symbol of the word. - If there is an arc labelled with that symbol, the
machine transitions to the next state, and the
symbol is consumed. - The process continues with successive symbols
until ....
7The Process of Recognition 2
- One or more of these conditions holds
- A. All symbols in the input are consumed
- IF current state is final, succeed, else fail
- B. There are no transitions out of a state for
the current symbol. - fail
8Deterministic Recognition
- A deterministic algorithm is one that has no
choice points - The following algorithm takes as input a tape and
an automaton. - returns accept else reject
9DETERMINISTIC FSA RECOGNITION
10Skeleton of Prolog Implementation
- drec(Tape,Machine,State,Result).
- drec( , M, S, yes) -
- final(S).
- drec(HT, M, S, Result) -
- tran(M,S,H,N),
- drec(T,M,N,Result).
- drec(_,_,_,no).
11Failure States
- We can regard failure as a special state.
- That state is reached by adding supplementary
arcs that represent invalid input.
12Adding a Failure State
13Deterministic versus Non Deterministic
Recognition.
- The behaviour of the automata we have considered
is fully determined by the current state, and the
input symbol. - The recognition process is said to be
deterministic - This is not necessarily the case.
- Several arcs with the same label.
- ?-Transitions. Arcs with no label.
- Automata like this are called non-determinstic
14Non Deterministic FAs
15Non Deterministic Recognition
- There are three ways of dealing with
non-deterministic recognition - Backtracking at every choice point, record the
state and as yet unexplored choices. - Lookahead peek ahead n symbols in the input in
order to decide which path to take. - Parallel search look at every path in parallel.
16ND-RECOGNISE
function ND-RECOGNISE(tape,machine) returns
accept or reject agenda ? (q0(machine),0)
search_state ? NEXT(agenda) loop if
ACCEPT-STATE?(search_state) true then return
accept else agenda ? agenda ? GENERATE-NEW-STATES(
search_state) if agenda is empty then return
reject else current_state ? NEXT(agenda) end
17ACCEPT-STATE?
function ACCEPT-STATES?(search_state) mstate ?
first(search_state) tape_pos ? second(search_state
) if tapetape_pos end_input and
IS-FINAL?(mstate) then return true else return
false
18GENERATE-NEW-STATES
function GENERATE-NEW-STATES(search_state) mstate
? first(search_state) tape_pos ?
second(search_state) return (x,tape_pos)
xtrantablemstate,? ? (x, tape_pos 1)
trantablemstate, tapetape_pos
19Recognition as Search
- Recognition can be regarded as a search problem
- Initial state, Goal State
- Rules
- Strategy
- Different search behaviours (depth first, breadth
first) can be evoked by managing the agenda in
different ways. - See Jurafsky Martin sect 2.2
20Deterministic and Non Deterministic FSAs
- The class of languages recognisable by NDFSA is
identical to that recognised by DFSA. - For every NDFSA ND there is an equivalent FSA D.
- The states of D correspond to sets of states in
ND - If N is the number of states in ND, the number of
states in D is 2N