Finite state transducer (FST) - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Finite state transducer (FST)

Description:

Map Arabic numbers to words. Ex: 123 one hundred and twenty three. Operations on FSTs ... wn/tn. Major steps. Training time: create an FST from the training data: ... – PowerPoint PPT presentation

Number of Views:704
Avg rating:3.0/5.0
Slides: 30
Provided by: xia1
Category:

less

Transcript and Presenter's Notes

Title: Finite state transducer (FST)


1
Finite state transducer (FST)
  • LING 570
  • Fei Xia
  • Week 3 10/10/2007

2
Applications of FSTs
  • ASR
  • Tokenization
  • Stemmer
  • Text normalization
  • Parsing

3
Outline
  • Regular relation
  • Finite-state transducer (FST)
  • Hw3
  • Carmel an FST package

4
Regular relation
5
Definition of regular relation
  • The set of regular relations is defined as
    follows
  • For all , (x, y) is
    a regular relation
  • The empty set is a regular relation
  • If R1, R2 are regular relations, so are
  • R1 R2 (x1 x2, y1 y2) (x1, y1) 2 R1,
    (x2, y2) 2 R2. R1 ? R2, and R.
  • Nothing else is a regular relation.

6
Closure properties
  • Like regular languages, regular relations are
    closed under
  • union
  • concatenation
  • Kleene closure
  • Unlike regular languages, regular relations are
    not closed under
  • Intersection
  • difference
  • complementation

7
Closure properties (cont)
  • New operations for regular relations
  • Composition
  • Projection extract x or y in the pairs
  • Inversion switch the x and y in (x,y) pairs
  • Take a regular language and create the identity
    regular relation
  • Take two regular languages and create the cross
    product relation

8
Finite state transducer
9
Finite-state transducers
  • xy is a notation for a mapping between two
  • alphabets
  • An FST processes an input string, and outputs
    another string as the output.
  • Finite-state automata equate to regular
    languages, and FSTs equate to regular relations.
  • Ex R (an, bn) n gt 0 is a regular
    relation.
  • It maps a string of as into an equal length
    string of bs

10
An FST
R(T) (a, x), (ab, xy), (abb, xyy),
11
Definition of FST
  • A FST is
  • Q a finite set of states
  • S a finite set of input symbols
  • G a finite set of output symbols
  • I the set of initial states
  • F the set of final states

  • the transition relation between states.
  • ? FSA can be seen as a special case of FST

12
Definition of transduction
  • The extended transition relation is the
    smallest set such that
  • T transduces a string x into a string y if there
    exists a path from the initial state to a final
    state whose input is x and whose output is y

13
More FST examples
  • Lowercase a string of any length
  • Tokenize a string
  • he saidGo away. ? he said Go away .
  • Convert a word to its morpheme sequence
  • Ex cats ? cat s
  • POS tagging
  • Ex He called Mary ? PN V N
  • Map Arabic numbers to words
  • Ex 123 ? one hundred and twenty three

14
Operations on FSTs
  • Union
  • Concatenation
  • Composition

15
An example of composition operation
16
FST Algorithms
  • Recognition Is a given pair of strings accepted
    by an FST?
  • (x,y) ? yes/no
  • Composition Given two FSTs T1 and T2 defining
    regular relations R1 and R2, create the FST that
    computes the composition of R1 and R2.
  • R1(x,y), R2(y,z) ? (x,z) (x,y) 2 R1,
    (y,z) 2 R2
  • Transduction given an input string and an FST,
    provide the output as defined by the regular
    relation?
  • x ? y

17
Weighted FSTs
  • A FST is
  • Q a finite set of states
  • S a finite set of input symbols
  • G a finite set of output symbols
  • I Q ?R (initial-state probabilities)
  • F Q ?R (final-state probabilities)

  • the transition relation between states.
  • P (transition probabilities)

18
An example build a unigram tagger
  • P(t1 tn w1 wn)
  • ¼ P(t1w1) P(tn wn)
  • Training time Collect (word, tag) counts, and
    store P(t w) in an FST.
  • Test time in order to choose the best tag
    sequence,
  • create an FSA for the input sentence
  • compose it with the FST.
  • choose the best path in the new FST

19
Summary
  • Finite state transducers specify regular
    relations
  • FST closure properties union, concatenation,
    composition
  • FST special operations
  • creating regular relations from regular languages
    (Id, crossproduct)
  • creating regular languages from regular relations
    (projection)
  • FST algorithms
  • Recognition
  • Transduction
  • Composition
  • Not all FSTs can be determinized.
  • Weighted FSTs are used often in NLP.

20
Hw3
21
Part III Creating a unigram POS tagger using
FSTs
  • Input w1 w2 wn
  • Output w1/t1 w2/t2 wn/tn
  • Training data w1/t1 w2/t2 wn/tn

22
Major steps
  • Training time create an FST from the training
    data
  • calc_unigram_prob.sh create word tag prob cnt
  • create_fst.sh create an FST from the unigram_voc
  • Test time
  • Preprocessing preproc.sh
  • Decoding (finding the best path) run carmel with
    some options
  • Postprocessing postproc.sh
  • Calculate tagging accuracy
  • Write a wrapper

23
Carmel
24
The format of FSA / FST
  • final_state
  • (from_state1 (to_state1 input_symbol
    output_symbol? weight?) )
  • (from_state2 (to_state2 input_symbol
    output_symbol? weight?) )
  • A state can be a number or string.
  • The from_state in the first edge-line is the
    start state.
  • ² is represented as e
  • output_symbol and prob are optional.

25
An FSA example fsa1
0
1
2
3
4
5
26
An WFSA example wfsa1
27
An WFST example wfst1
28
To use Carmel
  • carmel fst1 fst2
  • gt return a new fst, which composes fst1 and
    fst2.
  • carmel -k N wfst1
  • gt return the N most probable paths
  • carmel -Ok N wfst1
  • gt return the N most probable output strings

29
To use Carmel (cont)
  • cat input_file carmel sli fst1
  • create a foo_fst that corresponds to the first
    line in input_file
  • carmel foo_fst fst1
  • Ex input_file is
  • they can fish
  • cat input_file carmel sri fst1
  • create a foo_fst that corresponds to the first
    line in input_file
  • carmel fst1 foo_fst
  • Ex input_file is
  • PRO AUX VERB
  • cat input_file carmel b sli fst1
Write a Comment
User Comments (0)
About PowerShow.com