Learning Subclasses of Formal Languages

About This Presentation

Title:

Description:

Number of Views:25

Avg rating:3.0/5.0

Slides: 13

Provided by: jal73

Category:

more less

Transcript and Presenter's Notes

Title: Learning Subclasses of Formal Languages

1
Learning Subclasses of Formal Languages

2
Plan of Presentation

3
Motivation

4
Introduction

5
Grammatical Inference
6
Formal Languages

A formal grammar G has four components.
A set of symbols ?, called terminals
A set of symbols N, called non-terminals with the
restriction that ? and N are disjoint
A special non-terminal symbol S , called a start
symbol
A set of production rules P , where each
production of the form ? ? ?

7
Chomsky Hierarchy

8
Inductive Inference

9
Golds Results

The class of phrase structure languages is
learnable from positive and negative samples.
Not even the class of Regular languages is
learnable from positive samples alone.
Any language class which contains all finite
languages and at least one infinite language
(super finite language class) is NOT identifiable
in the limit from positive samples.
The finite cardinality languages class is
identifiable from positive samples.

10
Angluins Results

Angluin1980 proposed that a language class
that contains some finite languages and some
infinite languages is identifiable from positive
samples alone.
Angluin proposed an efficient characterizable
method using which one can learn many interesting
classes of languages. Examples are
Parenthesis Language, Pattern Language
K-Reversible Language and TDR Language
Back

11
Terminal Distinguishable Languages

Based on structural information (skeleton)
Good algebraic and grammatical characteristics.
Good incremental behaviour
Based on three properties Backward determinism,
Terminal completeness, Terminal dissimilarity
Back

12
Conclusion

Inference algorithms based on tabular approach
and union-find approach has been successfully
implemented and shown to have advantages like
simplicity and minimization of computational
overhead.
A novel way of identifying a subclass of CFG in
GNF from positive samples is presented and shown
to have good convergence property.
An inference model is proposed to identify a
TSDLG using error correcting approach.
Suitability of the grammatical inference
techniques are demonstrated with the help of
applications like XML Document analysis and
Pseudoknot identification.
Back.