Title: ENGLISH TO SQL USING FSTS AND SMT
1ENGLISH TO SQL USING FSTS AND SMT
- SHYAM GALA
- EECS 595
- NATURAL LANGUAGE PROCESSING
- FINAL PROJECT
2TRADITIONAL APPROACH
- All the NLDBi presented so use a process that
involves - - Conversion of a NL statement into its logical
representation by applying syntactic and semantic
analysis. - - Conversion of the internal representation into
a database query using
3TRADITONAL APPROACH
TOKEN BASED AND TEMPLATE BASED APPROACHES 1
The number steps involved with these traditional
methods make them very expensive, computationally.
4MY APPROACH
- I try to analyze two approaches that achieve the
conversion in a light weight manner - The first approach involves the use of SMT do
the conversion. - The second approach involves the use FSTs to
achieve the conversion.
5STASTICAL MACHINE TRANSLATION
- SMT involves the following 3 steps
- Generation of a Language Model(Training)
- Generation of a Translation Model(Training)
- Doing the actual translations by using the 2
generated models which is done by a
decoder(Translation) - Once training is done, translation is very
inexpensive
6STASTICAL MACHINE TRANSLATION
- Had to create the parallel corpus manually
- Experiments and Evaluations
- Some sample results using a trigram model
Sentence Show all employees whose first name is
Mary Output from empinfo first where
'Mary' Sentence What is the id of the film
'Casablanca'. Output title'Alien' SELECT id
salary) id Casablanca.
7Stastical Machines Translation
- Possible reasons for bad results
- Small size of data
- Applying unstructured language model to a highly
structured language - Future potential in using the structured language
model introduced by Knight and Yamada
8FINITE STATE TRANSDUCERS
- This is the 2nd approach that I tried
- It involves 2 steps
- Extracting Keywords, Indicators, and their
relative ordering from a given string using FSTs - Conversion of the output of the FST to a SQL
query using a preprocessor written in Java. - This is also a very inexpensive method
9KEYWORD EXTRACTION
- The first step of the process is to feed the
English statement to the FST - FST looks for certain keywords and indicators
based on the database schema
Sentence Show the names of managers who have
employees in a department located in
Bombay Output selectnamemanagerwhoseemp
loyeedepartmentlocation'Bombay'
10POSTPROCESSOR
- The output of the FST is fed to the postprocessor
- The post processor tries to break the input into
the Object component and the optional Conditions
component - Needs knowledge about the database
11Objects Component
- Looks at all the fields that have been mentioned
and tries to associate them their respective
tables - If more than 1 table exists
- Generates subquery
12Conditions Component
- Iterates for the fields
- Once it finds a field it finds the relatively
closest operator and value - Goes through the same process as the above to
find the associated table - Repeats this process for all the fields
- Checks for all the tables in the query so far and
links them up.
13NL Sentence
Show the names of managers who have employees in
a department located in Bombay
FST
selectnamemanagerwhoseemployeedepartmentloc
ation'Bombay'
Postprocessor
Objects
Components
Select manager.name from manager, employee,
department
where manager.managerid employee.managerid and
employee.departmentid department.departmentid
and department.location Bombay
Select manager.name from manager, employee,
department where manager.managerid
employee.managerid and employee.departmentid
department.departmentid and department.location
Bombay
14LIMITATIONS AND SOLUTIONS
- Building a domain specific FST that can be
cubersome - FST can be replaced by a PERL script which can
make this task a lot easier - There are commands that it will not be able to
process - Can be addressed by iterating back and forth
between a user to see if there is a equivalent
sentence in the preferred structure using pattern
matching - Needs knowledge about the database
- This can be automated by a Java program that
connects to a DB and retrieves all the useful
metadata information - And onetime effort from the user to associate
keywords to tables, fields, operators.
15CONCLUSION
- We have seen 2 lightweight methods to that try to
address the problem of conversion of English
statement into SQL statement - SMT not very successful, but does warrant some
further efforts - Keywords approach seems really promising and
definitely deserves some attention