Apple Pie Parser Satoshi Sekine, July 1996

About This Presentation

Title:

Apple Pie Parser Satoshi Sekine, July 1996

Description:

Apple Pie Parser. Developed by Satoshi Sekine at NYU Spring 1995. Bottom-up probabilistic char parser. Uses a best-first search algorithm. Grammar ... – PowerPoint PPT presentation

Number of Views:78

Avg rating:3.0/5.0

Slides: 27

Provided by: tcnj

Category:

more less

Transcript and Presenter's Notes

Title: Apple Pie Parser Satoshi Sekine, July 1996

1
Apple Pie ParserSatoshi Sekine, July 1996

Discussion and Demonstration Session
Kenneth Wilson
The College of New Jersey
Spring 2003

2
Apple Pie Parser

Developed by Satoshi Sekine at NYU Spring 1995
Bottom-up probabilistic char parser
Uses a best-first search algorithm
Grammar
Semi-context sensitive grammar
Two nonterminals
Extracted from Penn Tree Bank

3
Apple Pie Parser

Fully automatic acquisition of grammar from a
tagged corpus
Parser generates a syntactic tree as its output
Goal is to make a parse tree as accurate as
possible for reasonable sentences (newspapers,
well written documents, etc.)
Excludes conversation and ill-formed sentences

4
Obtaining the source
Via FTP _at_ cs.nyu.edu
Via WWW http//cs.nyu.edu/cs/projects/proteus/sek
ine http//cs.nyu.edu/cs/projects/proteus/app
5
Directories and files

After unzipping and untaring

Commands for unzipping and untaring gzip d
APP5.9.tar.gz tar xvf APP5.9.tar
6
How to make

Changes to be made to src/config.h file
Change relative pathname to absolute pathname
define DEFAULT_PARAMFILE /local2/users/wilson15/
APP5.9/bin/app.prm
Change memory allocation
define ALLOCATE_ANODE 600000 / number of Anode
/
Changes to be made to bin/app.prm
Change relative pathnames to absolute pathnames
DICTIONARY_FILE /local2/users/wilson15/APP5.9/data
/WS-A-001.dic
GRAMMAR_FILE /local2/users/wilson15/APP5.9/data/WS
-A-001.grm
NICKNAME_FILE /local2/users/wilson15/APP5.9/data/W
S-A-001.nic

7
How to make
Type make at ./src directory (rm .o first if
necessary)
8
How to run

Type app at ./bin directory or from any directory
(after modifying the path in shell)

9
Example of a session

Example of an APP session

10
Parsing algorithm

Techniques which enable the parser to handle
large grammars
Grammar rules are factored with common prefixes
Best-first search
Because the grammar is a probabilistic grammar,
it is enough to find only one parse tree with
highest possibility.
Parser uses integer values for indicating POS or
a grammatical node. Relationships between the
integer values and corresponding POS are stored
in the nickname file.

11
Out of vocabulary

Treatment of words not found in the vocabulary
Uses a list of part-of-speech d

12
Fitted parsing

Treatment of very long sentences.
Parser cant create a complete tree for some long
sentences.
Parser prepares a post process to make a complete
tree from several partial trees in the chart.

13
List of parameters

Default parameter file ./bin/app.prm

System Cut off START_SYMBOL Out of
Vocabulary OOV_NUM DICTIONARY_FILE TOO_LONG NP_LAB
EL FLAG_OOV OOV_NNP DICT_SCORETYPE CUT_GSCORE T
okenization OOV Others GRAMMAR_FILE CUT_DPROB TN_
SYMBOL OOV_NUMHYPN TRANS_POS NICKNAME_FILE PRUNE_R
ATE TN_TWO_SYMBOL OOV_CAPITAL INV_TRANS_POS DEBUG
Adjustment TN_TWO1_SYMBOL OOV_LY SUP_WORD PRINT
_STYLE WEIGHT_GRAM SPECIAL_HEAD OOV_Y PRINT_TOKEN
NO_WARNING WEIGHT_DICT DEL_TAIL OOV_ED PRINT_NT
Fitted parsing CAP_MINFREQ DEL_TAIL_SYMBOL OOV_D
FITTED_PARSE ATTACH DEL_TAIL_STRING OOV_S FITTED_
ROOT NO_LETTERCASE TAIL_NODE_CAT OOV_ION FITTED_LA
BEL DOUBLE_QUOTE OOV_ING FITTED_CCOST
14
Parameter setting by file

Can set parameters by file (parameter file)
ASCII file
Each line contains a parameter and its value
separated by at least one space.
Default parameter file is app.prm (used when no
other parameter file is specified.
Specify parameter files using the p option.

15
Change parameter value

A few parameters can also be changed during a
session
Type param and specify parameter name and value
on the next line
PRINT_STYLE
TOO_LONG
ATTACH
NO_LETTERCASE
DOUBLE_QUOTE
OOV

16
Grammar file

ltGRAMMAR RULEgt ltGRAMMAR RULEgt
ltGRAMMAR RULEgt ltLHS NODEgt ltRHS NODESgt
ltSCORE INFORMATIONgt
ltSTRUCTURE INFORMATIONgt
ltRHS NODESgt ltRHS NODEgt
ltLHS NODEgt integer
ltRHS NODEgt integer
ltSCORE INFORMATIONgt score ltSCOREgt
ltSCOREgt integer
ltSTRUCTURE INFORMATIONgt string to be print out
for the node
ltnumgt correspond to the string of the
num-th node in RHS
---------- Example--------------------------------
-------------------------------------------------
1 53 54 2 103 205 1 57 55
score 113
struct (S lt1gtlt2gtlt3gt (VP lt4gt (SBAR lt5gt
lt6gt)) lt7gt lt8gt)

17
Dictionary file

ltDICTIONARY FILEgt ltWORD INFORMATIONgt
ltWORD INFORMATIONgt ltSTRINGgt ltPOS
INFORMATIONgt
ltstringgt string
ltPOS INFORMATIONgt ltPOSgtltSCOREgt
ltPOSgt integer
ltSCOREgt integer
--------Example-----------------------------------
------------------------
base 6515, 70119, 852, 892
Base-price 701
Base-rate 651
Baseball 7054

18
Nickname file

ltNICKNAME FILEgt ltNICKNAME INFORMATIONgt
ltNICKNAME INFORMATIONgt ltPOSgt ltNICKNAMEgt
ltPOSgt integer
ltNICKNAMEgt string
------------Example-------------------------------
------------------
1 S
2 NP
5 S1
6 NP0
51
52
53
54 -LRB-
55 -RRB-

19
List of APP functions

APP_init_param() Initialize parameter variables
APP_read_param_file() Read parameter
APP_init_global() Initialize global variables
APP_parse() Parsing
APP_set_param() Set parameter
APP_current_param() Show current parameter values
APP_debug_routine () Get into debug routine

20
Interface

Initialization
Parameter setting
Must be done first
APP_init_param() function sets default parameters
Set your own parameter values using
APP_set_param()
Initialize internal data
Load dictionary, grammar, and nickname
information and store them
APP_init_global()

21
Interface

Parsing
Initialized by APP_parse()
Two arguments
Sentence
Address of return value structure
Returns a 1 if parsing is completed, 0 if
otherwise

22
Interface

Parameter setting and look-up
Can set limited kinds of parameters during the
session using APP_set_param()
Lookup parameters using APP_current_param()

23
Interface

Debug routine
Use to view
internal structure
Information about previous input sentences.
Dictionary entries
Part of grammar rules
Sentence and parse tree data on the heap
Type resume to quit the debug routine and
resume parsing session

24
Grammar

WS-A-001.grm is a grammar extracted from the Penn
Tree Bank.
Consists of all occurrences of two non-terminal
grammar rules (S and NP)
WS-A-002.grm also extracted from Penn Tree Bank
contains all the rules whose frequencies are more
than one
Smaller grammar than WS-A-001.grm
Tradeoffs
WS-A-001.grm gives greater accuracy with slower
parse time
WS-A-002.grm gives lower accuracy with faster
parse time

25
Dictionary

WS-A-001.dic
Extracted from Penn Tree Bank
Supplemented by part-of-speech information from
COMLEX syntax dictionary

26
Accuracy, time, size

Statistics based on tests runs using WS-A-001.grm
and WS-A-001.dic
Tests performed on a SPARCstation 5 with 160 MB
of memory and set ANODE_TOP to 5,000,000
Number of sentences 1989
Average length of sentences 23.28
Number of parsed sentences 1788
Number of fitted parse 203
Average parsing time for all 18
Average Parsing time excluding fitted parse 9.76
Precision 71.04
Recall 70.33
Average Crossing 3.03