Apple Pie Parser Satoshi Sekine, July 1996 - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Apple Pie Parser Satoshi Sekine, July 1996

Description:

Apple Pie Parser. Developed by Satoshi Sekine at NYU Spring 1995. Bottom-up probabilistic char parser. Uses a best-first search algorithm. Grammar ... – PowerPoint PPT presentation

Number of Views:78
Avg rating:3.0/5.0
Slides: 27
Provided by: tcnj
Category:
Tags: apple | bottom | july | parser | pie | satoshi | sekine

less

Transcript and Presenter's Notes

Title: Apple Pie Parser Satoshi Sekine, July 1996


1
Apple Pie ParserSatoshi Sekine, July 1996
  • Discussion and Demonstration Session
  • Kenneth Wilson
  • The College of New Jersey
  • Spring 2003

2
Apple Pie Parser
  • Developed by Satoshi Sekine at NYU Spring 1995
  • Bottom-up probabilistic char parser
  • Uses a best-first search algorithm
  • Grammar
  • Semi-context sensitive grammar
  • Two nonterminals
  • Extracted from Penn Tree Bank

3
Apple Pie Parser
  • Fully automatic acquisition of grammar from a
    tagged corpus
  • Parser generates a syntactic tree as its output
  • Goal is to make a parse tree as accurate as
    possible for reasonable sentences (newspapers,
    well written documents, etc.)
  • Excludes conversation and ill-formed sentences

4
Obtaining the source
Via FTP _at_ cs.nyu.edu
Via WWW http//cs.nyu.edu/cs/projects/proteus/sek
ine http//cs.nyu.edu/cs/projects/proteus/app
5
Directories and files
  • After unzipping and untaring

Commands for unzipping and untaring gzip d
APP5.9.tar.gz tar xvf APP5.9.tar
6
How to make
  • Changes to be made to src/config.h file
  • Change relative pathname to absolute pathname
  • define DEFAULT_PARAMFILE /local2/users/wilson15/
    APP5.9/bin/app.prm
  • Change memory allocation
  • define ALLOCATE_ANODE 600000 / number of Anode
    /
  • Changes to be made to bin/app.prm
  • Change relative pathnames to absolute pathnames
  • DICTIONARY_FILE /local2/users/wilson15/APP5.9/data
    /WS-A-001.dic
  • GRAMMAR_FILE /local2/users/wilson15/APP5.9/data/WS
    -A-001.grm
  • NICKNAME_FILE /local2/users/wilson15/APP5.9/data/W
    S-A-001.nic

7
How to make
Type make at ./src directory (rm .o first if
necessary)
8
How to run
  • Type app at ./bin directory or from any directory
    (after modifying the path in shell)

9
Example of a session
  • Example of an APP session

10
Parsing algorithm
  • Techniques which enable the parser to handle
    large grammars
  • Grammar rules are factored with common prefixes
  • Best-first search
  • Because the grammar is a probabilistic grammar,
    it is enough to find only one parse tree with
    highest possibility.
  • Parser uses integer values for indicating POS or
    a grammatical node. Relationships between the
    integer values and corresponding POS are stored
    in the nickname file.

11
Out of vocabulary
  • Treatment of words not found in the vocabulary
  • Uses a list of part-of-speech d

12
Fitted parsing
  • Treatment of very long sentences.
  • Parser cant create a complete tree for some long
    sentences.
  • Parser prepares a post process to make a complete
    tree from several partial trees in the chart.

13
List of parameters
  • Default parameter file ./bin/app.prm

System Cut off START_SYMBOL Out of
Vocabulary OOV_NUM DICTIONARY_FILE TOO_LONG NP_LAB
EL FLAG_OOV OOV_NNP DICT_SCORETYPE CUT_GSCORE T
okenization OOV Others GRAMMAR_FILE CUT_DPROB TN_
SYMBOL OOV_NUMHYPN TRANS_POS NICKNAME_FILE PRUNE_R
ATE TN_TWO_SYMBOL OOV_CAPITAL INV_TRANS_POS DEBUG
Adjustment TN_TWO1_SYMBOL OOV_LY SUP_WORD PRINT
_STYLE WEIGHT_GRAM SPECIAL_HEAD OOV_Y PRINT_TOKEN
NO_WARNING WEIGHT_DICT DEL_TAIL OOV_ED PRINT_NT
Fitted parsing CAP_MINFREQ DEL_TAIL_SYMBOL OOV_D
FITTED_PARSE ATTACH DEL_TAIL_STRING OOV_S FITTED_
ROOT NO_LETTERCASE TAIL_NODE_CAT OOV_ION FITTED_LA
BEL DOUBLE_QUOTE OOV_ING FITTED_CCOST
14
Parameter setting by file
  • Can set parameters by file (parameter file)
  • ASCII file
  • Each line contains a parameter and its value
    separated by at least one space.
  • Default parameter file is app.prm (used when no
    other parameter file is specified.
  • Specify parameter files using the p option.

15
Change parameter value
  • A few parameters can also be changed during a
    session
  • Type param and specify parameter name and value
    on the next line
  • PRINT_STYLE
  • TOO_LONG
  • ATTACH
  • NO_LETTERCASE
  • DOUBLE_QUOTE
  • OOV

16
Grammar file
  • ltGRAMMAR RULEgt ltGRAMMAR RULEgt
  • ltGRAMMAR RULEgt ltLHS NODEgt ltRHS NODESgt
  • ltSCORE INFORMATIONgt
  • ltSTRUCTURE INFORMATIONgt
  • ltRHS NODESgt ltRHS NODEgt
  • ltLHS NODEgt integer
  • ltRHS NODEgt integer
  • ltSCORE INFORMATIONgt score ltSCOREgt
  • ltSCOREgt integer
  • ltSTRUCTURE INFORMATIONgt string to be print out
    for the node
  • ltnumgt correspond to the string of the
    num-th node in RHS
  • ---------- Example--------------------------------
    -------------------------------------------------
  • 1 53 54 2 103 205 1 57 55
  • score 113
  • struct (S lt1gtlt2gtlt3gt (VP lt4gt (SBAR lt5gt
    lt6gt)) lt7gt lt8gt)

17
Dictionary file
  • ltDICTIONARY FILEgt ltWORD INFORMATIONgt
  • ltWORD INFORMATIONgt ltSTRINGgt ltPOS
    INFORMATIONgt
  • ltstringgt string
  • ltPOS INFORMATIONgt ltPOSgtltSCOREgt
  • ltPOSgt integer
  • ltSCOREgt integer
  • --------Example-----------------------------------
    ------------------------
  • base 6515, 70119, 852, 892
  • Base-price 701
  • Base-rate 651
  • Baseball 7054

18
Nickname file
  • ltNICKNAME FILEgt ltNICKNAME INFORMATIONgt
  • ltNICKNAME INFORMATIONgt ltPOSgt ltNICKNAMEgt
  • ltPOSgt integer
  • ltNICKNAMEgt string
  • ------------Example-------------------------------
    ------------------
  • 1 S
  • 2 NP
  • 5 S1
  • 6 NP0
  • 51
  • 52
  • 53
  • 54 -LRB-
  • 55 -RRB-

19
List of APP functions
  • APP_init_param() Initialize parameter variables
  • APP_read_param_file() Read parameter
  • APP_init_global() Initialize global variables
  • APP_parse() Parsing
  • APP_set_param() Set parameter
  • APP_current_param() Show current parameter values
  • APP_debug_routine () Get into debug routine

20
Interface
  • Initialization
  • Parameter setting
  • Must be done first
  • APP_init_param() function sets default parameters
  • Set your own parameter values using
    APP_set_param()
  • Initialize internal data
  • Load dictionary, grammar, and nickname
    information and store them
  • APP_init_global()

21
Interface
  • Parsing
  • Initialized by APP_parse()
  • Two arguments
  • Sentence
  • Address of return value structure
  • Returns a 1 if parsing is completed, 0 if
    otherwise

22
Interface
  • Parameter setting and look-up
  • Can set limited kinds of parameters during the
    session using APP_set_param()
  • Lookup parameters using APP_current_param()

23
Interface
  • Debug routine
  • Use to view
  • internal structure
  • Information about previous input sentences.
  • Dictionary entries
  • Part of grammar rules
  • Sentence and parse tree data on the heap
  • Type resume to quit the debug routine and
    resume parsing session

24
Grammar
  • WS-A-001.grm is a grammar extracted from the Penn
    Tree Bank.
  • Consists of all occurrences of two non-terminal
    grammar rules (S and NP)
  • WS-A-002.grm also extracted from Penn Tree Bank
  • contains all the rules whose frequencies are more
    than one
  • Smaller grammar than WS-A-001.grm
  • Tradeoffs
  • WS-A-001.grm gives greater accuracy with slower
    parse time
  • WS-A-002.grm gives lower accuracy with faster
    parse time

25
Dictionary
  • WS-A-001.dic
  • Extracted from Penn Tree Bank
  • Supplemented by part-of-speech information from
    COMLEX syntax dictionary

26
Accuracy, time, size
  • Statistics based on tests runs using WS-A-001.grm
    and WS-A-001.dic
  • Tests performed on a SPARCstation 5 with 160 MB
    of memory and set ANODE_TOP to 5,000,000
  • Number of sentences 1989
  • Average length of sentences 23.28
  • Number of parsed sentences 1788
  • Number of fitted parse 203
  • Average parsing time for all 18
  • Average Parsing time excluding fitted parse 9.76
  • Precision 71.04
  • Recall 70.33
  • Average Crossing 3.03
Write a Comment
User Comments (0)
About PowerShow.com