Title: Analyzing SQL embedded in COBOL and Java
1Analyzing SQL embedded in COBOL and Java
- Martin van der Vlist
- Supervisor Mark van den Brand
2Contents
- Introduction
- Background used techniques
- Desired results
- COBOL analysis results
- Java analysis results
- Work in progress
3Introduction
- Goal asses quality of embedded languages
- Mean
- Analyze SQL embedded in COBOL/Java
- Output data in SQuAVisiT format
- Reuse existing parsers for COBOL/Java?
- Surprisingly not much research found
- Software Improvement Group
Old programs never die, they even don't fade...
4Contents
- Introduction
- Background used techniques
- Desired results
- COBOL analysis results
- Java analysis results
- Work in progress
5Background Parsing
Architecture of traditional parser like ANTLR
and JavaCC
String x something String x something()
6Background Parser comparison
- Pico language used for comparison
- Easy to parse language
- Extended for embedded SQL
- begin
- declare
- input natural,
- output natural,
- repnr natural,
- rep natural
- input 10
- output 1
- while input - 1 do
- rep output
- repnr input
- while repnr - 1 do
- output output rep
- repnr repnr - 1
- od
- input input - 1
- od
- end
7Background JavaCC
- Generates traditional top-down parsers in Java
- Includes tree builder and visitor generation
- 'Special' features
- Lexical states
- Java actions
- Mixed lexer/parser grammars allowed
- Lookahead specification per production
- Used in other LaQuSo tools
8Background JavaCC embedded
- No direct support for multiple languages
- With a trick, multiple lexer/parser combinations
can share input - Doesn't work with lookahead
//Added productions in the grammar void
SqlStatement() //the sql production is
opened ltSQL_BEGINgt sql() ltSQL_ENDgt void
sql() //this calls the java function
defined elsewhere delegate()
9Background ANTLR
- Also top-down parser
- Multiple output languages (Java, C, C, )
- Special features
- No real separation between lexer and parser
- Automatic lookahead per production
- For non-recursive structures
- Java actions
- Modularity, provided that language is still in
LL()
10Background ANTLR embedded
- Similar construct
- Because java actions in Lexer are supported, no
issues with lookahead
SQL EXEC SQL // create a new SQL
lexer/parser duo // that feeds off the
current input stream SqlLexer j new
SqlLexer(input) CommonTokenStream tokens new
CommonTokenStream(j) tokens.discardTokenType(Sq
lLexer.WS) SqlParser p new
SqlParser(tokens) // parse the embedded
language p.sqlStatement() // do something
with the generated output tablenames.addAll(p.ta
blenames) SQL_END END-EXEC
emit(Token.EOF_TOKEN)
11Background SDF
- Developed at CWI
- Any context-free grammar supported
- Scannerless Generalized LR parser
- Modular approach
- Strings etc can be reused in all languages
- More than just a parser-generator
12Background SDF embedded
- Because of modular structure, just include the
embedded grammar
module languages/PicoSql/PicoSql imports
languages/pico/syntax/Pico imports
languages/pico/syntax/comments imports
languages/tinySql/TinySql exports context-free
syntax "EXEC SQL" SelectStm "END-EXEC" -gt
STATEMENT "EXEC SQL" Stm "END-EXEC" -gt
STATEMENT context-free start-symbols PROGRAM
13Background JavaCC for COBOL
- SDF and ANTLR have better support for embedded
languages - JavaCC still preferred
- Several COBOL dialects
- Working grammar for JavaCC available
- Embedded languages not directly supported
- Workaround use lexical states
14Background JavaCC for COBOL
- Using lexical states, analysis can be completely
separated - Often used for comments / .... /
- Maintenance issue copy/paste not desired
- Solution recognize entire query as token
- TOKEN
- "EXEC SQL" IN_SQL_CODE
-
- ltIN_SQL_CODEgt MORE
- ltgt
-
- ltIN_SQL_CODEgt TOKEN
- "END-EXEC" DEFAULT
-
15Background JavaCC for COBOL
- Benefits of approach
- Recovery after parse error in query
- SQL parser is reusable
16Background JavaCC Visitor
- One of the recommended methods
- Visitor.visit(AST t, Argu a) calls
t.accept(this, a) - Node.accept(Visitor v, Argu a) calls
v.visit(this, a)
17Contents
- Introduction
- Background used techniques
- Desired results
- COBOL analysis results
- Java analysis results
- Work in progress
18Desired results
- Show relation between host- and embedded language
- Dependency graph for tables
- Complexity metrics for queries
- Output compatible with SQuAVisiT
The evil is in the details
19Desired results Metrics
- First idea reuse complexity metrics used for
COBOL - Not applicable Cyclomatic, if-count etc
- Questionable Halstead, no proof
- Other complexity metrics
- Example number of tables, number of parameters
- All are questionable no solid proof
20Desired results interface SQuAViSiT
- SQuAVisiT calls batchfile/executable per file
with arguments - batchfile ltin_pathgt ltout_pathgt ltlog_pathgt
ltmetrics_pathgt ltcalls_pathgt - Path is the other option
- batchfile ltin_dirgt ltout_dirgt ltlog_pathgt
ltmetrics_pathgt ltcalls_pathgt - All subdirectories should be analyzed by
batchfile - Metrics can be read in CSV format
- Call files have special format
21Contents
- Introduction
- Background used techniques
- Desired results
- COBOL analysis results
- Java analysis results
- Work in progress
22COBOL analysis
- Parsing issues
- Dynamic SQL
- Case study results
Cobol isnt dead yet, it just smells funny
23COBOL parsing issues
- Issues with COBOL-parser architecture
- SQL dialect
- Performance
- 25 minutes for case study initial
- 2 minutes after change to file-list based
- 158 minutes using multithreaded on P4 HT
24COBOL Dynamic SQL
- Even in COBOL, Strings can be executed as SQL
- Partial analysis created, supporting
- Some string functions
- COBOL data structures
- No flow graph analysis
- Not used often in COBOL, test cases irrelevant
25COBOL Dynamic SQL
- Record example
- Referenced as Q_TABLENAME OF TEST_QUERY in COBOL
- Referenced as TEST_QUERY.Q_TABLENAME in SQL
01 TEST_QUERY. 03 FILLER PIC X(15) VALUE
"SELECT id FROM ". 03 Q_TABLENAME PIC X(15).
03 FILLER PIC X(6) VALUE "WHERE ". 03
Q_FIELDNAME PIC X(15). 03 FILLER PIC X(3)
VALUE " ". 03 Q_VALUE PIC X(15).
26COBOL case study
- Software of Insurance company
- COBOL with PL/SQL embedded
- 2816 files, 1603 files with queries
- 1.62 Million lines of code
27Case study dependencies
IVA100 table calls
All table calls
28Case study dependencies Regular structures of
IVA100
29Case study dependencies 3 Irregular structure
of FP101A
30Case study dependencies 4 Matrix view
31Case study metrics
32Contents
- Introduction
- Background used techniques
- Desired results
- COBOL analysis results
- Java analysis results
- Work in progress
33Java analysis
- All queries are dynamic String argument to
method - Statement.execute(String query)
- Connection.prepare(String query)
- Any Java operation possible to assemble query
- Static Analysis for values is NP-hard
- Solution pattern matching
- Interpret basic String and String Builder
operations - Interpret assignments
- Regard integers as random values
34Java analysis overview
- Three analyzers using Visitor-pattern
- Process overview
- Store list of classes
- Analyze dependencies
- Analyze String values for relevant methods
- If needed, do dependency analysis for referenced
types - Output results to text files
35Java storage of intermediate results
- JavaCC generated parser creates AST
- ClassNames visitor creates 2 HashMaps
- Dependency Analyzer generates 'smart tree'Class
definitions contain - search functions up and down
- add/remove node functions
- decision to analyze another file
- Value Analyzer enriches 'smart tree'
36Java storage of intermediate results
- JavaCC generated parser creates AST
- ClassNames visitor creates 2 HashMaps
- Dependency Analyzer generates 'smart tree'Class
definitions contain - Value Analyzer enriches 'smart tree'
- Add ValueNode where appropriate
- Add DecisionScopeNode at decisions
- Delete DecisionScopeNode and merge results
37Java results
- Case study Compiere
- GPL project
- 0.7 million lines of code
- Method calls found found 1972
- With only one option 1555
- Value only partially detected 395
- With only one option 293
- Discard reasons
- Too many alternatives 3
- No value known (members?) 164
- Method return value used 260
- Variable couldnt be located 32
- Total alternative count 11.055
- Parsed queries 5.427
- 682 have unknown values as tablename
- At least 86 unparsable queries are invalid
- Parse errors mostly due to other SQL dialect
38Contents
- Introduction
- Background used techniques
- Desired results
- COBOL analysis results
- Java analysis results
- Work in progress
39Work in progress
- Case study at Logica
- A Java program with embedded SQL
- Appointment at Monday
- Improve Java analysis
- Write thesis
40W.I.P. improvements Java analysis
- Make output readable for SQuAVisiT
- Analyze class members (initialization)
- Cleanup
41Questions