Title: Semi-Automated Software Restructuring
1Semi-Automated Software Restructuring
- By Santosh K Singh Kesar
- Advisor
- Dr. James Fawcett
Masters Thesis Dept. of Electrical Engineering
and Computer Science, Syracuse
University October 8, 2008
2Long-Term Research Goals
- Attempt to answer the question
- Is it possible to reliably improve the structure
of large, complex, software? - If so, can that be automated?
- If so, find appropriate means to implement a
process for such improvement.
3Specific Goals of this Research
- Find ways to reduce the size of large functions
and methods by turning them into a composition of
smaller functions and methods with the same
behavior. - Automate that process.
- Evaluate the results.
4Software Restructuring
- Extracting new functions and methods from source
code functions and methods. - Semi-Automated Source Code Restructuring
- Maintains the same external behavior of
restructured source code. - New files of restructured source code written, in
a different location from actual source code
5Restructuring v/s Refactoring
- Both code restructuring and refactoring are
concerned with improving logical structure. - Refactoring is a largely manual process with
broader scope. - Restructuring is automatic, but user-guided.
- Refactoring has traditionally been applied to
managed source code - Java in Eclipse
- C in Visual Studio
- Our restructuring works with native languages C
and C
6Is Badly Structured Code Likely?
- Is there a need for the results of this research?
- Do experienced researchers and professional
developers often create badly structured code?
7Imaging Research Code
File Function Name Number of lines
Weights_calculation.cpp WeightTwoQuadrantsFactor 280
Weights_calculation.cpp FactorsTwoRays 206
Weights_calculation.cpp AreaWeightFactor 223
Weights_calculation.cpp W_Calculate 377
Weights_calculation.cpp Main 191
Weights_calculation.cpp WeightBottomFactor 164
Mlr800fs.c Emsid2_new 724
Mlr800fs.c Main 851
Mlr800fs.c Ect 3608
Mlr800fs.c emsid3_new 749
Mlr800fs.c emsid4 516
8GKGFX Library, Mozilla 1.4.1
Smallest disk is a file
Dependency Lines
Number indicates the size of a strong component,
in this case, 60 mutually dependent files
9Restructuring Process
- Analysis
- Find feasible regions for function extraction
- Selection
- Select from feasible regions code segments that
require few parameters to be passed as function
arguments - Code generation
10Analysis
- Lexical analysis
- Tokenize input stream
- Group into analysis sequences
- Parsing
- Recognize key grammatical elements
- Store for later use
- Deeper analysis of functions
11Lexical Analysis
- Tokenize
- Remove comments
- Eliminate whitespace
- Recognize key punctuators
- Form semi-expressions
- Sequences of tokens appropriate for parsing
12Our Lexical Analysis Tools
Sample output from Tokenizer Module
13Our Lexical Analysis Tools
Sample output from Semi-Expressions Module
14Parsing
- Recognize key grammatical elements
- A very small subset of language grammar
- Function definitions
- Method definitions
- Data declarations
- Data manipulations
- Build parse tree
- Use tree elements to support code generation
15Parse Tree
Top Level Structure of Parse Tree
16Types of Nodes
Different types of Nodes in Parse Tree
17Building Parse Tree
Building of First Three Levels
18Containment Diagram of Parse Tree
Top Level Containment diagram of Parse Tree
19Criteria 1 Line Numbers
Line number criteria for Feasible Regions
20Top down approach
Top down approach for determining parameters
21Class Diagram of Parsers
Class Diagram of Parsers using Utility Class
22Class Diagram of ICRNode
Class Diagram of ICRNode Interface
23Class Diagram of RootObj
Class Diagram of RootObj
24Association of DataObjects
Class Relationship diagram of Parse tree Objects
25Representing Node Types
Class Diagram of Different Node Types
26Class Diagram of DataObject
Class Diagram of DataObject
27Hierarchy Stack
Stack Top Pointer
Hypothetical view of Hierarchy Stack
28Class Diagram of TempContainer
Figure 3.16 Class Diagram of TempContainer
29Class Diagram feasibleRegions and newFunctions
Class Relationship diagram of feasibleRegions and
newFunctions
30Class Diagram FunctionParser and fileManager
Class Relationship diagram of FunctionParser and
fileManager
31Restructuring in multiple passes
Original length of testFun function 120
32Restructuring Functions
void setRootValues() try
stdstring inFile getInputFile()
Directory dir Scanner scanr
scanr.doRecursiveScan(inFile)
dir.RestoreFirstDirectory()
if(dir.dirContainIncludes())
scanr.setFileIncludes() stdvectorltstdst
ringgt _files getCompleteFiles()
if(_files.size() gt 0) RootObj
root new RootObj() stdstring _type
root-gt_typename() if(_type "")
_type "pRoot"
root-gtdisplayRootStats()
catch(stdexception ex) stdcoutltlt
ex.what() ltltstdendl
Original Source code
33Extracted Function
Restructured Code
void setRootValues_1() stdstring inFile
getInputFile() Directory dir Scanner
scanr scanr.doRecursiveScan(inFile)
dir.RestoreFirstDirectory()
if(dir.dirContainIncludes())
scanr.setFileIncludes()
void setRootValues() try
setRootValues_1() stdvectorltstdstringgt
_files getCompleteFiles()
if(_files.size() gt 0) RootObj
root new RootObj() stdstring _type
root-gt_typename() if(_type "")
_type "pRoot"
root-gtdisplayRootStats()
catch(stdexception ex) stdcoutltlt
ex.what() ltltstdendl
34Contributions
- Semi-Automated Software Restructuring
- Type Analysis parser for host language
- Representing source code structure as Parse tree
- Identification of Feasible regions
- Demonstration with working code
- Future Work
- Further Optimization can be achieved.
- Semantic cues may help make sensible functions.
- Other things to think about like extracting
Objects.
35Changes to Thesis document
- Removed references to SMIRG
- Re-formatted to match university regulations.
36Demonstration
- Simple code that shows
- Parsing Functions and Methods
- Manages header and implementation files correctly.
37End of Presentation