Title: Laboratory of Software Analysis Lesson 1
1 Laboratory of Software AnalysisLesson 1
- Filippo Ricca
- Unità CINI at DISI
- (Laboratorio Iniziativa Software
FINMECCANICA/ELSAG spa - CINI) - Genova, Italy
- filippo.ricca_at_disi.unige.it
Mariano Ceccato, Alesssandro Marchetto ITC-Irst T
rento, Italy ceccato_at_itc.it, marchetto_at_itc.it
2Overview
- Objectives
- Course dependences
- Content / Course material / tools used
- Exam (discussion)
- Legacy systems
- Reverse engineering, re-structuring,
re-engineering - Program transformations (TXL)
- Past projects
- This year three small projects
3Objectives, dependences, material, exam.
4Objectives
- This course has two objectives
- providing the practical skills involved in
software analysis and testing. Some
techniques/approaches described during the
theoretical lessons of the basic course
(Software Analysis and Testing) will be applied
to real cases of software systems to be
re-engineered and tested. - introducing Empirical studies in Software
engineering
5Dependences
Not mandatory but
---gt Programming I and II, Software Engineering,
Software Analysis and Testing. ---gt It is
important to kwon (a little) - OO programming,
in particular Java (base level). - UML (class
diagram, ). - WEB technologies HTML, JSP,
(Just a little) - Theoretical aspects of
testing. -
6Content
- Code analysis and transformations
- Theoretical aspects (already seen in Software
analysis and testing). - The TXL programming language.
- Practice application of some techniques to
software systems.
- Software testing
- Theoretical aspects (already seen in Software
analysis and testing). - Acceptance testing, GUI testing, Test-first,
Design for testing, - Tools FIT, FITNESSE, JUnit, ABBOT, Robot
- Empirical studies in Software engineering
- Theoretical aspects (what is an ES?, How to
design/conduct an ES?) - Analysis and interpretation (how to draw
conclusions) - Execution of two empirical studies.
7Material / Tools
- Slides
- Papers
- Manuals of tools
http//sra.itc.it/people/ceccato/courses/lsa/
Languages and Tools
- TXL code analysis and transformations
- Graphviz Graph Visualization Software
- VisualUML UML modeling tool (diagrams recovery)
- JUnit, Fit, Fitnesse, Abbot, Robot Testing
tools -
8Examination
- During the course we will work at a lot of small
projects. - The examination will consist of a discussion.
- Admission to the examination requires (at least)
the production - of some documents that we will see during the
year. - Examples of small projects
- Recovering the Architecture (class diagram) of a
system. - Maintenance intervention / re-implementation of
a system - Porting a C program in Java
- Testing
- Empirical study C vs. Java
-
9A little of Terminology
10Legacy systems
Negative aspects
Positive aspects
Characteristics
- They were implemented years ago (? 1970)
- Their technology became obsolete (obsolete
languages, language - styles, hardware, )
- They have been maintained for a long time (? 30
years) - Their structure is deteriorated and does not
facilitate understanding - Their documentation (if it exists) became
obsolete - Original authors are not available
- They contain business rules not recorded
elsewhere - They can not be easily replaced (importart!)
- They represent a large investment
-
Each maintenance intervention is Extremely
difficult!
11Legacy dilemma
- What should we do with legacy code?
- to build the new system from scratch.
- trying to understand the legacy code and to
reconstitute it in a new form.
First step reverse engineering
12Reverse Engineering
- Reverse engineering is the process of taking
something (a device, an electrical component, a
car, a software, ) apart and analyzing its
working in details, usually with the intention to
construct a new device or program that does the
same thing. - Reverse engineering is used often by military, in
order to copy other nations technology.
13Military Reverse Engineered projects
- Examples of military reverse-engineered projects
include - Soviet Union reverse-engineered Tu-4 Bull bomber
from United States Boing B-29. - Soviet Union personal computer AGATHA was
reverse-engineered from the Apple II. - North Korea reverse-engineered the Russian
missile Scud Bs to make their own Scud Mod A.
Boing B-29
14(Software) Reverse engineering
- Reverse engineering is a process that helps
understanding a software system. It is a process
of examination, of extracting information, not a
process of change or replication.
Software ----------gt Abstract representation
Software ----------gt
15Forward and Reverse Engineering
- Forward engineering is the traditional process of
moving from high-level abstractions to the
physical implementation of a system.
Requirements
Design
Implementation
- Reverse engineering is the inverse of Forward
engineering
Design
Implementation
Requirements
Abstract Code Representation
Code
16Reverse Engineering Tools
- Pretty printers and code viewers
- Diagram generators (software views flowcharts,
data flow diagrams, call graph diagrams, ) - Embedded comments extractors (ex. Javadoc)
- Software metrics tools (Locs, methods/functions,
cohesion, coupling) - Design recovery tools (ex. Rational Rose, Omondo,
VisualUML UML diagram extractor) - Others
17Restructuring
- Restructuring is the transformation from one
representation to - another at the same relative abstraction level -
while preserving - the system external behavior (functionality and
semantics). - Examples
- Code level - from an unstructured
(spaghetti) form to a - structured form
(goto-less) - - conversion of set of
if-statements into a - case structure.
- Design level to improve or change data
structures (arrays to Lists, - files system to DBMS )
or to improve algorithms - (for example time
complexity).
18Re-engineering
- Re-engineering is the examination (reverse
engineering) of a system to reconstitute it
(forward engineering) in a new form. - This process may include modifications with
respect to new requirements not met by the
original system (Semantics cannot be preserved).
- The re-engineering process takes many forms,
depending on its - objectives. Sample objectives are
- code migration/porting (ex. C to C)
- reengineering code for reuse
- reengineering code for security
-
19Relationships
20Program analysis
- Program analysis is the (automated) inspection
of a program - to infer some properties. Usually, properties are
inferred without - running the program (static analysis).
- Examples are
- Type analysis (type inference)
- Dead code analysis
- Clone analysis
- Pointer Analysis
-
21Program Transformations
- Program transformation is the act of changing one
program into another.
Two cases L is different from L L is equal to
L
transformation
P
P
source language L
target language L
- Examples
- Pascal to C porting
- Goto elimination (Pascal language)
22 TXL
- TXL is a programming language specifically
designed to support software analysis and program
transformation.
x4 b c y2 x4 2 Loop x a b
y x a y 3 z x4 a y2 End
loop
Loop x a b y x a y 3
x b c y x 2 z x a y End
loop
Code motion optimization
Example moves all loop-independent
assignment statements outside of loops.
23Past Projects and Project of this year
24Project Year 2004 Porting C to Java
- Porting of the Chull program (C code) in Java.
- Chull determines the convex hull of a set of
points in 3D. - Chull is not a trivial C code (4161 LOCs, 31
functions, 3 struct, pointers, ).
Convex hull in 2D
25Project Steps 2004
Semi-automated procedure
- Instrumentation of Chull using TXL. Writing
Testcases such that branch covered is reached. - Reverse engineering of Chull using TXL Call
graph, dependences between functions and data
structure. - Object identification (clustering and concept
analysis). - OO design in UML (only class diagram).
- Java code generation. Chull (partially
TXL) - Testing of Chull with testcases generated at
point (1) to show that Chulli Chulli
26Code instrumentation
- To determine whether or not each branch is
traversed, we can place a counter
(instrumentation) on each branch. Then we have to
run the program with inputs. - To have branch coverage we have to check if
count is equal to (1, 1, , 1).
read x, y
start
count(1) 1
Program instrumented
z 1
count (0, 0, 0)
true
If (x gty)
count(3) 1
exit
false
N.B count is an array where each element
is assigned to 0.
count(2) 1
27Project Year 2005Maintenance intervention
is implemented by code fragments spread across
several classes
- Adding a new crosscutting functionality
(persistence history) to the Jconsole java
program . - Jconsole 27 java files, 1385 LOCs.
- Two ways for adding a crosscutting functionality
to a system - 1) Changing (almost) all the java classes.
- 2) Adding an aspect (AOP) in the language
AspectJ.
28AspectJ example
Suppose to have to add logging for all methods
of a Java program. (Logger.entry(string) and
Logger.exit(string))
/ Java / Public class Main public
void foo() Logger.entry(foo())
. Something
Logger.exit(foo()) public void
foo(int i) Logger.entry(foo(int))
. Something
Logger.exit(foo(int)) public
static void main(String args)
Logger.entry(main()) .
Something Logger.exit(main())
/ AspectJ / Public class Main public
void foo() . Something
public void foo(int i)
. Something public static void
main(String args) . Something
Public aspect autolog pointcut
publicMethods() . Before() publicMethods()
Logger.entry After() publicMethods()
Logger.exit
29Project Year 2006a real SE experiment
- We have conducted a real software engineering
experiment - stereotyped UML class diagrams
- (Conallen proposal)
- vs.
- Pure UML class diagrams
- What are stereotypes?
- What is a software engineering experiment (or
software engineering empirical study)?
Web Applications context
30Stereotypes
- The designers of UML recognized that the
language is not always perfect for every
situation/domain. - UML has defined a mechanism to allow certain
domains to extend the semantics of specific model
elements. The extension mechanism allows the
inclusion of new attributes, different semantics
and additional constraints. - Stereotypes form an extension to UML.
- Stereotypes are adornments or icons having a
well-defined semantics.
Used instead of classes in the class diagram
31Empirical studies in SE
- Software engineering is the result of opinions
and anecdotal evidences and not the result of
empirical evidence... - For example no one has demonstrated that OO
techniques are better that structured techniques,
but everyone uses OO ... - Empirical studies (experiments) are useful to try
to answer some research questions. - technique A is better than B?
32How to conduct an empirical study?
- Suppose that we have to demonstrate this
hypothesis - technique A is better than B
- Procedure
- Participants (students, professionals, etc) are
divided into two groups (Group 1 and Group 2). - Group 1 will execute the task with technique A
while Group 2 with technique B. - Data of the experiment are collected and metrics
are measured. - The hypothesis of the experiment is evaluated
statistically using data collected and metrics.
33Empirical study 1 Conallen vs. Pure UML
Conallen notation
Pure UML
Which is more useful during understanding and
maintenance?
34This year
Three projects
- Porting Borland Delphi Object Pascal program to
Java using TXL - Empirical study 1 Testcases (Fit tables) can
be used to clarify requirements? - Empirical study 2 Conallen vs. WebML. When doing
a comprehension task is more useful Conallen or
WebML?
35Borland Delphi Object Pascal to Java
- Type
- Person object
- surname string30
- name string20
- age Integer
- Procedure init
- End
- Student Object(Person)
- grade Integer
- teacher String30
- End
- Procedure Person.Init
- Begin
- surname
- name
- age0
- End
class Person String surname String
name int age Person()
surname name age
0 class Student extends Person
int grade String teacher
TXL program
36Fit tables
- A Fit table is a way of expressing the business
logic using a simple (input-output) HTML table. - Fit tables are added to the requirements and
are used as acceptance test cases. - Customers and Analysts create Fit tables using a
tool like Word, Excel, or even a text editor.
input
output
37Sports Magazine Website
We did an example to understand a little bit
better Fit tables
- A sports magazine decides to add a new feature to
its Website that will allow users to view top
football teams based on their ratings. - Rating ((10000(won3drawn))
- (3played))/100)
- The analyst can express the change requirement in
the traditional way - - natural language, use cases, .
- or
- - using natural language Fit tables
-
new feature added
38Natural languagevs. Fit tables natural
language
Fit tables can be used to clarify requirements?
Only natural language
Fit table natural language
- A user can search for top N football teams based
on rating. - The rating is defined
- A user can search for top N football teams based
on rating.
39Empirical study 2
Questionnaire
Group 1
Conallen
Questionnaire
Web appls
Group 2
WEBML
- When doing a comprehension task which is the
notation more useful?
40The end
Next lessons
41Obfuscated C contest Winner
return
IOCCC is a competition to see who can write the
most unreadable, but legal C program.
return
42Code viewers
return
1) Textual representation (colors)
2) Graphical representation (colors)
43A picture is worth a thousand words
Imagix tool
Main Window (? call graph)
C code
Calls
Functions
Variables
return
44return
CVF 3.0
CVF 3.0 is a automated program Flow chart
generator. It can perform automated reverse
engineering of program code into programming
flowcharts. It works with C, C, VC, VB, VBA,
VBScript, ASP, Visual C, Visual Basic .NET,
Visual J .NET, VC.NET, ASP.NET, Java, JSP,
JavaScript, Delphi, PowerBuilder and Perl.
45return
UML Class Diagram Recovery
46Type inference
Inferred Types
Program P
a4 cab Push(x, T) Push(y, T) dPop(T)
a integer creal, breal T queue dreal
Language without declarations
return
47Dead code
20 FOR I1 TO 10 30 VI VI 1 40 PRINT
VI 50 ENDFOR 60 PRINT X 70 GOTO 100 80
CALL F1 90 CALL F2 100 END
Suppose No jumps to the lines 80 and 90!
Never executed
return
48Clones
return
20 FOR I1 TO 10 30 VI VI 1 40 PRINT
VI 50 ENDFOR 60 PRINT X 70 CALL F 100
FOR J1 TO 10 110 WJ WJ 1 120 PRINT
WJ 130 ENDFOR
Clones
Lines 20-50 and 100-130