Improving Dependency Structure of Large Software Projects - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Improving Dependency Structure of Large Software Projects

Description:

Monitor progress in large software projects. ... an internal API so flamboyantly baroque that frankly we can't even comprehend ... – PowerPoint PPT presentation

Number of Views:122
Avg rating:3.0/5.0
Slides: 38
Provided by: Yag2
Category:

less

Transcript and Presenter's Notes

Title: Improving Dependency Structure of Large Software Projects


1
Improving Dependency Structure of Large Software
Projects
  • Brown Bag Seminar
  • Murat Gungor
  • Friday, October 15, 2004

2
Goals
  • Monitor progress in large software projects.
  • Provide tools for continuous extraction of
    structural quality from source code.
  • Provide means to improve software systems
    dependency structure.

3
Introduction
  • Software is an expensive product - it involves
    intensive labor.
  • Software projects typically consist of many
    parts.
  • Interdependency between parts of a project is
    desirable. Needed for one component to use
    another. However excessive dependency reduces
  • Testability
  • Maintainability
  • Reusability
  • Understandability
  • Observing current state of a project is
    critically important, since early detection of
    quality defects will avoid delays, difficulties
    and costs associated with development evolution
    later in project lifecycle.

4
Problem Definition
  • Dependencies between software files are essential
    so that one component may provide services to
    another.
  • However, dependencies complicate process of
    making changes, perhaps to fix latent errors or
    performance problems, because of effects a change
    may have on other files.
  • When files each bind to many other files and
    mutual dependencies exist between them,
    maintenance and testing may become quite
    difficult to carry out effectively.
  • It is not uncommon for a change in one file to
    precipitate a cascade of changes in other files,
    especially in the presence of mutual dependencies

5
Motivation
  • Provide managers of large software projects
    immediate views of current state of their
    projects products.
  • We study existing projects to try to understand
    ways to do that.
  • Our current work has shown that static dependency
    structure is an important element of that
    analysis.

6
Problem Large Fan-out
After topological sort
Top. Sorted Files
Structure chart - large Fan-out
  • Depending on scores of other files
  • (large fan-out) may indicate a lack of cohesion
    the file is taking responsibilities for too many,
    perhaps only loosely related, tasks and needs the
    services of many other files to manage that.

Level Dependency
7
Problem Large Strong Components strong
component is a set of mutual dependencies
After topologically sorting, strong components
are expanded
Top. Sorted Files
Files 2, 3, 4, and 5 cannot be ordered. The order
given is the best we can achieve.
Dependency chart
  • Ideal testing process
  • test those files with no dependencies, then test
    all files depending only on files already tested.
  • For testing, a strong component must be treated
    as a unit. The larger a strong component becomes,
    the more difficult it is to adequately test.
  • Change management becomes tougher, due to
    con-sequential changes to fix latent errors or
    performance problems

Level Dependency
8
Problem Large Fan-in
After topological sort
Top. Sorted Files
Structure chart - large Fan-in
  • High fan-in coupled with low quality creates a
    high probability for consequential change. By
    consequential change we mean a change induced in
    a depending file due to a change in the depended
    upon file

Level Dependency
9
Good Dependency Structure
After topologically sorted strong components
expanded
Top. Sorted Files
Dependency chart
  • Each component (file) depends only on its close
    neighbors. All files haveLow fan-in and
    fan-out. There is no call back to upper level
    components, or deep call forward.

Level Dependency
10
This is Mozilla, Version 1.4.1, Windows
BuildPlot shows some very large mutual
dependencies
  • This view is generated by our tools
  • DepAnal
  • DAView
  • It shows all files that depend on one specific
    file in largest strong component (Fan-In).

Green lines show Fan-Out of one file in a large
strong component. Note dependencies both inside
and outside component.
Size of bubble proportional to number of files in
strong component.
11
Is Complex Dependency Really a Problem?
  • Mozilla was targeted for Apple OSX.10 (Panther)
    but Apple switched to KHTML
  • Apple snub stings Mozilla
  • Bourdon said Safari engineers looked at size,
    speed and compatibility in choosing KHTML. In
    addition to Mozilla, Apple also considered
    building its own browser from scratch.
  • "Translated through a de-weaselizer, (Melton's
    e-mail) says 'Even though some of us used to
    work on Mozilla, we have to admit that the
    Mozilla code is a gigantic, bloated mess, not to
    mention slow, and with an internal API so
    flamboyantly baroque that frankly we can't even
    comprehend where to begin,'" Zawinski wrote.

12
Visibility
  • The dependencies shown on the previous slide are,
    without our tools, invisible.
  • Developers know only a small part of the
    dependency structure based on their own reading
    of the code. The rest they find by observing
    breakage when they change something.
  • Note that Mozilla, 1.4.1 is composed of 6701
    files! Impossible to understand that dependency
    structure without effective tools.

13
Project Monitoring
  • Monitoring software quality in a development
    project is an important task required of project
    management, especially for large-scale projects.
  • Constant feedback is an essential part of project
    management.
  • Watching progress manually is not an effective
    way in terms of time and correctness of results.
  • Up to date project documentation is not available
    always, but source code is.
  • Obtaining information from source code provides
    instant feedback. How do you do that for 6701
    files?

14
Static Source Code Analysis
  • Provides instant snapshot of projects state
  • Helps to diagnose the state of health of software
    project effectively.
  • Provides (almost) accurate result
  • Provides constant progress monitoring
  • Helps to determine effect of potential decisions
  • Helps to improve control because we change based
    on measurements, not guesses

15
Focus is dependencies among files
  • Many engineering organizations use source code
    files as the unit for analysis, management,
    testing.
  • Because we seek to provide support we
  • Investigate dependency structure between files.
  • Identify causes of dependency.
  • Research possible ways to improve dependency
    structure of existing software.
  • Automate static source code analysis to extract
    dependency structure and other software metrics.
  • This isnt as easy as it sounds for large file
    sets

16
Importance of this Study
  • Softwares quality depends on quality of its
    parts.
  • Future enhancements depend on existing system.
  • Maintainability depends on quality of current
    foundation.
  • Reuse is directly affected by dependencies
  • To reuse in a different context implies that we
    can extract the reused from its context.
  • That cant be done when dependencies are out of
    control.

17
Scope of the Study
  • We are not analyzing syntactic correctness of
    code.
  • We are not analyzing logical correctness of code.
  • Its applicability includes C-based procedural and
    object oriented languages C, C, C, Java.
  • Our tools only support C and C
  • Much of remaining work deals with repackaging
    content of existing code files to enhance
    dependency quality.
  • Research on repackaging techniques with
    heuristics, optimization.
  • Intent is to modify structure, not introduce new
    code.
  • Creating applications to automate obtaining
    information from source files.

18
Progress till now
  • Developed DepAnal, which is C/C static source
    code dependency analyzer tool.
  • Developed DAView, which visualizes dependencies
    among files and components in graphical
    representations.
  • Preparing paper for submission to
  • ISCA 20th INTERNATIONAL CONFERENCE ON COMPUTERS
    AND THEIR APPLICATIONS (CATA-2005) March
    16-18, 2005, New Orleans, Louisiana,
    USAhttp//isca-hq.org/confr.htm
  • Full paper Submission Deadline
  • November 5, 2004

19
Dependency Model
  • Focus is dependencies between files.
  • Files are unit of testing and configuration
    management
  • Based on types, global functions and variables.
  • Dependency Model - file A depends on file B if
  • A creates and/or uses an instance of a type
    declared or defined in B
  • A is derived from a type declared or defined in B
  • A is using the value of a global variable
    declared and/or defined in B
  • A defines a non-constant global variable modified
    by B
  • A uses a global function declared or defined in B
  • A declares a type or global function defined in B
  • A defines a type or global function declared in B
  • A uses a template parameter declared in B
  • Outputs are presented as direct dependencies.
    (does not show transitive closures for ease of
    interpretation too dense)

20
Architectural view of DepAnal
  • The goal is to build a tool that can be used to
    constantly monitor evolution of the state of
    large software systems
  • Makes two passes over each file in the project.

Finds dependencies based on static type analysis
  • DepAnal collects data from source code with the
    help of a C/C tokenizer and semi-expression
    composer.

21
Mozilla Project Version 1.4.1
  • The Mozilla project is a very large project
    developing browser tools for many different
    platforms.
  • Win 32 Configuration 
  • Number of executables 94
  • Number of dynamic link libraries 111
  • Number of static libraries 303
  • Number of source files for Win32, v 1.4.1
    6701
  • Analysis took approximately 24 hours on Dell
    Dimension 8300 with 1 G Memory

Wow!
22
Dependency Analysis Results
  • Show different views of dependency data for
    project and draw conclusions about what such data
    can disclose concerning a projects
    implementation.
  • The analysis results are presented for several
    data sets, in six views
  • Fan-in the number of files that depend on a
    file, for each file in the analysis set, and
    related fan-in density histogram.
  • Fan-out the number of files that a file depends
    on, for each file in the analysis set and related
    fan-out density histogram.
  • Strong Components groups of files that are all
    mutually dependent and its related strong
    component density histogram.
  • Topological sort of the strong components.
  • Expansion of all strong components within the
    sorted data.
  • Cyclomatic complexity versus file size.
  • We examine each of these views and interpret
    their data with respect to measures of project
    implementation strengths and weaknesses they
    reveal.

23
Fan-in Data Mozilla GKGFX library
  • Number of source files 655.
  • Dependencies from within the library.
  • When we analyze the entire build many of these
    fan-in numbers will increase.

High Fan-in coupled with low quality creates a
high probability for consequential change.
24
Fan-in Density Mozilla GKGFX library
  • Plot shows that significant number of library
    source code files have high fan-in,
    characteristic of a widely used library.

A library with this profile should be given high
priority for analysis by the test team and
quality analysts.
25
Fan-out Data Mozilla GKGFX library
  • A file with large fan-out may be symptomatic of a
    weak abstraction.

Fan-Out of 60!
We expect that a well-designed source file should
carry out its assigned tasks with the aid of a
few trusted delegates and perhaps a few
references to commonly used utilities.
26
Fan-out Density Mozilla GKGFX library
  • Large Fan-Out may be symptomatic of weak
    abstraction. Weve show elsewhere that High
    Fan-Out is correlated with large number of
    changes.

There are a significant number of files with
large fan-out.
27
Expanded Topological Sort GKGFX Library
  • If the file belongs to a strong component and any
    other file in that component is changed, rigorous
    testing dictates that it be retested. This makes
    a compelling argument in favor of continuous
    regression testing using test harnesses.

Approximately half the files in this library
cannot be put into a classic testing sequence.
This indicates a high probability of repeatedly
testing a given file.
Components below the diagonal are due to cycles
in dependency graph, e.g. mutual dependencies.
28
Dependency Data for the Entire Windows-Based
Mozilla Build
  • The plot below is a topological sorting of the
    dependency graph and then expanding strong
    components of the entire Mozilla build for
    windows.

This plot is so dense that it is becoming
difficult to draw conclusions, but the plot is
consistent with previous figure for the GKGFX
library.
29
This is Mozilla, Version 1.4.1, Windows
BuildPlot for GKGFX Library shows some very
large mutual dependencies
  • DAView shows that the GKGFX Library does indeed
    have significant structural problems, as
    predicted by the preceding views.
  • Note that these problems, made visible by our
    tools, are normally invisible!

30
Towards Improvement
  • If we can identify (low-level) causes of
    dependencies, we can reorganize file contents to
    improve inter-file dependency structure.
  • Intent is not to introduce new code, but to
    redistribute existing code between files to get
    better structure.

31
Future Research
  • Future researchs primary goal is to
  • Provide means to improve software systems
    dependency structure.

End of presentation Thanks for listening
32
Backup Slides
  • The following slides provide a little more detail
    in a few areas.

33
Files - Unit for analysis
  • In most development organizations, files are
    unit of testing and configuration management.
  • Dependencies between software files are essential
    so that one component may provide services to
    another.
  • If a file is using services of other files, it
    cannot be tested alone.
  • The larger the number of dependency between
    files, the harder it is to test,
    manage, understand, reuseThe situation gets
    worse if there are mutual dependencies.
  • Therefore, it is better to reduce dependencies
    between files, especially mutual dependencies.

34
File Dependency Structural Problems
  • Large Fan-out
  • Large Fan-in
  • Large Strong Components
  • Many and inconsistent levels in structure chart

35
Fine grain level dependency
  • One file depends on another file, if it uses the
    other files services
  • Types
  • Global Functions
  • Global Variables
  • To solve the file dependency problems we need to
    find more than file to file dependency. We check
    type-to-type, type-to-global function or
    variable, global function-to- type, global
    function-to-global function or variable.
  • If we obtain this information, we have fine-grain
    level dependencies. Now we can relocate some
    existing code to reduce dependency density among
    files.

36
Method Used for Improving Dependency
  • One simple solution is to put content of all
    files into one file, but this is not what we
    would like to uugggh, spaghetti code!
  • Our target is to simplify dependency structure by
    moving types, global functions and variables
    among existing files, and/or introducing new
    files, keeping file complexity essentially
    constant.

37
Comparing with previous dependency structure
  • In order to see the improvement in dependency
    structure, we will be comparing original
    dependency structure with the enhanced one by
    comparing
  • Dependency Structure
  • Fan-In, Fan-Out, Strong Component Size
  • Level analysis
  • And other graphic that are presented previous
    slides
Write a Comment
User Comments (0)
About PowerShow.com