Revealing Class Structure With Zoomable Concept Lattices - PowerPoint PPT Presentation

About This Presentation
Title:

Revealing Class Structure With Zoomable Concept Lattices

Description:

Example: ECG of Pnt3D. 26. Investigate Fields. Examine unused fields ... Use restricted ECG. Read methods in same ECG component together ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 42
Provided by: csC76
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Revealing Class Structure With Zoomable Concept Lattices


1
Revealing Class Structure With Zoomable Concept
Lattices
  • Uri Dekel
  • Department of Computer Science
  • Technion, Haifa, Israel

M.Sc. research supervised by Dr. Yossi Gil
2
Outline
  • Introduction
  • Formal Concept Analysis
  • Stage I Interface Analysis
  • Stage II Implementation Analysis
  • Stage III Code Inspection
  • Version Comparison
  • Conclusions, Related Future Research

3
Domain
  • Understanding and analyzing individual Java
    classes
  • Interface (black-box) analysis
  • Reducing the learning curve
  • Discovering interface problems
  • Implementation (white-box) analysis
  • Understanding class structure and role of fields
  • Discovering implementation problems
  • Code review and inspection
  • Understanding the purpose of each method from its
    code.
  • Ensuring style, quality, and correctness
  • Discovering code reuse opportunities
  • Version Comparison

4
Problems
  • Classes can be very large and complex
  • OOP practices promote use of many methods
  • Meyers shopping list approach advocates
    completing the interface with syntactic-sugar
    methods
  • Rules of software evolution The entropy of
    software artifacts increases with time
  • Delocalisation
  • Definition order not meaningful

Fact A quarter of all public methods are found
in classes with more than 100 methods !
5
Research Question
  • Can Formal Concept Analysis (FCA) help alleviate
    some of these problems?
  • FCA is a mathematical classification technique
  • Helps discover meaningful data in binary
    relations
  • Can be visualized with Concept Lattices
  • FCA has been applied to many CS and SW problems
  • Automatic modularization
  • Automatic construction and refinement of class
    hierarchies
  • Reverse engineering complex systems
  • Smart component repositories

6
Formal Concept Analysis
  • Input A context ltO,A,Rgt
  • O is a set of objects
  • A is a set of attributes
  • R is a binary relation between O and A
  • Mapping Galois Connection
  • Common attributes of a set of objects
  • Common objects of a set of attributes
  • Output Concepts s.t.

7
FCA Example
  • Field-accesses context of a class
  • Objects are fields, attributes are methods,
    relation specifies which methods access each
    field

Context
Concepts
8
Concept Lattices
  • Partial order
  • Defines domination between concepts
  • Visualized as a concept lattice

9
Interpreting Class Lattices
  • We use only sparse lattices
  • Economical but equivalent representation
  • Each object introduced in lowest concept
  • Each attribute introduced in highest concept
  • Interpretation
  • Each method uses all fields introducedin the
    same concept or below
  • Reveals
  • Possible restructuring
  • Asymmetry between coordinates

10
Field-Accesses Context
  • Field usage is critical for understanding a class
  • All implementations of an operation use the same
    fields
  • Representation changes are rare
  • Methods that use the same combination are related
  • Can be calculated directly from the .class file
  • Allows some reverse engineering without source
    code
  • Calculated using standard static analysis
  • Currently restricted to accesses inside the class

11
Zoom-in Zoom-out approach
  • Problems
  • Concept lattices can be very large
  • Number of concepts is bound by
  • Polynomial for most real-life contexts
  • Linear for 99.5 of classes!
  • Elaborate member details are cumbersome
  • Solution
  • Provide (semi-) automatic zoom in/out tools

12
Running Example
  • The Molecule class from CDK
  • CDK Chemistry Development Kit
  • Open source library of chemistry related classes
  • Developed at the Max Plank institute in Germany
  • Used in chemistry visualization applications
  • Why the Molecule class?
  • Has a large interface (nearly 75 public members)
  • The represented entity is familiar to most people
  • Methodology was successfully applied to other
    classes as well

Our methodology revealed several new bugs and
issues !
13
Stage I Interface Analysis
  • Programming today is a race between software
    engineers striving to build bigger and better
    idiot-proof programs, and the universe trying to
    produce bigger and better idiots. So far, the
    universe is winning
  • --Rich Cook
  • There are only two industries that refer to
    theircustomers as users
  • -- Edward Tufte

14
Interface Analysis
  • Purpose
  • Understand the functionality provided by the
    class
  • Map expectations into interface members
  • The concept assignment or feature mapping
    problems
  • Discover problems
  • e.g. missing or superfluous functionality,
    exposed implementation details, inconsistent
    naming
  • Methodology
  • Methods are partitioned into concepts
  • Heuristic for automatic feature categorization
  • Zoom-out and reason about overall structure
  • Zoom-in and examine specific functionalities

15
Preliminaries
  • Mapping features to interface members requires
    knowing what the features are
  • Tasks
  • Surmising abstraction, purpose and role
  • Determining vocabulary
  • Predicting mandatory- and non-mandatory
    functionality
  • Information sources
  • Domain-specific knowledge
  • Class environment
  • E.g. hierarchy, dependencies, etc.
  • This step is not unique to concept analysis

16
Context Selection
  • Only client-visible methods should be used
  • Public methods by default, protected if client is
    subclass, default if client is in the same
    package
  • All fields are kept to ensure a correct
    partitioning
  • Will be removed after the lattice is constructed
  • Context parameters (boldface indicates selection)

(bold indicates our selection, F representsdont
care )
17
Constructing the Lattice
  • The lattice is too cluttered to grasp immediately
  • We start zooming-out
  • Layers correspond to levels of abstraction

18
Simplifying concepts
  • We summarize the responsibilities of each concept
    in a quick skim over method signatures
  • This process cannot be fully-automated at present
  • Still too cluttered !

19
Naming Concepts
  • Name concepts based on summary
  • Use symbolic representations for common
    responsibilities

20
Horizontal Decomposition
  • Remove top- and bottom- concepts
  • Connected components are orthogonal
  • Problem with title (on the right) becomes obvious
  • Abundance of trivial components implies
    record-like behavior
  • Cohesive component requires further analysis

21
Abstraction Lattice
  • Heuristic for clustering concepts
  • Concepts dominated by the same top-layer concepts
    belong in the same cluster

22
Match services against expectations
  • Functionality search order
  • Expected mandatory features
  • Expected non-mandatory features
  • Unexpected features
  • For each functionality
  • Mark relevant clusters
  • Mark relevant concepts
  • Examine each concept
  • Example
  • Bond management

23
Stage II Implementation Analysis
  • "There are two ways of constructing a software
    design One way is to make it so simple that
    there are obviously no deficiencies, and the
    other way is to make it so complicated that there
    are no obvious deficiencies. C. A. R.
    Hoare

24
Implementation Analysis
  • Purpose
  • Understand implementation and structure.
  • Discover problems
  • e.g. redundant fields, bad naming conventions,
    wrongly-implemented operations
  • Methodology
  • Code is not inspected at this stage!
  • All information derived from lattice
  • Zoom-in
  • Including private fields and methods
  • Listing full signatures and introducing classes
  • Embedded call-graph

25
Embedded Call Graph
  • Superposition of call-graph on concept lattice
  • A semantics-based CG layout heuristic
  • Keeps related methods together while reducing
    crossings
  • Helps investigate relations between methods
  • e.g. surmise level of abstraction or discover
    wrappers
  • Used later for selecting an order for code
    inspection
  • Example ECG of Pnt3D

26
Investigate Fields
  • Examine unused fields
  • Might indicate unimplemented stubs or dead
    structure
  • Discover the roles of fields
  • Easy for trivialcomponents
  • Harder for thecohesive one
  • Investigateinterdependency
  • Naming quality

27
Investigate Special Methods
  • Methods that (should) use the entire state should
    be in the top concept
  • Exceptions can indicate problems
  • Zoom-in by adding declaring class details
  • Examine methodsthat do not use fields
  • e.g. discoverundeclared statics

28
Investigate Other Methods
  • Ensure symmetry where expected
  • e.g. C11 and C13, C10 and C14, C16 and C17
  • Ensure methods use expected access patterns
  • Add non-publicmethods to lattice

29
Stage III Code Inspection
  • Real programmers don't document. If it was hard
    to write, it should be hard to understand --A
    nonymous
  • Real programmers can write assembly code in any
    language --Larry Wall

30
Code Inspection
  • Purpose
  • Understand functionality which is unclear after
    the previous stages.
  • Ensure quality of code and style
  • Methodology
  • Select an order for effective reading
  • Maximizing reading throughput
  • Maximizing discovered defects
  • Minimizing repetitions

31
Code Inspection Problem
  • Original source code order not effective
  • Co-definitions.
  • No incremental order
  • All class members are defined simultaneously
  • Perturbations to intended order
  • Evolution and maintenance
  • Language issues (e.g. inheritance)
  • Style issues (e.g. public before private)

32
Reading Strategy
  • Organize methods into groups of related
    functionality and order these groups (global
    order)
  • Order the methods inside each group (local order)
  • Each concept is a group
  • Same-concept methods are similar in purpose,
    semantics and implementation
  • Increased prospects of understanding differences
    between methods and discovering redundancies and
    replications
  • Less infrastructure (e.g. external libraries) to
    memorize

33
Reading Strategy
  • Global order (by importance)
  • Read each HD component separately
  • Each represents an independent functionality
  • Read concepts in ascending order of layers
  • Exploit similar level of abstraction
  • Read concepts of the same cluster together
  • Local order (by importance)
  • Read methods in topological order
  • Use restricted ECG
  • Read methods in same ECG component together
  • Resolve equivalencies with simplest-first rule

34
Inspection Tasks
  • Inspection tasks customized for our reading order
  • Finding duplicate services inside a concept
  • e.g. getDegree and getBondCount
  • Identifying code-sharing opportunities
  • e.g. overloads of addBond
  • Verify that low-level methods are not bypassed
  • e.g. getBondCount, getBondAt
  • An addition to standardinspection tasks

35
Version Comparison
  • Zero defects The result of shutting down a
    production line --Kelvin Throop III, "The
    Management Dictionary"

36
Version Comparison
  • Examine an outline of the differences before the
    actual details
  • Example

Differences between the original version of the
Graph class of VGJ (Visualizing Graphs with
Java) and the Technion adaptation of that
class. Originals appear in bold font,
Modifications appear in plain font
37
Related- and Future- Research
38
Related Research
  • Formal Concept Analysis
  • Many applications for
  • Automatic class hierarchy construction
  • Automatic Modularization
  • Reverse engineering and program understanding
  • Management of component repositories
  • Understanding individual classes
  • Class blueprints (M. Lanza and S. Ducasse)
  • Not much else at the class level

39
Research Directions
  • Extensions to Current methodology
  • Conducting user studies
  • Validating the methodology
  • Discovering new tools
  • Integration with development or browsing tools
  • e.g. Eclipse or IBMs documentation enhancer
  • We currently have a non-interactive prototype
  • New zoom-in and zoom-out tools
  • Using other classification criteria
  • e.g. use of types, name-based classification

40
Research Directions (cont.)
  • Common Programming Practices
  • Defining a lattice-based suite of class metrics
  • Lattice Patterns
  • Other directions of research
  • Using nano-patterns to annotate methods
  • Marking functionality directly on lattice.
  • Applicability to class design in CASE tools
  • Interactive class diagram editor based on concept
    lattice
  • Methods are connected to fields and hence
    assigned some semantics.
  • Automatic assignment of Nano-patterns
  • Dealing with multiple classes

41
The End
  • Theory is when you know something, but it
    doesn't work. Practice is when something works,
    but you don't know why. Programming combines
    theory and practice Nothing works and you don't
    know why
  • -- Anonymous
Write a Comment
User Comments (0)
About PowerShow.com