Title: The Small World of Software Reverse Engineering
1The Small World of Software Reverse Engineering
- Ahmed E. Hassan and Richard C. Holt
- SoftWare Architecture Group (SWAG)
- University Of Waterloo
2Publications
- We study the evolution of a field through its
publications - Publications give a picture of
- Collaboration
- High degree in academia in contrast to industry
- Emergence of topics
- Hot topics and their effects on collaborations
3DBLP
- DBLP
- DataBase systems and Logic Programming
- Digital Bibliography and Library Project
- Tracks publications in several conferences and
communities, such as - WCRE
- Reengineering and maintenance
- Software engineering
- Records for each publication
- Title
- Authors
- Conference name and year
- Abstract
- Data available online as an XML file
4Studying Collaboration
- Develop a social collaboration network using
co-authorship data from DBLP - A node exists for each author
- An edge exists between two nodes (authors), if
they co-authored a paper together - Size of node proportional to of pubs
- Edges have a weight proportional to of
co-authored papers - Use a force based algorithm to layout the network
5Co-Authorship Graph
6Co-Authorship Graph
7The Largest Component over Time
8Small World Graphs
- Large graphs with small paths connecting its
nodes - Stanley Milgram studied them in the 60s
- Letters were given to people in Nebraska
- Each person hands letter to someone they knew and
whom they believe can eventually deliver the
letter to a stockbroker in Pittsburgh - Average chain of people between both cities is 6
six degrees of separation - Collaboration networks which are small world
graphs - Good indicator of ease of communication of
knowledge between members of a community
9Small World Graph
- Characteristic Path Length (L) measures on
average how many individuals an author has to go
through to reach other authors - The average shortest path from any node in the
graph to any other node in the largest component
of the graph - Clustering Coefficient (C) measures how
collaborative on average are the co-authors of an
author - For a node, C is the ratio of edges to neighbors
of that node to the maximum number of edges
between these neighboring nodes - Watts and Strogatz give a formal definition of
small world graphs using C, L, and random graphs - LgtLrandom and C gtgt
Crandom
10WCRE is a small world graph!
- Clustering coefficient is 0.76
- Characteristic path length is 4.3
Author Centrality Canfora 2.76 Koschke
2.88 Merlo 2.94 De Lucia 3.1 Holt 3.2
Towards a Standard Schema for C/C By Ferenc,
Sim, Holt, Koschke and Gyimothy 3.94 ? 4.32
11Paper Titles Analysis for Emerging Terms
12Bigger Small Worlds in SE
- We compare results against another two research
communities - Maintenance and Reengineering (MR) WCRE, IWPC,
CSMR, ICSM - Software Engineering MR 17 Conferences
- DBLP data is not as complete for these conferences
13The Largest Component over Time
- Slow constant growth then rapid growth once
researchers know each other - Soft Eng has slower growth
- Less conferences in early days
- Incomplete DBLP data
- Wider scope
- MR and Soft Eng rapid growth since 1996
- Internet and email?
- MR and WCRE growing since late 90s
- Y2K?
14Most Central Authors over Time
15WCRE, MR, SE vs. other fields
16Title Spectrograph
Joint work with J. Wu
Java
Reverse
object
Compon
orient
software
program
design
system
experi
engin
data
abstract
76 77 78 79 80 81 82 83 84 85 86 87 88 89
90 91 92 93 94 95 96 97 98 99 00 01 02 03
17Conclusion
- A meta paper on publications and collaboration
networks in WCRE, MR and SE communities - Small World collaboration networks facilitate the
exchange of ideas and results in a community - Many of the techniques presented could be used to
study the evolution of software systems (files or
developers as nodes)
18Generating Small World Graphs Using Random
Re-wiring
Small L Small C
Small L Large C
19(No Transcript)
20Percentage of Papers in a year
21 of new co-authors in a year