Title: Documenting and Automating Collateral Evolutions in Linux Device Drivers
1Documenting and Automating Collateral Evolutions
in Linux Device Drivers
- Yoann Padioleau
- Ecole des Mines de Nantes (now at UIUC)
- with
- Julia Lawall and René Rydhof Hansen (DIKU)
- Gilles Muller (Ecole des Mines de Nantes)
the Coccinelle project
2The problem Collateral Evolutions
lib.c
int foo(int x)
becomes
int bar(int x)
- Can entail lots of
- Collateral Evolutions (CE) in clients
before
Legend
after
clientn.c
client1.c
client2.c
foo(foo(2))
bar(bar(2))
if(foo(3))
if(bar(3))
3The problem Collateral Evolutions
lib.c
int foo(int x)
becomes
int bar(int x, int y)
- Can entail lots of
- Collateral Evolutions (CE) in clients
before
Legend
after
clientn.c
client1.c
client2.c
foo(foo(2))
bar(bar(2,?),?)
if(foo(3))
if(bar(3,?))
4Our target Linux device drivers
- Many libraries and many clients
- Lots of driver support libraries one per device
type, one per bus (pci library, sound library, ) - Lots of device specific code Drivers make up
more than 50 of Linux - Many evolutions and collateral evolutions
Eurosys06 - 1200 evolutions in Linux 2.6
- For each evolution, lots of collateral evolutions
- Some collateral evolutions affect over 400 files
at over 1000 code sites
5Our goal
- Currently, Collateral Evolutions in Linux are
done nearly manually - Difficult
- Time consuming
- Error prone
- The highly concurrent and distributed nature of
the Linux development process makes it even
worse - Patches that miss code sites (because of newly
introduced sites and newly introduced drivers) - Out of date patches, conflicting patches
- Drivers outside the Linux source tree are not
updated - Misunderstandings
Need a tool to document and automate Collateral
Evolutions
6Taxonomy of transformations
- Taxonomy of evolutions (library code)
- add parameter, split data structure, change
protocol sequencing, change return type, add
error code, etc - Taxonomy of collateral evolutions (client code)?
- Very wide variety of program transformations,
affecting wide variety of C and CPP constructs - Often depends on context, e.g. for add argument
the new value must be constructed from enclosing
code - Note that not necesseraly semantic preserving
Can not be done by current refactoring tools
(more than just renaming entities). Need a
flexible tool.
7Complex Collateral Evolutions (2.5.71)
- Evolution scsi_get()/scsi_put() dropped from
SCSI library - Collateral evolutions SCSI resource now passed
directly to proc_info callback functions via a
new parameter
From local var to parameter
- int a_proc_info(int x
-
- )
- scsi y
- ...
- y scsi_get()
- if(!y) ... return -1
- ...
- scsi_put(y)
- ...
-
,scsi y
Delete calls to library
Delete error checking code
before
Legend
after
8Our idea
The example
- How to specify the required program
transformation ? - In what programming language ?
- int a_proc_info(int x
- ,scsi y
- )
- scsi y
- ...
- y scsi_get()
- if(!y) ... return -1
- ...
- scsi_put(y)
- ...
-
9Our idea Semantic Patches
metavariable declarations
_at__at_
Patch-like syntax
function a_proc_info identifier x,y
_at__at_
Transform if everything matches
metavariable references
- int a_proc_info(int x
- ,scsi y
- )
- - scsi y
- ...
- - y scsi_get()
- - if(!y) ... return -1
- ...
- - scsi_put(y)
- ...
-
the ... operator
Declarative language
modifiers
10Affected Linux driver code
drivers/scsi/53c700.c
drivers/scsi/pcmcia/nsp_cs.c
- int s53c700_info(int limit)
-
- char buf
- scsi sc
- sc scsi_get()
- if(!sc)
- printk(error)
- return -1
-
- wd7000_setup(sc)
- PRINTP(vald,
- sc-gtfieldlimit)
- scsi_put(sc)
- return 0
int nsp_proc_info(int lim) scsi host
host scsi_get() if(!host)
printk(nsp_error) return -1
SPRINTF(NINJASCSId, host-gtbase)
scsi_put(host) return 0
Similar, but not identical
11Applying the semantic patch
int s53c700_info(int limit) char buf
scsi sc sc scsi_get() if(!sc)
printk(error) return -1
wd7000_setup(sc) PRINTP(vald,
sc-gtfieldlimit) scsi_put(sc) return 0
int nsp_proc_info(int lim) scsi host
host scsi_get() if(!host)
printk(nsp_error) return -1
SPRINTF(NINJASCSId, host-gtbase)
scsi_put(host) return 0
proc_info.sp
- _at__at_
- function a_proc_info
- identifier x,y
- _at__at_
- int a_proc_info(int x
- ,scsi y
- )
- - scsi y
- ...
- - y scsi_get()
- - if(!y) ... return -1
- ...
- - scsi_put(y)
- ...
-
spatch .c lt proc_info.sp
12Applying the semantic patch
int s53c700_info(int limit, scsi sc) char
buf
wd7000_setup(sc) PRINTP(vald,
sc-gtfieldlimit) return 0
int nsp_proc_info(int lim, scsi host)
SPRINTF(NINJASCSId,
host-gtbase) return 0
proc_info.sp
- _at__at_
- function a_proc_info
- identifier x,y
- _at__at_
- int a_proc_info(int x
- ,scsi y
- )
- - scsi y
- ...
- - y scsi_get()
- - if(!y) ... return -1
- ...
- - scsi_put(y)
- ...
-
spatch .c lt proc_info.sp
13SmPL Semantic Patch Language
- A single small semantic patch can modify hundreds
of files, at thousands of code sites - The features of SmPL make a semantic patch
generic. Abstract away irrelevant details - Differences in spacing, indentation, and comments
- Choice of the names given to variables
(metavariables) - Irrelevant code (..., control flow oriented)
- Other variations in coding style (isomorphisms)
e.g. if(!y) if(yNULL)
if(NULLy)
14Sequences and the operator
C file
Semantic patch
1 y scsi_get() 2 if(exp) 3 scsi_put(y) 4 ret
urn -1 5 6 printf(d,y-gtf) 7
scsi_put(y) 8 return 0
- y scsi_get() ... - scsi_put(y)
Control-flow graph(CFG) of C file
1
path 1
2
6
path 2
. . . means for all subsequent paths
3
7
8
4
exit
One - line can erase multiple lines
15Isomorphisms, C equivalences
- Examples
- Boolean X NULL ? !X ? NULL X
- Control if(E)S1 else S2 ? if(!E) S2 else S1
- Pointer E-gtfield ? E.field
- etc.
- How to specify isomorphisms ?
_at__at_ expression X _at__at_ X NULL ltgt !X ltgt
NULL X
Reuses SmPL syntax
16How does it work ?
17The transformation engine architecture
Parse Semantic Patch
Parse C file
Expand isomorphisms
Translate to CFG
Translate to extended CTL
Match CTL against CFG using a model checking
algorithm
Computational Tree Logic Clark86 with extra
features
Modify matched code
Unparse
18CTL and Model checking
- Model checking a CTL formula against a model
answers just yes/no (with counter example). - We do program transformations, not just pattern
checking. Need - Bind metavariables and remember their value
- Remember where we have matched sub-formulas
- We have extended CTL existential variables and
program transformation annotations
_at__at_ exp X,Y_at__at_ f(X) ... - g(Y) g(X,Y)
9X.f(X)Æ AX Atrue U 9v.9Y.g-(-Y-)--g(X,Y)v
19Other issues
- Need to produce readable code
- Keep space, indentation, comments
- Keep CPP instructions as-is. Also programmer may
want to transform some define,iterator macros
(e.g. list_for_each) - Interactive engine, partial match
- Isomorphisms
- Rewriting the Semantic patch (not the C code),
- Generate disjunctions
Very different from most other C tools
60 000 lines of OCaml code
20Evaluation
21Experiments
- Methodology
- Detect past collateral evolutions in Linux 2.5
and 2.6 using patchparse tool Eurosys06 - Select representative ones
- Test suite of over 60 CEs
- Study them and write corresponding semantic
patches - Note we are not kernel developers
- Going "back to the future". Compare
- what Linux programers did manually
- what spatch, given our SPs, does automatically
22Test suite
- 20 Complex CEs mistakes done by the programmers
- In each case 1-16 errors or misses
- 23 Mega CEs affect over 100 sites
- Up to 40 people for up to two years
- 26 typical CEs
- The whole set of CEs affecting a typical
(median) directory from 2.6.12 to 2.6.20
More than 5800 driver files
23Results
- Our SPs are on average 106 lines long
- SPs often 100 times smaller than human-made
patches. A measure of time saved - Not doing manually the CE on all the drivers
- Not reading and reviewing big patches, for people
with drivers outside source tree - Overall correct and complete automated
transformation for 93 of files - Problems on the remaining 7 We miss code sites
- CPP issues, lack of isomorphisms (data-flow and
inter-procedural) - We are not kernel developers dont know how to
specify - No false positives, just false negatives
- Average processing time of 0.7s per relevant file
Sometimes the tool was right and human wrong
24Impact on the Linux kernel
- We also wrote some SPs for current collateral
evolutions (looking at linux kernel mailing
lists) - use DIV_ROUND_UP, BUG_ON, FIELD_SIZE
- convert kmalloc-memset to kzalloc
- Total diffstat 154 files changed, 203
insertions(), 375 deletions(-) - We wrote other SPs, for bug-fixing (good side
effects of our tool) - Add missing put functions (reference counting)
- Drop unnecessary put functions (reference
counting) - Remove unused variables
- Total diffstat 111 files changed, 340
insertions(), 355 deletions(-)
Accepted in latest Linux kernel
25Future work
- Are semantic patches and spatch useful
- Only for Linux device drivers?
- Only for Linux programmers?
- Only for collateral evolutions program
transformations? - Only for program transformations?
- Our thesis We dont think so.
- But first device driver CEs are an important
problem! - All software evolves. Software libraries are more
and more important, and have more and more clients
We may also help that software, those libraries
26Related work
- Refactoring
- CatchUpICSE05, tool-based, replay refactorings
from Eclipse - JunGLICSE06, language-based, but based on ML,
less Linux-programmer friendly - Program transformation engines
- Stratego04
- C front-ends
- CILCC02
27Conclusion
- Collateral Evolution is an important problem,
especially in Linux device drivers - SmPL a declarative language to specify
collateral evolutions - Looks like a patch fits with Linux programmers
habits - But takes into account the semantics of C hence
the name Semantic Patches - A transformation engine to automate collateral
evolutions based on model checking technology.
28Thank you
- You can download our tool, spatch, at
http//www.emn.fr/x-info/coccinelle - Questions ?