Title: Verifying Compilers for Financial Applications
1Verifying Compilersfor Financial Applications
- David CrockerEscher Technologies Ltd.
2Why financial applications?
- Lots of money at stake if the software delivers
incorrect results - FSA sets tough audit regulations backed up by
very large fines - Business-critical software development is mostly
not outsourced to low-cost countries
3Characteristics ofinvestment banking software
- Often very complex
- Many hundreds of classes
- Agile techniques required
- Need to value handle new exotics quickly
- Lots of floating-point maths
- Used to represent money and often time
- Multi-threading is increasingly important
- To make the best use of processor resources
4Techniques and languagestypically used
- Component-based object-oriented development
- Helps to manage the complexity
- Provides agility
- Object-oriented programming languages
- Mainly C and C
- Some Java, little or no UML
- Often integrated with Excel spreadsheets
5What sorts of verifying compilermight an
investment bank use?
- Verifying compilers for C and C
- Much too difficult in practice
- Verifying compilers for annotated C and C
- e.g. MS Research Spec project
- Verifying compilers for specifications
- e.g. Perfect Developer
6Verification with annotated programming languages
Examples ESC/Java, SPARK, Spec
- Advantages
- Based on a standard programming language
- Easier to introduce into a development process
- Can use on existing code by adding the annotations
- Disadvantages
- Java, C etc. were not designed for verification
- Correctness and completeness must be sacrificed,
or the language must be severely subsetted - Large volume of annotation needed
- Loops and aliasing are big problems
- Hard to express data refinement
7The problem with loops
- Consider a method that contains a loop
- To verify code that follows the loop, we need to
know the state of the system just after the loop - This means we need to know the loop invariant
- Loop invariants are often very difficult to
determine even for experts!
8The problem with aliasing
- Changing the value of one variable also changes
the values of other variables it is aliased to - Most O-O languages use reference semantics by
default - Inheritance, polymorphism and dynamic binding
greatly increase the possibility of aliasing - The number of anti-aliasing annotations
potentially needed becomes HUGE!
9Why are we still usingpeople to write programs?
- Our view Traditional programming is obsolete!
- C, Ada, C etc. should play the same role today
that assembly languages did 20 years ago - Most code should be generated from specifications
- Manual refinement is sometimes needed
- Where a more efficient data representation is
needed - Where code generated directly from specifications
is currently not efficient enough
10Specify-Refine-Generate approach
Examples B-method, Perfect Developer
- Specify the system at multiple levels
- Behavioural and state-based specifications
- Verify the specifications for consistency and
completeness - Refine the specifications
- Manually (within the same notation) and
automatically - Verify that the refinements are correct
- Generate code in a standard programming language
- C, Java, C, Ada
11Advantages
- The notation is designed for verification
- e.g. value semantics by default, polymorphism
only on demand - Automated verification is much more tractable
- The user is largely spared from writing loop
invariants - 92 of all loops are generated from
specifications - The user is encouraged to write the specification
first - The most expensive errors are detected earlier
12Adapting Perfect Developerfor investment banking
- Floating point model had to be relaxed
- e.g. we now assume (a b) c a (b c) for
type real - Multiple inheritance of interfaces was added
- Component-based systems are typically built
around interfaces - C code generation will be needed
- Currently we generate C (subset), Java, partial
Ada
13Example valuing European options
- Vanilla European call and put options can be
valued using the Black-Scholes formula - Based on stochastic calculus
- There is an expected relation between the values
of a call option and the corresponding put option - A no arbitrage agreement predicts call-put
parity - We wish to verify
- For any input that obeys the declared
preconditions, the valuator shall not crash - The valuator obeys call-put parity
14(No Transcript)
15First attempt at verification
16Unproven output file
17Correcting the Specification
function putValue(today Date, assetPrice Money,
rate real, vol Volatility) Money
pre today lt maturity, assetPrice gt 0.0
( let ttm maturity - today let
vSqrtT vol sqrt(ttm) let d1
(log(assetPrice/strike) (rate volvol/2.0)
ttm)/vSqrtT let d2 d1 - vSqrtT
strike exp(-rate ttm) cunorm(-d2)
assetPrice cunorm(-d1) )
Insufficientprecondition
Incorrect sign
18Verifying the corrected specification
19Extract from the generated proofs
20Does Perfect Developer meet the challenges of
financial software?
- Complexity
- Complexity is less than PD itself, so not a
problem - We now have a 64-bit version of PD to handle
larger proofs - Floating point maths
- Using the relaxed FP model we can prove some
useful properties - We cannot guarantee absence of overflow/underflow
or the accuracy of the result
21What about multi-threading?
- Processor clock speeds have reached a plateau
- But transistor counts are still increasing
- Processor development is now centred on multiple
cores - Future applications must be multithreaded to take
advantage of increasing processor power - We need to handle two sorts of concurrency
- Distributed systems (use CSP model checking)
- Thread-level concurrency with shared variables
22How should we handlethread-level concurrency?
- The traditional approach is to use locks
- Implemented in some languages as synchronised
methods/objects - But programmers frequently use them incorrectly
- The compiler should manage access to shared
variables - But automating the creation and use of locks is
not very satisfactory because - Locks do not compose
- If we compose components that use locks, we can
get deadlock, priority inversion and other
problems
23Transactional Memory to the rescue!
- Relieves the programmer from having to worry so
much about access to shared variables - Avoids deadlock and priority inversion
- Can be implemented in software
- STM has been implemented for a Java compiler
- Even better, implement it in the Java or .NET
runtime - See (e.g.) the paper by Simon Peyton-Jones et al
for more details - http//research.microsoft.com/Users/simonpj/papers
/stm/stm.pdf
24Further work needed
- Formalising practical floating point maths
- In conjunction with experts in numerical
algorithms - Concurrency
- Do CSP and TM between them cover all our needs?
- Reaching 100 automated verification most of the
time - Inductive proofs are occasionally needed
- Handling proof failures
- Provide even better suggestions to the user
25Conclusion
- Verifying compilers for specifications of
complexsingle-threaded applications already
exist - 95 to 100 automated proof is currently achieved
- Extending this to cover shared-memory
multi-threaded applications should be possible
within 5 years - Provided that transactional memory lives up to
expectations - More research is needed before we can fully
verify systems using floating point arithmetic
26Software inevitably contains bugs
- lets dispel that myth!
27Appendix What others think of Perfect Developer
PD is the only tool of the four that comes close
to the ideal of automatic and easy program
verification.
Ingo Feinerer, MSc thesis, Technischen
Universität Wien
In comparison with other tools, PD offers a
software oriented approach to refinement rather
than a brutally mathematical one PD supports
specification and implementation in a relatively
simple language, so its learning curve is quite
gentle for practicing software engineers.
Gareth Carter, Software
Engineering and Formal Methods 2005
28Appendix Some application statistics
1 Including comments 2 No comments runtime
checks not generated