Title: Developing Reliable Complex Software Systems in a Research Environment
1Developing Reliable Complex Software Systems in a
Research Environment
- Christopher Mueller and Andrew Lumsdaine
- Open Systems Laboratory
- Indiana University
2Research Software Projects
- Software is an essential research tool
- Many projects use custom software
- Data gathering and processing
- Simulation
- Analysis and visualization
- Algorithm/protocol development
- Glue for 3rd party applications
- Many research areas are developing common
application frameworks - Software is often developed by a combination of
grad students, undergrads, Pis, collaborators,
and consultants using little or no process.
3Evolution of an Application
- First application written in Fortran
- A Model for Baconian Dynamics
F77
- Tom ports to C
- Adds command line parameters, makefile
- An Application Framework for Baconian Dynamics
- Jenny ports to F90
- Extends model
- An Extended Model of Baconian Dynamics
C
F90
- Brad ports to C
- Models system using objects
- An Object Oriented System for Dynamical Baconian
Systems
C
C
Java
- Jeremy (consultant) rewrites existing versions as
C version - Advanced template and object patterns lead to
fast and extensible code that is indecipherable
by scientists
- Maria implements model in Java for the Grid
- Implements original model
- A Scalable, Grid Enabled Toolkit for Baconian
Systems
- Baconian Dynamics predicts summer blockbusters
- Everyone wants a copy of the software
Baconian Dynamics predicts the success of a
movie using models based on the casts Bacon
Numbers, i.e. how many degrees of separation are
between the actors and a movie staring Kevin Bacon
4A Closer Look
F771
- There were 6 major versions
- 13 actual implementations
- 5 Languages
- 2 major versions advanced the science
- 4 major versions were simply software projects
- All versions re-implemented basic features
- The implementations used for the papers were not
always used for the next major version
F772
F901
C3
C1
F902
F903
C2
F904
C1
C
Java
C2
Version used for paper
Versions that advanced science
5Research Software Crisis!
Problem Research software applications are
difficult to develop and are costing researchers
time and money.
Solution Separate Research and Development and
use a development model derived from industrial
software development.
6Software Development
Business Modeling
Requirements
Software development is an iterative process that
consists of three main phases Business
modeling, Application Development, Maintenance
Design
Implement
Test
Deploy
Maintain
7Business Modeling
Goal Understand the main roles and procedures
used in a research program
- Identify Roles
- Researcher
- Support staff
- Developer
- PI
- Identify projects
- Identify common workflows
- Data processing pipelines
- Experimental protocols
- Identify commonly used data
- Instrument data
- Reference data collections
- Parameter files
- Identify physical resources
- Instruments
- Reagents
- Identify computational resources
- Commercial software tools (e.g., Excel, Spotfire,
ChemDraw, etc) - In-house software
Business Modeling
Requirements
Design
Implement
Test
Deploy
Maintain
8Requirements and Design
Goal Understand and agree upon the main features
for the application and each iteration
- Requirements will change as the project evolves
based on user feedback - Initial requirements should include only features
that are needed by users, not features that might
be needed in the future - The design fairly coarse grained, but identify
all the major components - Components that use unfamiliar technology should
be prototyped
Business Modeling
Requirements
Design
Implement
Test
Deploy
Maintain
9Implementation and Testing
Goal Implement the current iterations features
- This is where code is written
- Unit tests are fine-grained tests that cover one
or two low level features - As the code is written, it is versioned. This
makes it possible to revert to older versions. - For in-house software, testing is generally
performed by the user and developer - Short iterations and direct contact between
developers and users facilitate bug fixes - For scientific software, testing must include
validation, that is, confirming that the code
generates correct results
Business Modeling
Requirements
Design
Implement
Test
Deploy
Maintain
10Deployment and Maintenance
Goal Deliver the application to the users and
continue to support it
- Deployment consists of two steps
- Staging
- Application is installed in a production
sandbox - Users test application
- Deployment
- Application is installed on the users machines
- After deployment, the development process is
repeated until the application is complete and
enters maintenance mode - Complete is agreed upon by the developers and
users - No application is ever really complete, which
leads us to - Maintenance accounts for roughly 60 of software
costs (time and money) - This is good! It means the application is being
used and improved
Business Modeling
Requirements
Design
Implement
Test
Deploy
Maintain
11Software Tools
Diagram software (Visio, etc), spreadsheets, word
processors
Business Modeling
Rapid prototyping tools (VB, Python),
Requirements
Interpreted languages (Java, VB, Python)
Libraries/Components (numerical, plotting,
instrument communication)
Design
Compiled languages (C/C/Fortran)
Implement
Integrated Development Environments (IDEs)
Test
Debuggers
Bug/Feature tracking system
Deploy
Packaging Systems
Automated build system (nightly)
Maintain
12Roles
(bold roles are essential)
- End User
- Anyone who uses the software
- Project Manager
- Coordinates development efforts, resolves
conflicts, ensures project is moving along - Note This is the hardest job to fill
- Lead Developer (Architect, Sr. Software Engineer)
- Experienced member of the team, understands
technologies and is able to advise other
developers - Same responsibilities as developer
- Developer
- Responsible for all aspects of a portion of the
application (requirements, design,
implementation, testing) - Web developer
- Similar to a developer, but with a skill set
targeted at designing and implementing Web sites
and applications - Database Administrator (DBA)
- Maintains and optimizes the database and helps
developers design database applications - Technical Writer
- Develops tutorials and user manuals
- Quality Assurance
- On projects that a released to a wide audience, a
separate QA team is responsible for testing - System administrator
13Keys to Success
- Process is necessary but not sufficient
- Developer/User Interaction
- The more levels of communication required, the
higher the chance that requirements will be
mis-communicated - Neutral management
- The project managers role is to keep things
moving smoothly without getting in the way - Small, incremental deliverables
- This ensures the application evolves based on
users needs and that requirements have a chance
to be adjusted - Implement whats needed, not what might be needed
- This keeps developers and users focused on the
current problems - Put experienced developers in lead roles
- You would never make an undergraduate a lead
scientist - Mutual respect
- The hierarchy and reward systems for software and
science are different. - Scientists should treat developers as colleagues,
not as servants - Developers should respect the ideals and
institutions of science - Developers should be willing to understand the
scientific field they are supporting
14Benefits
- Software Quality is improved
- Applications are not single-user prototypes
- Features are available to all researchers
- Developers are not distracted by classes, papers,
etc - Research Process is improved
- Researchers can focus on research
- Development is not a bottleneck
- Reproducibility and Traceability
- Reproduce old experiments, trace the data/process
that led to a result - Easier to integrate new/visiting researchers
- Tools can be shared with a larger community
- High-end software becomes possible
- Parallel and high-performance implementations
- Well designed user interfaces
- Visualization
- Databases
- Data mining
- Web applications/services
15Implementing a Software Process
- Step 1
- Train research staff about basic software
processes - Incorporate basic tools into the research
environment - Version control
- Unit tests/validation
- Bug/Feature Tracking
- Standard locations for deployed applications and
data - Assign development roles to research staff
- Make sure to separate research work and
development work - Step 2
- Build a full time development staff as the
projects grow - Initial staff should include a lead developer and
a project manager - Use project manager to coordinate research
projects, too - A full time developer also helps track
institutional knowledge as students come and go - Additional staff can be added on a consulting,
part time, or full time basis as needed - Step 3
- Get back do doing what you love science!
16Costs and Funding
- Good software is not cheap
- Personnel Costs
- Lead developer 70-100k
- expect 80k to keep a good developer around
- Developer 40-100k, same as above (contract
30-200/hr) - Project Manager 70-96k
- System administrator 50-70k
- Database administrator 70-110k
- Note that TCO is 1.5-2.5x base salary
- Funding
- Share resources with collaborators, department
- Take advantage of university support services
- Systems, HPC, visualization, consulting
- Classes! (e.g., Software Carpentry)
- Write development costs and infrastructure
directly into grants - Look for software infrastructure grants
- Lobby!
17Conclusions
- Developing software is a complex process
- Training can help understand and manage
complexity - Separating research and development can help
improve the quality of research software - Existing staff can do this to some extent, but
outside help is needed as projects expand - The funding climate needs to change to fully
support this - Software should be considered essential research
equipment, on par with microscopes, mass
spectrometers, and supercomputers
18References
- The Mythical Man-Month, Frederick P. Brooks,
Jr. - Peopleware Productive Projects and Teams, Tom
DeMarco Timothy Lister - Software Project Survival Guide, Steve
McConnell - Facts and Fallacies of Software Engineering,
Robert L. Glass
19Questions?