Title: Lessons Learned from Managing and Deploying ESMF
1Lessons Learned from Managing and Deploying ESMF
Cecelia DeLuca / NCAR Portfolio / Institute
Create Meeting April 23-25, 2008
2Outline
- About ESMF
- Best Practices, On and Off List
- Governance
- Personal History
- Conclusion
3About ESMF
- ESMF provides component wrappers with standard
interfaces, a set of data structures for data
exchanges, and customizable drivers. - ESMF provides common utilities, such as data
communications, regridding, time management,
configuration, and message logging. - Goals are component interoperability and software
reuse
4Project History
- First phase started in 2002 with NASA funding
support for framework development and framework
integration into applications - Second phase began in 2005 with a transition to
multi-agency support and management NASA, NOAA,
NSF, and DoD sponsors - ESMF component interfaces are the basis of
programs at multiple agencies - DoD Battlespace Environments Institute (BEI)
- NOAA National Environmental Modeling System
(NEMS) - NASA Modeling Analysis and Prediction Program for
Climate Variability and Change (MAP) - Community Sediment Transport Model (ONR)
- In operational use since August 2006, initially
at the National Weather Service
5ESMF Earth Science Components Models
6ESMF Application Example
GEOS-5 Atmospheric General Circulation
Model Application Example
- Each box is an ESMF component
- Every component has a standard interface to
facilitate exchangescall ESMF_CompRun (myComp,
importState, exportState, clock, phase,
blockingFlag, rc) - Hierarchical architecture enables the systematic
assembly of many different systems
7ESMF Package Description, the Basics
- Component-based architecture, data and task
parallel - Components can run concurrently, sequentially or
in mixed mode - Serial or parallel
- Single or multiple executable or combinations
- Shared or distributed memory or hybrid
- Support for model ensembles, including execution
of multiple instances in the same address space - 400,000 SLOC - mostly written in C, wrapped in
Fortran 90 - 2000 unit tests, system tests, and examples
regression tested nightly on 26
platform/compiler combinations - Reference Manual, Users Guide and the examples
therein updated automatically with code changes
8ESMF Package Description, Coupling
- ESMF data structures are used to wrap (copy or
reference) user data and transfer/transform it
between components - Data transformations can be executed within a
coupler component, or arranged in a coupler
component and executed within model components - Coupling can be done in index space
- General multi-dimensional distributed arrays
- Sparse matrix multiply for regridding, user
defined weights - Coupling can be done in physical space
- Fields combine grids, arrays, and metadata
- Field regrid operation for regridding,
ESMF-generated weights - Grids supported are logically rectangular, 3D
finite element mesh coming - Performance requirements lt 5 overhead in time
to solution vs customized native approaches,
highly scalable in performance and memory
9Standardization of Component APIs
- Only three ESMF component methods, Initialize,
Run, and Finalize (I/R/F) - Users create a component by assigning their user
code I/R/F methods to an ESMF component type - The ESMF component calls down into the specific
user-assigned methods - I/R/F methods cascade down the tree
- Small set of standard arguments
call ESMF_CompRun (myComp, importState,
exportState, clock, phase, blockingFlag, rc)
10Standardization of Data Structures
- ESMF State data structures contain all data
exchanged between components - Flexibility in data representation - States can
contain lists of varied data structures,
including Arrays, ArrayBundles, Fields,
FieldBundles, and other States - The ESMF philosophy is to constrain the number
of component methods and their arguments, but
retain flexibility in the data structures used in
inter-component exchanges
11ESMF Design Philosophy
- Design and build comprehensive software engines
at lower levels - Build simple user interfaces on top for specific
problems - Expose a general interface for users who need
more flexibilty - Tricky issue can be difficult to involve
customers in early design discussions, because
developers are most concerned with implementing
generality, and users want to first see simple
problems
What the developer sees
What the user sees
12ESMF Release Path
2002 2003 2004 2005
2006 2007 2008 2009
2010
ESMF v1 Prototype
Full system prototype
ESMF v2 Components, VM and Utils ESMF_GridCompRun(
)
Building bottom-up
ESMF v3 Index Space Operations ESMF_ArraySparseMat
Mul()
ESMF v4 Grid Operations ESMF_GridCreate() ESMF_Fie
ldRegrid()
ESMFv5 Standardization Build, init, data types,
error handling,
Standardization
Last public ESMF v2.2.2r
Last internal ESMF v3.1.0p1
13Team Composition
- Team composition (blue is off-site)
- Manager (agency coordination and technical
coordination) - Operations Manager (website, metrics, space
issues, local administration) - Integrator/Test lead (regression testing, release
management) - Tester for numerical methods
- 7 developers
- Systems level/porting/build
- Low level data structures and architecture
- Structured grids
- Meshes
- High level numerical data structures
- Language interfaces
- Utilities including calendaring
- Metadata and attributes
- External and related
- External ½ FTE performance testing
- Related FTE for component distribution portal
14Outline
- About ESMF
- Best Practices, On and Off List
- Governance
- Personal History
- Conclusion
15Best Practices Listening to the Old-Timers
- Rational Unified Process
- Develop software iteratively
- Manage requirements
- Use component-based architectures
- Visually model software
- Control changes to software
- Agile Development Checklist
- Aggressive refactoring
- Automated, frequent testing
- Automated, frequent build and deployment
- Continuous integration
- Source control
- Communication plan
- Task tracking
- Self-documenting code
- Peer review
- Customer view of work in progress
- Feedback mechanism
- Airlie Council (original set)
- Formal risk management
- Agreement on interfaces
- Formal inspections
- Metrics-based scheduling and management
- Binary quality gates at the inch-pebble level -
status should be tracked through binary
completion of relatively small tasks - Program-wide visibility of progress vs plan
- Defect tracking against quality targets
- Configuration management
- People-aware management accountability
Blue means a practice followed by ESMF.
16... and Not Listening
- We should haves
- Visually modeling the software
- Why not? Lack of tools for automating generation
of diagrams in Fortran and language mixtures -
maintenance issue when done by hand - Would still like to improve this area
- We backed off
- Aggressive refactoring
- Not consistent across code, Fortran developers
dont, C developers do - Formal risk assessment
- Top risks to project survival involve adoption
and funding - In a highly politicized multi-agency environment,
better to discuss than to formally document - Agreement on interfaces
- Didnt agree to all interfaces before
implementing, but hold design reviews prior and
post implementation of new classes
17Disagreeing With the Old-TimersDistributed
Development Can Work
- Even with all the cool communication technology
in the world, we really can't pull off that kind
of feedback in a highly distributed environment.
Heck, even the cool technology in StarTrek
couldn't pull this off. The technology needed to
make us truly agile is face-to-face people.
(Steve McConnell) - Making face-to-face optional
- Saves time and money on travel
- Enables people to accommodate their home lives
- Creates collaboration infrastructure that
supports team work off hours - Creates collaboration infrastructure that
supports interactions with remote customers - Enables hiring from a national pool
18How ESMF Does Distributed Development
- GOAL - Everybody on the team has access to all
information, current and past - Archived email list where all development
correspondence gets ccd - Frequent telecons with minutes
- Web browsable repository, mail summary on
check-ins - Daily archived test results
- Monthly archived metrics
- Public archived trackers (bugs, feature requests,
support requests, etc.) - Discouraged IMing, one-to-one correspondence or
calls the medium matters - If its not in the project archive, it doesnt
exist.
19How ESMF Does Distributed Development, Continued
- Strict Battle Rhythm
- Regular meeting times, reporting periods, annual
project cycle, etc. build confidence - Remote members can tell whats going on without
constant updates - Articulated, non-negotiable project values
- Gives the team an identity
- Helps filter people quickly who do not fitthe
team profile - Distributed development is not for everyone
good fitsare secure, work-focused, smart,
communicative, positive -
ESMF team member
20Values
- Community driven development and community
ownership - Openness of project processes, management, code
and information - Correctness
- Commitment to a globally distributed and diverse
development and customer base - Simplicity
- Efficiency
- Public storage of project records and other
information - Engagement
- Web link for detail http//www.esmf.ucar.edu/abou
t_us/values.shtml
21Staff
- Best advice received Spend most management
time with best people - Attention can fix broken software, doesnt often
fix people - Tension demoralizes the team and is distracting
- Use term positions, contractors, and redirection
to address difficulties where possible
22Outline
- About ESMF
- Best Practices, On and Off List
- Governance
- Personal History
- Conclusion
23Beyond Best Practices Governance
- Management of ESMF required governance that
recognized social and cultural factors as well as
technical factors - Main objectives of governance
- Enabling people to fight and criticize in a
civilized, contained, constructive way - Enabling people to make decisions based on
resource realities
- Observations
- Sometimes just getting everyone equally
dissatisfied and ready to move on is a victory - Thorough, informed criticism is about the most
useful input a project can get
24Governance Interactions
- Multiple timescales, all staff levels
- Places for structured argument
ExecutiveManagement
Executive Board Strategic Direction Organizational
Changes Board Appointments
annually
Reporting
Interagency Working Group Stakeholder
Liaison Programmatic Assessment Feedback
Advisory Board External Projects
Coordination General Guidance Evaluation
Reporting
Working Project
Joint Specification Team Requirements
Definition Design and Code Reviews External Code
Contributions
Change Review Board Development
Priorities Release Review Approval
quarterly
Functionality Change Requests
weekly
Resource Constraints
Implementation Schedule
Collaborative Design Beta Testing
Core Development Team Project Management Software
Development Testing Maintenance Distribution
User Support
daily
25Outline
- About ESMF
- Best Practices, On and Off List
- Governance
- Personal History
- Conclusion
26Early Challenges
- New manager, new team, newly at institution
introduced inefficiencies in administrative and
technical process (how do you get it done here?,
who can do what?)RESOLVED BY Time - Key low-level data structures data blocks, grids
were not well designed/implementedRESOLVED BY
Determined unsalvageable and began redesign - Releases without sufficient testing or
documentationRESOLVED BY Put more resources
into testing and documentation and implementation
of best practices
27Later Challenges
- Some staff lacked necessary expertise to
implement new requirementsRESOLVED BY Keeping
the best, but turning over the other staff
hard, but brought in a better team - Historical conflicts, competition for resources
and conflicts of interest - RESOLVED BY Strategic planning and partnerships
help, but many issues are out of the projects
control
28Success Factors
- Though new, I had
- domain and technical expertise
- many contacts in Earth and space modeling
- Experience working with a team at Lincoln
Laboratory that was well managed (by Bob Bond)
and developed high performance framework software
this was critical as both a technical and
process model - Strong mentorship and support from a number of
senior scientists and developers across the
community - Enough resources to plan, design, implement,
document, test
29Diligence of ManagementThe Daily Checklist
- Funds need to come or go?
- Staff tasked and working? Hires?
- Customer issues?
- Does the product need attention? Website, legal,
reviews, design decisions? - Routine administration or meeting planning?
- PR, papers, presentations due?
- Strategic actions or issues?
- In the end, you have to make the project work
every day with - every conversation and decision
30Outline
- About ESMF
- Best Practices, On and Off List
- Governance
- Personal History
- Conclusion
31Conclusion Recurring Themes
- Information Management
- Discipline and Diligence
- Planning
- Conflict Management
- Honest Evaluation
32Extras
33What Governance Needs to Achieve
- Prioritize development tasks in a manner
acceptable to major stakeholders and the broader
community, and define development schedules based
on realistic assessments of resource constraints
(CRB) - Deliver a product that meets the needs of
critical applications, including adequate and
correct functionality, satisfactory performance
and memory use, ... (Core) - Support users via prompt responses to questions,
training classes, minimal code changes for
adoption, thorough documentation, ... (Core) - Encourage community participation in design and
implementation decisions frequently throughout
the development cycle (JST) - Leverage contributions of software from the
community when possible (JST) - Create frank and constructive mechanisms for
feedback (Adv. Board) - Enable stakeholders to modify the organizational
structure as required (Exec. Board) - Coordinate and communicate at many levels in
order to create a knowledgeable and supportive
network that includes developers, technical
management, institutional management, and program
management (IAWG and other bodies)
34Facilitating ScienceCoupled Climate-Chemistry
with ESMF
The image shows results from a version of the
GEOS-5 atmospheric general circulation model
coupled to a stratospheric chemistry
package (STRAT-CHEM), also developed at NASA, but
independently of GEOS-5,which has now been made
ESMF compliant.