Title: GETTING unstuck: working with legacy code and data
1GETTING unstuck working with legacy code and data
- Cory Foy http//www.cornetdesign.com
2Goals
- What is Legacy Code?
- How do we change Legacy Code?
- Common patterns for code bases
- Does Legacy Code have to be code, or can it be
something else like a really long bullet on a
PowerPoint slide, or perhaps a database? - Next Steps
3Legacy Code
- How do you define Legacy Code?
- Several definitions possible
- Code weve gotten from somewhere else
- Code you have to change, but dont understand
- Demoralizing code (Big ball of mud)
- Code without unit tests
4Legacy Code
5Legacy Code
- Code that needs to have behavior preserved
- What is behavior?
- The way in which someone behaves
- The way in which a person, organism, or group
responds to a specific set of conditions - The way that a machine operates or a substance
reacts under a specific set of conditions
6Legacy Code
- Whats the behavior of the following code?
7Legacy Code
- Does the following code add behavior?
8Legacy Code
- Now have we changed the behavior?
9How do we change Legacy Code?
- Why would we want to change the code?
- Four reasons to change software
- Adding a feature
- Fixing a bug
- Improving the design
- Optimizing resource usage
- Each has unique attributes
10Adding a feature / Fixing a bug
- Causes the following changes
- Structure
- Functionality (adding or replacing)
- Need to be able to know the new functionality
works - Need to be able to know that the system as a
whole is still functioning appropriately
11Improving the Design
- Causes the following changes
- Structure
- Note that it does functionality is not listed
above - Important to be able to know that all
functionality works before and after the change
12Optimizing Resource Usage
- Changes
- Resource usage
- May cause structure change
- Again note that functionality is ideally not in
the above list - Need to have a way to make sure functionality was
not changed - Need to have a way to verify the optimization
goals have been met (and stay met)
13Edit and Pray
- Carefully plan the changes you are going to make
- Make sure you understand the code to be modified
- Make the changes
- Run the system to make sure the change was made
- Do some additional testing to smoke test that
everything seems to be functioning - Pray you dont get a call at 2am that the system
doesnt work anymore
14Cover and Modify
- Verify that the system is working by running the
tests - Write tests to expose the behavior you want to
add or change - Write code to make the test pass
- Refactor duplication
- Wash, rinse, repeat
- Verify the system is still working by running the
tests
15Feathers Legacy Change Algorithm
- Michael Feathers discusses a Legacy Code Change
Algorithm in Working Effectively with Legacy Code - Five steps
- Identify change points
- Find test points
- Break dependencies
- Write tests
- Make changes and refactor
- These steps have common steps and scenarios
16Patterns for the Change Algorithm
- Identify Change Points
- One of the key areas architects and architecture
comes into play - If you arent sure where, put it in you can
refactor later (with unit test support)
17Patterns for the Change Algorithm
- Identify Change Points
- Scenarios
- I dont understand the code well enough to change
it - Notes / Sketching
- Listing Markup
- Separate Responsibilities
- Understand method structure
- Extract Methods
- Effect Sketch
- Scratch Refactoring
- Delete Unused Code
18Patterns for the Change Algorithm
- Identify Change Points
- Scenarios
- My application has no structure
- Tell the story of the system
- Naked CRC (Class, Responsibility, and
Collaborations) - Conversation Scrutiny
19Patterns for the Change Algorithm
- Find Test Points
- Where can you write tests to exercise the
behavior you want to add/change? - Important to have team standards for where unit
tests should go
20Patterns for the Change Algorithm
- Find Test Points
- Scenarios
- I need to make a change, what methods should I
test? - Reason about effects (Effect Sketch)
- Reasoning Forward (TDD)
- Effect propagation
- Effect reasoning
- Effect analysis
21Patterns for the Change Algorithm
- Find Test Points
- Scenarios
- I need to make many changes in one area do I
have to break all dependencies? - Interception Points
- Higher-Level interception points
- Pinch Points (encapsulation boundary)
- Pinch Point Traps
22Patterns for the Change Algorithm
- Break Dependencies
- Generally the most difficult part of the process
- Usually dont have tests to tell if breaking
dependencies will cause problems
23Patterns for the Change Algorithm
- Break Dependencies
- Scenarios
- How do I know Im not breaking anything?
- Hyperaware editing
- Single-goal editing
- Preserve Signatures
- Lean on the compiler
- Pair Programming (aka Real-Time Code Reviews)
24Patterns for the Change Algorithm
- Break Dependencies
- Scenarios
- I cant get this class into a test harness
- Irritating Parameters
- Hidden Dependencies
- Construction Blob
- Irritating Global Dependency
- Horrible Include Dependencies
- Onion Parameter
- Aliased Parameter
25Patterns for the Change Algorithm
- Break Dependencies
- Scenarios
- I cant run this method in a test harness
- Hidden Methods
- Helpful language features
- Undetectable Side Effect
- Sensing variables
- Command/Query Separation
26Patterns for the Change Algorithm
- Break Dependencies
- Scenarios
- I need to change a monster method and cant write
tests - Introduce sensing variables
- Extract what you know
- Break out a method object
- Skeletonize Methods
- Find Sequences
- Extract to the current class first
- Extract small pieces
- Be prepared to redo extractions
27Patterns for the Change Algorithm
- Break Dependencies
- Scenarios
- It takes forever to make a change
- Understanding
- Lag Time
- Breaking Dependencies
- Build Dependencies
28Patterns for the Change Algorithm
- Write Tests
- Tests may be more difficult to write then normal
unit tests - May have less-than-ideal scenarios
29Patterns for the Change Algorithm
- Write Tests
- Scenarios
- I need to make a change, but dont know what
tests to write - Characterization Tests
- Characterizing Classes
- Targeted Testing
- Writing Characterization Tests
- Write tests for the area youll be making the
change. Write as many as you need to understand
the code. - Then write tests for the things you need to
change - If converting or moving functionality, write
tests to verify the behavior on a case-by-case
basis
30DEMO Change Algorithm at Work
- Step through a common scenario, implementing the
tests as we go
31Legacy Code isnt just Code
- Most applications arent just simple console apps
- They deal with many dependencies
- File Systems
- Registries
- Databases
- Hardware
32Legacy Code isnt just Code
- These dependencies can cause legacy problems of
their own - Database schemas
- Existing data in the tables
- Business logic in the database
- No access to development data that mirrors
production - In other words, Legacy Data
33Legacy Data
- So where does this Legacy Data come from?
- Flat Files
- XML Documents
- RDBs
- Object DBs
- Other DBs
- Application Wrappers
- Your DB
- Many, many sources
34Legacy Data
- Legacy data produces its own unique set of
challenges - Data quality
- Data architecture problems
- Database design problems
- Process-related challenges
35Data Quality
- Common Data Quality problems
A single column is used for several purposes Determining the purpose of a column by the value of one or more other columns Inconsistent data values / formatting Missing data / columns Additional columns Important attributes and relationships are hidden in text fields Data values that stray from their field descriptions and business rules Various key strategies for the same type of entity Unrealized relationships between data records One attribute is stored in several fields Inconsistent use of special characters Different data types for similar columns Different levels of detail Different modes of operation Varying timeliness of data Varying default values Various representations
http//www.agiledata.org/essays/legacyDatabases.ht
mlDataProblems
36Data Architecture Problems
- Common Architectural Problems may include
- Applications responsible for data cleansing
(instead of DB) - Different database paradigms
- Different hardware platforms / storage
- Fragmented / Redundant / Inaccessible data
sources - Inconsistent semantics
- Inflexible architecture
- Lack of event notification
- No or inefficient security
- Varying timeliness of data sources
37Design Problems
- There may be key design issues with the database
- Database encapsulation scheme exists, but its
difficult to use - Ineffective (or no) naming conventions
- Inadequate documentation
- Original design goals at odds with current
project needs - Inconsistent key strategy
- Design goals at odds with data storage (treating
relational DBs as object DBs, etc)
38Design Problems
- Example
- Application which presented custom forms to users
- Implementers could create custom forms with
custom questions and validations - Beautiful OO architecture Forms had Groups
which had Items - Everything was rendered dynamically and could be
updated on the fly
39Design Problems
- Example
- The Form, Group, Item and other objects were
all stored as individual records in one database
table - A user in the system had on average 74 forms with
an average of 30 questions. With a target of
20,000 users in the database, this would lead to
over 50 million rows in the one table. - We identified one stored proc as one of the main
culprits. It had something like the following
40Design Problems
- Example
- INSERT INTO _at_tmpTable SELECT ot.myCol FROM
OtherTable ot WHERE ot.bitMask (144567
99435) 0 - This led to a full table scan for one of their
most heavily used procs degrading performance
significantly (average page load time of over 7
seconds)
41Working with Legacy Data
- So how do you deal with legacy data?
- Strategies
- Avoid it
- Develop Error Handling Strategy
- Work Iteratively and Incrementally
- Prefer Read-Only Legacy Access
- Encapsulate Legacy Data Access
- Introduce Data Adapters for Simple Data Access
- Introduce a staging database for complex access
- Adopt Existing Tools
42Working with Legacy Data
- We couldnt avoid the data the proc had to be
changed - So we developed an incremental 5 step plan
- Add an IsValidRecord column to the table
- Update the Column based on the bitmask for each
row - Change the proc to use the column instead of the
bitmask - Make sure all tests are still passing
- Introduce Update and Insert Triggers to
automatically populate the column
43Working with Legacy Data
- Advantages
- Required no change to application code
- We could rapidly test the application
- We could make incremental changes to see
improvements - What made it work
- Testing/QA Database with production-like data
- Regression tests to insure functionality
- Timing tests to show performance improvement
44Process Problems
- All the issues arent technical
- Working with legacy data when you dont have to
- Data design drives your object model
- Legacy data issues overshadow everything else
- App developers ignore legacy issues
- You choose not to refactor the legacy data
sources - Politics
- You are too focused on the data to see the
software
45Refactoring Databases
- Databases should not be left out of the
refactoring process - An interesting observation is that when you take
a big design up front (BDUF) approach to
development where your database schema is created
early in the life of your project you are
effectively inflicting a legacy schema on
yourself. Dont do this. - Scott Ambler maintains a catalog of DB
Refactoring - How do you refactor a database?
46Refactoring Databases
47Refactoring Databases
- Implementing Database Refactoring in your
organization - Start simple
- Accept that iterative and incremental development
is the norm - Accept that there is no magic solution to get you
out of your existing mess - Adopt a 100 regression testing policy
- Try it
48Next Steps
- Dealing with legacy code is hard
- Integration issues
- Code Issues
- Political Issues
- There are ways out
- Important to address pain points first
49Next Steps
- So where can you go from here?
- Working Effectively With Legacy Code by Michael
Feathers - Agile Database Techniques by Scott Ambler
- Refactoring Databases by Scott Ambler
- http//www.agiledata.org
- NUnit, JUnit, CppUnit, CppUnitLite, dbFit,
Fitnesse - http//www.cornetdesign.com