Working With Legacy Code in Agile Environments

How successful teams deal with legacy code to stabilize quality and improve overall product delivery
Working With Legacy Code in Agile Environments

At the beginning of most agile coaching projects, we find that our client applications are not properly covered with any kind of automated tests. In most cases, the organization has no plans to throw away those codebases and start from scratch, which means we need to design a plan to address the evergreen issue of quality in the legacy code. Yes, we call legacy code any code that is not covered by automated tests. As hard as it sounds, you might realize that you are currently writing legacy code for the next several months.

Even if it's you who writes it, next month, no one will remember exactly what this piece of code was created for, unless you have great unit, integration, and acceptance tests.

In an agile environment, tests are the most important documentation. I have been in organizations where five bugs were reported per developer per week, which, as you can imagine, destroys any possibility of following a plan. Not every reported bug was critical, of course, and therefore, not all of them needed to be fixed right away. But a significant amount of them were actually very painful for the business, and those usually impacted any planning.

Fixing bugs is boring, exhausting, and depressing. Describing scenarios and writing automated tests before actually developing a user story ensures that everyone has the same vision of what needs to be developed. These activities minimize the need to write new bugs due to rework or unconsidered scenarios. It's seen as a necessary part of developing robust applications.

In this context, we try to maximize the return on investment (ROI) on the time we invest in writing test automation. When we work with legacy code, the most valuable kinds of tests are acceptance tests, also called black-box tests. We don't need to have a well-written codebase; there is no dependency to start writing automated tests against our application interface. When we want to start writing unit/integration tests: the codebase will probably be coupled, non SOLID-compliant. Writing your first unit test might be a pain in the neck.

Related Article: How to Integrate Bug Fixes into Your Backlog

We not only write acceptance tests but we also take advantage of this safety network created with the acceptance tests, and we start refactoring our codebase.

This refactor should be focused on enabling the writing of unit and integration tests or the rewriting of highly buggy components. The focus is to not only beautify or simplify reading the codebase but to also gain results from this investment! We could have dedicated this time to develop more features for our application, but we decided to write automated tests because we believe it's going to bring a benefit: fewer bugs reported, leading to less money lost for our organization because of those bugs.

You develop the strategy for how to decide what type of tests you should write for each part of your application. However, here is a little guide to help you decide:

Test type     


Development time

Execution time

Bugs covered

Unit test

Covers a method in a class, using mocks to fake the behavior of other classes/methods used from the tested method




LOW. Only effective to detect bugs when someone changes the method code without considering all the possible cases

Integration test

Covers a method in a class, without mocking. It actually calls the other objects, and if it's required it loads fixtures (fake data) to the data storage

MEDIUM. Longer than a unit, perhaps over an hour.


MEDIUM. Effective to detect code in a broader range than unit test

Automated acceptance test

Same test a user  would do against our application, but performed by a machine.


Depending of the technology, maybe over an hour per scenario


HIGH. Will detect any bug or problem


At this point, you are probably ready to work on the actual plan. Here are some steps I suggest you don't skip. Keep in mind that steps 1 to 2 will take a while and require different skills than steps 3 to 4. You might decide to have someone do these steps and other people work in parallel on steps 3 to 4.

Choose your technological stack

What tools, programming language, or frameworks are you going to use to write your tests? What Gherkin implementation fits best for your application and technical skills? Take your time and try several tools. This is the basic architecture of your tests and, as such, it will be really hard to change if you make the wrong choice. At points 1 and 2, I suggest that you involve your Chief Technology Officer (CTO) or technical leaders to make this decision. It's important to have the right people on board.

Prepare a continuous integration suite

Your tests should be executed after you commit to your continuous integration (CI) tool. The execution of all our tests should not take more than 20 to 30 minutes, which is a reasonable minimum time for your colleagues to perform a code review in a commit related to a small user story. Keep in mind that these tests should be written as independently as possible. They should not be fragile; the setup, tear down, and the test logic itself should be properly designed to ensure that different tests running simultaneously won't give false alerts. Be ready to parallelize the execution. A good CI suite should be able to separate the execution across different slaves. One of my favorites is

Define priority levels for your applications

The priority must be agreed upon across the whole organization. Try to keep it as simple as possible. Ideally, you will have only three levels of priorities.

Below is an example of a simple approach related to a hotel reservation website to help you ask the right questions:

defcon scales

Incident priority




High-priority incidents must match at least one of these conditions:

1) Hotels don't appear on the search result.

2) The website is down.

3) The payment service is broken; we cannot process bookings.

4) Registration/Subscription is not working.

5) In general, there is a problem that completely stops our sales.

24/7 response. We will put all our energy on this incident at any time and will not stop until we have solved the incident and eventually fixed the code bug that is causing this problem.


High impact. Most of the functionality of an application cannot be used, or there is a massive number of affected users. Point out this affectation level when you report the incident.

We will start working on this incident immediately, with maximum priority, but only during working hours.


Medium impact. The application can work with limited functionality. The problem might mean a work overload to some affected users or employees, as a contingency to the problem.

We will plan this bug to be fixed during the next sprint. DEFCON3 and DEFCON4 will be prioritized with the rest of the user stories according to the ROI.


Low impact. The application works properly; however there is no esthetical or not critical defect.

Will be fixed eventually, at some point according to our bugs policy.


Every party involved must agree on this list. You should also include real examples for the different levels of priority.

What will you be automating? DEFCON1 incidents or DEFCON2 incidents in a frequently changing component of your application are clear candidates for automation.

List your product features

Define high-level features and assign priorities to them. A feature will have the highest priority from the highest-priority feature inside it. If a feature is assigned DEFCON1, go deeper into it. Start splitting up functionalities or creating new functionalities inside the DEFCON1 feature as user stories. Designate a priority to each of those user stories.

If a feature is DEFCON2, but it is a frequently changed part of your application, name the feature DEFCON2UP. For those DEFCON2UPs, write the user stories contained in that feature. Again, assign priorities to each user story.

Any feature with a DEFCON1 or DEFCON2UP priority is the one for which you will write automated tests. At this point, if you plan to run regression tests (manually) before release, you should describe all the scenarios. If you don't plan to do it, you can describe the scenarios as part of the automation of every user story.

Develop acceptance tests

We are not going to target initially to a specific percentage of code coverage in terms of unit or integration testing. Our initial focus is to cover 100 percent of DEFCON1 automated and 100 percent of DEFCON2UP. The best way to write your automated acceptance tests is by using any implementation of Gherkin. Cucumber, in Ruby, was the first one to be launched, although there are implementations in many programming languages.




At this stage, you have the test automation backlog prioritized. Remember that it's a live backlog, because every new feature that you write must be assigned a priority. If a new user story is DEFCON1 or DEFCON2UP, it should also be automated. If time permits, automate the rest of the test scenarios. But we both suspect you will never get to automate them, and that's usually fine.

Ideally, all your team members dedicate time every sprint to automating some tests. However, the one person who works on automated testing is burdened with the responsibility of ensuring product quality. This usually leads to worse results.

Start measuring!

As soon as you decide to write automated tests, start tracking the impact you make with your work. Below are some correlations that are especially interesting:

  • Between tests executed before every release and new DEFCON2 bugs reported

  • Between automated acceptance tests and end-of-sprint heart attacks

  • Between tests executed before every release and percentage of time dedicated every sprint to developing new features

  • Between unit/integration tests and bugs detected in your CI suite

Write unit or integration tests

Once you have created a safety net with your automated acceptance tests, you can refactor the buggiest parts of your application. At this point you can start with The Boy Scout Rule1 and grow your automation codebase.

Achieving 100 percent code coverage with integration/unit tests is highly complex in legacy applications and not always worth the effort. Focus on covering the functionalities with automated acceptance tests, and write only unit/integration tests for those parts of the application that you change frequently.

In general, developers aim to achieve 100 percent coverage in unit/integration; it makes our lives so much easier. However, depending on the size of your application, and the present circumstances, it's not worth dedicating time to write those tests for noncritical parts or parts that are never changed. It's smarter to write unit/integration tests for those parts of the application that are really critical and are continuously changing, which takes us to user stories defined as DEFCON1 or DEFCON2UP.

I hope you get good results by following this plan, as I have done across many different organizations as both a CTO and as an agile coach.

Looking for more Software Development Resources? Check out this collection or articles and videos.


1. “Leave Things BETTER than you found them.” ~ Robert Baden Powell



Stay Connected

Get the latest resources from Scrum Alliance delivered straight to your inbox