When should characterization tests be written?

Characterization tests should be written early, before structural changes, when existing behavior is not well specified or not covered by reliable unit tests. The goal is regression detection, so assertions should match current outputs until the system is safe to change.

How can seams be created in tightly coupled code?

Seams can be created by introducing the smallest possible indirection around an external dependency, such as extracting a method, injecting a clock, or wrapping a static call behind an interface. The point is to enable a deterministic test double without refactoring the entire dependency chain upfront.

When is Strangler Fig overkill?

Strangler Fig is often overkill when the change is fully internal, has no stable routing point, or affects only a narrow module boundary that can be isolated with Branch by Abstraction. If a single deployable can be maintained with a small abstraction layer, Branch by Abstraction usually carries less operational overhead.

How should refactors be validated in CI?

Refactors should be validated by running characterization and unit tests on every commit, then adding integration tests for any path that touches real infrastructure. Staged rollout plus monitoring is still necessary because some behavior only emerges under production load and data.

Should existing bugs be "locked in" by characterization tests?

Yes, initially, because characterization tests are meant to lock observed behavior and prevent accidental changes during cleanup. Bugs can be fixed afterward by explicitly changing the expected behavior and adding a correctness-focused test that documents the intended outcome.

How to Refactor Legacy Code

The systematic approach to legacy code refactoring is incremental refactoring guarded by characterization tests because legacy systems often encode years of production bug fixes that can disappear when code is restructured without a behavioral safety net.

TL;DR

Legacy refactors fail when teams rewrite from scratch or refactor without tests. Lock current behavior with characterization tests, create seams to isolate dependencies, refactor in small reversible steps, and use Strangler Fig for system-level replacement. CI gates and staged rollout catch mismatches that unit tests miss.

Why Legacy Codebases Resist Rewrites

Legacy codebases resist conventional rewrite attempts for reasons that compound over time:

Tight coupling: modules are connected in ways that make isolated changes hard.
Missing documentation: docs are absent, outdated, or do not reflect production reality.
High production risk: engineers avoid changes because breakages hit revenue or on-call load.

Those pressures push teams toward two common mistakes:

Big-bang rewrites that discard embedded fixes and stall delivery.
Untested refactors that break edge cases only exercised in production.

Joel Spolsky's rewrite warning remains a common reference point for why big-bang rewrites lose embedded knowledge and stall delivery.

This guide follows a repeatable sequence:

Map change points and hidden coupling so later changes do not cascade
Write characterization tests so behavior is locked before cleanup
Create seams so tests can isolate dependencies
Apply small structural refactors so each step is reversible
Use Strangler Fig for subsystem replacement so cutovers stay safe
Add CI gates and staged rollout so production-only failures are caught

System mapping is often the first scaling bottleneck on large codebases. Augment Code's Context Engine navigates cross-repository dependencies and cuts the time teams spend manually tracing legacy coupling.

[ Coming up next ]

The New Code Review Workflow for AI-Native Engineering Teams

See how leading teams keep code review fast and rigorous as AI writes more of the code.

Save your seat

— Thu, Jul 9 // 9:45 AM PDT

Prerequisites and Setup

With the overall refactoring sequence in mind, the next step is to make sure the toolchain and rollback mechanics are ready before any code changes begin. The following tooling and environment configuration should be in place before writing a single line of refactored code.

Category	Java	Python	Ruby
Unit test framework	JUnit	pytest	RSpec
Test doubles	Mockito	unittest.mock	rspec-mocks
Approval testing	ApprovalTests	approvaltests	N/A
IDE refactoring	IntelliJ IDEA	VS Code	VS Code

Upstream docs and repos provide the most reliable tool references: JUnit docs, pytest docs, RSpec docs, Mockito docs, ApprovalTests Java, and approvaltests Python.

Familiarity with Git, continuous integration/continuous delivery (CI/CD) pipelines, and the target codebase's primary language is assumed. Version control discipline keeps refactoring reversible: refactoring steps should be kept small enough that reverting to the last known good build remains straightforward using git revert.

For large codebases, even a lightweight dependency index can reduce the manual work of tracing coupling across files and services, and the same reference can be reused later when estimating rollout blast radius. For code review and change-safety conventions, teams commonly align on established practices such as Google's review guide.

Step 1: Identify Change Points and Map the System

Once the environment is ready to run tests and roll back safely, start by locating the parts of the system that are most likely to change and understanding what they touch. In Working Effectively with Legacy Code, Michael Feathers describes a workflow that starts with identifying change points, finding test points, breaking dependencies to enable tests, writing characterization tests, and only then making changes and refactoring.

Before touching production code, map hidden coupling to determine which seams exist and which must be created. This step reduces cascading failures from changes that appear safe in isolation. Document all entry points, external dependencies (databases, file systems, APIs, system clock calls), and module boundaries before proceeding. At larger scale, the same dependency map can reduce time spent chasing cross-repository and cross-service links during change-point discovery. Augment Code's Context Engine processes entire codebases across 400,000+ files through semantic dependency analysis and surfaces these cross-service relationships faster than manual tracing.

Step 2: Write Characterization Tests to Lock Current Behavior

With change points identified, the next safety step is to lock down what the system currently does so later refactors can be validated quickly. Feathers' characterization test technique documents actual system behavior before any refactoring begins.

Characterization tests document current behavior, including any existing bugs, rather than intended behavior. They catch regressions during cleanup, although they may also surface surprising behavior that points to an underlying bug.

The fail-first technique reduces guesswork about what the code actually does. Start by writing a test with a guessed assertion, then update the assertion to match the observed output.

Version context: Ruby 3.2, RSpec 3.12.

ruby

# Step 1: Write a test with a guessed assertion
describe 'mask' do
  it "masks regular text" do
    expect(mask('simple text')).to eq('text')
  end
end

# Step 2: Run the test; it fails with the ACTUAL output
# Step 3: Update the assertion to match actual output

A fail-first characterization testing approach documents existing behavior and detects regressions when making changes. For complex output, approval testing complements this approach: serialize output to a human-readable form, store it in version control as an approved baseline, and diff against it on every run. Teams choosing between unit and integration tests at this stage should prioritize whichever level captures the most observable behavior for the module under change.

Step 3: Create Seams to Enable Test Isolation

After behavior is captured by characterization tests, create seams so tests can isolate units of behavior without dragging the entire system into every test run. This keeps feedback cycles faster while still protecting behavior.

A seam, as Feathers defines it, is a place where behavior can be changed without editing the code in that place. Creating seams unlocks the ability to insert test doubles and isolate behavior for testing.

The Extract and Override pattern is a classic seam creation technique for legacy Java code.

Version context: Java 17, JUnit 5.

java

// Production code: seam methods for external dependencies
class LegacyPricingEngine {
    protected long getCurrentTime() {
        return System.currentTimeMillis();
    }

    protected ExchangeRate getExchangeRate(String currency) {
        return ExternalAPI.fetchRate(currency);
    }

    public Price calculatePrice(Order order) {
        long time = getCurrentTime();
        ExchangeRate rate = getExchangeRate(order.getCurrency());
        return new Price(order.getAmount() * rate.getRate());
    }
}

// Test code: deterministic overrides
class TestablePricingEngine extends LegacyPricingEngine {
    @Override
    protected long getCurrentTime() {
        return 1609459200000L; // Fixed: 2021-01-01 00:00:00 UTC
    }

    @Override
    protected ExchangeRate getExchangeRate(String currency) {
        return new ExchangeRate(currency, 1.2);
    }
}

Feathers summarizes the legacy-code dilemma as a circular dependency between tests and change: to change code safely, tests are needed; to add tests, small changes are often needed first. In that situation, the practical guidance is to make the smallest, simplest change to get the interaction behind a seam, add tests, and only then proceed to broader refactoring.

Before introducing a new seam, identify likely downstream consumers to reduce the risk of abstraction boundaries that miss critical callers. The Context Engine surfaces these consumer relationships across repositories, so teams know who depends on the code being isolated before they commit to a boundary.

Step 4: Apply Structural Refactoring Patterns Incrementally

With seams in place and tests running deterministically, structural refactoring becomes safer when applied in small, reversible steps. The discipline here is cadence: make one change, run tests, and only then move on.

Begin structural refactoring using patterns from Martin Fowler's Refactoring catalog. Extract Method is a common starting pattern: identify code fragments that can be grouped and named, separate them into explicit methods, and update all call sites. For teams managing multi-file refactors across interconnected modules, IDE-assisted symbol rename can update many references automatically, but it does not guarantee atomic updates across all files. The Context Engine tracks cross-file relationships during rename operations and flags missed references in large codebases.

Version context: Python 3.11.

python

# BEFORE: anonymous inline block with no explicit intent
def printOwing(self):
    self.printBanner()
    print("name:", self.name)
    print("amount:", self.getOutstanding())

# AFTER: extracted method with explicit intent
def printOwing(self):
    self.printBanner()
    self.printDetails(self.getOutstanding())

def printDetails(self, outstanding):
    print("name:", self.name)
    print("amount:", outstanding)

When several parameters frequently appear together across multiple method signatures, the Introduce Parameter Object refactoring improves code readability and encapsulation. Java records, introduced in JEP 395, are a modern approach for modeling simple data carriers.

Version context: Java 17.

java

public record OrderInfo(
    String customer, String product, int quantity,
    String address, String paymentMethod
) {}

public Order createOrder(OrderInfo orderInfo) {
    // Access: orderInfo.customer(), orderInfo.product()
}

Type-based switch statements can indicate missing polymorphism. Replace each branch conditional that checks object type with a subclass. Run the characterization test suite after each individual pattern application, not after a batch of changes.

Continuous test execution tools can provide fast feedback by running affected tests on each save: make a change, save it, and if tests go red, revert immediately. When renaming symbols codebase-wide, aim for logically grouped commits that keep related changes, including tests, documentation, and configuration files, together and digestible for reviewers.

Step 5: Apply Strangler Fig for System-Level Displacement

After incremental refactors make local modules safer to change, system-level modernization often needs a migration pattern that avoids a risky cutover. The goal is to keep production traffic flowing while replacement happens gradually.

For retiring entire subsystems, Martin Fowler's Strangler Fig pattern gradually replaces the legacy system by running the new implementation alongside the old one, with a proxy routing traffic between them during the transition. Teams migrating from monoliths to microservices often apply this pattern as the primary displacement strategy.

A Strangler Fig migration follows a sequence of small routing changes:

Build new functionality alongside the legacy implementation.
Deploy an indirection layer (proxy, gateway, facade, or router) that can direct requests to either path.
Gradually shift traffic and responsibility from old to new.
Reach a point where the legacy component receives no traffic.
Remove the legacy component after a validation window.

This migration structure makes progress visible and keeps rollback practical because the indirection layer can restore routing quickly if mismatches appear. Planning the routing design requires a clear map of upstream entry points and downstream integrations. Augment Code's Context Engine builds this map across repositories through semantic analysis, so teams can spot integration surfaces that would otherwise require manual tracing.

When there is no external entry point to intercept for the Strangler Fig pattern, consider using Branch by Abstraction instead for internal migrations.

Situation	Recommended Strategy
Retiring a large, mission-critical legacy system	Strangler Fig (minimizes disruption, maintains availability)
Evolving an interface with multiple consumers	Parallel Change
Adding tests to untested code	Seam Model and Characterization Tests (complementary techniques)
Large-scale changes with internal dependencies	Branch by Abstraction

These categories guide teams toward the smallest migration strategy that creates a controllable integration point. From there, the next step is to enforce that boundary with CI gates and progressive rollout.

Step 6: Validate with Continuous Integration (CI) Pipeline Gates and Staged Rollout

With old and new paths potentially running side by side, CI gates and staged rollout become the backstop that prevents regressions from escaping into production. This is where teams catch mismatches between "tests pass" and "production is safe."

Open source

augmentcode/review-pr★38

Star on GitHub

A common CI pipeline structure uses a fast stage that runs unit tests with test doubles, followed by a slower stage that runs integration tests against real databases and services. Exact runtimes vary widely by repository size, test strategy, and CI runner capacity.

Establish CI pipeline gates before beginning incremental displacement, and keep the required checks stable so engineers can predict what "safe" means for each commit. Staged deployment strategies can limit blast radius and provide verification windows during incremental rollouts, especially when combined with monitoring and rollback triggers.

For changes that cross service boundaries, reuse the Step 1 dependency map to assess likely blast radius earlier, before a rollout exposes unexpected downstream consumers. The Context Engine supports this assessment through cross-repository dependency graphs, producing a more complete picture of affected services than text-based search alone.

Common Mistakes and Pitfalls

After the core workflow is in place, the most common failures come from avoidable execution mistakes that bypass the safety net. These errors are rarely about a single bad refactor; they come from skipping the sequencing that keeps behavior locked.

A few guardrails prevent most legacy-refactor failures:

Preserve behavior first: characterization tests before structure.
Isolate before optimizing: create seams before deep cleanup.
Keep changes reversible: small commits, frequent test runs, fast reverts.
Assume production gaps: CI gates plus staged rollout and monitoring.

Legacy code refactoring often fails when engineers skip foundational steps. Each mistake below compounds the others, turning manageable tech debt into compounding refactoring debt.

Attempting a big-bang rewrite instead of incremental displacement

Historical rewrite attempts have repeatedly shown how a parallel rewrite can stall shipping while the old system still needs maintenance and fixes. The two-codebase problem doubles maintenance costs and pushes break-even further than originally estimated. Use Strangler Fig to maintain a single deployable system throughout the transition.

Refactoring without characterization test coverage

Restructuring code that looks equivalent can break subtle edge cases embedded in untested paths. The missing behavior may only surface in production. As described in Step 2, characterization tests give refactors a behavioral baseline, and Feathers' sequencing starts by getting those tests running before broader cleanup.

Mixing refactoring commits with feature commits

Many teams discourage mixing intents in the same review cycle: refactor-only changes and feature changes together become too complex to review safely and harder to roll back.

Skipping seam identification before making structural changes

Without mapped seams, changes ripple through multiple system components, breaking seemingly unrelated functionality. As described in Step 3, seam creation turns invasive dependency chains into testable, isolatable interactions.

These pitfalls show up most often when teams are under delivery pressure, so guardrails like small commits, CI gates, and explicit system mapping tend to pay off quickly.

Apply Measurement-First Refactoring to Your Next Legacy Sprint

The safest legacy refactors treat behavior preservation as the first deliverable. In the next sprint, pick one high-churn change point, write one characterization test that captures current outputs, and introduce one seam around the most volatile external dependency so that test can run deterministically.

Augment Code's Context Engine processes 400,000+ files through semantic dependency analysis, surfacing the cross-repository relationships that make legacy refactoring safer.

How to Refactor Legacy Code

TL;DR

Why Legacy Codebases Resist Rewrites

The New Code Review Workflow for AI-Native Engineering Teams

Prerequisites and Setup

Step 1: Identify Change Points and Map the System

Step 2: Write Characterization Tests to Lock Current Behavior

Step 3: Create Seams to Enable Test Isolation

Step 4: Apply Structural Refactoring Patterns Incrementally

Step 5: Apply Strangler Fig for System-Level Displacement

Step 6: Validate with Continuous Integration (CI) Pipeline Gates and Staged Rollout

Common Mistakes and Pitfalls

Attempting a big-bang rewrite instead of incremental displacement

Refactoring without characterization test coverage

Mixing refactoring commits with feature commits

Skipping seam identification before making structural changes

Apply Measurement-First Refactoring to Your Next Legacy Sprint

FAQ

Written by

Molisha Shah

Give your codebase the agents it deserves

TL;DR

Why Legacy Codebases Resist Rewrites

The New Code Review Workflow for AI-Native Engineering Teams

Prerequisites and Setup

Step 1: Identify Change Points and Map the System

Step 2: Write Characterization Tests to Lock Current Behavior

Step 3: Create Seams to Enable Test Isolation

Step 4: Apply Structural Refactoring Patterns Incrementally

Step 5: Apply Strangler Fig for System-Level Displacement

Step 6: Validate with Continuous Integration (CI) Pipeline Gates and Staged Rollout

Common Mistakes and Pitfalls

Attempting a big-bang rewrite instead of incremental displacement

Refactoring without characterization test coverage

Mixing refactoring commits with feature commits

Skipping seam identification before making structural changes

Apply Measurement-First Refactoring to Your Next Legacy Sprint

FAQ

When should characterization tests be written?

How can seams be created in tightly coupled code?

When is Strangler Fig overkill?

How should refactors be validated in CI?

Should existing bugs be "locked in" by characterization tests?

Related

Written by

Molisha Shah

Give your codebase the agents it deserves