How AI Code Assistants Are Revolutionizing Test-Driven Development

Ioannis Moustakis March 20, 2025 5 min

How AI Code Assistants Are Revolutionizing Test-Driven Development

With the advent of large language models, the field of AI coding assistants and AI-driven development has rapidly expanded – but oftentimes, when the AI lacks context about a project or when a robust development process is not defined, integrating those tools can prove challenging. In this article, we’ll explore how we can leverage the power of generative AI within the context of Test-Driven Development (TDD) – a proven methodology that emphasizes writing tests before code.

What is Test-Driven Development?

Test-driven development is a well-documented methodology first developed by Kent Beck in the late 1990s as part of Extreme Programming. We’ll review it shortly here, but for more info, take a look at this article.

TDD is built around a disciplined, iterative process:

Starting with an empty test suite,

Write a test that defines a desired behavior and ensure it fails.
Write the minimal code necessary to make the test pass.
Run all tests.
Refactor the code and the test while the existing tests pass.
Repeat.

For product owners familiar with tools like Jira or other project portfolio management (PPM) systems, TDD (especially when expressed as testing desired behaviors, a practice called BDD – behavior-driven development) offers an appealing advantage-it translates business requirements into actionable test cases. This alignment makes it easier to track progress, ensure deliverables match expectations, and reduce miscommunication between technical and non-technical teams. For developers, TDD provides a clear focus, guiding them to write only the code needed to satisfy the defined functionality, reducing bloat and keeping the codebase clean.

However, TDD isn’t without controversy. Critics argue it can feel counterintuitive, especially for creative problem solvers who prefer experimenting before setting constraints. Some developers find it stifling, claiming it limits innovation or slows initial development.

But the essence of TDD lies in the first D, for driven. Your development is led by predefined tests, not spontaneous coding. Under TDD principles, jumping straight into code without first defining the desired outcome is considered a misstep. While it may take practice to adopt, TDD ensures precision and clarity, making it a valuable method for sustainable software development.

TDD and BDD are practices that can be applied in any programming language. A variety of testing frameworks exist that facilitate this way of working, such as behave for Python, Ginkgo and Gomega for Go, Cucumber for Java, and Mocha and Chai for Javascript/Typescript.

The coding pitfalls of Generative AI

If you’ve ever asked ChatGPT to whip up some code, you’ve likely stumbled upon imaginary packages, baffling method signatures, or code that refuses to play nice with your project, resulting in more time wasted trying to retrofit it than it would be to write it from scratch. Similarly, asking an LLM to write unit tests for a function can have deceptive results, with the generated test asserting the behavior – with bugs included.

Test-driven development provides a framework for code generation that acts as user-defined, context-specific “guard rails” for your model or assistant. Let’s see how to execute this in practice.

How to do AI Test-Driven Development

1. Use your code assistant’s IDE plugin to craft acceptance criteria into unit-test stubs

In the development lifecycle, after product owners define user stories, it’s a common practice to define acceptance criteria or scenarios in a structured way, like this:

“Given some initial context…”
“When the user performs an action A…”
“Then the system responds with B…”

This natural, clear, context-aware language happens to be the optimal input to modern LLMs and can be easily transformed into code stubs in almost any programming language. These criteria can be simply copy-pasted into the coding assistant conversation or pulled by API calls or AI agents from your PPM software (Jira, Asana, etc). Then, you can simply prompt your coding assistant with something like “Create a test suite with test stubs that map to these requirements.” This can accelerate the rather daunting and time-consuming step of defining the tests, and there’s a variety of coding assistant tools from which you can choose. If you are looking for a quality-first coding assistant, check out Qodo; we think you will love it!

2. Implement the tests

When writing unit tests, we tend to assume the structure and shape of input and output data, as well as implementation details. For example, if you write a handler for an API endpoint that speaks JSON, your test will include a concrete request with JSON body and an expected JSON response. If your application is caching the responses in Redis, your test will be configuring a mock of a Redis client. By implementing the tests yourself, you can maintain control over the implementation details and implicitly “guide” the generative AI with the step that comes next.

A word of caution: Whether the tests will be implemented by you or by your coding assistant is a decision that is yours (or your business’s) to make. Some argue that it should be a human who tests the code generated by a coding assistant, and not the other way around, since AI tools can sometimes create underhanded, deceiving tests that validate the software’s buggy behavior.

Now that you have written your tests, the next step is straightforward:

3. Have your code assistant write the code

The tests that you have implemented are valuable context for the LLM that will be generating your code, so include them in the conversation or have your IDE-integrated assistant examine the test files. You can start small, with a few tests, and ask the AI to refactor as you add more of your tests.

After you have the first version of your generated code, run your test suite and feed the results back to your assistant. It might take multiple iterations, but you should be able to achieve a good result. A great aspect of TDD and writing the tests first, especially if you include edge cases and boundary conditions, is that your generated code will try to fulfill those requirements from the beginning, which can help you avoid nasty surprises down the road. When iterating with your code assistant, context from code, chat history, and user control are important to avoid falling into time-wasting scenarios, such as fixing the code to pass a failing test causes a previous passing test to fail. How to manage context depends on your specific tool, so make sure to check your coding assistant’s documentation so you can take advantage of its full capabilities. Here’s how to manage context continuity with Qodo.

4. (Bonus) Use AI agents in your DevOps pipelines

As your codebase grows, you’ll find yourself in need of rigorous CI/CD pipelines with well-structured, timely code reviews. Generative AI tools can also help with that, examining your PR’s code or its test outputs through the CI pipeline and providing valuable feedback to you and your team members. Code assistants that work in the DevOps domain can deliver fast feedback on your code and even help provide valuable context to new, onboarding developers or managers.

Advantages and Caveats

Coding with generative AI tools certainly has merit, but it has to be used carefully to avoid nullifying its benefits. A lot of time can be saved for product owners, who can use AI to transform business requirements into acceptance criteria or stubs of fully fledged test suites, and a lot of time can be saved by the developer generating application code or daunting test definitions.

However, to achieve this, a rigorous methodology (such as the proposed test-driven development) should be applied; otherwise, it’s easy to waste the time gained in debugging or, even worse, to let nasty bugs slip out in production. Besides that, it’s possible that a number of iterations will be needed to achieve the final result, with the cost of each compounding due to increasing context. Finally, the code might need manual parameterization or code style changes to align with your business’s guidelines.

Conclusion

Generative AI in testing automation has various facets, from generating test suites to filling out tests, which, in TDD, guide the development process.
Combining GenAI and TDD has strong potential for time and cost savings. It provides a framework fueled by natural language that can be leveraged by both product owners and developers.
Context is king – TDD has much to offer in terms of giving crucial context to code assistants, resulting in better-quality code and shorter iteration times.
Although a lot of code can be generated, manual intervention will probably be needed in various steps of the process until the tooling for generative coding improves and we reach autonomous AI Agents in software. It’s essential to manage expectations from developers to product owners.

Ioannis Moustakis March 20, 2025 5 min