Reinforcement Learning from Automatic Feedback for High-Quality Unit Test Generation

Published in DeepTest (ICSE Workshop), 2025

Recommended citation: Benjamin Steenhoek, Michele Tufano, Neel Sundaresan, and Alexey Svyatkovskiy. 2025. Reinforcement Learning from Automatic Feedback for High-Quality Unit Test Generation. In 2025 Sixth International Workshop on Deep Learning for Testing and Testing for Deep Learning (DeepTest ’25), April 27–May 3, 2025, Ottawa, Canada. https://arxiv.org/abs/2412.14308

We propose a novel technique called Reinforcement Learning from Static Quality Metrics (RLSQM).

We show that LLMs can generate undesirable test smells. Thus, we utilize Proximal Policy Optimization (PPO) to train models for optimizing a single quality metric at a time and amalgamate these rewards into a unified reward model aimed at capturing different best practices and quality aspects of tests.
We provide insights into how reliably utilize RL to improve test generation quality and into the effects of various training strategies.
The RL-optimized model consistently generated high-quality test cases compared to the base LLM, improving the model by up to 21%, and successfully generates nearly 100% syntactically correct code.
RLSQM also outperformed GPT-4 on four out of seven metrics.
Our data are available at this https URL.

Share on

Twitter Facebook LinkedIn