Reinforcement Learning from Automatic Feedback for High-Quality Unit Test Generation

Published in ArXiv, 2023

Recommended citation: Benjamin Steenhoek, Michele Tufano, Neel Sundaresan, and Alexey Svyatkovskiy. (2023). Reinforcement Learning from Automatic Feedback for High-Quality Unit Test Generation. ArXiv. https://arxiv.org/abs/2310.02368

We propose a novel technique called Reinforcement Learning from Static Quality Metrics (RLSQM).

  • We show that LLMs can generate undesirable test smells. Thus, we utilize Proximal Policy Optimization (PPO) to train models for optimizing a single quality metric at a time and amalgamate these rewards into a unified reward model aimed at capturing different best practices and quality aspects of tests.
  • We provide insights into how reliably utilize RL to improve test generation quality and into the effects of various training strategies.
  • The RL-optimized model consistently generated high-quality test cases compared to the base LLM, improving the model by up to 21%, and successfully generates nearly 100% syntactically correct code.
  • RLSQM also outperformed GPT-4 on four out of seven metrics.
  • Our data are available at this https URL.