We propose a novel technique called Reinforcement Learning from Static Quality Metrics (RLSQM).
- We show that LLMs can generate undesirable test smells. Thus, we utilize Proximal Policy Optimization (PPO) to train models for optimizing a single quality metric at a time and amalgamate these rewards into a unified reward model aimed at capturing different best practices and quality aspects of tests.
- We provide insights into how reliably utilize RL to improve test generation quality and into the effects of various training strategies.
- The RL-optimized model consistently generated high-quality test cases compared to the base LLM, improving the model by up to 21%, and successfully generates nearly 100% syntactically correct code.
- RLSQM also outperformed GPT-4 on four out of seven metrics.
- Our data are available at this https URL.