Dataflow Analysis-Inspired Deep Learning for Efficient Vulnerability Detection

Published in ICSE, 2024

Recommended citation: Benjamin Steenhoek, Hongyang Gao, and Wei Le. 2024. Dataflow Analysis-Inspired Deep Learning for Efficient Vulnerability Detection. In 2024 IEEE/ACM 46th International Conference on Software Engineering (ICSE ’24), April 14–20, 2024, Lisbon, Portugal. ACM, New York, NY, USA, 13 pages.

We present DeepDFA, a dataflow analysis-inspired graph learning framework and an embedding technique that enables graph learning to simulate dataflow computation. We show that DeepDFA is both performant and efficient.

  • DeepDFA outperformed all non-transformer baselines.
  • It was trained in 9 minutes, 75x faster than the highest-performing baseline model.
  • When using only 50+ vulnerable and several hundreds of total examples as training data, the model retained the same performance as 100% of the dataset.
  • DeepDFA also generalized to real-world vulnerabilities in DbgBench; it detected 8.7 out of 17 vulnerabilities on average across folds and was able to distinguish between patched and buggy versions, while the highest-performing baseline models did not detect any vulnerabilities.
  • By combining DeepDFA with a large language model, we surpassed the state-of-the-art vulnerability detection performance on the Big-Vul dataset with 96.46 F1 score, 97.82 precision, and 95.14 recall.