We present DeepDFA, a dataflow analysis-inspired graph learning framework and an embedding technique that enables graph learning to simulate dataflow computation. We show that DeepDFA is both performant and efficient.
- DeepDFA outperformed all non-transformer baselines.
- It was trained in 9 minutes, 75x faster than the highest-performing baseline model.
- When using only 50+ vulnerable and several hundreds of total examples as training data, the model retained the same performance as 100% of the dataset.
- DeepDFA also generalized to real-world vulnerabilities in DbgBench; it detected 8.7 out of 17 vulnerabilities on average across folds and was able to distinguish between patched and buggy versions, while the highest-performing baseline models did not detect any vulnerabilities.
- By combining DeepDFA with a large language model, we surpassed the state-of-the-art vulnerability detection performance on the Big-Vul dataset with 96.46 F1 score, 97.82 precision, and 95.14 recall.