Benjamin Steenhoek, Hongyang Gao, and Wei Le. 2024. Dataflow Analysis-Inspired Deep Learning for Efficient Vulnerability Detection. In 2024 IEEE/ACM 46th International Conference on Software Engineering (ICSE ’24), April 14–20, 2024, Lisbon, Portugal. ACM, New York, NY, USA, 13 pages. https://doi.org/10.1145/3597503.3623345
Ph.D. student seeking machine learning research internship.
Research interests are the intersection of machine learning and software engineering.
Experience in deep learning, machine learning, program analysis, software development.
PhD Computer Science
- Research interests: deep learning-based vulnerability detection, graph neural networks, ML for SE.
MS Computer Science
- Thesis: Refactoring programs to improve the performance of deep learning for vulnerability detection.
- Released code as open-source library cfactor.
- GPA 3.91/4.00.
BS Computer Science
- Magna Cum Laude honors (GPA 3.84/4.00).
- First author, "An Empirical Study of Deep Learning Models for Vulnerability Detection" (accepted ICSE 2023; 26% acceptance rate).
- First author, "Dataflow Analysis-Inspired Deep Learning for Efficient Vulnerability Detection." (accepted ICSE 2024).
- Contributing author, "TRACED: Execution-aware Pre-training for Source Code" (accepted ICSE 2024).
- Contributing author, "Validating static warnings via testing code fragments" (accepted ISSTA 2021; 27% acceptance rate).
- Conducted novel research resulting in 2 first-author conference papers in submission to ICSE24, one accepted and one presently under review.
- Collaborated with ARiSE lab at Columbia University and CERT lab @ Carnegie-Mellon University.
- Improved experiment iteration time by building tools for static analysis (tree-climber) and dynamic analysis and code generation (pal-tools).
- Enabled collaboration on experiments by acquiring and maintaining bug benchmarks for use in experiments.
- Conducted research on improving large language models such as Codex using reinforcement learning.
Software Developer Intern
- Fall 2021: Democratized public datasets by adding GIS capability for geolocation and remote sensing.
- Summer 2020: Widened customer reach by integrating AgFiniti with John Deere data platform.
- Summer 2018: Enabled agronomic analysis by maintaining a domain-specific language using Antlr.
- Instructed 30 students in weekly labs and office hours.
- Volunteered to create a GUI visualization for Conway's Game of Life (gol-gui) to increase student engagement.
Freelance Software Developer
- Collaborated with 2 other developers to create Amazon product listing web app using C#, ASP.NET Core, SQL Server, and Azure cloud services.
Yangruibo Ding, Benjamin Steenhoek, Kexin Pei, Gail Kaiser, Wei Le, and Baishakhi Ray. 2024. TRACED: Execution-aware Pre-training for Source Code. In 2024 IEEE/ACM 46th International Conference on Software Engineering (ICSE ’24), April 14–20, 2024, Lisbon, Portugal. ACM, New York, NY, USA, 12 pages. https://doi.org/10.1145/3597503.3608140
Benjamin Steenhoek, Md Mahbubur Rahman, Richard Jiles, and Wei Le. 2023. An Empirical Study of Deep Learning Models for Vulnerability Detection. In Proceedings of the 45th International Conference on Software Engineering (ICSE 2023). Association for Computing Machinery, New York, NY, USA.
Guo, X., Joshy, A. K., Steenhoek, B., Le, W., & Flynn, L. (2023). A Study of Static Warning Cascading Tools (Experience Paper) (arXiv:2305.02515). arXiv.
Steenhoek, Benjamin. (2022). Refactoring programs to improve the performance of deep learning for vulnerability detection (Poster). Presented at: Iowa State University 6th Annual Research Day.
Steenhoek, Benjamin. (2021). Refactoring programs to improve the performance of deep learning for vulnerability detection (Order No. 28648161). Available from Dissertations & Theses @ Iowa State University; ProQuest Dissertations & Theses Global. (2625295478).
Ashwin Kallingal Joshy, Xueyuan Chen, Benjamin Steenhoek, and Wei Le. 2021. Validating static warnings via testing code fragments. In Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2021). Association for Computing Machinery, New York, NY, USA, 540–552.
- cfactor: Scalable, policy-driven refactoring for C programs (Python/srcML).
- tree-climber: Scalable program analysis tools for C built on tree-sitter (Python).
- pal-tools: Dynamic analysis and code generation, using Intel Pin (C++) and LLVM (Python).
- rarl: Reproduction of Robust Adversarial Reinforcement Learning (Pinto et al. 2017) (PyTorch/stable-baselines).
- animal-cognitive: Deep reinforcement learning models with embodied animal cognition (PyTorch/rllib).
- precise-interrupts: Reproducing a historical interrupt handling paper in ARM architecture (C++/gem5).
- Machine Learning & data scraping: PyTorch, rllib, pandas, numpy, Selenium, beautifulsoup.
- Web Development: Vue, ASP.NET Core, .NET Framework, SQL Server, Azure Functions, ACI, VMs, ML Studio.
- Computer architecture and program analysis: Antlr, LLVM, Intel Pin, gem5, abstract interpretation, fuzzing.
- DevOps: Git, Azure DevOps, and CI/CD, Slurm batch processing, Linux server administration.
- Science education outreach at Greenville County Juvenile Detention, Fall 2018/Spring 2019
- Vice president of Phi Beta Chi society, Spring 2018