Hey there, Richmond! It's time to kick off the tech track at RVAsec. Today, we're going to delve into how we at Bishop Fox have been leveraging large language models (LLMs) to identify security vulnerabilities. My name is Caleb Gross, and I work alongside Josh Shomo. Our team has been focusing on how LLMs can be harmonized to enhance vulnerability research, particularly in the context of end-day vulnerability analysis.
To start, let's understand what endday vulnerability analysis involves. A zero-day vulnerability is undisclosed and unpatched, a one-day is disclosed but unpatched, and an endday is both disclosed and patched. Our primary goal is to use these patches in the diff analysis for vulnerability discovery.
For our research, we used GPT-4 and explored its capabilities for static analysis of pseudo code obtained from decompiled patch diffs. Some early findings showed that GPT-4 significantly outperformed other models.
Several factors influence the success of using LLMs, including the consistency of context, the non-deterministic nature of LLM responses, and the limitations imposed by the context window size. GPT-4, however, has shown promise in vulnerability identification when the study is scoped precisely.
We started our research with an advisory that included details about a Citrix Gateway vulnerability. An unauthenticated remote code execution bug required us to download, extract, and diff the relevant binaries. We fed these details into GPT-4 and split the task into various sub-tasks. Despite some initial overstated assessments from GPT-4, refining our approach allowed us to prioritize the most relevant functions effectively.
In another example, we used FortiGate SSL VPN's out-of-bounds write vulnerability. This binary data lacked symbols, making it a different challenge. When analyzing this advisory, we altered prompts and managed to rank the vulnerable function as the second most likely candidate, showcasing the method's effectiveness even with unsymbolized binaries.
Recently, we applied the same method to a new patch for Checkpoint Quantum Gateway. Given a larger number of modifications, limitations in context window size became apparent. However, using a multi-sample sort algorithm, which involved averaging scores across multiple runs, we could still rank the accurate fix at the top end of the list.
While LLMs might not replace human analysts in the near future, they can significantly enhance vulnerability assessment by providing a better starting point. This hybrid approach offers a practical way forward in the evolving landscape of cybersecurity research.
An endday vulnerability refers to a security vulnerability that has been disclosed and patched.
LLMs, particularly GPT-4, have shown significant promise in identifying and prioritizing security vulnerabilities but work best when combined with human oversight.
The main challenges included the non-deterministic responses from LLMs and the limitations of context window size. Multi-sample sorting and refining prompts helped mitigate these issues.
Incorporating more extensive call trees, dynamic testing mechanisms, and exploring specific domain models could further improve the effectiveness of LLM-guided vulnerability research.
In addition to the incredible tools mentioned above, for those looking to elevate their video creation process even further, Topview.ai stands out as a revolutionary online AI video editor.
TopView.ai provides two powerful tools to help you make ads video in one click.
Materials to Video: you can upload your raw footage or pictures, TopView.ai will edit video based on media you uploaded for you.
Link to Video: you can paste an E-Commerce product link, TopView.ai will generate a video for you.