Topview Logo
  • Create viral videos with
    GPT-4o + Ads library
    Use GPT-4o to edit video empowered by Youtube & Tiktok & Facebook ads library. Turns your links or media assets into viral videos in one click.
    Try it free
    gpt video

    RVAsec 2024: Caleb Gross / Josh Shomo - Patch Perfect: Harmonizing with LLMs to Find Security Vulns

    blog thumbnail

    Introduction

    Introduction

    Hey there, Richmond! It's time to kick off the tech track at RVAsec. Today, we're going to delve into how we at Bishop Fox have been leveraging large language models (LLMs) to identify security vulnerabilities. My name is Caleb Gross, and I work alongside Josh Shomo. Our team has been focusing on how LLMs can be harmonized to enhance vulnerability research, particularly in the context of end-day vulnerability analysis.

    Basic Concepts

    Endday Vulnerability Analysis

    To start, let's understand what endday vulnerability analysis involves. A zero-day vulnerability is undisclosed and unpatched, a one-day is disclosed but unpatched, and an endday is both disclosed and patched. Our primary goal is to use these patches in the diff analysis for vulnerability discovery.

    Using LLMs for Vulnerability Research

    Initial Steps

    For our research, we used GPT-4 and explored its capabilities for static analysis of pseudo code obtained from decompiled patch diffs. Some early findings showed that GPT-4 significantly outperformed other models.

    Challenges and Considerations

    Several factors influence the success of using LLMs, including the consistency of context, the non-deterministic nature of LLM responses, and the limitations imposed by the context window size. GPT-4, however, has shown promise in vulnerability identification when the study is scoped precisely.

    Practical Application - Case Studies

    Case Study 1: Citrix Gateway

    We started our research with an advisory that included details about a Citrix Gateway vulnerability. An unauthenticated remote code execution bug required us to download, extract, and diff the relevant binaries. We fed these details into GPT-4 and split the task into various sub-tasks. Despite some initial overstated assessments from GPT-4, refining our approach allowed us to prioritize the most relevant functions effectively.

    Case Study 2: FortiGate SSL VPN

    In another example, we used FortiGate SSL VPN's out-of-bounds write vulnerability. This binary data lacked symbols, making it a different challenge. When analyzing this advisory, we altered prompts and managed to rank the vulnerable function as the second most likely candidate, showcasing the method's effectiveness even with unsymbolized binaries.

    Case Study 3: Checkpoint Quantum Gateway

    Recently, we applied the same method to a new patch for Checkpoint Quantum Gateway. Given a larger number of modifications, limitations in context window size became apparent. However, using a multi-sample sort algorithm, which involved averaging scores across multiple runs, we could still rank the accurate fix at the top end of the list.

    Lessons Learned and Future Directions

    Key Findings

    1. Description and Ranking: We now have tools for describing and ranking functions related to patched vulnerabilities.
    2. Comparison and Mitigation: Employing a multi-sample sorting algorithm compensates for non-determinism in LLM responses.
    3. Broader Implications: Our findings support the use of LLMs as effective tools for vulnerability research augmentation.

    Suggested Further Research

    1. Expanded Contexts: Including full incoming and outgoing call trees for more insightful analysis.
    2. Dynamic Testing: Integrating fuzzing or crash attempts to complement static analysis.
    3. Specific Domain Models: Exploring other models designed specifically for code analysis.

    Conclusion

    While LLMs might not replace human analysts in the near future, they can significantly enhance vulnerability assessment by providing a better starting point. This hybrid approach offers a practical way forward in the evolving landscape of cybersecurity research.

    Keywords

    • Large Language Models (LLMs)
    • Vulnerability Analysis
    • Endday Vulnerability
    • Static Code Analysis
    • GPT-4
    • Patch Diff
    • Cybersecurity Research

    FAQ

    What is an endday vulnerability?

    An endday vulnerability refers to a security vulnerability that has been disclosed and patched.

    How effective are LLMs like GPT-4 for vulnerability research?

    LLMs, particularly GPT-4, have shown significant promise in identifying and prioritizing security vulnerabilities but work best when combined with human oversight.

    What challenges did you encounter using LLMs for vulnerability analysis?

    The main challenges included the non-deterministic responses from LLMs and the limitations of context window size. Multi-sample sorting and refining prompts helped mitigate these issues.

    What future improvements could enhance LLM-guided vulnerability research?

    Incorporating more extensive call trees, dynamic testing mechanisms, and exploring specific domain models could further improve the effectiveness of LLM-guided vulnerability research.

    One more thing

    In addition to the incredible tools mentioned above, for those looking to elevate their video creation process even further, Topview.ai stands out as a revolutionary online AI video editor.

    TopView.ai provides two powerful tools to help you make ads video in one click.

    Materials to Video: you can upload your raw footage or pictures, TopView.ai will edit video based on media you uploaded for you.

    Link to Video: you can paste an E-Commerce product link, TopView.ai will generate a video for you.

    You may also like