Security researcher Sean Heelan has uncovered a new zero-day vulnerability in the Linux kernel using OpenAI’s advanced o3 reasoning model. This marks the first time an AI model has identified a security flaw in a complex system like the Linux kernel, which powers millions of servers and devices. The vulnerability has been officially cataloged as CVE-2025-37899.
In a blog post, Heelan explains that he was auditing the ksmbd module for vulnerabilities using OpenAI’s o3 AI model directly through the API, without relying on any external tools. For context, “ksmbd is a Linux kernel server that implements the SMB3 protocol within kernel space to enable file sharing over a network.“
In a blog post, Heelan explains that he was auditing the ksmbd module for vulnerabilities using OpenAI’s o3 AI model directly through the API, without relying on any external tools. For context, ksmbd is a Linux kernel server that implements the SMB3 protocol within kernel space to enable file sharing over a network.
In this instance, the o3 model grasped how concurrent connections to the server worked and pinpointed “a spot where an object without reference counting gets freed but remains accessible by another thread.” Essentially, o3 discovered a serious “use-after-free” vulnerability in the handler for the SMB ‘logoff’ command.
The o3 model analyzed all SMB command handlers around 12,000 lines of code processing roughly 100,000 tokens. A patch fixing the issue has already been submitted and merged into the official Linux kernel repository on GitHub. This marks the first time an AI has discovered a bug, a human confirmed it, an official patch was released, and the vulnerability was successfully resolved.
Interestingly, the researcher discovered the new security bug while benchmarking AI models like Claude 3.7 Sonnet, Claude 3.5 Sonnet, and OpenAI o3 on a separate vulnerability — the Kerberos authentication flaw (CVE-2025-37778). According to Heelan, o3 identified the Kerberos issue in 8 out of 100 runs, Claude 3.7 Sonnet found it in 3 runs, while Claude 3.5 Sonnet failed to detect it in all 100 attempts.
Lastly, the researcher emphasizes that “o3 is not infallible,” but acknowledges that recent reasoning models have made major strides in comprehending large codebases. If your project is under 10,000 lines of code, models like o3 can assist in problem-solving. When it comes to vulnerability research, these advanced models can make your work “significantly more efficient and effective.”