Share
Subscribe to the AlphaWire Newsletter
OpenAI and Paradigm introduced EVMbench on February 19, 2026, a benchmark designed to assess AI agents’ capabilities in identifying, exploiting, and patching high-severity vulnerabilities in Ethereum Virtual Machine smart contracts. This launch matters now in the face of rising demand for automated security tools, as recent high-profile exploits continue to drain billions from DeFi protocols and AI agents increasingly show promise in code auditing tasks.
Introducing EVMbench—a new benchmark that measures how well AI agents can detect, exploit, and patch high-severity smart contract vulnerabilities. https://t.co/op5zufgAGH
— OpenAI (@OpenAI) February 18, 2026
The tool uses 120 curated vulnerabilities from 40 audits, primarily from open code audit competitions, along with scenarios inspired by Paradigm-backed Tempo blockchain security processes.
These high-severity bugs include common real-world categories such as reentrancy, access control failures, arithmetic overflows/underflows, oracle manipulation, improper authorization, flash loan exploits, and logic errors leading to fund drainage, types that have historically caused the majority of major DeFi losses.
The benchmark assigns a percentage-based score to agents, summing up their effectiveness in auditing contracts, patching issues while maintaining functionality, and exploiting vulnerabilities. It works by presenting agents with vulnerable code samples and evaluating their outputs against predefined criteria for detection accuracy, patch viability, and exploit success.
We recently saw a $1.78M exploit caused by a vulnerability written by Claude Opus 4.6.
cbETH was priced at $1 instead of $2,000.
Not long after @OpenAI launched EVMbench. To put it simply, it’s a benchmark that evaluates AI agents' ability to interact with smart contracts.… pic.twitter.com/Tz9XjveKFd
— The Smart Ape 🔥 (@the_smart_ape) February 19, 2026
EVMbench operates in three modes: detect, patch and exploit.
EVMbench runs AI agents inside isolated Ubuntu 24.04 Docker containers pre-loaded with Foundry, feeding them the original audit scope, automated findings, and sponsor hints to closely replicate real-world auditor workflows, while intentionally disabling web access to prevent external lookups or cheating.
so let me get this straight, majority of crypto is in existential crisis while
– blackrock partners with uniswap
– apollo partners with morpho
– openai rolls out evmbench
– hyperliquid launches us policy centergenuinely some of the best news this space has seen in a very long…
— niko (@saintniko) February 18, 2026
This setup makes programmatic grading of detection, patching, and exploitation tasks possible by executing the agent’s actions in a local Ethereum environment. It also verifies outcomes through on-chain state changes, balance deltas, and events rather than subjective human review.
The developers note that while EVMbench’s vulnerabilities are realistic and high-severity, the benchmark does not capture the full complexity of real-world smart contract security. It focuses on isolated scenarios rather than comprehensive system audits, potentially limiting its representation of production environments.
EVMbench establishes a standardized, quantifiable benchmark for AI agents in smart contract security similar to Hugging Face’s Open LLM Leaderboard but specialized for EVM risks, offering clear comparisons across agents while pointing out gaps in current capabilities. Unlike general-purpose LLM leaderboards or narrower cyber evaluations such as CyBench, EVMbench focuses on end-to-end real-world tasks, detection, patching with functional preservation, and exploit execution.
The 120-vulnerability dataset provides some level of nuance over simpler tests, but its curated nature sacrifices breadth for depth, potentially underrepresenting edge cases or chain-specific issues. Looking at the long-term, it could shift security workflows toward hybrid human-AI models. However, success in 2026 would include developing the model further to capture the full realities of smart contract security in the real world.
Share
