AI News

A New Era for Blockchain Security: OpenAI and Paradigm Unveil EVMbench

In a decisive move to fortify the intersection of artificial intelligence and decentralized finance, OpenAI has announced a strategic partnership with crypto investment firm Paradigm. The collaboration introduces EVMbench, a comprehensive benchmark designed to evaluate the capabilities of AI agents in detecting, patching, and exploiting smart contract vulnerabilities.

As of February 2026, the crypto ecosystem secures over $100 billion in open-source assets, making it a lucrative target for malicious actors. The release of EVMbench represents a critical shift from theoretical AI application to practical, rigorous testing in economically meaningful environments. By providing a standardized framework, OpenAI and Paradigm aim to accelerate the development of defensive AI systems capable of auditing and strengthening code before it reaches the mainnet.

This initiative underscores a growing recognition that as AI agents become proficient at reading and writing code, they must be rigorously tested against the specific, high-stakes constraints of the Ethereum Virtual Machine (EVM).

Deconstructing EVMbench: The Trinity of Security Tasks

EVMbench is not merely a dataset but a dynamic evaluation environment. It moves beyond static code analysis by immersing AI agents in a sandboxed blockchain environment where they must interact with live bytecode. The benchmark evaluates agents across three distinct but interconnected capability modes, each mimicking a critical phase in the lifecycle of smart contract security.

1. Detect: The Digital Auditor

In the detection mode, agents are tasked with auditing a smart contract repository. The objective is to identify ground-truth vulnerabilities—those that have been confirmed by human auditors—and flag them accurately. Agents are scored based on their "recall," or the percentage of known vulnerabilities they successfully identify. This mode challenges the AI's ability to understand complex logic flows and recognize patterns indicative of security flaws, such as reentrancy attacks or integer overflows.

2. Patch: The Surgical Fix

Perhaps the most complex of the three, the patch mode requires agents to not only find a vulnerability but to fix it. The constraints here are significant: the agent must modify the vulnerable contract to eliminate the exploit while preserving the original intended functionality. This is verified through a suite of automated tests. If an agent "fixes" a bug but inadvertently breaks the contract's core logic or introduces compilation errors, the attempt is marked as a failure. This mimics the real-world pressure on developers to apply hotfixes without disrupting protocol operations.

3. Exploit: The Red Teamer

In this mode, agents act as attackers. They are given a deployed contract in a sandboxed environment and must execute an end-to-end attack to drain funds. Grading is performed programmatically via transaction replay and on-chain verification. This mode is critical for "Red Teaming"—using AI to simulate attacks so that defenses can be battle-tested against the most creative adversarial strategies.

The Dataset: Rooted in Reality

To ensure the benchmark reflects real-world risks, OpenAI and Paradigm curated 120 high-severity vulnerabilities from 40 different audits. The majority of these were sourced from open code audit competitions, such as Code4rena, which are known for surfacing subtle and high-impact bugs.

A notable addition to the dataset includes vulnerability scenarios drawn from the security auditing process for the Tempo blockchain. Tempo is a purpose-built Layer 1 blockchain designed for high-throughput, low-cost stablecoin payments. By including scenarios from Tempo, EVMbench extends its reach into payment-oriented smart contract code, a domain expected to see massive growth as agentic stablecoin payments become commonplace.

The technical infrastructure powering EVMbench is equally robust. It utilizes a Rust-based harness that deploys contracts and replays agent transactions deterministically. To prevent accidental harm, exploit tasks run in an isolated local Anvil environment rather than on live networks, ensuring that the testing ground is safe, reproducible, and contained.

Benchmarking the Frontier: GPT-5.3 Takes the Lead

The launch of EVMbench has provided the first public insights into how the latest generation of AI models performs in the crypto-security domain. OpenAI utilized the benchmark to test its frontier agents, revealing a significant leap in capabilities over the last six months.

The performance metrics highlight a dramatic improvement in "offensive" capabilities, specifically in the exploit mode. The data shows that the latest iteration of OpenAI's coding model, GPT-5.3-Codex, vastly outperforms its predecessor.

Table 1: Comparative Performance in Exploit Mode

Model Version Execution Environment Exploit Success Rate
GPT-5.3-Codex Codex CLI 72.2%
GPT-5 Standard 31.9%
GPT-4o (Reference) Standard < 15.0%

The jump from a 31.9% success rate with GPT-5 to 72.2% with GPT-5.3-Codex indicates that AI agents are becoming exceptionally proficient at identifying and executing exploit paths when given a clear, explicit objective (e.g., "drain funds").

The Offensive-Defensive Gap

However, the benchmark also revealed a persistent gap between offensive and defensive capabilities. While agents excelled at the Exploit task, their performance on Detect and Patch tasks remained lower.

  • Detection Challenges: Agents often stopped auditing after finding a single issue, failing to perform the exhaustive review required to certify a codebase as safe.
  • Patching Complexities: The requirement to maintain full functionality while removing subtle bugs proved difficult. Agents frequently generated patches that fixed the security flaw but broke the contract's intended utility—a "cure is worse than the disease" scenario that is unacceptable in production environments.

Strategic Implications for the Crypto Industry

The collaboration between OpenAI and Paradigm signals a maturing of the "AI x Crypto" narrative. Paradigm, known for its deep technical expertise and research-first approach to crypto investing, provided the domain knowledge necessary to ensure the benchmark's tasks were not just syntactically correct, but semantically meaningful to blockchain developers.

By releasing EVMbench's tasks, tooling, and evaluation framework as open source, the partners are effectively issuing a "call to arms" for the developer community. The goal is to democratize access to high-level security tools, allowing individual developers and small teams to audit their smart contracts with the same rigor as top-tier security firms.

Expanding the Defensive Toolkit: Project Aardvark

In conjunction with the benchmark release, OpenAI announced the expansion of the private beta for Aardvark, their dedicated security research agent. Aardvark represents the practical application of the insights gained from EVMbench—an AI agent specifically fine-tuned for defensive security tasks.

Furthermore, OpenAI is committing $10 million in API credits to accelerate cyber defense research. This grant program focuses on applying the company's most capable models to protect open-source software and critical infrastructure systems, ensuring that the benefits of AI security are distributed widely across the ecosystem.

The Road Ahead

The introduction of EVMbench serves as both a measurement tool and a warning. The rapid improvement in AI's ability to exploit contracts (evidenced by the 72.2% success rate of GPT-5.3-Codex) suggests that the window for "security by obscurity" is closing fast. As AI agents become more capable attackers, defensive tools must evolve at an equal or greater velocity.

For the blockchain industry, this means that AI-assisted auditing will soon graduate from a luxury to a necessity. Future iterations of EVMbench may expand to include multi-chain environments, cross-bridge vulnerabilities, and more complex social engineering attacks, mirroring the evolving threat landscape of Web3.

As we move deeper into 2026, the synergy between OpenAI's reasoning engines and Paradigm's crypto-native insights sets a new standard for how we approach digital trust. The question is no longer if AI will be used to secure smart contracts, but how quickly the industry can adopt these benchmarks to stay ahead of the next generation of automated threats.

Featured
ThumbnailCreator.com
AI-powered tool for creating stunning, professional YouTube thumbnails quickly and easily.
Video Watermark Remover
AI Video Watermark Remover – Clean Sora 2 & Any Video Watermarks!
AdsCreator.com
Generate polished, on‑brand ad creatives from any website URL instantly for Meta, Google, and Stories.
BGRemover
Easily remove image backgrounds online with SharkFoto BGRemover.
VoxDeck
Next-gen AI presentation maker,Turn your ideas & docs into attention-grabbing slides with AI.
Refly.ai
Refly.AI empowers non-technical creators to automate workflows using natural language and a visual canvas.
Skywork.ai
Skywork AI is an innovative tool to enhance productivity using AI.
Qoder
Qoder is an agentic coding platform for real software, Free to use the best model in preview.
FineVoice
Clone, Design, and Create Expressive AI Voices in Seconds, with Perfect Sound Effects and Music.
Flowith
Flowith is a canvas-based agentic workspace which offers free 🍌Nano Banana Pro and other effective models...
FixArt AI
FixArt AI offers free, unrestricted AI tools for image and video generation without sign-up.
Elser AI
All-in-one AI video creation studio that turns any text and images into full videos up to 30 minutes.
Pippit
Elevate your content creation with Pippit's powerful AI tools!
SharkFoto
SharkFoto is an all-in-one AI-powered platform for creating and editing videos, images, and music efficiently.
Funy AI
AI bikini & kiss videos from images or text. Try the AI Clothes Changer & Image Generator!
KiloClaw
Hosted OpenClaw agent: one-click deploy, 500+ models, secure infrastructure, and automated agent management for teams and developers.
Diagrimo
Diagrimo transforms text into customizable AI-generated diagrams and visuals instantly.
SuperMaker AI Video Generator
Create stunning videos, music, and images effortlessly with SuperMaker.
AI Clothes Changer by SharkFoto
AI Clothes Changer by SharkFoto instantly lets you virtually try on outfits with realistic fit, texture, and lighting.
Yollo AI
Chat & create with your AI companion. Image to Video, AI Image Generator.
AnimeShorts
Create stunning anime shorts effortlessly with cutting-edge AI technology.
InstantChapters
Create Youtube Chapters with one click and increase watch time and video SEO thanks to keyword optimized timestamps.
NerdyTips
AI-powered football predictions platform delivering data-driven match tips across global leagues.
WhatsApp AI Sales
WABot is a WhatsApp AI sales copilot that delivers real-time scripts, translations, and intent detection.
happy horse AI
Open-source AI video generator that creates synchronized video and audio from text or images.
AI Video API: Seedance 2.0 Here
Unified AI video API offering top-generation models through one key at lower cost.
insmelo AI Music Generator
AI-driven music generator that turns prompts, lyrics, or uploads into polished, royalty-free songs in about a minute.
wan 2.7-image
A controllable AI image generator for precise faces, palettes, text, and visual continuity.
BeatMV
Web-based AI platform that turns songs into cinematic music videos and creates music with AI.
Kirkify
Kirkify AI instantly creates viral face swap memes with signature neon-glitch aesthetics for meme creators.
Text to Music
Turn text or lyrics into full, studio-quality songs with AI-generated vocals, instruments, and multi-track exports.
UNI-1 AI
UNI-1 is a unified image generation model combining visual reasoning with high-fidelity image synthesis.
Iara Chat
Iara Chat: An AI-powered productivity and communication assistant.
Wan 2.7
Professional-grade AI video model with precise motion control and multi-view consistency.
kinovi - Seedance 2.0 - Real Man AI Video
Free AI video generator with realistic human output, no watermark, and full commercial use rights.
Tome AI PPT
AI-powered presentation maker that generates, beautifies, and exports professional slide decks in minutes.
Lyria3 AI
AI music generator that creates high-fidelity, fully produced songs from text prompts, lyrics, and styles instantly.
Video Sora 2
Sora 2 AI turns text or images into short, physics-accurate social and eCommerce videos in minutes.
Atoms
AI-driven platform that builds full‑stack apps and websites in minutes using multi‑agent automation, no coding required.
AI Pet Video Generator
Create viral, shareable pet videos from photos using AI-driven templates and instant HD exports for social platforms.
Ampere.SH
Free managed OpenClaw hosting. Deploy AI agents in 60 seconds with $500 Claude credits.
Paper Banana
AI-powered tool to convert academic text into publication-ready methodological diagrams and precise statistical plots instantly.
Hitem3D
Hitem3D converts a single image into high-resolution, production-ready 3D models using AI.
HookTide
AI-powered LinkedIn growth platform that learns your voice to create content, engage, and analyze performance.
GenPPT.AI
AI-driven PPT maker that creates, beautifies, and exports professional PowerPoint presentations with speaker notes and charts in minutes.
Create WhatsApp Link
Free WhatsApp link and QR generator with analytics, branded links, routing, and multi-agent chat features.
Palix AI
All-in-one AI platform for creators to generate images, videos, and music with unified credits.
Gobii
Gobii lets teams create 24/7 autonomous digital workers to automate web research and routine tasks.
Seedance 20 Video
Seedance 2 is a multimodal AI video generator delivering consistent characters, multi-shot storytelling, and native audio at 2K.
Veemo - AI Video Generator
Veemo AI is an all-in-one platform that quickly generates high-quality videos and images from text or images.
AI FIRST
Conversational AI assistant automating research, browser tasks, web scraping, and file management through natural language.
WhatsApp Warmup Tool
AI-powered WhatsApp warmup tool automates bulk messaging while preventing account bans.
AirMusic
AirMusic.ai generates high-quality AI music tracks from text prompts with style, mood customization, and stems export.
GLM Image
GLM Image combines hybrid AR and diffusion models to generate high-fidelity AI images with exceptional text rendering.
TextToHuman
Free AI humanizer that instantly rewrites AI text into natural, human-like writing. No signup required.
Manga Translator AI
AI Manga Translator instantly translates manga images into multiple languages online.
ainanobanana2
Nano Banana 2 generates pro-quality 4K images in 4–6 seconds with precise text rendering and subject consistency.
Free AI Video Maker & Generator
Free AI Video Maker & Generator – Unlimited, No Sign-Up
Remy - Newsletter Summarizer
Remy automates newsletter management by summarizing emails into digestible insights.
Telegram Group Bot
TGDesk is an all-in-one Telegram Group Bot to capture leads, boost engagement, and grow communities.

OpenAI and Paradigm Launch EVMbench to Secure Smart Contracts With AI

OpenAI partners with Paradigm on EVMbench benchmark testing AI agents' ability to detect, patch, and exploit blockchain vulnerabilities.