Cursor Composer 2.5 Matches Frontier Benchmarks at One-Tenth the Cost
Cursor shipped Composer 2.5 on May 18, 2026, positioning it as a drop-in replacement for frontier-class coding agents at significantly lower cost. The model achieves 79.8% on SWE-Bench Multilingual and 63.2% on CursorBench v3.1 — scores matching Claude Opus 4.7 and GPT-5.5 on these benchmarks — while priced at $0.50 per million input tokens and $2.50 per million output tokens in the standard variant (a faster tier runs at $3.00/$15.00). Composer 2.5 was trained on 25x more synthetic tasks than its predecessor, using targeted reinforcement learning with textual feedback that allowed nuanced behavior improvements beyond what traditional benchmark-oriented training captures.
The most notable infrastructure addition is multi-repo cloud agent support. Background agents and automations in Cursor now support multi-root workspaces, allowing developers to configure a single agent environment spanning multiple repositories — enabling cross-repo refactors, shared library updates, and monorepo-style coordination across separately versioned projects. Cloud agents now persist these multi-repo configurations across sessions, eliminating per-session setup overhead. Cursor also expanded JetBrains IDE support (added in March 2026) with improved Composer 2.5 integration, removing the VS Code dependency for teams standardized on IntelliJ-based editors.
The Cursor SDK has been updated to expose the Composer 2.5 agent as a callable interface, enabling platform teams to embed Cursor's long-horizon reasoning capabilities inside CI pipelines and custom tooling. For organizations evaluating agentic coding tools, Composer 2.5 fundamentally changes the cost calculus: teams previously priced out of frontier model usage for high-volume tasks can now access comparable quality at commodity inference rates.
Read more — Cursor
GitHub Copilot Overhauls Plans with AI Credits and a New Max Tier
GitHub announced a restructured billing model for Copilot individual plans on May 12, 2026, effective June 1. The existing credit-based approach is being replaced by GitHub AI Credits, where 1 AI credit equals $0.01 USD, and token usage is priced per model according to published API rates. The four tiers — Free, Pro ($10/month), Pro+($39/month), and the new Max ($100/month) — each include base credits that map directly to the subscription price plus variable flex allotments that GitHub can adjust as model economics evolve. Code completions and Next Edit suggestions remain unmetered on all paid plans.
The new Max plan is aimed at heavy individual users who previously hit plan limits frequently: it provides $100 in base credits plus $100 in flex credits per month, the highest included usage of any individual Copilot tier. Existing Pro and Pro+ subscribers will be automatically migrated to the new structure on June 1 with expanded included usage — no action is required. The flex allotment concept is designed to let GitHub pass efficiency gains from cheaper models forward to users while protecting base credit guarantees from pricing volatility.
One structural change with broad impact: the credit system now applies uniformly across the IDE, GitHub.com, and the Copilot CLI, replacing the previous per-product usage bucketing. Teams tracking developer costs will need to update their expense models to use the AI Credits framework, since usage across surfaces now aggregates into a single pool rather than being tracked per integration point. The pricing documentation includes a per-model credit consumption table to help teams model costs before migration.
Read more — GitHub Blog
Claude Code 2.1.147–2.1.149 Brings /code-review, Session Pinning, and Usage Breakdown
Claude Code shipped three meaningful releases between May 21 and May 22, 2026. Version 2.1.147 renamed /simplify to /code-review and added effort levels and a --comment flag for posting inline comments directly on GitHub pull requests — enabling Claude Code to participate in the review workflow without leaving the terminal. Pinned background sessions (Ctrl+T) now stay alive when idle and auto-restart if they crash, making long-running parallel workloads more reliable for agentic CI use cases.
Version 2.1.149 (May 22) added a per-category breakdown to /usage, showing token consumption split by skills, subagents, plugins, and individual MCP servers — giving developers the first clear visibility into what's driving their usage limits. The /diff detail view became keyboard-scrollable (arrows, j/k, PgUp/PgDn, Space, Home/End), and markdown output now renders GFM task list checkboxes, making checklist-driven workflows more usable in the terminal. A new enterprise managed setting allowAllClaudeAiMcps was added for organizations that want to pre-approve all first-party Anthropic MCP servers without individual permission prompts.
Several security fixes landed in both releases: a PowerShell permission bypass via built-in cd functions, a sandbox write allowlist gap in git worktrees, and a permission-analysis gap involving PWD/OLDPWD/DIRSTACK variable tracking were all patched. A find command regression that exhausted macOS file-descriptor tables on large directories was also resolved, which had been causing failures in projects with dense filesystem structures.
Read more — Claude Code Docs
OpenAI Codex Gets Appshots and Goal Mode Graduates to Standard Feature
OpenAI's Codex v26.519 (May 21, 2026) promoted two previously experimental features to standard, alongside a new macOS-specific input method. Goal Mode is now a first-class feature across the Codex app, IDE extension, and CLI — allowing developers to specify an objective that Codex will pursue over hours or days, spanning multiple sessions without requiring manual re-prompting. The codex remote-control command is also stabilized as a foreground command, enabling CI-driven agent runs that a human can observe and interrupt via the Codex UI.
Appshots is a new macOS capability: pressing both Command keys captures the frontmost application window and sends its screenshot plus available text to Codex in context. This means Codex can work from what's visible in a design tool, documentation viewer, or error dashboard without the developer needing to describe or copy content. The use case is particularly useful for visual debugging workflows — pointing Codex at a rendered UI with a visible error state provides richer context than pasting error text alone.
Remote computer use is now enabled, allowing Codex to operate desktop applications after the Mac display locks — useful for long-running automated workflows that span hours. The feature includes safeguards: short-lived authorization, display coverage while locked, re-locking on local input, and a manual-unlock fallback. CLI v0.133.0 (released simultaneously) adds goals with dedicated storage for progress tracking, and codex doctor for connection and environment diagnostics.
Read more — OpenAI Codex Changelog
Docker Gordon AI Agent Reaches General Availability
Docker shipped Gordon — its AI agent for container workflows — as generally available in Docker Desktop 4.74.0, released May 19, 2026. Gordon is now included free with every Docker account and is the first AI agent built natively into Docker Desktop with direct runtime access to running containers, images, Compose files, and working-directory context. Unlike generic coding assistants, Gordon does not require the developer to describe the container environment — it reads logs, inspects running state, and queries configuration autonomously before responding.
The GA release requires explicit approval before Gordon executes any shell command, file modification, or Docker operation, with session-scoped permissions that reset when the session closes. In practice this means developers can delegate container debugging (diagnosing exit loops, missing environment variables, port conflicts) and Dockerfile authoring (multi-stage builds, layer cache optimization, slim image selection) with confidence that Gordon cannot take unsanctioned action. Gordon is also available via the docker ai terminal command for CLI-native workflows.
A paid Gordon Plus tier ($20/month) provides 2x–20x additional usage capacity for teams doing heavy containerization work. The broader significance of this release is architectural: Gordon represents the first production-grade AI agent embedded in a developer tool that has direct, authenticated access to a running infrastructure runtime. Rather than reasoning about infrastructure from file contents, Gordon operates against live state — a pattern other tool vendors are now racing to replicate.
Read more — Docker Blog
NVIDIA Nemotron Diffusion LLMs Achieve 6.4x Speedup Over Autoregressive Models
NVIDIA's Nemotron-Labs published Diffusion Language Models in 3B, 8B, and 14B parameter sizes on Hugging Face on May 23, 2026, claiming a fundamentally different token generation architecture that unlocks substantial throughput gains over autoregressive (AR) models. Unlike AR models that generate one token per forward pass with each token depending on all previous tokens, diffusion language models generate multiple tokens in parallel and iteratively refine them — trading sequential correctness guarantees for massive parallelism on modern GPU hardware.
The 8B Nemotron Diffusion model achieves 1.2% improved average accuracy compared to Qwen3 8B on standard benchmarks, while the throughput gains are striking: diffusion mode delivers 2.6x higher tokens per forward pass than an equivalent AR model, and self-speculation mode (which uses the model to draft and verify simultaneously) achieves 6x–6.4x speedup. At approximately 865 tokens/second on B200 hardware, the 8B model runs at roughly 4x the autoregressive baseline rate.
Each checkpoint supports three generation modes via a single configuration switch — plain autoregressive (for compatibility with existing pipelines), diffusion (maximum throughput), and self-speculation (speed with quality verification). This means teams can adopt the model without modifying their application code and progressively optimize latency by selecting the appropriate inference mode. The models are available under a permissive license from the NVIDIA Nemotron-Labs Diffusion collection on Hugging Face, with training infrastructure via NVIDIA's Megatron Bridge and deployment support through SGLang.
Read more — Hugging Face Blog