OpenAI GPT-5.5-Cyber and the Patch the Planet Initiative
OpenAI released GPT-5.5-Cyber on June 22, 2026 as a cybersecurity-specialized variant of GPT-5.5, developed in partnership with Trail of Bits and tested on CyberGym benchmarks where it scores 85.6% versus 81.8% for standard GPT-5.5. The model is capable of analysing large codebases for security-relevant patterns, tracing code reachability to validate whether a vulnerability is exploitable, developing patches, and producing evidence packages for human review — all in workflows designed for authorized defensive security work.
The launch is paired with Patch the Planet, a Daybreak initiative built around an open-source security sprint with Trail of Bits and HackerOne. Over 30 projects participate, including cURL, Go, Python, Sigstore, and pyca/cryptography. The initial sprint surfaced hundreds of issues, merged dozens of patches, and produced reusable testing workflows covering fuzzing, variant analysis, and differential testing. Trail of Bits engineers used GPT-5.5-Cyber to build an entire fuzzing lab covering dozens of entry points, variant builds, and novel test seeds in under a day — work that would ordinarily take several weeks of manual effort.
The model's practical reach is notable: on the Linux kernel, it identified security-relevant components across more than 30 million lines of code and generated 8 kernel information-leak proofs-of-concept and 24 local privilege escalation exploits, all used for defensive hardening. For developer teams, the Codex Security plugin integrates vulnerability scanning directly into the IDE, embedding security checks at the point where code is written rather than as a separate pipeline stage.
Read more — AI Tools Recap
NVIDIA Cosmos 3: An Open Omni-Model for Physical AI
NVIDIA released Cosmos 3 in June 2026 as the first openly available "omni-model" for physical AI — a single unified architecture that combines video generation, physical reasoning, and action prediction without requiring separate specialised models for each task. The model integrates all capabilities through a shared representation space built on a Mixture-of-Transformers (MoT) architecture, where text, images, video, audio, and actions pass through dedicated encoders and are then handled by either autoregressive or diffusion subsequences depending on whether the task requires reasoning or generation.
Cosmos 3 ships in two sizes designed for different hardware targets. Cosmos 3 Nano (16B parameters) is optimised for workstation GPUs such as the RTX PRO 6000, making physical-AI prototyping accessible without large-scale infrastructure. Cosmos 3 Super (64B parameters) targets data-centre-scale synthetic data generation on Hopper and Blackwell GPUs, where it can produce training sets for robotics and autonomous driving scenarios at speed. Both models are integrated with the Hugging Face Diffusers library through a Cosmos3OmniPipeline class, so developers can start running inference with minimal boilerplate.
The model demonstrates practical proficiency across three physical-AI scenarios: pick-and-place robotics tasks, autonomous driving including edge cases involving road debris, and warehouse safety simulations. Its value for developers building robot training pipelines or simulation systems is that it can generate physically plausible synthetic data across all these modalities from a single model checkpoint rather than stitching together a video generator, a physics model, and a policy network separately. Post-training tools for domain-specific adaptation are also included, which matters for teams working in regulated or safety-critical environments where general-purpose outputs are insufficient.
Read more — Hugging Face
Cohere North Mini Code: A 30B MoE Specialist for Agentic Engineering
Cohere released North Mini Code on June 9, 2026, a 30-billion-parameter Mixture-of-Experts model with only 3 billion active parameters per inference step, designed specifically for agentic software engineering. Despite its efficient active-parameter count, the model outperforms several much larger dense models on standard coding benchmarks, achieving 80.2% pass@10 on SWE-Bench Verified and 55.1% pass@10 on Terminal-Bench v2 — the latter improving by 7.9 percentage points after reinforcement learning with verifiable rewards (RLVR). On the Artificial Analysis Coding Index it scores 33.4, surpassing Nemotron 3 Super (120B) and Mistral Small 4 (119B).
The model's architecture uses 128 experts with 8 activated per token, combined with interleaved attention mechanisms across 5,000 repositories worth of training data covering over 70,000 verifiable tasks. Training followed a two-stage pipeline: supervised fine-tuning first, then reinforcement learning using verifiable rewards drawn from test suites and terminal execution, which proved especially effective at improving Terminal-Bench scores. The model is available under Apache 2.0 in both BF16 and FP8 quantized formats on Hugging Face, via Cohere's API, and natively integrated into the OpenCode agentic engineering environment.
For developers evaluating smaller models for local or API-efficient agentic workflows, North Mini Code offers an important tradeoff: a model that fits on a single high-end workstation GPU (in FP8 quantized form) while delivering performance competitive with models requiring multi-GPU serving. The Apache 2.0 license allows unrestricted commercial use, making it a practical choice for teams building internal developer agents or code review pipelines where deploying large closed-source models is cost-prohibitive.
Read more — Hugging Face
Google Deep Research Max: Autonomous Research with MCP and Native Visualizations
Google released Deep Research Max as part of an evolution of its autonomous research agent family, built on Gemini 3.1 Pro and positioned for long-horizon background workflows rather than interactive sessions. Where the base Deep Research agent optimises for speed and interactivity, Deep Research Max applies extended test-time compute to iteratively reason, search, and refine its output — making it well suited for asynchronous tasks like nightly due diligence reports or comprehensive technical surveys that can run unattended and be ready by morning.
The key new capability for developer and enterprise use is Model Context Protocol (MCP) support, which lets the agent connect to private data sources including financial data providers, proprietary databases, and internal knowledge bases. This shifts Deep Research Max from a general web-research tool into a specialist that can work against an organisation's own data with the same depth it would apply to public sources. Combined with the Gemini 3.1 Pro backbone's 1-million-token context window, the agent can reason across entire code repositories, long document collections, or multi-year financial histories in a single run.
Deep Research Max also introduces native visualisation output: it generates charts and infographics directly within its reports rather than producing text descriptions for a human to manually visualise. Research plans can be reviewed and refined before execution, and the agent can combine web search, code execution, and file search simultaneously within a single analysis. Both Deep Research and Deep Research Max are available in public preview through paid tiers in the Gemini API via the Interactions API, with a rollout to Google Cloud enterprise customers planned.
Read more — Google