Developer Tools Digest: Copilot Multi-Model, Codex 0.130.0, Hermes v0.13.0, and Ollama MLX, 2026-05-14

Microsoft 365 Copilot April 2026: Multi-Model Flexibility and Microsoft Agent 365 GA

The April 2026 update to Microsoft 365 Copilot is the most significant release in months, driven by two strategic moves: the addition of Anthropic's Claude as an alternative model alongside OpenAI in Word, and the general availability of Microsoft Agent 365 for enterprise agent building.

The multi-model integration in Word is noteworthy because it marks the first time a major productivity suite has exposed model choice as a first-class feature at the document level. Users can select Claude for drafting long-form content—capitalising on its extended context window and nuanced writing style—while retaining OpenAI models for tasks where they have established preferences. For developers building on the Microsoft 365 platform, this signals that Microsoft's Copilot infrastructure is increasingly model-agnostic, with the underlying agent layer abstracting provider specifics.

Copilot Notebooks received significant capability upgrades: public web grounding via user-supplied URLs allows notebooks to pull in live reference material, and direct PowerPoint generation from stored references reduces a multi-step workflow to a single Copilot instruction. Excel users gain a long-requested feature—local file analysis on both Windows and Mac—removing the previous requirement to upload files to SharePoint or OneDrive for AI-assisted data work.

The GA of Microsoft Agent 365 is the developer-facing highlight. Combined with the improved Agent Builder, this gives teams a supported path to create, deploy, and govern custom organisational agents that integrate with the M365 data graph. Enterprise developers who have been piloting agents in preview now have stable APIs and support contracts to build against.

Read more — Microsoft Tech Community

OpenAI has released Codex 0.130.0, a release that focuses on extensibility and multi-environment workflows rather than core model changes. The three headline additions—plugin sharing, a remote-control server command, and a Chrome Extension—each address a distinct friction point for teams running Codex at scale.

Plugin sharing arrives with granular access controls and a discovery mechanism, allowing teams to publish internal Codex plugins to colleagues without exposing them broadly. This is a meaningful step for organisations building proprietary tool integrations on top of Codex: a team's custom database query plugin or internal API wrapper can now be shared with specific users or groups with audit-trail visibility. The enhanced discoverability layer surfaces relevant plugins contextually during agent sessions.

The codex remote-control command enables headless, remotely commandable app-server instances—a capability aimed squarely at CI/CD integration and automated agent pipelines. Rather than requiring a local interactive session, Codex can be launched as a persistent server and driven programmatically, enabling it to participate in multi-step build and review workflows without human interaction. The Chrome Extension opens a parallel path: Codex actions can now be triggered directly from the browser, enabling side-by-side agent workflows across multiple tabs.

Earlier updates in the 0.1xx series added persistent goal workflows with terminal UI controls and a permission profile system for sandboxed execution—foundational features that the 0.130.0 sharing and remote-control capabilities build on.

NousResearch Hermes Agent v0.13.0: Multi-Agent Kanban and 8 Security Closures

NousResearch has released Hermes Agent v0.13.0, dubbed "The Tenacity Release," with two themes: dramatically expanded agent coordination capabilities and a significant security hardening pass.

On the coordination side, the headline feature is a multi-agent Kanban system that allows users to delegate tasks to AI teams with durable boards and retry budgets. Where previous versions of Hermes treated agent interactions as largely single-session affairs, the Kanban system introduces persistence across sessions: boards survive restarts, retry budgets prevent runaway task loops, and the system tracks task state across multiple collaborating agent instances. This pairs with the new /goal command, which enables agents to lock onto objectives across conversation turns using the Ralph loop pattern—a form of goal persistence that prevents an agent from abandoning a complex multi-step objective mid-execution when the conversation context shifts.

Multimodal capabilities were expanded with native video analysis support for compatible models (including Gemini), and text-to-speech options now include xAI custom voice cloning. Connectivity broadens significantly: Google Chat joins the supported channel list, and MCP gains SSE transport and OAuth forwarding, enabling Hermes to operate as an MCP client against secured enterprise tooling.

The security story is substantial: eight vulnerabilities were closed in this release, with default-enabled redaction of sensitive data being the most impactful for production deployments. Checkpoints v2 rewrites state persistence with a more reliable backend, reducing the risk of state corruption on unexpected termination. For teams deploying Hermes in enterprise environments or exposing it to external inputs, the security closures make 0.13.0 a mandatory upgrade.

Ollama Previews MLX-Powered Inference on Apple Silicon

Ollama has released a preview build integrating Apple's MLX machine learning framework, bringing hardware-optimised inference to Apple Silicon Macs and delivering measurable speed improvements for developers running large language models locally.

The integration targets the M5 chip's unified memory architecture and GPU accelerators, showing gains in both prefill speed (how fast the model processes the input prompt) and decode speed (how fast tokens are generated). For developers using Ollama as their local inference layer—whether for AI coding assistants, test harnesses, or production offline environments—the MLX backend makes running 7B to 30B parameter models on MacBooks meaningfully faster without any configuration changes to existing workflows.

The release also adds support for NVIDIA's NVFP4 quantisation format, which improves output quality at a given memory footprint compared to earlier quantisation schemes, benefiting users on both Apple Silicon and NVIDIA GPU systems. The caching system received a significant overhaul: shared prefix caching now reuses stored key-value pairs across requests that share common prompt prefixes (a common pattern in system-prompt-heavy workflows), and the eviction strategy was updated to be smarter about retaining high-reuse cache entries.

The MLX integration is currently a preview and ships as an opt-in mode rather than the default backend on macOS. The Ollama team is collecting performance feedback and bug reports ahead of a stable release.

Developer Tools Digest: Copilot Multi-Model, Codex 0.130.0, Hermes v0.13.0, and Ollama MLX, 2026-05-14

Microsoft 365 Copilot April 2026: Multi-Model Flexibility and Microsoft Agent 365 GA

NousResearch Hermes Agent v0.13.0: Multi-Agent Kanban and 8 Security Closures

Ollama Previews MLX-Powered Inference on Apple Silicon

Links & Sources

Stanislav Lentsov

You May Also Enjoy

Spring Ecosystem Update: Spring Boot 4.0.6, CVE Fixes, and Security Hardening, 2026-05-14

Java News: GraalVM Monthly Releases, Quarkus Agent MCP, and Ecosystem Updates, 2026-05-14

AI Dev Patterns: Hugging Face Spring 2026 Report, Agent RCE Vulnerabilities, and Sandboxing, 2026-05-14