AI Dev Patterns: Harness Productivity Report, Hermes on NVIDIA, Ollama CVEs, MDASH, 2026-05-18
ai

AI Dev Patterns: Harness Productivity Report, Hermes on NVIDIA, Ollama CVEs, MDASH, 2026-05-18

7 min read

Harness Report: AI Has Outpaced How Engineering Organisations Measure Productivity

A Harness study of 700 engineering professionals surfaces a structural problem in how AI-assisted development is being managed: while 89% of engineering leaders report productivity gains from AI tools, the measurement frameworks those leaders are using were designed for a pre-AI world and are now systematically misleading them.

The most concrete finding is the invisible work gap. Approximately 31% of developer time is now spent on activities that traditional productivity metrics — lines of code, story points, commit frequency — do not track. The dominant category of invisible work is AI output validation: the report finds that 81% of developers are spending significantly more time on code review than before AI tools were introduced, because reviewing AI-generated code requires the same depth of scrutiny as any other code, just with higher volume. Leaders who see velocity metrics going up are not seeing the corresponding increase in review burden going down — the aggregate picture looks positive, but individual developers are experiencing higher cognitive load.

The trust gap between leaders and developers is striking. Managers are nearly four times more likely than developers to say they have no concerns about how AI productivity data is used. Developers, closer to the actual work, are more aware that the metrics being reported do not capture what is actually happening — including the technical debt being accumulated when AI-generated code passes review without catching structural issues. The report's recommendation is that engineering organisations need to add code quality indicators, cognitive load measures, and explicit validation time tracking to their productivity frameworks before using AI-era velocity numbers to make headcount or investment decisions.

Read more — PR Newswire / Harness


Hermes Self-Improving Agents on NVIDIA RTX and DGX Spark: Local AI Gets Serious

NVIDIA and Nous Research have jointly detailed how Hermes Agent runs on NVIDIA RTX consumer PCs and the DGX Spark workstation, a deployment mode that brings self-improving agentic capabilities to local hardware rather than cloud APIs. The combination addresses a key friction point for developers who want agentic workflows without the latency, cost, or data-privacy implications of cloud inference.

The self-improvement mechanism is the architectural differentiator: Hermes agents write and refine their own skills based on task execution feedback. When an agent attempts a task and encounters failure modes, it can update the skill definition responsible and retry — a feedback loop that does not require human intervention to improve the agent's capability over time. This is distinct from RAG-based memory systems that retrieve existing knowledge; the agent is literally modifying its own instruction set. The practical consequence is that a Hermes deployment improves at repetitive tasks over time, making it particularly valuable for development workflows that repeat — running test suites, generating boilerplate, maintaining documentation — where each iteration teaches the agent about the codebase's conventions.

The DGX Spark (128GB unified memory, 1 petaflop AI compute) paired with Alibaba's Qwen 3.6 35B model demonstrates that models matching 120B-parameter performance can run locally at roughly one-sixth the memory footprint. For developers evaluating local inference, this narrows the gap between what is achievable on local hardware versus cloud inference for most agentic coding tasks. The framework integrates with llama.cpp, LM Studio, and Ollama, meaning developers already using those inference backends can add Hermes's agentic layer without changing their model serving setup. Hermes has reached 140,000 GitHub stars in under three months, a growth rate that reflects genuine developer adoption rather than hype-driven attention.

Read more — NVIDIA Blog


Safe & Secure AI Agent Practices

Microsoft MDASH: Multi-Model Agentic Security System Tops Industry Benchmark

Microsoft has introduced MDASH (Multi-Model Agentic Scanning Harness), an internal security system that orchestrates over 100 specialised AI agents to autonomously discover and formally prove software vulnerabilities. The system represents a shift from using AI as a single-model security assistant to deploying it as an agentic pipeline with structured stages — and it is producing results that exceed what either human analysts or single-model AI approaches have achieved.

MDASH operates across five pipeline stages: Prepare (defining the attack surface and generating hypotheses), Scan (running specialised agents against specific code regions), Validate (filtering out false positives through cross-model verification), Dedup (eliminating overlapping findings), and Prove (generating formal proof-of-concept demonstrations for confirmed vulnerabilities). The use of an ensemble of frontier and distilled models across these stages is deliberate: different models have different strengths at different stages, and the pipeline architecture allows each stage to use the model best suited to its task rather than forcing a single model to handle everything.

The initial production deployment found 16 new vulnerabilities in core Windows components, including four critical remote code execution flaws. These include a TCP/IP race condition (CVE-2026-33827) and an IKEv2 double-free (CVE-2026-33824) — classes of vulnerability that are notoriously difficult to find through manual code review and that conventional static analysis tools miss. MDASH currently holds the top score on the CyberGym security benchmark at 88.45%, a significant margin over the previous best. For security teams, the implication is that agentic pipelines purpose-built for security are now demonstrably more effective than single-model approaches — and that defenders can use the same architecture that makes agents powerful offensively to find vulnerabilities before attackers do.

Read more — Microsoft Security Blog


Critical Ollama Vulnerabilities: "Bleeding Llama" Memory Leak and Windows Updater Flaws

Three security vulnerabilities in Ollama were disclosed in a single week in early May, making it a significant security moment for the local LLM ecosystem. The most severe is CVE-2026-7482, dubbed "Bleeding Llama" — a memory disclosure vulnerability in the GGUF model loader that allows unauthenticated attackers to leak process memory, including API keys, environment variables, and other sensitive data present in the Ollama process at runtime.

The Bleeding Llama vulnerability is present in all Ollama versions before 0.17.1. Exploitation requires no authentication — any client that can reach the Ollama HTTP API (which binds to all network interfaces by default, not just localhost) can trigger the memory leak. For developers who expose Ollama to a local network or run it in a shared environment, the risk is concrete: any sensitive credentials loaded into the environment alongside Ollama are potentially accessible to other network participants. The fix is in Ollama 0.17.1, which should be treated as an immediate mandatory upgrade. Separately, developers should audit their Ollama network binding configuration and restrict the service to the loopback address unless there is a specific reason to expose it more broadly.

The two Windows-specific flaws (CVE-2026-42248 and CVE-2026-42249) affect the Ollama updater rather than the inference engine itself. One bypasses updater signature verification; the other exploits path traversal to write files into the Windows Startup folder, enabling persistent code execution across reboots. Both were fixed in the tagged release that merged on May 11, but Windows users should verify they are running a version that includes these specific patches rather than relying solely on the version number. The combination of a network-accessible memory leak and Windows persistence flaws in a single week is a reminder that local inference tools — which developers often treat as low-risk because they run locally — have the same attack surface as any networked service.

Read more — Mondoo


Stanislav Lentsov

Written by

Stanislav Lentsov

Software Architect

You May Also Enjoy