Claude Code 2.1.156–159: Plugin Autoloading, Opus 4.8 Fix, Bedrock/Vertex Auto Mode
Three Claude Code releases shipped between May 29 and May 31 that collectively improve the plugin authoring experience and expand auto mode availability across managed cloud platforms.
Version 2.1.156 (May 29) addressed a specific regression: when using Opus 4.8, thinking blocks included in requests were being inadvertently modified before submission, causing API errors. The fix restores correct handling of pre-populated thinking blocks, which matters for workflows that pass structured reasoning context between agent steps.
Version 2.1.157 (May 31) delivered the most developer-facing change. Plugins stored in .claude/skills directories are now loaded automatically without requiring marketplace registration or explicit settings.json entries. This removes the friction of having to publish or register a skill before it becomes available in a session — local skill development now works the same way as local file editing. Alongside autoloading, a new claude plugin init command scaffolds the required directory structure and frontmatter for a new plugin, and the /plugin slash-command gains autocomplete for argument values. Teams that maintain internal skill libraries benefit immediately: dropping a SKILL.md file into .claude/skills/ is sufficient to activate it.
Version 2.1.158 (May 31) made auto mode available on Amazon Bedrock, Google Vertex AI, and Azure Foundry for Opus 4.7 and Opus 4.8. Auto mode dynamically selects the appropriate Claude version for each request based on task complexity. Previously limited to direct Anthropic API connections, its extension to all three major cloud platforms means enterprise teams already using Bedrock or Vertex can opt in without changing providers. The opt-in is via the CLAUDE_CODE_ENABLE_AUTO_MODE=1 environment variable.
Read more — Releasebot / Anthropic
Gemini CLI Retires June 18 — Migrate to Antigravity CLI Now
Google announced that Gemini CLI and the Gemini Code Assist IDE extensions will stop serving requests for Google AI Pro and Ultra subscribers on June 18, 2026. The cutoff is part of a planned consolidation: all developer tools are being unified under the Antigravity platform, which launched at Google I/O 2026 and is already available to all users.
Antigravity CLI is not a rename — it is a new implementation built in Go, with a redesigned server-side harness and a multi-agent architecture that Gemini CLI's single-agent model could not support. The core capabilities that developers relied on in Gemini CLI are preserved: Agent Skills, Hooks, Subagents, and Extensions all carry over. The new additions include asynchronous background task processing and a shared backend architecture with the Antigravity 2.0 desktop application, meaning skills written for the CLI compose with the desktop app without modification.
For teams running Gemini CLI in CI pipelines or automated workflows, the migration path is straightforward: install Antigravity CLI, update any hardcoded gemini binary references to antigravity, and verify that custom Skills and Hooks work as expected. Google is publishing video walkthroughs and migration documentation in the weeks before the cutoff. Enterprise license holders on paid Gemini Code Assist plans are not affected by the June 18 date and will receive a separate migration timeline.
Read more — Google Developers Blog
Ettin Reranker Family: Hugging Face Releases Six Open Rerankers for RAG
Hugging Face released the Ettin Reranker Family on May 19, 2026 — six open-weight CrossEncoder models ranging from 17M to 1B parameters designed for the retrieve-then-rerank pattern used in production RAG and search systems. The family is built on ModernBERT-style Ettin encoders with unpadded attention, Rotary Position Embedding, and support for sequences up to 8,192 tokens.
The benchmark results are striking at the smaller end. The 32M model outperforms BAAI/bge-reranker-v2-m3 (568M parameters) by +0.025 NDCG@10 on the MTEB Retrieval benchmark — 17× fewer parameters for better quality. The 150M model is the strongest reranker under 600M parameters overall, edging out Qwen3-Reranker-0.6B by +0.005 NDCG@10 while running at 3,237 pairs per second on an H100 with Flash Attention 2. That throughput — 2.3× faster than architectural peers at the same parameter count — comes from unpadded inputs flowing through all transformer layers, not just the attention kernel.
The full family uses knowledge distillation from mxbai-rerank-large-v2 (1.54B parameters) as a teacher, trained on approximately 143M query-document-score triples. The training data is released as cross-encoder/ettin-reranker-v1-data with full split provenance across 39 named splits. Integration is three lines via sentence-transformers: load a CrossEncoder, call .predict() with query-document pairs, and rerank your top-K candidates retrieved by an embedding model in the first stage. All models carry Apache 2.0 licenses.
For teams building production search or agentic document retrieval, the 32M or 68M models offer a practical sweet spot: better-than-existing-large-model quality at latencies that fit synchronous request paths on commodity GPU hardware.
Read more — Hugging Face Blog