AWS and Cerebras: Disaggregated AI Inference via Amazon Bedrock
AWS and Cerebras Systems announced a partnership on March 13, 2026 to deliver ultra-fast AI inference through Amazon Bedrock using Cerebras CS-3 wafer-scale processors deployed inside AWS data centres. The technical approach, called inference disaggregation, splits the inference workload into two specialised stages rather than running the entire forward pass on a homogeneous GPU cluster.
In the disaggregated architecture, AWS Trainium processors handle the prefill stage — the computationally intensive, massively parallel step of processing the input prompt — while Cerebras CS-3 systems handle the decode stage, where tokens are generated sequentially and the bottleneck shifts to memory bandwidth rather than compute throughput. The CS-3's wafer-scale design delivers thousands of times greater memory bandwidth than standard GPU architectures, directly addressing the memory wall that limits token generation speed on conventional hardware. AWS Elastic Fabric Adapter networking connects the two stages with low latency, and the entire stack runs on AWS Nitro System infrastructure.
The claimed performance benefit is 5x more token capacity within the same hardware footprint compared to current solutions, described as "an order of magnitude faster" for demanding workloads. Cerebras names agentic coding assistance as the primary use case — scenarios where developer productivity is gated on inference speed, such as real-time code review, multi-step agent loops, and interactive debugging. The solution will be accessible through the existing Amazon Bedrock API, meaning applications already integrated with Bedrock will not require code changes to use Cerebras-backed endpoints. Open-source LLMs and Amazon Nova models are planned to run on Cerebras hardware later in 2026. The rollout was expected within months of the March announcement.
Read more — Amazon / Cerebras
Google Cloud: Veo 3.1 Lite Public Preview and April 2 Release Notes
Google Cloud's April 2, 2026 release notes contain several developer-relevant promotions to general availability and new preview capabilities. The most attention-grabbing entry is Veo 3.1 Lite, which entered public preview as the most cost-efficient model in the Veo video generation family. For developers building content generation pipelines or multimodal applications, Veo 3.1 Lite provides access to video generation at a lower cost point than the full Veo 3.1 model.
Cloud SQL for SQL Server read pools reached general availability, supporting 1-to-7 node configurations with automatic load balancing across read replicas. Read pools provide horizontal scaling for read-heavy workloads without requiring application-level routing logic — queries directed at the read pool endpoint are distributed across available nodes automatically. Both horizontal scaling (adding nodes) and vertical scaling (changing machine type) are supported for read pool instances, and they integrate with Cloud SQL's existing monitoring and maintenance window configuration.
BigQuery's Snowflake-to-GoogleSQL translation reached general availability, enabling automated migration of SQL queries, stored procedures, and scripts from Snowflake to BigQuery's GoogleSQL dialect. The GA release maps Snowflake's INTEGER and zero-scale NUMERIC types to INT64 in GoogleSQL, resolving a common compatibility gap that caused manual intervention during migrations. A separate BigQuery update brings DDL statements — CREATE CONNECTION, ALTER CONNECTION SET OPTIONS, and DROP CONNECTION — for managing Cloud resource connections in GoogleSQL, enabling infrastructure-as-code patterns for BigQuery external data source management.
Read more — Google Cloud