Hermes Agent v0.16.0: Native Desktop App, 187k Stars
Nous Research shipped "The Surface Release": a full native desktop app (macOS/Linux/Windows) for Hermes Agent, a web dashboard admin panel, fuzzy model picker, /undo command, NVIDIA/skills trusted tap, and Quick Setup via Nous Portal. 874 commits, 542 merged PRs, 170 contributors. 187k GitHub stars.
Read more →
NVIDIA Nemotron 3 Ultra: 550B MoE for Agent Orchestration
NVIDIA released Nemotron 3 Ultra: 550B parameters with 55B active, built for long-running agent orchestration. Hybrid Mamba-Transformer, NVFP4 quantization (5x throughput), LatentMoE, multi-token prediction. 30% cost savings on agentic tasks. Fully open weights, data, and recipes.
Read more →
Holo3.1: Computer-Use Agents Go Local with NVFP4
H Company released Holo3.1 with mobile automation (AndroidWorld 79.3%), cross-harness support, and quantized FP8/NVFP4/GGUF checkpoints. On DGX Spark, NVFP4 delivers 2x speedup, cutting step time from 6.8s to 3.3s. Sizes from 0.8B to 35B-A3B.
Read more →
Mellum2: JetBrains 12B MoE for Agent Sub-Tasks
JetBrains released Mellum2: a 12B-parameter MoE (2.5B active) for routing, RAG, and sub-agents. Apache 2.0. 2x faster inference than similar-sized models. Designed for high-frequency tasks inside larger AI systems. It's the "focal" model in your agent stack.
Read more →
Pi Agent v0.77.0: Claude Opus 4.8, Exclude Tools, 60.9k Stars
Pi Agent added Claude Opus 4.8 support, --exclude-tools flag for selective tool disablement, headless Codex subscription login, and streaming-aware extension input. 60.9k GitHub stars. Pi is not another agent SDK. That's the whole point.
Read more →
FuriosaAI + Broadcom: New Inference Chiplet Platform
FuriosaAI partnered with Broadcom to develop a 3rd-gen AI accelerator using multi-die chiplet design. RNGD chip is in mass production at TSMC. TCP architecture targets agentic workloads with HBM4/4E and 2nm process. CUDA-alternative SDK ships with PyTorch compiler.
Read more →
Netrasemi A2000: India's First AI Chip Begins Customer Trials
Zoho-backed Netrasemi successfully tested its A2000 AI SoC, ready for edge devices. TSMC 12nm process, targeting smart cameras and automotive. Early trials with 3 customers. Part of India's DLI scheme. A4000 server chip expected Q2 2027.
Read more →
NVIDIA Vera Rubin in Full Production for Agentic AI Factories
NVIDIA announced Vera Rubin NVL72 is now in full production. The platform powers "agentic AI factories" worldwide with 72 Rubin GPUs per rack, NVLink-C2C fabric, and NVIDIA's own Vera ARM CPU. Multi-anchor system design includes Intel Xeon 6 and Groq LP30.
Read more →
Cerebras Hits 2,522 TPS: Wafer-Scale Inference Crown
Cerebras CS-3 delivers 2,522 tokens/s on Llama 4 Maverick (400B), more than 2x NVIDIA Blackwell. Wafer-scale engine holds entire models in SRAM, eliminating memory bottlenecks. Groq LPU follows at 549 TPS. For batch processing and large-scale inference, custom silicon is winning.
Read more →
Mercury 2: Diffusion LLM Hits 629 TPS with Reasoning
Inception Labs launched Mercury 2, the first diffusion-based reasoning LLM. Generates tokens in parallel instead of sequentially. 629 TPS (up to 1,100), 5x faster than leading fast LLMs. Matches Claude 4.5 Haiku on AIME 2025 (91.1%) and GPQA (73.6%).
Read more →