🔍 Inside this Issue
Chat is turning into the runtime, agents are muscling past RAG, and production bugs are quietly rewriting the rules on model quality. I pulled the sharp threads—UI pilots, open safety audits, modular agent stacks—so you can dig into the details below and ship with fewer surprises.
🧪 Anthropic Launches Petri: Open-Source Tool for AI Safety Audits
🐛 Anthropic unveils three infrastructure bugs behind Claude's performance issues
🧩 ChatGPT Launches Interactive Apps with New Apps SDK Preview
🤖 Google DeepMind Launches Gemini 2.5 Model for Enhanced API Performance
🛠️ OpenAI Launches AgentKit: Streamline Agent Development for Enterprises
🏗️ Technical Tuesday: 10 best practices for building reliable AI agents in 2025
🪦 The RAG Obituary: Killed by Agents, Buried by Context Windows
⚡ Write Deep Learning Code Locally and Run on GPUs Instantly
You’ve got the patterns - turn them into leverage.
Have a great week!
FAUN.dev Team
ℹ️ News, Updates & Announcements

faun.dev
Google DeepMind just dropped Gemini 2.5 Computer Use, now in public preview via API. It's built to control web and mobile UIs with scary precision.
Feed it a request, a screenshot, and some action history. It churns out the right function calls - clicks, typing, navigation - fast and tight.
System shift: LLMs aren’t just chatting anymore. This moves them into full-on UI pilots. Iterative, autonomous steps. Agents are coming.

faun.dev
Anthropic dug into three gnarly production bugs that were quietly messing with Claude’s outputs. Culprits: broken context routing, bad TPU API configs, and a flaky TPU compiler. Fixes? Rewired the routing logic, rolled back some changes, and teamed up with the XLA:TPU crew. They’ve now beefed up on-prod evals and debugging.
System shift: AI infra teams can’t just ship and hope. Model fidelity now hinges on stress-testing across messy hardware stacks - TPUs, GPUs, Trainium, whatever’s in the mix.

faun.dev
Anthropic dropped Petri, an open-source tool that stress-tests LLMs for bad behavior. It uses autonomous agents and judge models to sniff out risky outputs - no human babysitting required.
Turns out, even models from OpenAI, Google, xAI, and Anthropic itself slip up more than you'd expect.

faun.dev
OpenAI just dropped a preview of the Apps SDK for ChatGPT. It lets devs embed interactive apps straight into ChatGPT convos. Think custom logic, custom UI, backend hooks - fully in-chat.
It runs on the open Model Context Protocol (MCP) and supports activation by name or the model’s own vibe check: context-based suggestions.
System shift: Chat’s no longer just the UI. It’s the app. The SDK makes chat-native workflows feel more like building with Lego and less like shoehorning UX into prompts.

faun.dev
OpenAI dropped AgentKit, a full-stack toolkit for building, shipping, and fine-tuning AI agents. It expands the earlier Agents SDK and Responses API - now with much sharper edges.
New toys: build apps inside ChatGPT itself, test things out with the Agent Builder (still in beta), and tap into the Connector Registry or ChatKit for smoother integration and evaluation.
What changed: OpenAI’s platform just went modular. Agent workflows aren’t one long pipeline anymore - they’re a grid of snap-together pieces.
🔗 Stories, Tutorials & Articles

nicolasbustamante.com
Agent-based setups are starting to edge out old-school RAG. As LLMs snag multi-million-token context windows and better task chops, the need for chunking, embeddings, and reranking starts to fade. Claude Code, for example, skips all that - with direct file access and smart navigation instead. Retrieval isn't dead, but it's morphing into something far more agentic.
Bigger picture: Bigger windows and sharper attention mean LLMs can now process whole documents and run tasks directly - no more stitching together fragments just to get work done.

openpipe.ai
New product, Serverless RL, available through collaboration between CoreWeave, Weights & Biases, and OpenPipe. Offers fast training, lower costs, and simple model deployment. Saves time with no infra setup, faster feedback loops, and easier entry into RL training.

aiengineering.academy
Modal cuts the drama out of deep learning ops. Devs write Python like usual, then fire off training, eval, and serving scripts to serverless GPUs - zero cluster wrangling. It handles data blobs, image builds, and orchestration. You focus on tuning with libraries like Unsloth, or serving via vLLM.

uipath.com
UiPath just dropped Agent Builder in Studio - a legit development environment for AI agents that can actually handle enterprise chaos. Think production-grade: modular builds, traceable steps, and failure handling that doesn’t flake under pressure.
It’s wired for schema-driven prompts, tool versioning, and DeepRAG to lock in relevant context. Model-agnostic deployment? Yep. Plus sharp evals and trace logs to keep things safe, sane, and audit-friendly.