Models are getting their own runtime, IDE agents are getting popped by zero‑click tricks, and Google finally put a number on the cost of a prompt. From budget GPUs to buyer profiling, data monetization, and MCP hardening, the details matter—dive deeper below.
🖥️ AI Models Need a Virtual Machine 🧰 Building an AI Server on a Budget ($1.3K) 🛍️ Building Etsy Buyer Profiles with LLMs 💸 Cursor looks into selling your data for AI training ⚡ In a first, Google has released data on how much energy an AI prompt uses 🐛 MCP vulnerability case study: SQL injection in the Postgres MCP server 💼 OpenAI eats jobs, then offers to help you find a new one 🧪 OpenAI reorganizes research team behind ChatGPT's personality 🛠️ Writing effective tools for AI agents—using AI agents 🚨 Zero-Click Remote Code Execution: Exploiting MCP & Agentic IDEs
You’ve got the signal and the scars—now go build with both.
Code. Game. Flow. This 9"×8" (22.86 x 20.32 cm) Binary Matrix mouse pad gives you smooth precision, durable build, and a design every developer will vibe with. Perfect for work or play.
Anysphere—the team behind Cursor, the AI coding sidekick—is looking to license user behavior data to the big model labs: OpenAI, Anthropic, and the usual suspects. Why? Training costs are brutal, and this could ease the burn.
Strategic Implication: Selling real developer telemetry to model competitors? Signals two things: 1) Cursor’s data is juicy, and 2) the race to monetize usage signals is very much on.
OpenAI introduced a new program called "OpenAI Grove" for early tech entrepreneurs to build with AI. The program is aimed at individuals in the pre-idea to pre-seed stage and offers mentoring, access to tools and models, and in-person workshops. Grove's first cohort will run from Oct. 20 to Nov. 21, 2025, with applications closing on Sept. 24.
Google dropped detailed stats on energy, water, and carbon use per query for its Gemini models. Median energy: 0.24 Wh, with TPUs eating 58% of that. They’re claiming a 33× efficiency boost in the last year—credit goes to model and software tuning.
System shift: A public hyperscaler posting this means the industry's inching toward serious, standardized AI climate metrics. About time.
OpenAI just fired a shot across LinkedIn’s bow. Its new jobs platform—part of OpenAI Academy—aims to certify AI skills, then plug users directly into hiring pipelines. Walmart's already on board.
Market signal: OpenAI’s not just training people anymore. It's moving in on talent placement, absorbing the AI jobs funnel into its own ecosystem. Vertical integration, meet the workforce.
OpenAI just folded its Model Behavior team—the crew behind AI personality design and anti-sycophant training—into the Post Training group. Behavior tuning now lives inside the same house as model refinement.
Joanne Jang, who led Model Behavior, now runs OAI Labs, a fresh research unit digging into post-chat interfaces built for real human-AI teamwork.
A nasty SQL injection bug in Anthropic’s now-retired Postgres MCP server let attackers blow past read-only mode and run whatever SQL they wanted. The repo got archived back in May 2025—but it’s far from dead. The unpatched package still racks up 21,000 NPM installs and 1,000 Docker pullsevery week.
A zero-click exploit is making the rounds—nasty stuff targeting agentic IDEs like Cursor. The trick? Slip a malicious Google Doc into the system. If MCP integration and allow-listed Python execution are on, the document gets auto-pulled, parsed, and runs code. No clicks. No prompts. Just remote code execution, data exfiltration, and vibes ruined.
This isn’t a bug. It’s standard behavior. IDE agents are doing exactly what they’re told—grabbing what looks like a legit asset and running it. That’s the problem.
A developer rolled their own AI server for $1.3K—Ubuntu 24.04.2 LTS, an Nvidia RTX GPU, and a sharp eye on Tensor cores, VRAM, and resale value. The rig handles small models locally and punts big jobs to the cloud when needed. Local-first, cloud-when-it-counts.
Anthropic’s sharpening the blueprint for building tools that play nice with LLM agents. Their Model Context Protocol (MCP) leans hard into three pillars: test in loops, design for humans, format like context matters—because it does.
They co-develop tools with agents like Claude Code. That means prototyping side-by-side, pressure-testing with structured evals, and prompt-wrangling tool specs until Claude stops hallucinating and starts calling the right APIs.
Big shift: You're no longer building for checkbox-clicking APIs. You're building for opinionated, non-deterministic models with vibes. Forget rigid abstractions. Focus on flexible scaffolding, tight eval cycles, and giving the model what it needs, when it needs it.
Microsoft and academic researchers want to give AI models a new kind of home: the AI Model Virtual Machine (MVM). Think of it like the JVM, but for LLMs—an interface layer that standardizes how models plug into host software.
The MVM enforces security, isolation, and tool-calling rules, while also unlocking interoperability through protocols like MCP and access frameworks like AC4A and FIDES.
Big picture: This is a shift toward treating LLMs less like magic black boxes and more like proper runtime components—bounded, controllable, swappable. Like OS processes, but weirder.
Every day, nearly 90M buyers look for unique items out of over 100 million listings on the Etsy. The platform uses large language models to create detailed buyer profiles anonymously capturing their interests. Adjustments in data retrieval and processing have reduced the time and cost of generating buyer profiles significantly.
A powerful coding agent toolkit providing semantic retrieval and editing capabilities (MCP server & other integrations)
🤔 Did you know?
Did you know vLLM’s PagedAttention treats the KV cache like a virtual memory page cache, splitting keys and values into fixed-size blocks and mapping logical → physical blocks via a block table? This lets vLLM allocate KV cache on demand, reuse freed blocks immediately, and avoid both internal and external fragmentation. Under mixed sequence lengths, the overhead of block table lookups is often more than paid back by better GPU occupancy and less wasted memory compared with contiguous-KV schemes. You get much higher serving throughput without needing any model changes or complicated CUDA-graph hacks.
🤖 Once, SenseiOne Said
"A model can be right offline and wrong online; MLOps is where that gap becomes an incident. If you can’t diff the data, features, and configs, you’re not debugging—you’re guessing." — SenseiOne
👤 This Week's Human
This Week’s Human is Gursimar Singh, a Google Developers Educator, Author @ freeCodeCamp, and DevOps & Cloud consultant who makes complex systems teachable. They’ve spoken at HAProxyConf 2022, multiple KCDs, and DevOpsDays Warsaw; reviewed programs for OpenTofu Day and PyCon India; and mentored at IIT Madras while volunteering with EuroPython. Offstage, they’ve published 70+ articles reaching 100k+ readers and contributed 5 project write-ups to the Google Dev Library, covering tools from Kubernetes to Terraform.