| |
| 🔗 Stories, Tutorials & Articles |
| |
|
| |
| Monitoring LLM behavior: Drift, retries, and refusal patterns |
| |
| |
| Traditional software is predictable due to determinism, while generative AI is unpredictable. Engineers need a new infrastructure layer, the AI Evaluation Stack, to ship enterprise-ready AI products. The stack includes deterministic assertions and model-based assertions to ensure structural integrity and semantic quality. |
|
| |
|
| |
|
| |
| The AI engineering stack we built internally - on the platform we ship |
| |
| |
| Cloudflare wired AI into the engineering stack. LLM traffic funnels through a proxy Worker and AI Gateway. It shipped Workers AI and the Agents SDK. Daily users hit 3,683 (93% R&D). MR throughput climbed to ~10,952/week. Workers AI handled 51B input tokens and cut a security agent's inference spend by 77%. |
|
| |
|
|
| |
|
| |
| Multi-Agent System Reliability |
| |
| |
| LLMs are unreliable out of the box, but multi-agent systems can improve by dividing work among specialized agents. Building robust systems involves leveraging human system patterns like hierarchy, consensus, adversarial debate, and knock-out in a multi-agent architecture to ensure correctness and reliability. To combat LLMs' stochastic nature, utilize multiple models in parallel to cancel out noise and improve accuracy. It's crucial to treat LLMs as unreliable components in a distributed system, emphasizing constraint, verification, pruning, and challenges over anthropomorphizing them. |
|
| |
|
| |
|
| |
| Introducing the Agent Readiness score. Check to see if your site is agent-ready |
| |
| |
Cloudflare launched IsItAgentReady. It scans 200k domains, scores agent readiness, publishes weekly adoption charts, and exposes results via an API.
It checks robots.txt, llms.txt, content negotiation via Accept: text/markdown, API Catalog, .well-known/mcp.json, OAuth discovery, and x402 payments.
Cloudflare overhauled docs to serve Markdown endpoints. It publishes an Agent Skills index and runs a stateless MCP server for programmatic agent access. |
|
| |
|
| |
|
| |
| An open-weights Chinese model just beat Claude, GPT-5.5, and Gemini in a programming challenge |
| |
| |
The AI Coding Contest Day 12 matched ten models on a sliding‑letter puzzle. Open‑weights Kimi K2.6 took first: 22 match points (7‑1‑0). MiMo V2‑Pro scored second by blasting claims for intact ≥7‑letter seeds (43 points). GPT‑5.5 and Claude Opus 4.7 landed third and fifth.
Grids ran 10×10→30×30. Heavy scrambling made active sliding the deciding move. Static scanners and brittle claimers like Muse crashed or tanked.
System shift: an open‑weights Kimi K2.6 besting frontier models changes who can run near‑frontier inference locally and forces teams to rethink deployment. |
|
| |
|
| |
👉 Got something to share? Create your FAUN Page and start publishing your blog posts, tools, and updates. Grow your audience, and get discovered by the developer community. |