Allow loading remote contents and showing images to get the best out of this email.FAUN.dev's AI/ML Weekly Newsletter
 
🔗 View in your browser   |  ✍️ Publish on FAUN.dev   |  🦄 Become a sponsor
 
Allow loading remote contents and showing images to get the best out of this email.
 
AILinks
 
This week in Generative AI/ML, with Kala the Koala
 
 
📝 A Few Words
 
 
A Cursor agent deleted a production database in 9 seconds.

The system prompt told it not to. The vendor's safety features did not stop it.

A team at PocketOS gave a Cursor agent access to a Railway CLI token. The token was scoped to "manage custom domains". Railway tokens have no per-operation scoping, so that token also held volumeDelete. The agent found it, called it, and the database was gone.

The agent did its job but the permission model failed.

If you are building agents, check three things this week:
- What credentials does your agent hold, and what is the actual scope?
- Where does authorization happen: in code, or in the prompt?
- Which destructive operations run without human confirmation?

A wrong answer to any of those puts you one reasoning step away from a postmortem.

Have a great week,
Aymen
 
 
🔍 Inside this Issue
 
 
Big orgs are quietly turning AI from a chat toy into real infrastructure: graph-shaped ML metadata at Netflix, agent knowledge systems at Meta, and plugin-driven code review at Cloudflare. On the other end of the spectrum, you have one person and a laptop trying to run a local model without it wrecking their workflow, plus AWS productizing MCP into something you can actually depend on.

🎬 Democratizing Machine Learning at Netflix: Building the Model Lifecycle Graph
🧠 How We Built an AI Second Brain for 60K Knowledge Workers
🛡️ Orchestrating AI Code Review at scale
💻 Running local models on an M4 with 24GB memory
☁️ The AWS MCP Server is now generally available

Steal the patterns, skip the hype, and ship something sturdier this week.
Until next time!

FAUN.dev() Team
 
 
⭐ Patrons
 
iacconf.com iacconf.com
 
IaCConf 2026 starts this Thursday!
 
 
Your team can prompt its way to “working” Terraform. The problem is everything the model cannot see. In the IaCConf Keynote, Corey Quinn from Duckbill will break down failure modes, hidden dependencies, and why “it compiles” is a dangerous bar. You’ll walk away with a framework to review and constrain AI-generated IaC safely.

May 14. Free to attend. Register now.
 
 
eventbrite.co.uk eventbrite.co.uk
 
🚀 Join the AI-Powered Platform Engineering – Cohort 2 by Packt!
 
 
Modern platform teams are under pressure to scale cloud-native systems faster while improving reliability, security, developer experience, and operational efficiency. AI is changing how platforms are designed and operated — from intelligent automation and observability to AI-native developer platforms and autonomous operations.

Join leading experts from WSO2, CNCF, cloud-native, and DevSecOps communities for a practical workshop focused on building scalable, secure, and intelligent AI-native platforms.

Register Here: Building AI-Native Platform Engineering Systems Tickets, Saturday, May 30 • 7 PM - 11:59 PM GMT+5 | Eventbrite
 
 
👉 Spread the word and help developers find you by promoting your projects on FAUN. Get in touch for more information.
 
⭐ Sponsors
 
eventbrite.co.uk eventbrite.co.uk
 
Are Your APIs Ready for AI Agents? A Hands-on Workshop on May 23rd
 
 
Are Your APIs Ready for AI Agents? A Hands-on Workshop on May 23rd

AI agents are beginning to autonomously call APIs, chain services, and create integrations that most platforms were never designed to handle. This hands-on masterclass on Designing AI-ready APIs helps architects and developers build governed, predictable API ecosystems using OpenAPI, Overlay, and Arazzo.

Learn how to add guardrails, improve discoverability, and safely evolve existing APIs for automated consumption.

FAUN.dev readers get an exclusive 40% discount using code FAUN40.
 
 
bytevibe.co bytevibe.co
 
The most honest line you have ever shipped
 
 
You wrote it in 2019.
It is still in production.
ByteVibe sells the t-shirt.
Code FAUNDEV10 for 10% off.
 
 
👉 Spread the word and help developers find you by promoting your projects on FAUN. Get in touch for more information.
 
🔗 Stories, Tutorials & Articles
 
netflixtechblog.com netflixtechblog.com
 
Democratizing Machine Learning at Netflix: Building the Model Lifecycle Graph
 
 
Netflix's Saish Sali, Nipun Kumar, and Sura Elamurugu describe the Metadata Service (MDS), a graph layer built to connect siloed ML tooling (model registry, pipeline orchestrator, experimentation platform, feature store, dataset platform, identity) across personalization, studio, payments, and ads.

The system assigns every ML asset a global AIP URI, ingests thin change events from each source over Kafka and SNS/SQS, then hydrates the full state from the source of truth so out-of-order or dropped events self-correct, with Datomic holding entities and reified edges and Elasticsearch powering search.

Background enrichment jobs walk multi-hop chains (model to pipeline run to A/B test cell to experiment) to materialize cross-system relationships, turning queries like "which experiments are running this model" or impact analysis on a feature change into a single graph traversal.
 
 
medium.com medium.com
 
How We Built an AI Second Brain for 60K Knowledge Workers
 
 
Meta built an AI agent system internally called the AI Second Brain that now has over 63,000 installs and ~10,000 daily active users across engineering, PM, design, legal, finance, comms, and sales, growing from zero in roughly three months after a non-technical PM's adoption post. The architecture pairs Tiago Forte's PARA folder framework (Projects, Areas, Resources, Archives) with a root CLAUDE.md plus per-project CLAUDE.md files for progressive disclosure, an infrastructure layer of internal MCP servers and CLIs that give the agent scoped authenticated access to docs, meeting transcripts, task trackers, and code review, and a library of community-written skills as plain Markdown for workflows.The post credits four lessons: invest in the tool-access infrastructure layer before applications, prefer progressive disclosure over context dumping, low-friction bootstrap drives viral adoption, and composable Markdown skills turned the plugin into a platform that users extended themselves.
 
 
aws.amazon.com aws.amazon.com
 
The AWS MCP Server is now generally available
 
 
AWS now offers AWS MCP Server as a managed remote MCP server in US East (N. Virginia) and Europe (Frankfurt). MCP-compatible clients can use existing IAM credentials to access more than 15,000 AWS API operations.

For GA, AWS added IAM context keys, documentation retrieval without authentication, lower token use, server-side Python execution in a sandbox with no network access, and separate CloudWatch and CloudTrail visibility for MCP calls. AWS service teams also maintain Skills for the server.
 
 
blog.cloudflare.com blog.cloudflare.com
 
Orchestrating AI Code Review at scale
 
 
Cloudflare engineers built an AI code review platform on OpenCode.

They split GitLab integration, model providers, prompts, and policy into separate plugins. A coordinator assigns up to seven domain reviewers across security, performance, code quality, documentation, release checks, and AGENTS.md compliance.

They stream review events as JSONL, route work by risk tier, protect each model with circuit breakers and failback chains, and let Workers/KV override model choices. They also track incremental re-review state and deduplicate prompt context to control cost and latency.

In the first 30 days, Cloudflare ran 131,246 reviews across 48,095 merge requests in 5,169 repos. The team reported a 3m39s median runtime and a $0.98 median cost per run.
 
 
jola.dev jola.dev
 
Running local models on an M4 with 24GB memory
 
 
Local LLMs work best as supervised coding assistants. The writer ran Qwen 3.5 9B (Q4) in LM Studio on a 24GB MacBook Pro and got about 40 tokens per second, with thinking mode, tool use, and a 128K context window. The author saw mixed results: Qwen helped with simple Elixir linter edits, then failed a basic git conflict by leaving conflict markers in place and trying to continue the rebase.
 
 

👉 Got something to share? Create your FAUN Page and start publishing your blog posts, tools, and updates. Grow your audience, and get discovered by the developer community.

 
⚙️ Tools, Apps & Software
 
github.com github.com
 
ChatbotXIO/ChatbotX
 
 
Open-source omnichannel chatbot for agentic workflows via APIs, CLI, and MCP. An alternative to Wati, ManyChat, and Respond.io
 
 
github.com github.com
 
systalyze/utilyze
 
 
Utilyze measures how efficiently your GPU is doing useful work, not just whether it's busy. It runs live against your workload with negligible overhead.
 
 
github.com github.com
 
antirez/ds4
 
 
DeepSeek 4 Flash local inference engine for Metal
 
 
github.com github.com
 
njbrake/agent-of-empires
 
 
Manage multiple Claude Code, OpenCode agents from either TUI or Web for easy access on mobile. Also supports Mistral Vibe, Codex CLI, Gemini CLI, Pi.dev, Copilot CLI, Factory Droid Coding. Uses tmux and git worktrees.
 
 
github.com github.com
 
virgiliojr94/book-to-skill
 
 
Turn any technical book PDF into a Claude Code skill — ready to study, reference, and use while you work.
 
 

👉 Spread the word and help developers find and follow your Open Source project by promoting it on FAUN. Get in touch for more information.

 
🤔 Did you know?
 
 
Did you know that vLLM, one of the most widely used engines for serving large language models, speeds up inference by treating GPU memory like an operating system treats RAM? Its core trick, PagedAttention, borrows the idea of paging from OS design: instead of reserving one big contiguous block of memory for each request's KV cache (the stored attention state that grows as the model generates tokens), it splits the cache into small fixed-size blocks that can sit anywhere in GPU memory and are looked up through a block table. That single change lets the same model on the same GPU serve far more concurrent requests, and the original paper measured 2 to 4 times higher throughput than prior state-of-the-art serving systems.
 
 
🤖 Once, SenseiOne Said
 
 
"Your model is allowed to be wrong; your pipeline is not. MLOps is the price of admitting that accuracy is a local variable while reliability is a system property."

— SenseiOne
 

(*) SenseiOne is FAUN.dev’s work-in-progress AI agent

 
⚡Growth Notes
 
 
You shipped a RAG app that wowed the room six months ago, and now most of your week goes to tweaking the retrieval and swapping models while no one has looked at what users are actually asking lately. It still feels like progress because numbers move and PRs ship, but you're getting better at answering last quarter's questions, and the people using it have quietly moved on to new ones.
 
Each week, we share a practical move to grow faster and work smarter
 
😂 Meme of the week
 
 
 
 
❤️ Thanks for reading
 
 
👋 Keep in touch and follow us on social media:
- 💼LinkedIn
- 📝Medium
- 🐦Twitter
- 👥Facebook
- 📰Reddit
- 📸Instagram

👌 Was this newsletter helpful?
We'd really appreciate it if you could forward it to your friends!

🙏 Never miss an issue!
To receive our future emails in your inbox, don't forget to add community@faun.dev to your contacts.

🤩 Want to sponsor our newsletter?
Reach out to us at sponsors@faun.dev and we'll get back to you as soon as possible.
 

AILinks #528: How We Built an AI Second Brain for 60K Knowledge Workers
Legend: ✅ = Editor's Choice / ♻️ = Old but Gold / ⭐ = Promoted / 🔰 = Beginner Friendly

You received this email because you are subscribed to FAUN.dev.
We (🐾) help developers (👣) learn and grow by keeping them up with what matters.

You can manage your subscription options here (recommended) or use the old way here (legacy). If you have any problem, read this or reply to this email.