Allow loading remote contents and showing images to get the best out of this email.FAUN.dev's DevOps / SRE / Platform Engineering Weekly Newsletter
 
🔗 View in your browser.   |  ✍️ Publish on FAUN.dev   |  🦄 Become a sponsor
 
Allow loading remote contents and showing images to get the best out of this email.
 
DevOpsLinks
 
This week in DevOps, with Dolly the Cow
 
 
📝 A Few Words
 
 
Ubuntu is getting AI.

Half of you are excited. The other half are already thinking "this is why I'm switching to Debian" or "time to move to Arch".

Both of you should read this.

Canonical just laid out how AI lands in Ubuntu through 2026, and the SRE-relevant parts are more interesting than the desktop demos:

For your fleet:
👉 Local inference as snaps. One command, hardware-optimized, no Ollama glue code.
👉 SRE-ready agents: log analysis, scheduled maintenance, strict guard rails.
👉 Read-only analysis, scoped permissions, full auditability. Agents inherit your production boundaries.

For the purists:
👉 No mandatory AI assistant.
👉 No data leaves the box by default.
👉 Open weights, snap confinement, opt-in everywhere.
👉 Canonical isn't tracking engineers on AI token usage either.

The interesting bet is what's missing: a cloud-first AI assistant. While Microsoft wires Copilot into Windows and Apple builds Apple Intelligence around its private cloud, Ubuntu is going local-by-default with confined agents that respect the controls you already run.

If you manage Ubuntu fleets, this is worth tracking. If you run servers and you're skeptical of AI in your OS, this is an implementation that's hardest to argue against.

Have a great week!
Aymen
 
 
🔍 Inside this Issue
 
 
AI is creeping into ops, but not where the hype says it will, while the old guard of infra tooling is getting called out for making humans translate intent into ceremony. Toss in a gnarly Kubernetes CPU zombie hunt and a peek at the engineering that keeps Lambda networking off the hot path, and you have a pretty opinionated reading queue.

🧭 AI in SRE: What's Actually Coming in 2026
🧟 Finding zombies in our systems: A real-world story of CPU bottlenecks
🧨 Shift Left Did Not Fix It
⚰️ Terraform is dead
🕸️ The invisible engineering behind Lambda’s network

Steal the good ideas, skip the cargo cults, ship something solid.

Have a great week!
FAUN.dev() Team
 
 
⭐ Patrons
 
faun.dev faun.dev
 
The Helm Course for Engineers Who've Been Burned
 
 
Most engineers can run helm install. Far fewer can explain why their upgrade half-applied at 3 AM.

A practical course on what Helm actually does: state, releases, rendering, hooks, dependencies, rollbacks, GitOps integration, and the failure modes nobody writes blog posts about.

For engineers tired of treating Helm like a black box.

[Start the course →]
 
 
eventbrite.co.uk eventbrite.co.uk
 
Are Your APIs Ready for AI Agents? A Hands-on Workshop on May 23rd
 
 
Are Your APIs Ready for AI Agents? A Hands-on Workshop on May 23rd

AI agents are beginning to autonomously call APIs, chain services, and create integrations that most platforms were never designed to handle. This hands-on masterclass on Designing AI-ready APIs helps architects and developers build governed, predictable API ecosystems using OpenAPI, Overlay, and Arazzo.

Learn how to add guardrails, improve discoverability, and safely evolve existing APIs for automated consumption.

FAUN.dev readers get an exclusive 40% discount using code FAUN40.
 
 
👉 Spread the word and help developers find you by promoting your projects on FAUN. Get in touch for more information.
 
🔗 Stories, Tutorials & Articles
 
grahamgilbert.com grahamgilbert.com
 
Terraform is dead
 
 
Graham Gilbert argues Terraform is effectively dead, kept alive only by inertia: HCL forced engineers to translate intent (the diagrams, paragraphs, and constraints that actually describe systems) into a DSL that nobody naturally thinks in, while fragmenting infrastructure, application logic, policies, and diagrams across representations that never stay in sync. AI removes that translation layer by working directly from diagrams and natural language, interrogating intent, and producing executable code, which makes a static DSL in the middle redundant. If he were starting today, he'd skip HCL entirely and build an intent layer backed by general-purpose code, closer in spirit to Pulumi than Terraform.
 
 
dzone.com dzone.com
 
AI in SRE: What's Actually Coming in 2026
 
 
AI in SRE is evolving, with true value in Root Cause Analysis and Pre-Change Impact Analysis, not autonomous remediation or AI replacing SREs - it's about collaboration and focus evolution.
 
 
brijeshdeb.medium.com brijeshdeb.medium.com
 
Shift Left Did Not Fix It
 
 
Shift left has become a buzzword, but merely moving testing earlier doesn't address the core issue of authority and decision-making in quality assurance. AI may offer quicker testing, but it doesn't comprehend risk like human testers do - beware the dangerous lie that AI can replace thorough, critical testing.
 
 
medium.com medium.com
 
Finding zombies in our systems: A real-world story of CPU bottlenecks
 
 
After a network outage crisis, Pinterest's ML Platform team discovered high Kubernetes agent CPU usage was causing critical Ray training job failures.
The team's deep profiling strategy revealed a rarely seen flaw in how Kubelet was handling memory cgroup iterations.
 
 
allthingsdistributed.com allthingsdistributed.com
 
The invisible engineering behind Lambda’s network   ✅
 
 
AWS engineers explain how the Lambda team rebuilt VPC networking so they can keep per-invocation setup off the hot path and run dense microVM workers at scale.
 
 

👉 Got something to share? Create your FAUN Page and start publishing your blog posts, tools, and updates. Grow your audience, and get discovered by the developer community.

 
⭐ Supporters
 
eventbrite.co.uk eventbrite.co.uk
 
🚀 Join the AI-Powered Platform Engineering – Cohort 2 by Packt!
 
 
Modern platform teams are under pressure to scale cloud-native systems faster while improving reliability, security, developer experience, and operational efficiency. AI is changing how platforms are designed and operated — from intelligent automation and observability to AI-native developer platforms and autonomous operations.

Join leading experts from WSO2, CNCF, cloud-native, and DevSecOps communities for a practical workshop focused on building scalable, secure, and intelligent AI-native platforms.

Register Here: Building AI-Native Platform Engineering Systems Tickets, Saturday, May 30 • 7 PM - 11:59 PM GMT+5 | Eventbrite
 
 
👉 Spread the word and help developers find you by promoting your projects on FAUN. Get in touch for more information.
 
⚙️ Tools, Apps & Software
 
github.com github.com
 
Wondermove-Inc/k-o11y
 
 
K-O11y: Kubernetes Observability Platform (SigNoz + OTel + Beyla)
 
 
github.com github.com
 
juicedata/juicefs
 
 
JuiceFS is a distributed POSIX file system built on top of Redis and S3.
 
 
github.com github.com
 
zhangqi444/open-forge
 
 
AI-guided self-hosting for 950+ open-source apps on any cloud. Works with Claude Code, Codex, Cursor, Aider, OpenClaw, Hermes — catalog self-improves from user feedback.
 
 
github.com github.com
 
huseynovvusal/blamebot
 
 
AI on-call agent that detects deploy failures explains what broke pages the responsible team and rolls back automatically.
 
 
github.com github.com
 
Neilblaze/portscope
 
 
CLI tool to see & manage what's running on your ports
 
 

👉 Spread the word and help developers find and follow your Open Source project by promoting it on FAUN. Get in touch for more information.

 
🤔 Did you know?
 
 
Did you know that when a Kubernetes node fails, the control plane deliberately evicts pods slowly, at a default rate of 0.1 nodes per second (one node every 10 seconds)? If too many nodes go unhealthy in a zone, the rate drops further to 0.01 per second, and in small clusters evictions stop entirely. This is rate-limiting by design, not a stuck scheduler: it stops a partial outage from triggering a cluster-wide stampede of reschedules, at the cost of slower recovery. In large clusters you tune these alongside PodDisruptionBudgets, which cap how many pods of a workload can be down at once.
 
 
🤖 Once, SenseiOne Said
 
 
"Cloud makes failure cheap, so teams buy more of it and call it resilience. SRE is the discipline of paying that bill on purpose, in advance."

— SenseiOne
 

(*) SenseiOne is FAUN.dev’s work-in-progress AI agent

 
😂 Meme of the week
 
 
 
 
❤️ Thanks for reading
 
 
👋 Keep in touch and follow us on social media:
- 💼LinkedIn
- 📝Medium
- 🐦Twitter
- 👥Facebook
- 📰Reddit
- 📸Instagram

👌 Was this newsletter helpful?
We'd really appreciate it if you could forward it to your friends!

🙏 Never miss an issue!
To receive our future emails in your inbox, don't forget to add community@faun.dev to your contacts.

🤩 Want to sponsor our newsletter?
Reach out to us at sponsors@faun.dev and we'll get back to you as soon as possible.
 

DevOpsLinks #529: Terraform is Dead?
Legend: ✅ = Editor's Choice / ♻️ = Old but Gold / ⭐ = Promoted / 🔰 = Beginner Friendly

You received this email because you are subscribed to FAUN.dev.
We (🐾) help developers (👣) learn and grow by keeping them up with what matters.

You can manage your subscription options here (recommended) or use the old way here (legacy). If you have any problem, read this or reply to this email.