Allow loading remote contents and showing images to get the best out of this email.FAUN.dev's Kubernetes Weekly Newsletter
 
🔗 View in your browser.   |  ✍️ Publish on FAUN.dev   |  🦄 Become a sponsor
 
Allow loading remote contents and showing images to get the best out of this email.
Kaptain
 
#Kubernetes #Docker #DistributedSystems
 
 
📝 A Few Words
 
 
A few years ago, when I started self-publishing, I did neither have a brand nor a platform. I just had notes.

My first book was "SaltStack for DevOps". It was technical, practical, and written from the trenches. The second one was "Painless Docker".

Docker was everywhere: not in servers, but in conversations. It was the new hotness. Every week, I kept meeting engineers who were confused by the same things: "What is an image, really?", "Why does my container disappear?", "Why is this working on my machine but not in CI?", "What does this cgroup thing actually do?" ..

It wasn’t lack of intelligence. It was lack of online resources and documentations.

So I started helping people one by one. In person. On Slack. In workshops. Over coffee. On whiteboards.

And every single time someone had that moment - when the fog lifted - it was a pleasure.

That’s how Painless Docker was born. The goal was simple: remove the mystique around a technology that was totally new at the time.

Last week, I published the second edition.

Revised. Updated. Cleaner explanations. Better examples. Modernized sections with the new features (BuildKit, AI model runner, security, OCI registries, etc). Years of additional field experience distilled into something tighter.

If you’re learning Docker today or if you want to get back to basics, I hope you’ll check it out (and yeah there's a 20% discount code : SENSEIFEBRUARY)

Have a fantastic week!
Aymen
 
 
🔍 Inside this Issue
 
 
Kubernetes is quietly becoming the default runtime for everything from LLM gateways to Slurm-era HPC workloads, and the hidden gotcha is not YAML, it's the threat model and the defaults you forgot to change. If you are shipping AI or Java on k8s this year, these links will either save you pain or validate the pain you already have.

🎥 Building Prod ML API w FastAPI + Kubernetes + Grafana
🛡️ LLMs on Kubernetes: Same Cluster, Different Threat Model
🧬 Migrating from Slurm to Kubernetes
🏛️ Spotlight on SIG Architecture: API Governance
☕ The State of Java on Kubernetes 2026: Why Defaults are Killing Your Performance
🚦 Zero-Downtime Ingress Controller Migration in Kubernetes

Go fix one default you have been ignoring and ship faster tomorrow.

Take care!
FAUN.dev() Team
 
 
⭐ Patrons
 
docs.google.com docs.google.com
 
Call for Presenters: IaCConf 2026 | Real-World Infrastructure as Code & Platform Engineering Talks
 
 
If you’ve managed Infrastructure as Code in production, scaled platforms under pressure, or built guardrails that held up at speed, we want to hear from you. IaCConf 2026 is seeking practitioners to present 40-min sessions on May 14 (virtual). Submit your proposal by April 7.
 
 
👉 Spread the word and help developers find you by promoting your projects on FAUN. Get in touch for more information.
 
🔗 Stories, Tutorials & Articles
 
kubernetes.io kubernetes.io
 
Spotlight on SIG Architecture: API Governance
 
 
Kubernetes SIG Architecture’s API Governance crew is tightening the screws on stability, consistency, and cross-cutting sanity across the whole API surface. Not just REST. They’re eyeing the overlooked stuff too - CLI flags, config formats, anything that shapes how users and tools touch the system.

The big win: GA-level schema validation for Custom Resource Definitions (CRDs). That’s a major step toward bringing discipline to parts of the API ecosystem that used to be “just don’t break it, probably.”
 
 
georg-schwarz.com georg-schwarz.com
 
Zero-Downtime Ingress Controller Migration in Kubernetes
 
 
Ingress-nginx is heading for the exits - end-of-life drops March 2026. That puts Kubernetes operators on the hook to swap in a new ingress controller.

The migration path? Run both old and new in parallel. Use DNS cutover. Point explicitly with Ingress classes. Done right, the switchover hits zero downtime.
 
 
metalbear.com metalbear.com
 
LLMs on Kubernetes: Same Cluster, Different Threat Model
 
 
Running LLMs on Kubernetes opens up a new can of worms - stuff infra hardening won’t catch. You need a policy-smart gateway to vet inputs, lock down tool use, and whitelist models. No shortcuts.

This post drops a reference gateway build using mirrord (for fast, in-cluster tinkering) and Cloudsmith (to track and secure every last artifact)
 
 
akamas.io akamas.io
 
The State of Java on Kubernetes 2026: Why Defaults are Killing Your Performance
 
 
Akamas just dropped fresh numbers: over 60% of Java apps running on Kubernetes stick with default JVM settings. That means sluggish memory use, GC thrash, and CPUs getting choked out.

Even with "container-friendly" Java builds out there, most teams still skip setting GC types or heap sizes. Kubernetes doesn’t play nice with those gaps - performance tanks fast.
 
 
blog.skypilot.co blog.skypilot.co
 
Migrating from Slurm to Kubernetes
 
 
SkyPilot drops a clean interface that blends Slurm with Kubernetes. AI/ML teams get to keep their Slurm-style comforts - job scripts, gang scheduling, GPU guarantees, interactive workflows - but pick up Kubernetes perks like container isolation and rich ecosystem hooks.

It handles the messy bits: pods, containers, networking. Distributed training? Covered. Supports volumes and multi-node storage layouts.
 
 

👉 Got something to share? Create your FAUN Page and start publishing your blog posts, tools, and updates. Grow your audience, and get discovered by the developer community.

 
🎦 Videos, Talks & Presentations
 
youtube.com youtube.com
 
Building Prod ML API w FastAPI + Kubernetes + Grafana
 
 
Ever built an awesome ML model locally, only to hit a wall when deploying it to handle millions of users?

If you're just starting out, inference PaaS options like Bedrock, SageMaker, Azure AI, Vertex, or elastic solutions (ECS/Fargate) can totally get the job done.

But as these options get operationally more expensive, start creating deployment bottlenecks, and miss the features and configurations that are really important for your use case, Kubernetes becomes a great solution to consider.

In this video, you'll learn how to deploy models with FastAPI on Kubernetes (complete with Model Registry, monitoring, and live-tested with 100,000 simultaneous requests).
 
 
 
⚙️ Tools, Apps & Software
 
github.com github.com
 
serhanekicii/openclaw-helm
 
 
Helm chart for OpenClaw - personal AI assistant
 
 
github.com github.com
 
agentkube/agentkube
 
 
Agentkube - Run Kubernetes Like Never Before
 
 
github.com github.com
 
kubekattle/verifier
 
 
A comprehensive Kubernetes configuration verification tool
 
 
github.com github.com
 
sandys/kappal
 
 
Docker Compose CLI for Kubernetes - Run your docker-compose.yaml on Kubernetes without learning Kubernetes.
 
 
github.com github.com
 
skyhook-io/radar
 
 
Modern Kubernetes visibility. Topology, event timeline, and service traffic — plus resource browsing and Helm management.
 
 

👉 Spread the word and help developers find and follow your Open Source project by promoting it on FAUN. Get in touch for more information.

 
🤔 Did you know?
 
 
Did you know that large Kubernetes clusters often hit scaling limits because of etcd watch traffic, not just API writes? Every new Pod or Node increases the number of active watches, which can amplify read load on the control plane, which is why the kube-apiserver uses a watch cache to reduce direct pressure on etcd, as described in the Kubernetes watch cache design and the etcd performance tuning guide. High-churn resources combined with broad list-and-watch patterns can create hot spots and trigger frequent compaction and defragmentation, leading to tail latency spikes before CPU looks saturated. In practice, reducing object churn and watch cardinality is often more effective than simply scaling up control plane nodes.
 
 
🤖 Once, SenseiOne Said
 
 
"Kubernetes promises you won't manage machines; it just moves the work into managing failure modes you used to ignore. Containers make deployments reproducible by making the system state easier to forget. Distributed systems punish that forgetfulness on a schedule you don't control."
— SenseiOne
 

(*) SenseiOne is FAUN.dev’s work-in-progress AI agent

 
⚡Growth Notes
 
 
Write down and commit the invariants for each workload as SLOs (latency, error rate, and budgeted cost per request), then wire HPA/VPA, cluster autoscaler limits, and alert routing to those numbers rather than to CPU percent or generic thresholds. Six months later, when node pressure, quotas, and retries collide, you will still have a single source of truth for which pods are allowed to win and what gets throttled first.
 
Each week, we share a practical move to grow faster and work smarter
 
👤 This Week's Human
 
 
This Week’s Human is Shashank B R, a senior full stack engineer in Cambridge building tools that help bioinformaticians and researchers handle complex biological data. Across Java Spring Boot, Python FastAPI, React, and data platforms like Kafka, Cassandra, and Netflix Hollow, he ships fast, observable systems that make messy datasets queryable.
 
💡 Engage with FAUN.dev on LinkedIn — like, comment on, or share any of our posts on LinkedIn — you might be our next “This Week’s Human”!
 
😂 Meme of the week
 
 
 
 
❤️ Thanks for reading
 
 
👋 Keep in touch and follow us on social media:
- 💼LinkedIn
- 📝Medium
- 🐦Twitter
- 👥Facebook
- 📰Reddit
- 📸Instagram

👌 Was this newsletter helpful?
We'd really appreciate it if you could forward it to your friends!

🙏 Never miss an issue!
To receive our future emails in your inbox, don't forget to add community@faun.dev to your contacts.

🤩 Want to sponsor our newsletter?
Reach out to us at sponsors@faun.dev and we'll get back to you as soon as possible.
 

Kaptain #516: Zero-Downtime Ingress Controller Migration in Kubernetes
Legend: ✅ = Editor's Choice / ♻️ = Old but Gold / ⭐ = Promoted / 🔰 = Beginner Friendly

You received this email because you are subscribed to FAUN.dev.
We (🐾) help developers (👣) learn and grow by keeping them up with what matters.

You can manage your subscription options here (recommended) or use the old way here (legacy). If you have any problem, read this or reply to this email.