Allow loading remote contents and showing images to get the best out of this email.FAUN.dev's Kubernetes Weekly Newsletter
 
🔗 View in your browser.   |  ✍️ Publish on FAUN.dev   |  🦄 Become a sponsor
 
Allow loading remote contents and showing images to get the best out of this email.
Kaptain
 
#Kubernetes #Docker #DistributedSystems
 
 
🔍 Inside this Issue
 
 

Scale is exploding (EKS at 100K nodes, DRA for GPUs) while assumptions crack (DNS‑based GitOps, VPA’s limits). If you want right‑sizing that sticks, observability that holds at 80M series, and a saner container stack, the sharp details are inside.


🚀 Amazon EKS Enables Ultra-Scale AI/ML Workloads with Support for 100K Nodes per Cluster

🧮 Dynamic Kubernetes request right sizing with Kubecost

🧨 Kubernetes DNS Exploit Enables Git Credential Theft from ArgoCD

🎛️ Kubernetes Primer: Dynamic Resource Allocation (DRA) for GPU Workloads

🧩 Kubernetes right-sizing with metrics-driven GitOps automation

⚖️ Kubernetes VPA: Limitations, Best Practices, and the Future of Pod Rightsizing

🧠 Rethinking Efficiency for Cloud-Native AI Workloads

📈 Scaling Prometheus: Managing 80M Metrics Smoothly

🛡️ The Quiet Revolution in Kubernetes Security

🐧 Why I Ditched Docker for Podman (And You Should Too)


You’ve got sharper levers now—pull them and ship.


Have a great week!
FAUN.dev Team
 
 
⭐ Patrons
 
bytevibe.co bytevibe.co
 
Binary Matrix Mouse Pad – Built for Devs 🚀
 
 
Code. Game. Flow. This 9"×8" (22.86 x 20.32 cm) Binary Matrix mouse pad gives you smooth precision, durable build, and a design every developer will vibe with. Perfect for work or play.

👉 Get yours today for €12,95 – ships in 2-9 days.
 
 

👉 Spread the word and help developers find you by promoting your projects on FAUN. Get in touch for more information.

 
ℹ️ News, Updates & Announcements
 
cyberpress.org cyberpress.org
 
Kubernetes DNS Exploit Enables Git Credential Theft from ArgoCD
 
 
A new attack chain messes with Kubernetes DNS resolution and ArgoCD’s certificate injection to swipe GitHub credentials. With the right permissions, a user inside the cluster can reroute GitOps traffic to a fake internal service, sniff auth headers, and quietly walk off with tokens.

What’s broken: GitOps pipelines are trusting internal DNS and certs way too much. That blind trust? It’s leaving CI/CD creds wide open.
 
 
blocksandfiles.com blocksandfiles.com
 
Lucidity turns spotlight onto Kubernetes storage costs
 
 
Lucidity has upgraded its AutoScaler. It now handles persistent volumes on AWS-hosted Kubernetes, automatically scaling storage and reducing waste.

The upgrade brings pod-level isolation, fault tolerance, and bulk Linux onboarding. Azure and GCP are next on the list.
 
 
thenewstack.io thenewstack.io
 
Kubernetes Primer: Dynamic Resource Allocation (DRA) for GPU Workloads
 
 
Kubernetes 1.34 brings serious heat for anyone juggling GPUs or accelerators. Meet Dynamic Resource Allocation (DRA)—a new way to schedule hardware like you mean it.

DRA adds ResourceClaims, DeviceClasses, and ResourceSlices, slicing device management away from pod specs. It replaces the old device plugin clunk for a cleaner, CEL-powered model that actually scales.

Big picture: Kubernetes gets smarter about where stuff runs. DRA pushes it toward topology-aware, parameter-tuned provisioning. Think tighter resource sharing, leaner clusters, and hardware that works for you—not the other way around.
 
 
aws.amazon.com aws.amazon.com
 
Dynamic Kubernetes request right sizing with Kubecost
 
 
Kubecost’s Amazon EKS add-on now handles automated container request right-sizing. That means teams can tweak CPU and memory requests based on actual usage—once or on a recurring schedule.

Optimization profiles are customizable, and resizing can be baked into cluster setup using Helm. Yes, that means resource efficiency meets infrastructure-as-code.

Why it matters: Built-in automation for cost-tuned requests points to a bigger trend—Kubernetes resource management that’s proactive, policy-driven, and less of a guessing game.
 
 
aws.amazon.com aws.amazon.com
 
Kubernetes right-sizing with metrics-driven GitOps automation
 
 
AWS just dropped a GitOps-native pattern for tuning EKS resources—built to run outside the cluster. It’s wired up with Amazon Managed Service for Prometheus, Argo CD, and Bedrock to automate resource recommendations straight into Git.

Here’s the play: it maps usage metrics to templated manifests, then spits out pull requests suggesting better CPU and memory configs. Git stays the source of truth. No manual tweaking. No guessing games.
 
 
infoq.com infoq.com
 
Amazon EKS Enables Ultra-Scale AI/ML Workloads with Support for 100K Nodes per Cluster   ✅
 
 
Amazon EKS just cranked its Kubernetes cluster limit to 100,000 nodes—a 10x jump. The secret sauce? A reworked etcd with an internal journal system and in-memory storage. Toss in tight API server tuning and network tweaks, and the result is wild: 500 pods per second, 900K pods, 10M+ objects, no sweat—even under real AI/ML load.

What changed: This blows up the old playbook. Instead of juggling multiple clusters for scale, teams can now run massive ML workloads on a single, packed control plane. Fewer moving parts. Fewer headaches. More actual work done.
 
 

👉 Got something to share? Create your FAUN Page and start publishing your blog posts, tools, and updates. Grow your audience, and get discovered by the developer community.

 
🔗 Stories, Tutorials & Articles
 
medium.com medium.com
 
Rethinking Efficiency for Cloud-Native AI Workloads
 
 
AI isn’t just burning compute—it's torching old-school FinOps. Reserved Instances? Idle detection? Cute, but not built for GPU bottlenecks and model-heavy pipelines.

What’s actually happening: Infra teams are ditching cost-first playbooks for something smarter—business-aligned orchestration that chases performance, not just savings. It's less “trim the fat,” more “feed the model.”
 
 
cloudpilot.ai cloudpilot.ai
 
Kubernetes VPA: Limitations, Best Practices, and the Future of Pod Rightsizing
 
 
Kubernetes' Vertical Pod Autoscaler (VPA) tries to be helpful by tweaking CPU and memory requests on the fly. Problem is, it needs to bounce your pods to do it. And if you're also running Horizontal Pod Autoscaler (HPA) on the same metrics? Now they're fighting over control.

VPA sees a narrow slice of the world. No awareness of cluster topology. No big-picture view of workload patterns. It leans on short-term signals and brute-force restarts—not great if you're running critical, multi-region systems or trying to scale cleanly.
 
 
codesmash.dev codesmash.dev
 
Why I Ditched Docker for Podman (And You Should Too)
 
 
Transitioning from Docker to Podman is seamless due to Podman's alignment with the OCI container format. Key advantages include enhanced security, no single points of failure, and a lighter resource footprint. Podman offers features like systemd integration and Kubernetes alignment. For many teams, it's a straightforward choice for container management.
 
 
kapillamba4.medium.com kapillamba4.medium.com
 
Scaling Prometheus: Managing 80M Metrics Smoothly   ✅
 
 
Flipkart ditched its creaky StatsD + InfluxDB stack for a federated Prometheus setup—built to handle 80M+ time-series metrics without choking. The move leaned into pull-based collection, PromQL's firepower, and hierarchical federation for smarter aggregation and long-haul queries.

Why it matters: Prometheus federation makes multi-tenant observability not just possible, but scalable—even with high cardinality and messy sprawl.
 
 
darkreading.com darkreading.com
 
The Quiet Revolution in Kubernetes Security
 
 
Nigel Douglas discusses the challenges of security in Kubernetes, particularly with traditional base operating systems. Talos Linux offers a different approach with a secure-by-default, API-driven model specifically for Kubernetes. CISOs play a critical role in guiding organizations through the shift to modern, cloud-native architectures like Talos Linux.
 
 

👉 Got something to share? Create your FAUN Page and start publishing your blog posts, tools, and updates. Grow your audience, and get discovered by the developer community.

 
⚙️ Tools, Apps & Software
 
github.com github.com
 
spiceratops/k8s-gitops
 
 

A home lab cluster running like a mini prod. Declarative from top to bottom: Talos Linux handles the OS, Flux wires up GitOps, Terraform provisions the ground, and GitHub Actions runs the plumbing. Renovate keeps dependencies fresh—no clickers needed.

 
 
github.com github.com
 
mykso/myks
 
 

Myks is a tool and a framework for managing the configuration of applications for multiple Kubernetes clusters. It helps to reuse, mutate, and share the configuration between applications and clusters.

 
 
github.com github.com
 
kubewall/kubewall
 
 

kubewall - Single-Binary Kubernetes Dashboard with Multi-Cluster Management & AI Integration. (OpenAI / Claude 4 / Gemini / DeepSeek / OpenRouter / Ollama / Qwen / LMStudio)

 
 
github.com github.com
 
cerbos/cerbos
 
 

Cerbos is the open core, language-agnostic, scalable authorization solution that makes user permissions and authorization simple to implement and manage by writing context-aware access control policies for your application resources.

 
 

👉 Spread the word and help developers find and follow your Open Source project by promoting it on FAUN. Get in touch for more information.

 
🤔 Did you know?
 
 
Did you know that PodDisruptionBudgets (PDBs) in Kubernetes only block the Eviction API (policy/v1), and do not prevent Pods from being deleted via a direct DELETE call? That means if a controller or operator issues kubectl delete pod … (or deletes via its REST API), the PDB is bypassed and the Pod simply terminates (respecting its terminationGracePeriodSeconds) even if the PDB would have prevented an eviction. To ensure PDBs are honored during rollouts, drains, or other disruptions, tools must use the Eviction API, and clusters should audit for direct deletions in namespaces that use PDBs.
 
 
😂 Meme of the week
 
 
 
 
🤖 Once, SenseiOne Said
 
 

"Kubernetes promises portability; CRDs, storage classes, and cloud IAM quietly take it back. We abstracted servers, then re-coupled to controllers, backoffs, and tail latency."
— SenseiOne

 

(*) SenseiOne is FAUN.dev’s work-in-progress AI agent

 
👤 This Week's Human
 
 

This Week’s Human is Gursimar Singh, a Google Developers Educator, Author @ freeCodeCamp, and DevOps & Cloud consultant who makes complex systems teachable. They’ve spoken at HAProxyConf 2022, multiple KCDs, and DevOpsDays Warsaw; reviewed programs for OpenTofu Day and PyCon India; and mentored at IIT Madras while volunteering with EuroPython. Offstage, they’ve published 70+ articles reaching 100k+ readers and contributed 5 project write-ups to the Google Dev Library, covering tools from Kubernetes to Terraform.

 

💡 Engage with FAUN.dev on LinkedIn — like, comment on, or share any of our posts on LinkedIn — you might be our next “This Week’s Human”!

 
❤️ Thanks for reading
 
 
👋 Keep in touch and follow us on social media:
- 💼LinkedIn
- 📝Medium
- 🐦Twitter
- 👥Facebook
- 📰Reddit
- 📸Instagram

👌 Was this newsletter helpful?
We'd really appreciate it if you could forward it to your friends!

🙏 Never miss an issue!
To receive our future emails in your inbox, don't forget to add community@faun.dev to your contacts.

🤩 Want to sponsor our newsletter?
Reach out to us at sponsors@faun.dev and we'll get back to you as soon as possible.
 

Kaptain #494: Scaling Prometheus: Managing 80M Metrics Smoothly
Legend: ✅ = Editor's Choice / ♻️ = Old but Gold / ⭐ = Promoted / 🔰 = Beginner Friendly

You received this email because you are subscribed to FAUN.dev.
We (🐾) help developers (👣) learn and grow by keeping them up with what matters.

You can manage your subscription options here (recommended) or use the old way here (legacy). If you have any problem, read this or reply to this email.