FAUN.dev's DevOps Weekly Newsletter

🔗 View in your browser. | ✍️ Publish on FAUN.dev | 🦄 Become a sponsor

DevOpsLinks

#DevOps #SRE #PlatformEngineering

🔍 Inside this Issue

Speed is up, patience is down: ARM keeps winning, Graviton5 flexes, and one team drops serverless to cut latency by 6x—while Slack wraps its AI in a paranoid perimeter and Datadog turns failure into design. From declarative tests to credential heists and the awkward role of attrition in RCAs, follow the links for the how, the why, and the trade-offs.

🛡️ Advancing Our Chef Infrastructure: Safety Without Disruption
🚀 AWS Unveils Graviton5: A 192-Core Leap in Cloud Performance and Efficiency
🆚 Comparing AWS Lambda Arm64 vs x86_64 Performance Across Multiple Runtimes in Late 2025
🧩 Declarative Action Architecture
🐳 Docker Desktop 4.50 Supercharges Daily Development With AI, Security, and Faster Workflows
🛟 Failure is inevitable: Learning from a large outage, and building for reliability in depth at
🔑 NordPass: Worst Passwords of 2025 and How Each Generation Compares
🎣 The story of how we almost got hacked
🚪 Why we're leaving serverless
🧠 You’ll never see attrition referenced in an RCA

You’ve got fresh leverage, go apply it.

Until next time!
FAUN.dev() Team

⭐ Patrons

faun.dev

End-to-End Kubernetes with Rancher, RKE2, K3s, Fleet, Longhorn, and NeuVector

Hey there,

We’re extending our 25% discount for End-to-End Kubernetes with Rancher, RKE2, K3s, Fleet, Longhorn, and NeuVector and all our courses on FAUN.Sensei(). The previous activation window was short, so the offer has been reactivated to give everyone more time.

End-to-End Kubernetes with Rancher, RKE2, K3s, Fleet, Longhorn, and NeuVector is a practical guide to building and operating real Kubernetes environments with the Rancher and SUSE ecosystem. It covers deployments, scaling, GitOps, storage, security, and disaster recovery using production-ready patterns.

Learn how to deploy, manage, secure, and scale real-world clusters:

RKE2 and K3s from edge to enterprise
Rancher architectures that actually scale
GitOps with Fleet
Storage and disaster recovery with Longhorn
Runtime security and compliance with NeuVector
And more!

Built by FAUN.dev() for engineers who operate Kubernetes in the real world.

⏳ Use the coupon SENSEI2525 before it expires on December 31, after that, full price!

👉 Spread the word and help developers find you by promoting your projects on FAUN. Get in touch for more information.

ℹ️ News, Updates & Announcements

faun.dev

Docker Desktop 4.50 Supercharges Daily Development With AI, Security, and Faster Workflows

Docker Desktop 4.50 just dropped. It's packing a free Docker Debug, deeper VS Code hooks, and a new Model Context Protocol (MCP) to help AI tools find their context without losing their minds.

There’s beefed-up WSL2 support, stricter local port rules, and Compose to Kubernetes to ease that leap from dev to prod.

faun.dev

NordPass: Worst Passwords of 2025 and How Each Generation Compares

NordPass’s 2023 report confirms what we all feared: people still rely on the digital equivalent of leaving the front door wide open. Think "123456", "password", even "admin", still topping the charts, across every age group.

The data comes straight from breach dumps and dark web leaks. So yeah, it’s real. And it’s clear, stronger password habits just aren’t sticking, no matter how many alarms go off.

The big picture: This kind of password reuse isn’t just lazy. It’s a ceiling. And it’s cracking. The future’s screaming for passwordless auth, because users aren’t built to remember 30 unique 16-character secrets.

faun.dev

AWS Unveils Graviton5: A 192-Core Leap in Cloud Performance and Efficiency

AWS just dropped the Graviton5-powered EC2 M9g instances, and they’re stacked. Expect 192 cores, 25% higher performance, and a cache that’s 5x the size of last gen. Built on a sleek 3nm architecture, they shrink inter-core latency by 33% and boost bandwidth across the board.

Under the hood, the Nitro System handles virtualization with tight security and low overhead.

👉 Enjoyed this?Read more news on FAUN.dev/news

⭐ Sponsors

bytevibe.co

The Kubectl Heavy Blend Hoodie

The Kubectl Heavy Blend Hoodie is back with a year-end discount. Soft, warm, and built for everyday wear, it features a classic fit, a plush cotton-poly blend, and a clean kubectl design that hits the right note for developers.

🎁 Use SUBSCR1B3R for a limited 25% discount
ℹ️ The coupon applies to all other products as well.
⏳Offer ends December 31

👉 Spread the word and help developers find you by promoting your projects on FAUN. Get in touch for more information.

🔗 Stories, Tutorials & Articles

datadoghq.com

Failure is inevitable: Learning from a large outage, and building for reliability in depth at

Datadog ditched its “never fail” mindset after a March 2023 meltdown knocked out half its Kubernetes nodes and took major user features down with them. The fix? A full-stack rethink built around graceful degradation.

The team added disk-based persistence at intake, live-data prioritization, QoS-aware retry logic, and localized failover for control plane calls. In other words: no more all-or-nothing. If it breaks, it bends instead.

invictus-ir.com

The story of how we almost got hacked

Team Invictus caught a BEC attempt using WeTransfer to slip in a fake Microsoft 365 login page powered by EvilProxy. Classic Adversary-in-the-Middle move, but dressed up with a slick delivery package.

Digging deeper, the team mapped the attacker’s setup and found something bigger: a credential grab campaign they’re calling VendorVandals. Think phishing lures disguised as procurement emails, blasted out from hijacked inboxes. Fully scripted and built to scale.

surfingcomplexity.blog

You’ll never see attrition referenced in an RCA ✅

Lorin Hochstein argues that while high-profile engineer attrition is often speculated to contribute to major outages, it is universally absent from public Root Cause Analyses (RCAs). This exclusion occurs because public RCAs aim to reassure customers by focusing on technical fixes, whereas attrition is a complex, business-related organizational issue.

Internally, attrition may be discussed as a risk factor, but it is rarely documented as a direct cause, as traditional RCA methods fail to account for systemic, risk-increasing contributors. Ultimately, organizational factors like attrition play a role in every major incident, but remain unstated due to the narrow focus of formal incident reviews.

chrisebert.net

Comparing AWS Lambda Arm64 vs x86_64 Performance Across Multiple Runtimes in Late 2025

A new open-source benchmark looked at 183,000 AWS Lambda invocations, and arm64 beats x86_64 across the board in both cost and speed.

Rust on arm64 with SHA-256 tuned in assembly? It clocks in 4–5× faster than x86 in CPU-heavy tasks. Cold starts are snappy too—5–8× quicker than Node.js and Python.

medium.com

Declarative Action Architecture

The Declarative Action Architecture (DAA) is a scalable E2E testing pattern that separates concerns across three distinct layers. The Test Layer is 100% declarative, stating what is being tested without any procedural logic, making tests read like documentation. The core Action Layer implements the execution logic by translating the declarative steps, with a mandatory rule of self-verification (an assertion is built into every action) and composing smaller, reusable actions . Finally, the Physical Layer acts as a "dumb" driver, handling pure execution and system interaction (like API calls or WebDriver commands) without any business logic or assertions.

slack.engineering

Advancing Our Chef Infrastructure: Safety Without Disruption

Slack pulled back the curtain on Slack AI, its LLM-powered assistant built with a fortress mindset. Every customer gets their own isolated environment. Any data passed to vendor LLMs? It's ephemeral. Gone before it can stick.

No fine-tuning. No exporting data outside Slack. And there’s a whole middle-layer filter/audit setup watching every prompt like a hawk.

Why it matters: It’s a blueprint for threading LLMs into enterprise SaaS without handing the keys to your data.

unkey.com

Why we're leaving serverless

Unkey slashed their latency by 6x, moving from Cloudflare Workers to stateful Go servers simplified architecture, enabling self-hosting and platform independence. Serverless limitations forced elaborate caching workarounds and data pipeline nightmares, leading to a new, high-speed solution.

👉 Got something to share? Create your FAUN Page and start publishing your blog posts, tools, and updates. Grow your audience, and get discovered by the developer community.

💬 Discussions, Q&A & Forums

news.ycombinator.com

A logging loop in GKE cost me $1,300 in 3 days – 9.2x my actual infrastructure

GKE’s default log sink grabs every chunk of stdout and stderr from containers, no throttle, no cap. Those logs fly straight into Cloud Logging.

One user fired off logs at ~2,000/sec. That triggered a 10x billing spike. Google said nope to the refund. Turns out, new policy means accidental log floods are now your problem.

reddit.com

Our observability costs are now higher than our AWS bill ✅

This discussion highlights a critical, often-unspoken challenge for scaling engineering teams: Observability costs can eclipse core infrastructure expenses.

A company with a significant and highly distributed architecture discovered their monthly observability spend ($97k) far outweighed their total AWS infrastructure bill ($52k).

AWS Infrastructure: ~$52k
Datadog (Metrics & APM): ~$47k
Splunk (Logs): ~$38k
Sentry (Error Tracking): ~$12k

Total Observability Spend: ~$97kThe cost-to-infrastructure ratio led to understandable concern from leadership: they are spending nearly double to monitor systems than to run them.

⚙️ Tools, Apps & Software

github.com

suzuki-shunsuke/pinact

pinact is a CLI to edit Workflow and Composite action files and pin versions of Actions and Reusable Workflows. pinact can also update their versions and verify version annotations.

github.com

corazawaf/coraza

OWASP Coraza WAF is a golang modsecurity compatible web application firewall library

github.com

winapps-org/winapps

Run Windows apps such as Microsoft Office/Adobe in Linux (Ubuntu/Fedora) and GNOME/KDE as if they were a part of the native OS, including Nautilus integration.

github.com

snyk-labs/log-sniffer

Snyk Audit Log Dashboard

👉 Spread the word and help developers find and follow your Open Source project by promoting it on FAUN. Get in touch for more information.

🤔 Did you know?

Did you know that Kubernetes admission webhooks run directly on the API server’s write path? If a mutating or validating webhook is slow or unreachable and uses failurePolicy=Fail, every create or update request can block until timeoutSeconds is hit (up to a hard cap of 30 seconds), stalling deploys, node joins, and CRD changes. That’s why operators harden webhooks by keeping timeouts low, declaring sideEffects=None, running backends as system-critical pods with PDBs, and using failurePolicy=Ignore for non-critical resources.

🤖 Once, SenseiOne Said

"Autoscaling converts capacity problems into cost problems; SLOs convert outages into policy; DevOps converts handoffs into ownership. If you don't realign incentives, you've just automated the blame."
— SenseiOne

(*) SenseiOne is FAUN.dev’s work-in-progress AI agent

⚡Growth Notes

Build a personal observability stack for your work just like you would for a critical service: for every incident, design decision, or deployment you touch, log a 3-line postmortem in a persistent doc with context, action taken, and lesson learned.

Over weeks this becomes a high-signal timeline of your impact, bottlenecks, and repeated failure modes, which you can then refactor just like noisy alerts or flaky runbooks. Once a month, do a production review of yourself: spot patterns (e.g. always blocked on one domain, always firefighting the same class of issues) and define a single experiment to reduce that toil. Pair this with one concrete habit: for every painful incident you work, contribute one small but real reliability improvement (better alert, dashboard, playbook, or config guardrail) and log it back into your doc.

Over time you get a compounding loop: incidents fuel learning, learning drives tiny reliability upgrades, and the history proves your trajectory far better than any resume bullet.

Each week, we share a practical move to grow faster and work smarter

👤 This Week's Human

This Week’s Human is Shannon Atkinson, a DevOps & Automation specialist with 15+ years building Kubernetes and CI/CD systems across AWS, Azure, and GCP, and a Certified Jenkins Engineer and patent holder. At Realtor.com, Shannon migrated mobile CI/CD from Bitrise to CircleCI, boosting delivery by 20%; at Salesforce, built a B2B2C platform serving 100M+ users; at Zapproved, developed automation that scaled systems 40% and cut manual work from hours to minutes.

💡 Engage with FAUN.dev on LinkedIn — like, comment on, or share any of our posts on LinkedIn — you might be our next “This Week’s Human”!

😂 Meme of the week

👉 Never miss an issue
Join FAUN.dev and subscribe to our newsletter here.

👋 Keep in touch and follow us on social media:
- 💼LinkedIn
- 📝Medium
- 🐦Twitter
- 👥Facebook
- 📰Reddit
- 📸Instagram

👌 Was this newsletter helpful?
We'd really appreciate it if you could share it with your friends! You can also donate to help us keep this newsletter going.

ℹ️ Have a question or feedback?
Feel free to reach out to us at community@faun.dev. We'd love to hear from you!

🤩 Want to sponsor our newsletter?
Reach out to us at sponsors@faun.dev and we'll get back to you as soon as possible.