Allow loading remote contents and showing images to get the best out of this email.FAUN.dev's DevOps Weekly Newsletter
 
🔗 View in your browser.   |  ✍️ Publish on FAUN.dev   |  🦄 Become a sponsor
 
Allow loading remote contents and showing images to get the best out of this email.
DevOpsLinks
 
#DevOps #SRE #PlatformEngineering
 
 
🔍 Inside this Issue
 
 
Speed is up, patience is down: ARM keeps winning, Graviton5 flexes, and one team drops serverless to cut latency by 6x—while Slack wraps its AI in a paranoid perimeter and Datadog turns failure into design. From declarative tests to credential heists and the awkward role of attrition in RCAs, follow the links for the how, the why, and the trade-offs.

🛡️ Advancing Our Chef Infrastructure: Safety Without Disruption
🚀 AWS Unveils Graviton5: A 192-Core Leap in Cloud Performance and Efficiency
🆚 Comparing AWS Lambda Arm64 vs x86_64 Performance Across Multiple Runtimes in Late 2025
🧩 Declarative Action Architecture
🐳 Docker Desktop 4.50 Supercharges Daily Development With AI, Security, and Faster Workflows
🛟 Failure is inevitable: Learning from a large outage, and building for reliability in depth at
🔑 NordPass: Worst Passwords of 2025 and How Each Generation Compares
🎣 The story of how we almost got hacked
🚪 Why we're leaving serverless
🧠 You’ll never see attrition referenced in an RCA

You’ve got fresh leverage, go apply it.

Until next time!
FAUN.dev() Team
 
 
⭐ Patrons
 
faun.dev faun.dev
 
End-to-End Kubernetes with Rancher, RKE2, K3s, Fleet, Longhorn, and NeuVector
 
 
Hey there,

We’re extending our 25% discount for End-to-End Kubernetes with Rancher, RKE2, K3s, Fleet, Longhorn, and NeuVector and all our courses on FAUN.Sensei(). The previous activation window was short, so the offer has been reactivated to give everyone more time.

End-to-End Kubernetes with Rancher, RKE2, K3s, Fleet, Longhorn, and NeuVector is a practical guide to building and operating real Kubernetes environments with the Rancher and SUSE ecosystem. It covers deployments, scaling, GitOps, storage, security, and disaster recovery using production-ready patterns.

Learn how to deploy, manage, secure, and scale real-world clusters:
  • RKE2 and K3s from edge to enterprise
  • Rancher architectures that actually scale
  • GitOps with Fleet
  • Storage and disaster recovery with Longhorn
  • Runtime security and compliance with NeuVector
  • And more!
Built by FAUN.dev() for engineers who operate Kubernetes in the real world.

⏳ Use the coupon SENSEI2525 before it expires on December 31, after that, full price!
 
 
👉 Spread the word and help developers find you by promoting your projects on FAUN. Get in touch for more information.
 
ℹ️ News, Updates & Announcements
 
faun.dev faun.dev
 
Docker Desktop 4.50 Supercharges Daily Development With AI, Security, and Faster Workflows
 
 
Docker Desktop 4.50 just dropped. It's packing a free Docker Debug, deeper VS Code hooks, and a new Model Context Protocol (MCP) to help AI tools find their context without losing their minds.

There’s beefed-up WSL2 support, stricter local port rules, and Compose to Kubernetes to ease that leap from dev to prod.
 
 
faun.dev faun.dev
 
NordPass: Worst Passwords of 2025 and How Each Generation Compares
 
 
NordPass’s 2023 report confirms what we all feared: people still rely on the digital equivalent of leaving the front door wide open. Think "123456", "password", even "admin", still topping the charts, across every age group.

The data comes straight from breach dumps and dark web leaks. So yeah, it’s real. And it’s clear, stronger password habits just aren’t sticking, no matter how many alarms go off.

The big picture: This kind of password reuse isn’t just lazy. It’s a ceiling. And it’s cracking. The future’s screaming for passwordless auth, because users aren’t built to remember 30 unique 16-character secrets.
 
 
faun.dev faun.dev
 
AWS Unveils Graviton5: A 192-Core Leap in Cloud Performance and Efficiency
 
 
AWS just dropped the Graviton5-powered EC2 M9g instances, and they’re stacked. Expect 192 cores, 25% higher performance, and a cache that’s 5x the size of last gen. Built on a sleek 3nm architecture, they shrink inter-core latency by 33% and boost bandwidth across the board.

Under the hood, the Nitro System handles virtualization with tight security and low overhead.
 
 
👉 Enjoyed this?Read more news on FAUN.dev/news
 
⭐ Sponsors
 
bytevibe.co bytevibe.co
 
The Kubectl Heavy Blend Hoodie
 
 
The Kubectl Heavy Blend Hoodie is back with a year-end discount. Soft, warm, and built for everyday wear, it features a classic fit, a plush cotton-poly blend, and a clean kubectl design that hits the right note for developers.

🎁 Use SUBSCR1B3R for a limited 25% discount
ℹ️ The coupon applies to all other products as well.
⏳Offer ends December 31
 
 
👉 Spread the word and help developers find you by promoting your projects on FAUN. Get in touch for more information.
 
🔗 Stories, Tutorials & Articles
 
datadoghq.com datadoghq.com
 
Failure is inevitable: Learning from a large outage, and building for reliability in depth at
 
 
Datadog ditched its “never fail” mindset after a March 2023 meltdown knocked out half its Kubernetes nodes and took major user features down with them. The fix? A full-stack rethink built around graceful degradation.

The team added disk-based persistence at intake, live-data prioritization, QoS-aware retry logic, and localized failover for control plane calls. In other words: no more all-or-nothing. If it breaks, it bends instead.
 
 
invictus-ir.com invictus-ir.com
 
The story of how we almost got hacked
 
 
Team Invictus caught a BEC attempt using WeTransfer to slip in a fake Microsoft 365 login page powered by EvilProxy. Classic Adversary-in-the-Middle move, but dressed up with a slick delivery package.

Digging deeper, the team mapped the attacker’s setup and found something bigger: a credential grab campaign they’re calling VendorVandals. Think phishing lures disguised as procurement emails, blasted out from hijacked inboxes. Fully scripted and built to scale.
 
 
surfingcomplexity.blog surfingcomplexity.blog
 
You’ll never see attrition referenced in an RCA   ✅
 
 
Lorin Hochstein argues that while high-profile engineer attrition is often speculated to contribute to major outages, it is universally absent from public Root Cause Analyses (RCAs). This exclusion occurs because public RCAs aim to reassure customers by focusing on technical fixes, whereas attrition is a complex, business-related organizational issue.

Internally, attrition may be discussed as a risk factor, but it is rarely documented as a direct cause, as traditional RCA methods fail to account for systemic, risk-increasing contributors. Ultimately, organizational factors like attrition play a role in every major incident, but remain unstated due to the narrow focus of formal incident reviews.
 
 
chrisebert.net chrisebert.net
 
Comparing AWS Lambda Arm64 vs x86_64 Performance Across Multiple Runtimes in Late 2025
 
 
A new open-source benchmark looked at 183,000 AWS Lambda invocations, and arm64 beats x86_64 across the board in both cost and speed.

Rust on arm64 with SHA-256 tuned in assembly? It clocks in 4–5× faster than x86 in CPU-heavy tasks. Cold starts are snappy too—5–8× quicker than Node.js and Python.
 
 
medium.com medium.com
 
Declarative Action Architecture
 
 
The Declarative Action Architecture (DAA) is a scalable E2E testing pattern that separates concerns across three distinct layers. The Test Layer is 100% declarative, stating what is being tested without any procedural logic, making tests read like documentation. The core Action Layer implements the execution logic by translating the declarative steps, with a mandatory rule of self-verification (an assertion is built into every action) and composing smaller, reusable actions . Finally, the Physical Layer acts as a "dumb" driver, handling pure execution and system interaction (like API calls or WebDriver commands) without any business logic or assertions.
 
 
slack.engineering slack.engineering
 
Advancing Our Chef Infrastructure: Safety Without Disruption
 
 
Slack pulled back the curtain on Slack AI, its LLM-powered assistant built with a fortress mindset. Every customer gets their own isolated environment. Any data passed to vendor LLMs? It's ephemeral. Gone before it can stick.

No fine-tuning. No exporting data outside Slack. And there’s a whole middle-layer filter/audit setup watching every prompt like a hawk.

Why it matters: It’s a blueprint for threading LLMs into enterprise SaaS without handing the keys to your data.
 
 
unkey.com unkey.com
 
Why we're leaving serverless
 
 
Unkey slashed their latency by 6x, moving from Cloudflare Workers to stateful Go servers simplified architecture, enabling self-hosting and platform independence. Serverless limitations forced elaborate caching workarounds and data pipeline nightmares, leading to a new, high-speed solution.
 
 

👉 Got something to share? Create your FAUN Page and start publishing your blog posts, tools, and updates. Grow your audience, and get discovered by the developer community.

 
💬 Discussions, Q&A & Forums
 
news.ycombinator.com news.ycombinator.com
 
A logging loop in GKE cost me $1,300 in 3 days – 9.2x my actual infrastructure
 
 
GKE’s default log sink grabs every chunk of stdout and stderr from containers, no throttle, no cap. Those logs fly straight into Cloud Logging.

One user fired off logs at ~2,000/sec. That triggered a 10x billing spike. Google said nope to the refund. Turns out, new policy means accidental log floods are now your problem.
 
 
reddit.com reddit.com
 
Our observability costs are now higher than our AWS bill   ✅
 
 
This discussion highlights a critical, often-unspoken challenge for scaling engineering teams: Observability costs can eclipse core infrastructure expenses.

A company with a significant and highly distributed architecture discovered their monthly observability spend ($97k) far outweighed their total AWS infrastructure bill ($52k).
  • AWS Infrastructure: ~$52k
  • Datadog (Metrics & APM): ~$47k
  • Splunk (Logs): ~$38k
  • Sentry (Error Tracking): ~$12k
Total Observability Spend: ~$97kThe cost-to-infrastructure ratio led to understandable concern from leadership: they are spending nearly double to monitor systems than to run them.
 
 
 
⚙️ Tools, Apps & Software
 
github.com github.com
 
suzuki-shunsuke/pinact
 
 
pinact is a CLI to edit Workflow and Composite action files and pin versions of Actions and Reusable Workflows. pinact can also update their versions and verify version annotations.
 
 
github.com github.com
 
corazawaf/coraza
 
 
OWASP Coraza WAF is a golang modsecurity compatible web application firewall library
 
 
github.com github.com
 
winapps-org/winapps
 
 
Run Windows apps such as Microsoft Office/Adobe in Linux (Ubuntu/Fedora) and GNOME/KDE as if they were a part of the native OS, including Nautilus integration.
 
 
github.com github.com
 
snyk-labs/log-sniffer
 
 
Snyk Audit Log Dashboard
 
 

👉 Spread the word and help developers find and follow your Open Source project by promoting it on FAUN. Get in touch for more information.

 
🤔 Did you know?
 
 
Did you know that Kubernetes admission webhooks run directly on the API server’s write path? If a mutating or validating webhook is slow or unreachable and uses failurePolicy=Fail, every create or update request can block until timeoutSeconds is hit (up to a hard cap of 30 seconds), stalling deploys, node joins, and CRD changes. That’s why operators harden webhooks by keeping timeouts low, declaring sideEffects=None, running backends as system-critical pods with PDBs, and using failurePolicy=Ignore for non-critical resources.
 
 
🤖 Once, SenseiOne Said
 
 
"Autoscaling converts capacity problems into cost problems; SLOs convert outages into policy; DevOps converts handoffs into ownership. If you don't realign incentives, you've just automated the blame."
— SenseiOne
 

(*) SenseiOne is FAUN.dev’s work-in-progress AI agent

 
⚡Growth Notes
 
 
Build a personal observability stack for your work just like you would for a critical service: for every incident, design decision, or deployment you touch, log a 3-line postmortem in a persistent doc with context, action taken, and lesson learned.

Over weeks this becomes a high-signal timeline of your impact, bottlenecks, and repeated failure modes, which you can then refactor just like noisy alerts or flaky runbooks. Once a month, do a production review of yourself: spot patterns (e.g. always blocked on one domain, always firefighting the same class of issues) and define a single experiment to reduce that toil. Pair this with one concrete habit: for every painful incident you work, contribute one small but real reliability improvement (better alert, dashboard, playbook, or config guardrail) and log it back into your doc.

Over time you get a compounding loop: incidents fuel learning, learning drives tiny reliability upgrades, and the history proves your trajectory far better than any resume bullet.
 
Each week, we share a practical move to grow faster and work smarter
 
👤 This Week's Human
 
 
This Week’s Human is Shannon Atkinson, a DevOps & Automation specialist with 15+ years building Kubernetes and CI/CD systems across AWS, Azure, and GCP, and a Certified Jenkins Engineer and patent holder. At Realtor.com, Shannon migrated mobile CI/CD from Bitrise to CircleCI, boosting delivery by 20%; at Salesforce, built a B2B2C platform serving 100M+ users; at Zapproved, developed automation that scaled systems 40% and cut manual work from hours to minutes.
 
💡 Engage with FAUN.dev on LinkedIn — like, comment on, or share any of our posts on LinkedIn — you might be our next “This Week’s Human”!
 
😂 Meme of the week
 
 
 
 
❤️ Thanks for reading
 
 
👋 Keep in touch and follow us on social media:
- 💼LinkedIn
- 📝Medium
- 🐦Twitter
- 👥Facebook
- 📰Reddit
- 📸Instagram

👌 Was this newsletter helpful?
We'd really appreciate it if you could forward it to your friends!

🙏 Never miss an issue!
To receive our future emails in your inbox, don't forget to add community@faun.dev to your contacts.

🤩 Want to sponsor our newsletter?
Reach out to us at sponsors@faun.dev and we'll get back to you as soon as possible.
 

DevOpsLinks #507: "A Logging Loop in GKE Cost Me $1,300 in 3 Days"
Legend: ✅ = Editor's Choice / ♻️ = Old but Gold / ⭐ = Promoted / 🔰 = Beginner Friendly

You received this email because you are subscribed to FAUN.dev.
We (🐾) help developers (👣) learn and grow by keeping them up with what matters.

You can manage your subscription options here (recommended) or use the old way here (legacy). If you have any problem, read this or reply to this email.