🔍 Inside this Issue
Speed is up, patience is down: ARM keeps winning, Graviton5 flexes, and one team drops serverless to cut latency by 6x—while Slack wraps its AI in a paranoid perimeter and Datadog turns failure into design. From declarative tests to credential heists and the awkward role of attrition in RCAs, follow the links for the how, the why, and the trade-offs.
🛡️ Advancing Our Chef Infrastructure: Safety Without Disruption
🚀 AWS Unveils Graviton5: A 192-Core Leap in Cloud Performance and Efficiency
🆚 Comparing AWS Lambda Arm64 vs x86_64 Performance Across Multiple Runtimes in Late 2025
🧩 Declarative Action Architecture
🐳 Docker Desktop 4.50 Supercharges Daily Development With AI, Security, and Faster Workflows
🛟 Failure is inevitable: Learning from a large outage, and building for reliability in depth at
🔑 NordPass: Worst Passwords of 2025 and How Each Generation Compares
🎣 The story of how we almost got hacked
🚪 Why we're leaving serverless
🧠 You’ll never see attrition referenced in an RCA
You’ve got fresh leverage, go apply it.
Until next time!
FAUN.dev() Team
ℹ️ News, Updates & Announcements

faun.dev
Docker Desktop 4.50 just dropped. It's packing a free Docker Debug, deeper VS Code hooks, and a new Model Context Protocol (MCP) to help AI tools find their context without losing their minds.
There’s beefed-up WSL2 support, stricter local port rules, and Compose to Kubernetes to ease that leap from dev to prod.

faun.dev
NordPass’s 2023 report confirms what we all feared: people still rely on the digital equivalent of leaving the front door wide open. Think "123456", "password", even "admin", still topping the charts, across every age group.
The data comes straight from breach dumps and dark web leaks. So yeah, it’s real. And it’s clear, stronger password habits just aren’t sticking, no matter how many alarms go off.
The big picture: This kind of password reuse isn’t just lazy. It’s a ceiling. And it’s cracking. The future’s screaming for passwordless auth, because users aren’t built to remember 30 unique 16-character secrets.

faun.dev
AWS just dropped the Graviton5-powered EC2 M9g instances, and they’re stacked. Expect 192 cores, 25% higher performance, and a cache that’s 5x the size of last gen. Built on a sleek 3nm architecture, they shrink inter-core latency by 33% and boost bandwidth across the board.
Under the hood, the Nitro System handles virtualization with tight security and low overhead.
🔗 Stories, Tutorials & Articles

datadoghq.com
Datadog ditched its “never fail” mindset after a March 2023 meltdown knocked out half its Kubernetes nodes and took major user features down with them. The fix? A full-stack rethink built around graceful degradation.
The team added disk-based persistence at intake, live-data prioritization, QoS-aware retry logic, and localized failover for control plane calls. In other words: no more all-or-nothing. If it breaks, it bends instead.

invictus-ir.com
Team Invictus caught a BEC attempt using WeTransfer to slip in a fake Microsoft 365 login page powered by EvilProxy. Classic Adversary-in-the-Middle move, but dressed up with a slick delivery package.
Digging deeper, the team mapped the attacker’s setup and found something bigger: a credential grab campaign they’re calling VendorVandals. Think phishing lures disguised as procurement emails, blasted out from hijacked inboxes. Fully scripted and built to scale.

surfingcomplexity.blog
Lorin Hochstein argues that while high-profile engineer attrition is often speculated to contribute to major outages, it is universally absent from public Root Cause Analyses (RCAs). This exclusion occurs because public RCAs aim to reassure customers by focusing on technical fixes, whereas attrition is a complex, business-related organizational issue.
Internally, attrition may be discussed as a risk factor, but it is rarely documented as a direct cause, as traditional RCA methods fail to account for systemic, risk-increasing contributors. Ultimately, organizational factors like attrition play a role in every major incident, but remain unstated due to the narrow focus of formal incident reviews.

chrisebert.net
A new open-source benchmark looked at 183,000 AWS Lambda invocations, and arm64 beats x86_64 across the board in both cost and speed.
Rust on arm64 with SHA-256 tuned in assembly? It clocks in 4–5× faster than x86 in CPU-heavy tasks. Cold starts are snappy too—5–8× quicker than Node.js and Python.

medium.com
The Declarative Action Architecture (DAA) is a scalable E2E testing pattern that separates concerns across three distinct layers. The Test Layer is 100% declarative, stating what is being tested without any procedural logic, making tests read like documentation. The core Action Layer implements the execution logic by translating the declarative steps, with a mandatory rule of self-verification (an assertion is built into every action) and composing smaller, reusable actions . Finally, the Physical Layer acts as a "dumb" driver, handling pure execution and system interaction (like API calls or WebDriver commands) without any business logic or assertions.

slack.engineering
Slack pulled back the curtain on Slack AI, its LLM-powered assistant built with a fortress mindset. Every customer gets their own isolated environment. Any data passed to vendor LLMs? It's ephemeral. Gone before it can stick.
No fine-tuning. No exporting data outside Slack. And there’s a whole middle-layer filter/audit setup watching every prompt like a hawk.
Why it matters: It’s a blueprint for threading LLMs into enterprise SaaS without handing the keys to your data.

unkey.com
Unkey slashed their latency by 6x, moving from Cloudflare Workers to stateful Go servers simplified architecture, enabling self-hosting and platform independence. Serverless limitations forced elaborate caching workarounds and data pipeline nightmares, leading to a new, high-speed solution.
🤔 Did you know?
Did you know that Kubernetes admission webhooks run directly on the API server’s write path? If a mutating or validating webhook is slow or unreachable and uses failurePolicy=Fail, every create or update request can block until timeoutSeconds is hit (up to a hard cap of 30 seconds), stalling deploys, node joins, and CRD changes. That’s why operators harden webhooks by keeping timeouts low, declaring sideEffects=None, running backends as system-critical pods with PDBs, and using failurePolicy=Ignore for non-critical resources.
⚡Growth Notes
Build a personal observability stack for your work just like you would for a critical service: for every incident, design decision, or deployment you touch, log a 3-line postmortem in a persistent doc with context, action taken, and lesson learned.
Over weeks this becomes a high-signal timeline of your impact, bottlenecks, and repeated failure modes, which you can then refactor just like noisy alerts or flaky runbooks. Once a month, do a production review of yourself: spot patterns (e.g. always blocked on one domain, always firefighting the same class of issues) and define a single experiment to reduce that toil. Pair this with one concrete habit: for every painful incident you work, contribute one small but real reliability improvement (better alert, dashboard, playbook, or config guardrail) and log it back into your doc.
Over time you get a compounding loop: incidents fuel learning, learning drives tiny reliability upgrades, and the history proves your trajectory far better than any resume bullet.