|
🔗 Stories, Tutorials & Articles |
|
|
|
Amazon Prime Video’s Microservices Move Doesn’t Lead to a Monolith after All |
|
|
The streaming service provider made waves when its engineers reported they had refactored their QoS monitor for a monolithic architecture. Microservices experts evaluating the details discovered they actually did just the opposite. |
|
|
|
|
|
|
How to add, use, and update `.terraform.lock.hcl` without pain |
|
|
Starting from Terraform 1.4.0, the lockfile is always checked before using the cache directory, meaning that Terraform will perform a full initialization even if the cache is present. To solve this, one can either continue using Terraform 1.3.x as the new 0.11 version, set TF_PLUGIN_CACHE_MAY_BREAK_DEPENDENCY_LOCK_FILE=true, or start using the lockfile and move forward. Lockfiles are recommended for repeatability and security reasons, ensuring consistent provider versions and protecting against supply chain attacks. The process of adding lockfiles involves validating the Terraform configuration, generating lockfiles using pre-commit hooks, and automating their updates in the CI workflow. Renovate and dependabot are alternative options, but they are not utilized in the discussed approach. |
|
|
|
|
|
|
How we reduced our Prometheus infrastructure footprint by a third |
|
|
Prometheus collects metrics by scraping services, and to handle large data loads, the load can be distributed across multiple instances through sharding. However, evaluating recording rules can be challenging when metrics are distributed across multiple instances, leading to partial rules. Criteo addressed this by filtering metrics during the scrape process, resulting in improved efficiency, reduced resource usage, and significant savings in memory, CPU, and network traffic. |
|
|
|
|
|
|
Impact of Observability Practices on Engineering Productivity |
|
|
Observability practices and tools enhance engineering productivity by streamlining troubleshooting, facilitating proactive error detection, enhancing system comprehension, boosting collaboration, promoting continuous improvement, and lowering stress levels. These tools enable engineers to quickly identify and resolve issues, make informed decisions, and improve system performance, ultimately saving time and effort. By providing a shared understanding and promoting a growth mindset, observability practices foster a more synchronized team effort and contribute to a healthier work environment. |
|
|
|
|
|
|
How I went from Operations Manager to Site Reliability Engineer In 6 Months! |
|
|
The transition from an operation manager to a site reliability engineer in 6 months is possible with the right guidance and experience. By applying to management positions at other companies and continuously learning, it is achievable to become a manager and lead a team to solve customer problems. |
|
|
|
|
|
|
|
Leveraging AWS SSO (aka Identity Center) with Google Workspaces |
|
|
Companies using Google Workspaces can leverage their Google accounts to access AWS via AWS SSO, an excellent service for enforcing identity best practices. The process of setting up Google as the authentication mechanism for AWS SSO is not clearly documented, but this post provides guidance on the configuration steps. |
|
|
|
|
|
|
How Platform Engineering Works |
|
|
Platform Engineering at Sotheby's focuses on velocity and stability, applying a product mindset to support software delivery and system consistency. The team sets goals based on outcomes, not just outputs, using processes like OKRs, quarterly planning, and story kickoffs. They also prioritize understanding the needs of engineers through direct support, consulting engagements, and surveys to bridge the gap and improve collaboration. |
|
|
|
|
|
|
Don’t Let Observability Inflate Your Cloud Costs |
|
|
A shift occurred in the technology sector this year, as companies focused on the cost perspective of sustainability in the infrastructure and tooling space. Observability tooling, perceived as a large cost center, came under scrutiny, leading to the recommendation of allocating 20-30% of total infrastructure spend to observability. Tweaks in trace locality, such as keeping requests within a single availability zone, and employing sampling techniques like head and tail sampling, can help reduce costs in the observability stack. Additionally, considering peering and transit gateways, as well as utilizing vendor support for services like AWS PrivateLink or Azure private endpoints, can further alleviate cloud costs. |
|
|
|
|