| 🔗 Stories, Tutorials & Articles | | | | How we improved on-call life by reducing pager noise | | | The article describes a problem faced by an on-call SRE team, who were receiving too many pages when there was a service-wide degradation, making it difficult for the on-call to focus on solving the problem. The team decided to group alerts by service and introduce service dependencies for alerting/paging. They used Prometheus and Alertmanager to group alerts by labels such as "type" and "env" labels. They also updated their Pagerduty and Slack templates to show the right information. This resulted in reducing the number of pages the on-call receives, and allowing them to focus on solving the problem. The article also describes how they addressed the problem of cascading effects when a downstream service starts burning through the error budget. |
| | | | | | Use One Big Server ✅ | | | The real issue behind the monoliths vs microservices debate is whether distributed systems are worth the developer time and cost overheads. Virtualization has made servers bigger and cheaper than we think. One server today is capable of serving video files at 800 Gbps, 1 million IOPS on a NoSQL database, 70k IOPS in PostgreSQL, 500k requests per second to nginx and more. The cost of one server ranges from $1,318/month to $6,055/month. The cloud premium is real but if you're going to spend that much on a server, you should probably consider building your own data center. While cloud computing can provide ease of use and high availability, it also comes with a premium cost. Cloud-native architecture, such as microservices and serverless computing, may not be necessary for most workloads and can be more expensive. It's important to understand your own workload and usage patterns to determine whether the cost of cloud-native architecture is worth it. Additionally, the more bursty your workload is, the more beneficial cloud-native architecture may be. However, in general, using a few large servers can be more cost-effective and simpler to manage. |
| | | | | | | [INFOGRAPHIC] The True Cost of Downtime: 21 Stats You Need to Know | | | Downtime, or when a system or website is unavailable, can be costly for companies. Research shows that each minute of downtime can cost an average of $9,000. The cost continues to rise and some industries, such as banking and finance, healthcare, and manufacturing, have a much higher price tag of up to $5 million per hour. The cost of downtime varies depending on the size of the organization, the length of the outage, and other factors. To calculate the cost of downtime for your organization, you can use a formula of minutes of downtime multiplied by the cost per minute. |
| | | | | | How I turned a cheap weather station into a personal DevOps dashboard ✅ | | | In this post, the author discusses how they created a personal weather dashboard using a cheap weather station and various other hardware and software components they already owned. They used a WS2032 weather station that sends data on a 433 MHz radio frequency, which is picked up by a USB radio receiver. They also used a tool called rtl_433 to parse the data and send messages to an MQTT broker, which is then picked up by Home Assistant and sent to InfluxDB. Grafana, running on a Synology NAS , is then used to display the data from InfluxDB. The author also includes diagrams and pictures of the setup, as well as information about the cost of the hardware and setup instructions for the different components. |
| | | | | | | How Cloudflare Broke My Build and How I Fixed It | | | An open-source project failed to build on a continuous integration service due to a failure to upload source code coverage data to coveralls.io. The cause of the failure was found to be Cloudflare, a service provider used by coveralls.io. The author reached out to coveralls.io support and found that the issue was with double quotes around the boundary directive value in the Content-Type header. The author does not know why Cloudflare treated this as an error. |
| | | |
|
|