From 892bea1749a7d8c6f07dece1cc4b25818cf305a2 Mon Sep 17 00:00:00 2001 From: LogicalGagan Date: Tue, 14 Oct 2025 19:14:06 +0530 Subject: [PATCH 1/3] Added additional SRE resources from major tech companies --- README.md | 21 +++++++++++++++++---- 1 file changed, 17 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index 19c5abd..93c0691 100644 --- a/README.md +++ b/README.md @@ -1657,14 +1657,27 @@ Numerous organizations frequently share their insights and expertise, encompassi #### SRE Resources from various organizations -* [Google SRE Page](https://sre.google/) -* [Google SRE Classroom](https://sre.google/classroom/) +* [Airbnb Engineering - Lessons Learned in Incident Management](https://dropbox.tech/infrastructure/lessons-learned-in-incident-management) +* [Atlassian - Blameless Postmortems](https://www.atlassian.com/incident-management/postmortem/blameless) +* [Atlassian - Creating Postmortem Reports](https://www.atlassian.com/incident-management/postmortem/reports) +* [AWS Observability Recipes](https://aws-observability.github.io/aws-o11y-recipes/) +* [Awesome Sysadmin](https://github.com/awesome-foss/awesome-sysadmin) +* [Cloudflare - Incident Analysis and Response](https://blog.cloudflare.com/cloudflare-incident-on-august-21-2025/) +* [Dropbox - Monitoring Server Applications with Vortex](https://dropbox.tech/infrastructure/monitoring-server-applications-with-vortex) * [Google Cloud SRE Page](https://cloud.google.com/sre) +* [Google SRE Classroom](https://sre.google/classroom/) +* [Google SRE Page](https://sre.google/) +* [Google SRE - Blameless Postmortem Culture](https://sre.google/sre-book/postmortem-culture/) +* [Google SRE - Incident Response and Analysis](https://sre.google/workbook/incident-response/) * [Microsoft SRE Page](https://docs.microsoft.com/en-us/azure/site-reliability-engineering/) +* [Netflix - Centralized Site Reliability Practice](https://netflixtechblog.com/keeping-customers-streaming-the-centralized-site-reliability-practice-at-netflix-205cc37aa9fb) +* [PagerDuty - Incident Response Automation](https://www.pagerduty.com/blog/automation/from-alert-to-resolution-how-incident-response-automation-cuts-mttr-and-closes-gaps/) * [School of SRE from LinkedIn](https://linkedin.github.io/school-of-sre/) +* [Spotify - Incident Management Practices](https://engineering.atspotify.com/2013/06/04/incident-management-at-spotify) * [Stripe Increment Magazine Issue 16 on Reliability](https://increment.com/reliability/) -* [AWS Observability Recipes](https://aws-observability.github.io/aws-o11y-recipes/) -* [Awesome Sysadmin](https://github.com/awesome-foss/awesome-sysadmin) +* [Uber - Observability at Scale](https://www.uber.com/en-IN/blog/observability-at-scale/) + + #### Incidents & postmortems From 4b6f922e5fcda1db9684baaec68d3b7b68eb985c Mon Sep 17 00:00:00 2001 From: LogicalGagan Date: Tue, 14 Oct 2025 19:21:44 +0530 Subject: [PATCH 2/3] Add additional SRE & DevOps resources from major tech companies --- README.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/README.md b/README.md index 93c0691..24973e9 100644 --- a/README.md +++ b/README.md @@ -1677,8 +1677,6 @@ Numerous organizations frequently share their insights and expertise, encompassi * [Stripe Increment Magazine Issue 16 on Reliability](https://increment.com/reliability/) * [Uber - Observability at Scale](https://www.uber.com/en-IN/blog/observability-at-scale/) - - #### Incidents & postmortems * [The Verica Open Incident Database](https://www.thevoid.community/) From 7d4a4c06752b64da2afaead84b9c7c61e7427b00 Mon Sep 17 00:00:00 2001 From: LogicalGagan Date: Tue, 14 Oct 2025 19:31:22 +0530 Subject: [PATCH 3/3] Cleared some of the bugs caused by duplicate urls and blank spaces --- README.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/README.md b/README.md index 24973e9..60c792a 100644 --- a/README.md +++ b/README.md @@ -1663,14 +1663,12 @@ Numerous organizations frequently share their insights and expertise, encompassi * [AWS Observability Recipes](https://aws-observability.github.io/aws-o11y-recipes/) * [Awesome Sysadmin](https://github.com/awesome-foss/awesome-sysadmin) * [Cloudflare - Incident Analysis and Response](https://blog.cloudflare.com/cloudflare-incident-on-august-21-2025/) -* [Dropbox - Monitoring Server Applications with Vortex](https://dropbox.tech/infrastructure/monitoring-server-applications-with-vortex) * [Google Cloud SRE Page](https://cloud.google.com/sre) * [Google SRE Classroom](https://sre.google/classroom/) * [Google SRE Page](https://sre.google/) * [Google SRE - Blameless Postmortem Culture](https://sre.google/sre-book/postmortem-culture/) * [Google SRE - Incident Response and Analysis](https://sre.google/workbook/incident-response/) * [Microsoft SRE Page](https://docs.microsoft.com/en-us/azure/site-reliability-engineering/) -* [Netflix - Centralized Site Reliability Practice](https://netflixtechblog.com/keeping-customers-streaming-the-centralized-site-reliability-practice-at-netflix-205cc37aa9fb) * [PagerDuty - Incident Response Automation](https://www.pagerduty.com/blog/automation/from-alert-to-resolution-how-incident-response-automation-cuts-mttr-and-closes-gaps/) * [School of SRE from LinkedIn](https://linkedin.github.io/school-of-sre/) * [Spotify - Incident Management Practices](https://engineering.atspotify.com/2013/06/04/incident-management-at-spotify)