Skip to content

SLI's, SLA's and SL'D'OHS! Learning about Service Uptime from Homer Simpsons

Measuring service uptime can be a daunting task, but not for the Simpsons! Viewers will leave with a clearer understanding of how to account for services, how SLxs can define and guide what to monitor, and how to implement reliability targets and error budgets; all told by fabled Springfield wisdom

Introduction

Building services is important, but what happens after they are built and running in production? How do we establish trust with our customers that our service will actually be available? Who creates these definitions and how do we measure them?

Service Level Indicators (SLI), Agreements (SLA), and Objectives (SLO) are central to an operations mindset and foundational tools for effective Site Reliability Engineering. This talk will take you on a journey through Springfield as we discuss exactly what SLIs, SLAs, and SLOs are, how to measure them, what targets should be measured, how to define uptime, availability, and acceptable error rates, and what happens when they are breached.

This talk was given at:

* [Open Source Summit 2020](https://www.youtube.com/watch?v=pkiEs3ax5uQ)
* [Chicago Python User Group Meetup](https://www.youtube.com/watch?v=mXKOv0wlrg8)

Presentation Slides