What's difficult about problem detection?

Featuring Joanna Mazgaj, Laura Nolan, Kurt Anderson, and Matt Davis

Series Summary

Buckle your seatbelts! We’re venturing into the SRE community to chat about reliability engineering. Join us as we dig into SRE day-to-day challenges and how it intersects with other parts of your business. We'll cover topics like What’s difficult about on-call? and How do I make sense of reliability data?  This is an honest conversation with engineers about how we apply SRE principles. Each episode features community guests. Follow us on Twitter or LinkedIn to know when a new episode drops!

Episode 4: What's difficult about problem detection?

In this episode, Joanna Mazgaj, Director, Production Support, and Laura Nolan, SRE at Flatiron, join Matt Davis and Kurt Andersen from the Blameless team to detect the problems of problem detection! Knowing what's going wrong isn't always easy. Learn how to get ahead by building collective intelligence, stopping things from slipping, and more!



Matt Davis
Staff Infrastructure Engineer, Blameless

Matt’s background covers distributed databases, IT security, site reliability, observability, and techops leadership. He also has a passion for exploring the relationships between the artistic mind and operating distributed software architectures.


Kurt Andersen
SRE Architect, Blameless

Kurt is a practitioner and an active thought leader in the SRE community. Kurt was a Sr. Staff SRE at LinkedIn. He’s a member of the USENIX Board of Directors and on the steering committee for SREcon.

joanna mazgaj

Joanna Mazgaj
Director, Production Support, Tala

I manage production engineering organization at Tala, which is a part of our CloudOps/DevOps group. My teams build internal tools and platforms for customer and product management and we own the production incidents escalation process. All the way from simple change request escalations to P0/P1 incidents.

In my spare time I'm probably on a hike, playing a computer game or cooking. I collect cookbooks, I have about 85 right now. 

Laura Nolan_CHD_3789-

Laura Nolan
Principal Software Engineer, Stanza Systems

Laura Nolan is a software engineer and SRE. She has contributed to several books on SRE, such as the Site Reliability Engineering book, Seeking SRE, and 97 Things Every SRE Should Know. Laura is a Principal (and principled) Engineer at Stanza Systems, where she is building software to help humans understand and control their production systems. Laura is a member of the USENIX board of directors and a long-time SREcon volunteer. She lives in rural Ireland in a small village full of medieval ruins.

Register to Watch Episode 4

fttp e4 thumbnail