Culture Around Incident Management


Over the years, handling an incident has transformed from taking action after an incident has shut systems down to applying a proactive approach and cushioning the hit before it happens. The cultural shift that changed how we handle incidents for the better demonstrates stronger communication and transparency between you and your stakeholders. Automated monitoring, Atlassian’s Statuspage and Jira Service Management are just a few features Atlassian has developed to improve communication and incident response for teams and their customers.

Click the video above to hear Edmond and Michael give further insight on the cultural shift around Incident Management, or skim the transcription below.

Video Transcription

Edmond: 00:00

Hey. Good morning, Michael.

Michael: 00:01

Hey, Edmond. How are you?

Edmond: 00:03

Hey, man. I'm doing pretty good. I was just reminiscing about our trip last year. It's been a year since we were in Atlanta.

Michael: 00:10

Oh, that was such a great show. That was the last concert I went to before COVID.

Edmond: 00:15

I know. Going to see Tool in Atlanta was just, oh, it was phenomenal. Picture right here, actually, is from that show.

Michael: 00:25


Edmond: 00:25

I just can't believe it's been a year. But anyway, like I said, we're going to sit here and riff about some metal tracks or we can get into some topics around incident management, which is what I think we're going to talk about today. Is that right?

Michael: 00:39

You got it.

Edmond: 00:40

Oh, very cool. So give me a quick 30-second, one minute primer about sort of the history of incident management. Give me some background, how things have progressed in that world.

Michael: 00:51

Yeah, I think 10 or 15 years ago, the whole idea was catch it when it's down and you get all these systems in place to monitor services and items, and you would respond when something went down. And then fast-forward a couple of years, and you would see much more of a preventative mindset, application, performance monitoring, or real time log aggregators would try to prevent and try to catch warnings well before a service would actually ever go down. And then you fast-forward to now, and there's a kind of a different cultural shift that is things do go down and things do happen. What do we do when that occurs is the focus, I think, of the cultural shift around incident management today.

Edmond: 01:41

So historically it was something went down and there was probably a manual inspection. And then they got to automated monitoring, things like that and telling you this, but that had to evolve even further. So to bring you sort of into today, what happens today with incident management when an incident occurs and what's the culture behind that?

Michael: 02:02

The whole idea is a different level of transparency than we used to see, giving your stakeholders, whether that's your customers, or your team, or whoever, giving them a way to say, "If there's a problem and I know about it, this is where you would go and see it." So I like to use the... My internet provider as an example. If I had a problem with my internet, I used to pick up the phone and I would call them. Well, if that's your IT team or any other team managing an incident, that could be distracting, just to ask if it's a problem.

So now the whole idea of proactively publishing if there's a problem and you know about it, we do that today with Atlassian's Statuspage. That's a great way to proactively indicate and inform your customers all the way down to the workflows behind it, so really making sure that we realize this does happen. Incidents do occur. How do we learn from it? How do we make sure that we manage them consistently every time they happen so that we can learn from them, make sure they don't happen again, inform the right people at the right time and automate as much as possible?

But they do happen, so it's all about the culture of progressing that idea from fix it when it's down to, "Okay. Now we realize things do happen. Let's learn from them. Let's make sure we learn from them and let's show how we learn from them."

Edmond: 03:31

Yeah, that makes a lot of sense. So, I mean, I can think back to many years ago, not just the monitoring stuff, but it wasn't always best practice to share when your systems are down or there's a bug or a defect and be that transparent. So that's a big culture shift, and it sounds like it's going in the right direction because being completely transparent and open with customers, whether it's outside customers or internal stakeholders, them knowing this is going to inform them of what's happening.

Michael: 04:06

Mm-hmm (affirmative). And trust me, people realize it. Just like the internet example of your local internet provider, the customers realize when there's a problem. And for me as a consumer, it puts my mind at ease if I go on and I see that they're aware of a problem, and that they're actively working to fix it. That's much more modern than picking up the phone and calling them and saying, "Is there a problem?" "Yes, I restarted my computer three times," type of thing.

Edmond: 04:37


Michael: 04:38

It's an entirely shift of how you communicate, what you communicate, and to your point, a sense of transparency.

Edmond: 04:47

Well, that's great. So if I'm out in the world right now and I want to start this process, or maybe I have some of this in place, what can I do? How do I get involved to begin to make this shift?

Michael: 04:59

Yeah, I mentioned Statuspage is one of the communication channels within the Atlassian ecosystem. We have JSM for the whole service management tool, so we can correspond incidents and have workflows. We can correspond it to assets within your environment. And then the whole Opsgenie piece is a full-fledged incident management tool, so workflows around those incidents, making sure we're capturing and following kind of a playbook, and that's what we do. So at E7 we coach people how to implement these tools, how to learn more about this culture and how to create best practices and more of a modern culture around incident management.

Edmond: 05:42

Well, that makes a ton of sense. So it sounds like there's a ton here we could talk about. Maybe we need to bus down and have a full-fledged webinar around this topic, but in the meantime, if folks have questions or they're ready to get started, I'm sure they could reach out to us and connect.

Michael: 06:00 You got it.

Edmond: 06:03

Awesome. Well, thanks, Michael. I appreciate your time.

Michael: 06:05

No problem. Next time, you bring the hot sauce. I'll bring the music.

Edmond: 06:09

All right. Sounds good.

Michael: 06:10

All right.

Edmond: 06:10

We'll see ya.

Michael: 06:11


Edmond: 06:12


On-Demand Webinar: An ITSM Discussion on Incident Management