According to Atlassian, an incident is described as “an event that causes disruption to or a reduction in the quality of a service which requires an emergency response.” For example, an incident can range from minor intermittent errors to a major or global web crash. There is high priority in having a focus and strategy on incident management, as incidents can become costly or damaging to business. In fact, based on industry surveys by Gartner, Atlassian noted that network downtime can cost more than $300,000 per hour. In addition to the costs of network outages, other occurring incidents can also cost teams time while waiting for the incident to be resolved.
The solution for these types of situations, though, is proven successful when the tasks and services affected resume to regular functioning. Once the incident is resolved, the incident postmortem takes place to identify the root cause of the incident as well as plan actions to prevent the re-occurrence of the incident. As there are several incident management processes for every type of organization to adapt, each one of those processes have the same focus and significance in supporting companies through the occurrences of incidents.
At E7 Solutions, we take Incident Management seriously, making sure to be as transparent and communicative not only within the company but also with our customers. Before we dive too deep into Incident Management, let’s start with a basic process for managing incidents.
- Guide and build: Incorporate autonomous decision-making and consistence culture among teams in identifying, managing and learning from incidents. There will not always be a clear answer, but guiding and building together can move the process along more effectively.
- Align teams: Develop an understanding of which attitude is appropriate for each aspect of incident identification, resolution and reflection.
- Detect: Continuously monitor and attend to incidents before customers discover them, as issues can be resolved before becoming incidents.
- Respond: “Don’t hesitate to escalate.” It is better to bring awareness of a potential incident even if it does not affect everyone than to stay silent.
- Recover: Service will go down time to time uncontrollably; this is understood as long as the incident is resolved as quickly and efficiently as possible.
- Learn: In reference to the value above, mistakes or accidents will occur but proper accountability and gained knowledge from those situations can improve for better delivery of service.
- Improve: Break down the incident, starting from the exact root cause to the necessary and strategic actions in preventing or reducing the chance of the incident occurring again. Set dates for those actions.
In a perfect world, everything would run smoothly, and there would be no such occurrence of incidents. However, the real world has proven endless times that it hides nothing from the possibility of problems happening now and then. With that said, it is important to identify the cause of incidents, develop consistent procedure to resolve among dedicated teams, plan actions that can reduce or prevent re-occurrence, and learn from each one to tackle those situations accordingly.
We’re just getting started on the various topics associated with Incident Management, such as Atlassian tools dedicated to incident management processes, incident response, best practices for incident communication, and more. Stay tuned for future posts!