There seems to be some confusion with these three words as concepts of ITIL and in some cases everyday life. This is more about how they apply when things go south in Technology. Often Event, Incident and Problem are used in place of one another when talking about something that breaks. It can get confusing so I thought I would take a moment to help put my spin on how these three words break down.
Event - The Big Bang An even is just that, an event. O.k., that didn't work very well, let me try that again. Think of an event as a point in time when something occurs. Power fails, a car crashes, a tornado touches down or an earthquake strikes. Each one of these is an example of an Event, that moment when something occurs. It can be confusing because in Technology, something can crash or fail and classified as an event, but often, that is a symptom of the event, which occurred. Take a Blue Screen for example. You might think that the Blue Screen of Death is the machine just giving up and quitting. The reality is that in most cases the Blue Screen is a symptom of the event, which occurred. This could be a rogue application, bad memory or something that made the Operating System crash. So determining the Event in Technology can be at times somewhat subjective.
Event Management is the practice of trying to detect a potential event by putting some process in place that can detect an event as soon as it occurs. In most cases that is done through monitoring tools or automation. The goal is to get on top of these things well before they cause impact or loss of service to the customer. Take the Blue Screen example above. If it was an issue with an application taking up too much memory due to bad code, monitoring could detect that ahead of time and trigger an alarm to have someone respond to the memory issue before resulting in the operating system Blue Screen. If not, then you could have monitoring on the machine itself that could detect when the box becomes unresponsive and trigger an alert. So an Event, and Event Management is all about living life for the moment something could or does go wrong.
Incident - Whatever it takes So, an Event occurs and things break. This is where you have to start thinking about life as the user. Someone somewhere was doing something that is now impacted and not working anymore. Everything that person was depending on is referred to as the Service being provided. It has been my experience that the user or consumer of that service does not give two shakes what makes things go, but that they cannot go any more.
For the most part, this applies to almost everything in our lives. If my TV channels go out, do I really care what it took Verizon to get it to me? Do I care about their cables, switches, fiber, blah blah...? NO,
I just want my damn TV back on! So what do you do? You call their support desk just to ask one simple question: "When will it be back on?" That is all you care about as the customer. Not what broke, who is fixing it or how it broke, but "How long will it take to fix it!" That is the core point of an Incident. Everything that happens during that time, when what you as the customer wants is not available, and when it actually is back up and running.
That whole process falls under the world of Incident Management. The ability to manage all of the moving parts that are needed to get service restored to the customer. It is the very human element of life in Technology. The one that is responsible for command and control, providing updates to management and the customers while coordinating all of the groups working on the problem. So an Incident could be summed up as the actual lifecycle of a failure. Whatever it takes, just get the service back up and we will figure out the rest later.
Problem - Workarounds are not forever Where it could be said that an Incident is the reactive response to an event, Problems are generally proactive, identifying issues that are known to exist, have the potential to wrong, but are managed through known workarounds.
An easy analogy of a Problem would be following a recipe that calls for sugar, knowing you don't have any and using honey instead. The dish may come out o.k., but is not the correct way it is done. That is your workaround. The dish still comes out, and the customers may be satisfied, but it is not as good as it should be with sugar. Problem Management is about realizing that you have made several dishes without sugar using honey and making plans to go to the store to buy some sugar. When you buy it, the problem goes away.
For the most part, Incidents feed Problems. If the goal of Incident Management is to restore service as fast as possible and by any means necessary, including workarounds, then Problem Management is responsible for reviewing those workarounds and replacing them with permanent solutions so that the incident never occurs again.
The proactive nature of Problem Management does not exist solely for Incidents; it can also apply to Event Management. Think of all of the alarms or alerts that may be occurring in a single day. Problem Management could also collect and analyze that data looking for trends where there is a risk of failure. A single alarm may seem innocent enough; ten or twenty in a week may point to a larger 'problem'. HA! That is why it is Problem Management.
So there you have it, my interpretation of Event, Incident and Problem and why they need to be understood and applied as each being their own individual process. The beauty is, this is as it is applied in IT, but can also be applied in many different areas in life.