click for the Intuitive Interactions home page

"Intuitive Interactions"

John McCulloch - Alarm Management

Home Page

Contact John

Curriculum Vitae

Ethical Statement

Alarm Management

Operator Interface Design

Dynamic Simulation

Control System Configuration

Process Control Troubleshooting

Control Loop Tuning

RPN Calculators



Solar Energy


Community Activities

Torphichen Kirk

The Alarm System

Diagram illustrating role of Alarms in a disturbance
The function of an alarm is to warn the operator if there is a problem that requires his attention. Disturbances happen to a plant all the time - they can range from the trivial, a slight change in feed quality or of cooling water temperature which the control system is easily able to cope with - to the extreme, major equipment failure, which the control system is not designed to cope with. Ideally, the alarms are set so that they distinguish the events that the control system can deal with from those that require operator intervention.

Poor Alarm Management is a key factor in a number of major incidents. The best known of these are:
Texaco, Milford Haven, Wales
A large oil refinery explosion and fire, 24 July 1994. The UK Health and Safety Executive report into the incident criticised the excessive number of alarms triggered by an emergency as contributing to the incident. A single alarm indicating a developing problem was missed.
For more information about this incident, click here:
Three Mile Island, USA
Nuclear Power Plant meltdown, 28 March 1979. During the first minute of the upset there were over 400 new alarms triggered in the control room. The operators failed to identify the underlying and most significant problem - total loss of coolant.
For more information about this incident, click here:
Channel Tunnel fire, UK/France
On 18 November 1996, a burning truck on a train entered the tunnel and later developed into a major fire. A history of train delays caused by nuisance false alarms had resulted in the introduction of a slow and bureaucratic response procedure causing delays in responding to a genuine alarm.
For more information about this incident, click here:
These examples are only the tip of the iceberg; every major operating site has examples of incidents that have escalated through poor Alarm Management.

So, what is the problem? How does this lead to the escalation of an incident? What makes an alarm system good or bad? And if your alarm system is not as good as it should be, what can be done about it?

What is the problem?

When an alarm occurs, the operator has to perform the following tasks:
  • Identify the alarm: find out which of sometimes several thousand sensors is registering a problem.
  • Identify the cause of an alarm: an alarm can always have several possible causes - an instrument fault or a genuine process problem, perhaps several different process problems.
  • Check the state of the rest of the plant: what is happening elsewhere can affect his decision of what action to take.
  • Decide the action to take: he may have to look up books of operating procedures if it is an unfamiliar event.
  • Perform the corrective actions required: this may involve a complex sequence of steps.
It should be clear that these steps could take several minutes to perform for just one alarm. In a real upset on a process, many alarms can arrive at once. Operators become surprisingly good at sorting out the important alarms from the huge quantity that can be presented to them, but even the best operator can become overloaded if the quantity becomes too great.

Psychology tells us what happens when an operator becomes overloaded in this situation:
  • Over-simplification: The operator does not have time to appreciate the full complexity of the situation and so makes an over simplistic assessment of what is happening.
  • Problem Lock-in: The operator becomes fixed on the problems that he knows about and fails to take in evidence of a new problem. Secondary problems are thus overlooked.
  • Problem Escalation: the overlooked problems escalate because nothing is done to correct them.
  • Physical Event: this continues until some physical event takes place, perhaps operation of a safety system, occasionally equipment damage or more rarely a fire or explosion.
  • Panic: when the operator realises that the situation is out of his control, he panics because he has to completely reasess the situation and has not enough time to do it. In this state he will make wildly inappropriate actions.
All too often, the operator is blamed for his late and inappropriate actions. In reality the fault lies in the poor design of the system that presents information to him. The problem is called cognitive overload - he has too much information to deal with in too little time. Even when incidents are not happening, it is a constant source of stress, as he knows that he is likely at any time to be presented with a situation that is impossible to handle correctly and to be criticised for doing it wrong.

Successful Alarm Management

I have developed a set of 7 key points that, if appropriately employed, will provide an alarm system in which cognitive overload cannot occur:
  1. Define Standards. Appropriate standards should be in place for Alarm Management. These should be properly understood by site management and so it is important that the right culture is in place for acceptance of the standards.
  2. Measure Performance. There are a number of both static and dynamic measurements that can be made to indicate the performance of hte alarm system. For a new plant, comparable measurements on existing similar plants can be used.
  3. Define Improvements Required. By comparing the performance being achieved with the requirements in the standards, the degree of improvement required can be assessed. This step defines which of the remaining steps to pursue.
  4. Alarm Rationalisation. This is the process of identifying the alarms that are actually needed on the plant, their proper settings and the corrective action to be taken when each operates. Results will be recorded in an "Alarm Response Manual" which should contain information on why the alarm is there as well as the configuration detail. Alarm rationalisation can make a considerable improvement to the number of nuisance alarms in normal operation, but on its own, has little impact on alarm floods and cognitive overload. It is, however, an essential first step for the remainder.
  5. Alarm Logical Grouping. Where a number of alarms have the same operator response, they can be replaced by a single group alarm. Individual detail is provided for diagnostic or maintenance purposes in detailed schematic displays.
  6. Dynamic Alarm Modification. In different operating moddes of a plant or unit, different alarms may be required. When a compressor has tripped, the pre-alarms warning of states that might cause a trip are no longer required. Dynamic Alarm Modification adjusts the settings of alarms according to the mode of operation of the plant or unit.
  7. Alarm Diagnosis System. Systems are being developed that perform process problem diagnosis by comparing hte information from the plant sensors with a model of the plant using some sort of expert system. Whilst the technology is not at the point where such systems can be routinely used, there is much valuable development work going on. It is useful to consider the requierements of such a system. It must be based on a dynamic model as the process is an inherently dynamic environment and, especially in an upset, is rarely at a steady state. Self-tuning dynamic models are available that will adapt to changing plant conditions. Where a discrepancy between model and real plant behaviour is observed, it should be able to short-list possible causes and test these singly and in combination against the model to establish the most probable cause or causes of the observations. Such systems have the potential to give very early warning of process problems.
It should be emphasised that the Operator Interface Design is also of high importance in achieving effective alarm response. For this reason, I also provide consultancy on Operator Interface Design and on the Ergonomics of the Man-Machine Interface, (see separate page on this site).

New Plant Projects:

If you have a new plant project and you want it to have an effective alarm system, it is always better to start as you mean to proceed rather than trying to correct a bad situation later. I have experience of setting up project procedures and of training project staff to ensure that the plant, when delivered, has a satisfactory and effective alarm system.

Click on the relevant item in the left-hand column to get more detail on any of these subjects or to contact me.

This page last updated on 2013-08-06 JGM