Abstract
There have been several well publicized incidents over the past few years in refinery and petrochemical facilities. Incident reports from amongst others, the Chemical Safety Board and papers from ARC and Marsh McLennan Insurance Company show that many of these incidents can be attributed to operational errors. On closer examination of possible actions during a crisis, such as mistakes due to confusion and time taken to understand the value of information being presented, it begs the question – “Are Machines Better than Humans in a Crisis?”.
In fact, could technology provide better safety and business benefits? On November 4th 2010, Qantas Airlines flight 32, an Airbus A380, took off from Singapore to Sydney en route from London Heathrow. At the time, the A380 was the world’s largest and most technically advanced airliner. Not long after take-off over Indonesia one of the engines exploded. The incident report said that had there not been five experienced pilots on board that day, the plane could have crashed. Thankfully they saved the aircraft which landed back in Singapore. The conclusion was that the balance of advanced technology and human thinking saved the day. In essence, in a stressful situation, both humans and machines have a role to play. This presentation will discuss several plant incidents and see how technology could maybe aid the operator in a crisis and help him to focus on the problem.
[ot-video][/ot-video]
Challenges Facing Humans in Process Operations
According to a 2012 report by the Energy Practice of Marsh Ltd, a division of Marsh McLennan, the 5-year loss rate (adjusted for inflation) in the refinery industry over the period 1972-2011 continued to rise, with incidents occurring during start-ups and shutdowns continuing to be a significant factor.
The losses indicated are occurring at a time when control systems and instrumentation on process plants have improved substantially. So why are they happening?
Modern control systems have improved the way processes are controlled and made them much safer. Modern control rooms provide safer havens for operators to perform their work. One operator controls substantially more equipment than was previously possible. The operator has access to as much or as little information as is desired. But this is where the configurability of modern control systems can cause issues if not implemented within guidelines. It is now possible to use as many colors as you want and to alarm in multiple ways. So, systems are often configured with displays or alarms that under normal conditions are inactive but may become active unnecessarily under abnormal operations and confuse the operator – causing errors to be made. As the number of operators in a typical process have been reduced over the years, it is important for them to get a clear picture of what is happening in periods of crisis to avoid escalation into an incident. What often happens though is they get too much information and can become confused. Humans are not designed to cope with masses of information, especially when they are under stress. For example, the incident report in 1994, after an explosion at the Texaco Milford Haven refinery said: “In the last 11 minutes before the explosion the two operators had to recognize, acknowledge and act on 275 alarms”.
Start-ups and shutdowns of process units are ‘normal’ operations, along with grade changes and other transitions that put operators under an increased amount of stress. As the Marsh report above indicates, they are also times when things can and do, go wrong. This is compounded when abnormal conditions exist, often causing inappropriate decisions at the wrong time, potentially leading to, at best, a production incident and, at worst, injury or fatality. With thought and planning the same systems that can be configured to confuse the operator, have the power to be used to help the operator. I will now explore how operators and machines have different strengths in crisis situations.
BP Texas City – A Case Study
On March 23rd, 2005 there was an explosion in the isomerization unit of the BP Texas City Refinery, Texas City, TX, which at the time was BP’s largest refinery. The explosion killed 15 people and injured 170.
The incident report indicated several issues that operators failed to act on, such as a level alarm acknowledged in error, the heat up ramp rate being too fast and the fact that they tried to start up in manual when procedures indicated that during early start up the only way to control splitter level was in automatic. Another issue was that the burners were turned on prior to establishing rundown. All of these could have been due to stress or just lack of vigilance.
Amongst other things the report cited issues with the procedure as one of the causes of the incident:
- “…failure to follow many established policies and procedures. Supervisors assigned to the unit were not present to ensure conformance with established procedures, which had become custom and practice on what was viewed as a routine operation”
- “The team found many areas where procedures, policies, and expected behaviors were not met”
The report recommended changes to start-up and shutdown procedures but did not recommend additional training or procedure support from the control system. The incident could possibly have been avoided by the correct use of instrumentation, control, and procedures, as shown later in this paper.
Humans Do Count!
Having several very skilled “operators” probably saved Qantas flight 32 on November 4th, 2010. The flight operated by an A380 Airbus – the largest and most technically advanced passenger aircraft in the world had left Singapore for Sydney en route from London when over Indonesia one of the engines blew apart.
The pilots were inundated with messages – 54 came in to alert them of system failures or impending failures but only 10 could fit onto the screen. The pilots watched as screens full of messages came in. Luckily on that flight, there were five experienced pilots including three captains who were on “check” flights. Even with that much experience on board, it took 50 minutes for the pilots to work through the messages and prioritize them to find the status of the plane. The incident report said that without those pilots it’s possible the flight would not have made it. In fact, the ‘airmanship’ of the pilots saved the plane. If the pilots had followed ALL the advice of the flight systems the plane would have crashed. The most senior pilot told the others to read the messages but ‘feel’ the plane. They managed to land the plane with only one of its four engines in full working order.
Finding the Balance
From the Qantas incident, it’s clear that humans and machines have a role to play in crisis situations. We may ask if it would be possible to take the human out of the equation. The theoretical answer to that is yes. It is possible to run a process plant without operators and to fly a plane without pilots but would anyone fly on such an aircraft? But can we use the processing power of machines to guide humans and the deductive power of humans (given a logical number of options) to make the correct decision?
Mary L. Cummings, Director of the Humans and Automation Laboratory (HAL), MIT and a former Navy F-18 pilot who is doing research into human-automated path planning optimization and decision support has said: “Humans are doing a pretty good job, but they do it even better with the assistance of algorithms.” and “This research is really showing the power of how, when algorithms work with humans, the whole system performs better.” So, maybe there IS a balance.
Humans have emotions and get stressed. There is no better example of this happening than in a crisis. Some humans are able to handle crises in a very calm way, as shown by historical heroic efforts in war and peace, but the majority tends either to try to do everything, panic or just switch off. So when even the best operator is faced with many alarms coming in at the same time and other things happening around him, he will likely try to look at as many as he can and work out a scenario and possible solution, but that may be too late. It would be much better if the system provided him with options and guidance – or decision support.
Automated systems are able to do repetitive things over and over the same way. They can handle and correlate large amounts of data in a short time. They don’t fall asleep or ignore procedures. They don’t panic under pressure. They can respond quickly to changes in conditions. BUT they can fail and they need “training”. They can analyze many hundreds of inputs – maybe find patterns and offer more refined suggestions to the operator. Humans, on the other hand, are perceptive; they have senses; they can weigh pros and cons and they can evaluate and act on advice when they are not under stress.
There is a balance.
Standards Helping Decision Support
Decisions are made by assessing the problem, collecting and verifying information, identify alternatives, anticipating consequences of possible decisions and then making a choice using sound and logical judgment based on available information.
Few humans in a crisis are able to do this without help. Either they find it difficult to manage the situation to give them time to gather enough information to make a sound decision or they just run out of time trying to make the decision. With decision support and guidance, this task becomes more manageable.
A decision support system should be able to:
- Use historical data for ‘memory’ of what has happened in the past
- Incorporate both data and models to analyze and present the most likely options
- Assist operators in semi-structured or unstructured decision-making processes
- Support, rather than replace, operator judgment
- Aim at improving the effectiveness rather than the efficiency of decisions
In the process industries, decision support of this nature is not yet widely available but research is being conducted and in key areas such as human-machine interface design, alarm management, and procedure management basic decision support may be provided. In support of this, industry standards are either available or being developed.
The International Society for Automation (ISA), a globally recognized standards development organization, is developing three standards:
- ANSI/ISA-18.2-2009 – Management of Alarm Systems for the Process Industries
- ISA101 – Human Machine Interfaces
- ISA106 – Procedure Automation for Continuous Process Operations
ANSI/ISA-18.2, which has been a standard since 2009, provides requirements and recommendations for the activities of the alarm management lifecycle. The lifecycle stages include philosophy, identification, rationalization, detail design, implementation, operation, maintenance, monitoring & assessment, management of change, and audit.
ISA101 is a standard that is still in development and at the committee stage. It is being directed at those responsible for designing, implementing, using, and/or managing human-machine interfaces in manufacturing applications.
The ISA106 committee has decided to produce a series of technical reports initially to address procedural automation for continuous process operations. The aim will be to provide good practices to address many of the human performance limitations that can occur during procedural operations.
The Role of Procedures
As can be seen from the incidents above, the effective use of procedures is one of the key items in maintaining safe and reliable operations under ALL conditions. In fact, if configured correctly, well-planned alarms could trigger procedures in many abnormal situations and a well designed human-machine interface could bring a developing incident to the attention of the operator in a timely manner.
The airline industry is amongst the safest and most automated in the world – in fact, most modern aircraft would not be able to fly without the use of computer guidance, yet procedures play a big part in the way aircraft are operated. Pilots need to go through many procedures before, during and after a flight.
The first recorded procedures were introduced by test pilots in 1935 after a crash of the B-17 Flying Fortress almost caused the program to be abandoned due to a gust lock still being engaged at take off. It was said that the plane was too complicated to fly. The test pilots developed procedures for takeoff, flight, before landing and after landing. Boeing delivered 12 of the aircraft to the Air Corps and they flew 1.8 million miles without a serious mishap.
Every type of plane from small private planes to the largest jumbo jet now uses procedures for all aspects of the journey and not following them could lead to a pilot losing his license to fly (or worse).
In the same way the start-up and shutdown of a process requires standard operating procedures (SOPs) which are designed to ensure the process is started up or shut down the same way each time. However, these are sometimes ‘modified’ by experienced operators who may see a better way of doing things. In the case of both the pilot and the process operator, there are ways that these improved procedures should be evaluated and turned into current practices. In the case of an aircraft, the consequences of not doing this are obvious, but in a process plant, a tweak here and a tweak there may go unnoticed until things go wrong. As with the operation and maintenance of aircraft, the goal of operations and decision support is to capture the knowledge of the best and hopefully calmest operator on his/her best day under all conditions.
BP Texas City Revisited
Now let’s revisit the Texas City Refinery incident. In terms of a set of circumstances where the system could have potentially provided the operator with the correct information at the right time and possibly even override incorrect actions, this was the ‘perfect storm’. A start-up procedure in the control system with inputs from several key measurements may have avoided this incident or at least made things more manageable for the operator. The use of a procedure assistant could have helped unsure/overworked operators to take corrective action – or it could have taken it for them.
A procedural assistant could have given clear communications for all 3 units regarding;
- What had transpired during previous shifts
- Next steps according to approved safety procedures
- Safety hazards associated with missteps
Clearly, some mistakes were made, but as things started to go wrong several warning signs were missed due to the inexperience of the board operator and the lack of a system to interpret some clear measurement information that was available at the time. Although the high-level alarm on the bottom of the isomerization unit had been acknowledged and essentially ignored, there were enough indications from temperatures and pressures in the column that there was liquid high in the column – even above the feed tray. There was temperature information from the profile and feed tray plus indications of overheating in the stripper bottom, indicating things were not right in the column. There was pressure information and also the heat up ramp rate was too high. But – the operators probably could not have digested all this information. These could have all been part of a warning system that could have alerted the operator and even taken action to shut the unit down.
A procedure assistant could have triggered actions or prompts as a result of excessive liquid level and incorrect temperature and pressure readings. It could also have prevented the ramp rate being too high and ultimately it could also have initiated a shutdown.
Conclusion – ARE Machines Better Than Humans in a Crisis?
This paper has shown that issues often exist with humans in the workplace during times of crisis and stress. In some cases having the right human (or humans) in the right place can be beneficial – and often this is the case. But we need to be prepared for the situations where the operator gets overloaded, takes things for granted or an inexperienced operator is working at the time things start to become unstable.
In times of abnormal operations, systems are configured to produce lots of data – humans are not configured to handle or interpret it. However, when presented with the right information, in the right context, during an abnormal condition, humans are able to do things machines cannot. They can evaluate the situation and provide the “thought process” on what action to take, with the guidance and support of automated systems.