Course Handout - Transportation Disasters - Aerospace

Copyright Notice: This material was written and published in Wales by Derek J. Smith (Chartered Engineer). It forms part of a multifile e-learning resource, and subject only to acknowledging Derek J. Smith's rights under international copyright law to be identified as author may be freely downloaded and printed off in single complete copies solely for the purposes of private study and/or review. Commercial exploitation rights are reserved. The remote hyperlinks have been selected for the academic appropriacy of their contents; they were free of offensive and litigious content when selected, and will be periodically checked to have remained so. Copyright © 2001-2018, Derek J. Smith.

 

First published online08:33 BST 2nd May 2001, Copyright Derek J. Smith (Chartered Engineer). This version [2.0 - copyright] 09:00 BST 4th July 2018.

 

Transportation Disasters - Aerospace

Key to Abbreviations:

AAIB = Air Accident Investigation Branch (UK)

CRM = Cockpit Resource Management (alternatively Crew Resource Management)

CVR = cockpit voice recorder

ENFL = English not first language

FAA = Federal Aviation Authority (US)

FDR = flight data recorder (or "black box")

GPWS = ground proximity warning system, a loud cockpit audio warning that an aircraft is flying dangerously low

HSC = Health and Safety Commission

knot = speed in nautical miles per hour, the standard measure of speed for aviation and marine purposes. The difference between knots and miles per hour (mph) arises because a nautical mile is 6076 feet, whereas a land mile is only 5280 feet. To get knots from mph, multiply the former by 5280/6076 (ie. roughly 0.87), and to get mph from knots, multiply the former by 6076/5280 (ie. roughly 1.15).

NTSB = National Transportation Safety Board

Staines Air Disaster, 1972: In this incident on 18th June 1972, a BEA Trident I stalled shortly after take-off from Heathrow en route for Brussels, and crashed into a field, killing all 118 people on board. The subsequent investigation found from the FDR that the droops (leading edge wing flaps) had been raised too soon, and that the built in "stick-push" safety mechanism had been manually overridden.

The aircraft left the ground at T+44 (seconds after brake release), making 145 knots. The seating on the flight deck was Captain Key at P1 (captain, front left), Second Officer Keighley, a trainee, at P2 (co-pilot, front right), Second Officer Ticehurst at P3 (behind), and Captain Collins at P4 (a non-participating transit passenger in the jump-seat behind P1). P1 and P2 shared the tasks associated with actively flying the aircraft, while P3 monitored systems and completed any necessary paperwork. In normal circumstances, either P1 or P2 can be the "handling pilot", with transfers of control made at the discretion or for the convenience of the captain; when trainees are involved, they are also selectively made to widen the trainee's experience. There is autopsy evidence that P1 may have been having, or may just have had, or may just have been about to have a heart attack, and it is perhaps relevant that he was a senior pilot with a reputation as a stickler for discipline. He was accompanied on the fatal occasion by two younger and much less experienced crew members, and by a fourth pilot due to fly out of Belfast later that day. He had also had a shouting match with a junior colleague one and a half hours before flying, which P2 had witnessed. This is what happened next according to the control tower recordings, the aircraft's FDR, and official expert interpolation; our own comments and interpolations are either indented or set in square brackets:

T+63 (FDR): The autopilot was engaged in the pitch and roll channels.

T+67 (FDR) - A SERIOUS ERROR: The autopilot was engaged in the airspeed channel, holding as far as turbulence allowed at the then current speed of 170 knots (instead of the recommended 177 knots). There is some suggestion that P1 habitually engaged the autopilot slow and early as a matter of personal professional preference.

T+72 (FDR): The aircraft started a 20° banked turn to port [and stayed banked until T+115, when as part of the developing emergency the port wing dropped momentarily and then levelled out].

T+83 (TOWER): P1 radioed "climbing as cleared".

T+90 (Presumptive): P2 announced "ninety seconds", indicating that noise abatement procedures needed to be invoked because the aircraft was about to overfly the built-up areas of the London commuter belt. These procedures required it to raise its trailing edge flaps and throttle back to a prespecified level. The leading edge droops, however, should remain in position (ie. lowered, or "out"). The location of the flap and droop control levers can clearly be seen at points 2 and 1 (respectively) on the AAIB photograph of the Trident control panel.

[T+92 (Presumptive): P1 verbally authorised flaps up.]

T+93 (FDR): Flying at 168 knots, the flap control lever was moved from the 20 DEGREES to the ZERO DEGREES position.

This operation would normally be carried out by P2 with his left hand. The mechanical process then normally takes around 10 seconds to complete, and reduces both lift (so that the margin of safety between the actual airspeed and the safe flying speed decreases) and drag (so that the engines can hold a given speed with lower throttle settings).

T+95 (FDR): The throttle settings were eased to the prespecified level [the throttling back was not intended to slow the aircraft, merely to reduce the noise it was making]. The location of the throttle levers can clearly be seen at point 3 on the AAIB photograph of the Trident control panel.

This operation, too, would normally be carried out by P2 with his left hand, grasping and moving all three throttle levers simultaneously, and would have taken several seconds of close attention to get the settings just right. As these changes to lift, drag, and power took effect, the autopilot would simply reduce the rate of climb to compensate, which may explain why the tailplane angle channel on the FDR shows a reasonably linear progressive increase in climb rate between T+90 and T+116, at which point the autopilot was disconnected in the developing emergency.

T+98 (FDR): With the flaps by now about 50% retracted, speed had dropped to 163 knots [perhaps because P2 had over-reduced the throttle settings].

T+100 (TOWER/FDR): P1 radioed "passing fifteen hundred". Speed at this point had increased to 166 knots.

T+103 (TOWER/FDR): With the flaps by now fully retracted, speed had dropped off again, this time to 157 knots. The tower responded with clearance to continue climbing to 6000 feet.

T+108: P1 acknowledged this clearance verbally, and P2 and P3 separately and successfully recorded it in their flight logs (Stewart, 1986). [P1's acknowledgement is on the control tower tapes. P2's and P3's logs were found in the wreckage, and were presumably updated (but not necessarily simultaneously) at some point between T+108 and T+115, when the warning lights starting to flash.] P1's message was the last transmission from the aircraft. The official report uses the adjective "terse" to describe it, and, in some experts' opinions, it came after a suspiciously long delay, possibly indicating that his medical condition was actively deteriorating. [For our own part, it seems just as compellingly to indicate P1 in mid-explanation to P2, or else assisting him to set the throttles - see our closing comments re over-attentiveness.]

T+108 - T+110: Speed blipped up from 157 knots to 163 knots and then down again to 157 knots [perhaps due to turbulence]. Several reports suspect a stick shake operation (see below) may have taken place at this juncture.

T+114 - THE FIRST CRITICAL ERROR: With an airspeed of 162 knots and at altitude 1772 feet, the droop select lever was moved to "up", beginning the process of winding up the droops themselves, a process which normally takes around 8 seconds and raises the safe flying speed from 177 knots to 225 knots. This immediately started to reduce the available lift, and ought to have been counteracted by lowering the aircraft's nose, increasing power, and reselecting droops down. Movements of undercarriage, flaps, and/or droops are called "changes of configuration", and if not properly allowed for and managed may induce a "configuration stall".

One Popular Scenario: Stewart (1986) suggests that P1 responded to a stick shake at around T+110 by saying "up" (meaning the throttles), but that P2 misinterpreted this instruction and raised the droops instead. However, it usually takes considerable detective work to determine why an emergency occurred and why it was not recovered from, and the fundamental choice here, as in all transportation disasters, is between (a) human error or inadequacy, and (b) mechanical failure. We shall now consider two issues simultaneously, namely who moved the droop lever, and whether P1 was medically incapacitated or not when it was moved. For the first of these issues there are only three possible solutions, namely (1) that P1 moved the lever, (2) that P2 moved the lever, or (3) that the lever moved itself (which is not as silly as it sounds - see under Schofield Theory below). For the second issue there are only two possibilities, namely (1) that P1 was incapacitated, and (2) that P1 was not incapacitated. This gives us a total of six permutations, as follows, four of which we are going to discard for the reason stated, and two we are going to retain for detailed analysis:

    • 1st Permutation - P1 moved the lever while incapacitated. DISCARDED - P1 could have become suicidal or confused by chest pain, but of all the controls he might conceivably have lashed out at, the droop lever is probably the hardest to get at.
    • 2nd Permutation - P1 moved the lever and P1 was not incapacitated. DISCARDED - It was not P1's job to move the droop lever and if fully compos mentis he would not have done so.
    • 3rd Permutation - P2 moved the lever and P1 was incapacitated. DISCARDED - This is already the explanation of choice. P2 either made the error because P1 was becoming incapacitated and somehow confused him, or, by making the error, actually induced the incapacity.
    • 4th Permutation - P2 moved the lever by mistake and P1 was not incapacitated. ANALYSED BELOW AS SCENARIO "A".
    • 5th Permutation - The lever moved itself and P1 was incapacitated. DISCARDED - Not impossible, but requires a major double coincidence, namely a Schofield movement (see below) and a P1 incapacitation.
    • 6th Permutation - The lever moved itself and P1 was not incapacitated. ANALYSED BELOW AS SCENARIO "B".

T+115: With the droops by now about 12% retracted (safe speed about 183 knots; actual speed about 162 knots; altitude about 1778 feet), the console DROOP WARNING (flashing amber) light would have come on. This warns about droops down at too high a speed or up at too low a speed, although with one's mind firmly in <ascent mode> there would have been no doubt which message was intended.

The location of the DROOP WARNING light can clearly be seen at point 6 on the AAIB photograph of the Trident control panel. The warning systems had been designed in consultation with the Applied Psychology Unit at Cambridge University, and were based on a Central Warning System (CWS) with 22 separate displays. Each pilot had general purpose amber and red warning lights to draw their attention to the CWS. In practice, therefore, pilots would get their initial warning from the console in front of them, but would then have to look to the centre line to see what the warning was about. On this occasion, the CWS would have displayed the message DROOP OUT OF POSITION.

T+115 1/2: With the droops by now about 17% retracted (safe speed about 186 knots; actual speed about 162 knots; altitude about 1791 feet), the stick shaker stall warning system would have engaged.

The textbook action at this juncture is known as "flying a recovery". This consists of three sequential, but fundamentally separate, cognitive operations, as follows, each with its own particular problems should they go wrong:

1 - Diagnosis Phase: With his/her mind now in <ascent mode with possible stall emergency>, the handling pilot must make an emergency diagnosis of the reason for the warning. If a stall warning is genuine it will be because the speed is too low given the attitude and configuration of the aircraft, which means checking airspeed, pitch, roll, and configuration, and deciding what - if anything - has gone wrong. If the diagnostic information confirms the warning, then the cognitive mode should immediately be changed to <fly stall recovery>, and the whole process will have taken not much more than a second. However, if the diagnostics are in any way unclear or inconsistent, then the emergency instantly becomes a <stall crisis> in which the reliability of every rule, control, subsystem, and display is in question, and in which survival becomes the only consideration.

2 - Planning Phase: Having entered <fly stall recovery> mode, the handling pilot now has to plan what s/he is going to do about it, and in what order. With most flight emergencies, of course, such plans have been well rehearsed in advance and will require little or no conscious thought (like braking if a child runs into the road in front of you). We have dealt elsewhere (Smith, 1997) with the process of micro-preparation for the execution of skilled motor movements, so suffice to note that this takes of the order of 170-210 milliseconds, depending on how many specific behaviours are being planned.

3 - Recovery Phase: With an appropriate plan of action in mind, the handling pilot now starts to execute the sequence of motor behaviours to bring it to fruition. With the case in question, this would be to lower the nose of the aircraft, increase the power, neutralise any roll, and change the configuration - but not necessarily in that precise order.

Here is how these three decision making stages might have worked themselves through in the two scenarios previously mentioned:

SCENARIO A - P2 MOVED THE DROOP LEVER BY MISTAKE; P1 HANDLING

In this scenario, P1 remains the handling pilot, but will not have seen P2 move the droop lever (or else he would have responded, and probably quite forcefully, at T+114):

1 - Diagnosis Phase: As an experienced pilot, P1's gaze will move in quick and accurate succession from the console warning light to the CWS diagnostic message (ie. DROOP OUT OF POSITION), and then to the droop control lever itself. If said lever shows UP, then he need look no further: he may accept the stall warning as genuine and enter the standard recovery mode.

2 - Planning Phase: The planning phase on this occasion would probably be as follows:

DROOP LEVER DOWN WITH RIGHT HAND

NOSE DOWN SIMULTANEOUSLY WITH LEFT HAND

THROTTLES UP WITH RIGHT HAND

DISCONTINUE TURN

3 - Recovery Phase: P1 now has to execute a thoroughly rehearsed and automatic series of recovery movements. However, the execution would have been overtaken by events at T+116, namely the pre-empting stick push, so the NOSE DOWN element would actually be done for him.

SCENARIO B - THE DROOP LEVER MOVED ITSELF; P1 HANDLING

In this scenario, too, P1 remains the handling pilot, but the major cause - the retracting droop - is not known to any of the flight crew because the lever moved itself:

1 - Diagnosis Phase: Same as Scenario A, but with a possible additional glance at the droop angle indicator after finding the lever UP.

2 - Planning Phase: Same as Scenario A.

3 - Recovery Phase: Same as Scenario A.

T+116: With the droops by now about 25% retracted (safe speed about 189 knots; actual speed about 162 knots, altitude about 1794 feet), the stick pusher stall recovery system operated, pushing the control column forward for about a second and then releasing it. this would have been accompanied by the console STALL RECOVERY (amber) light and a CWS STALL RECOVERY message. This action forced the nose of the aircraft downwards, but it would also have automatically disengaged the autopilot, giving an AUTOPILOT DISCONNECT (red flashing, with clang! clang! headset audio) warning to both P1 and P2.

There is then a three-second period in which the FDR shows the nose of the aircraft pitched - quite correctly - downwards. Coming off the top of its climb, it reached its maximum height of about 1800 feet at about T+117.

T+117 - THE SECOND CRITICAL ERROR: The control column was pulled back again, even though this meant fighting a built-in stall protection mechanism. As a result, by T+120 the aircraft had levelled out again. As a result, there was a second stick push at T+124, and a third at T+126, and on both these occasions the control column was pulled back afterwards. Finally, having inexplicably fought the stick push system three times in succession, the stall recovery system was manually inhibited at T+128.

T+130: The speed at this juncture was 193 knots, the highest value achieved on the entire flight, but well below safe flying speed. With the control column back, speed now fell away precipitously and the aircraft pitched nose-up, irrecoverably out of control.

T+150: The aircraft hit the ground, having been in the air less than two minutes.

If we examine this timeline, we see that the critical decisions were all concentrated in the 13 seconds between T+114 and T+127, and that the chances of successful recovery began to erode very rapidly after about T+120. This means that the crew had a mere 6 seconds in which to respond correctly or die, and had to spend the first two or three of those priceless seconds frantically checking instruments and controls. The incident is therefore an example of just how confusing it must be to have a sudden flurry of warning messages from both colleagues and systems, and to have only a second or two to diagnose the problem and plan a suitable remedy.

[In printing off a draft of this material on 24th April 2001 my printer flashed its OUT OF PAPER light. I turned and checked the INK level! Why? Because I always keep the paper well topped up, but use a lot of ink. It took me about 20 seconds to locate my error even though I had only two lights to choose from, and am a skilled printer operator! Had I been on Papa India that would have been enough time to kill me more than three times over!]

Here is the official list of five immediate causes of the crash:

(1) Captain Key failed to achieve and maintain adequate speed [ie. there was little or no margin for error - defence in depth]; (2) the droops were retracted some 60 knots too soon; (3) the crew failed to monitor the speed errors and to observe the movement of the droop lever; (4) the crew failed to diagnose the reason for the stick-push warning; (5) the crew cancelled ("dumped") the stall recovery system

And here is the official list of seven underlying causes of the crash:

(1) Captain Key's underlying heart condition lead to lack of concentration and impaired judgement; (2) Some distraction of P3 by the presence of the transit captain on the flight deck; (3) lack of training directed at "subtle" pilot incapacitation, that is to say, subacute and not generally apparent considerations such as that which might have been troubling Captain Key; (4) lack of experience in P2; (5) lack of knowledge in the crew relating to a "change of configuration stall"; (6) lack of knowledge that a stick-push would follow such an imminent stall; (7) lack of any mechanism to prevent retraction of the droops at too low a speed.

And here are some additional lines of enquiry pursued during the investigation and material to the recommendations made as a result of it:

  • Rostering Procedures: The seat allocation policy was to seat the training pilot at P1 and the trainee - the least experienced pilot - in P2. Such trainees were known as "P2-only" pilots because they lacked the experience necessary to do the P3 monitoring duty.
  • Medical Screening Procedures: The problem with the medical screening regime was that it was not rigorous enough to detect the particular heart condition Captain Key was suffering from (Stewart, 1986). This regime hitherto required regular ECG checks only at rest, but was subsequently tightened up by including ECG testing under stress as well.
  • Procedural Black Hole (1): There was therefore a serious procedural black hole between the seat allocation policy when trainee pilots were on a flight and the medical screening of pilots. What was P3 to do if P1 was suddenly incapacitated and P2 was faced with a technical problem outside his experience! P2 and P3 would then have to change places, which, with removal of belts and headsets, would take at least 6 seconds [desk simulation by author and colleagues, 25th April 2001]. Since the trainee on this occasion had already been assessed during simulator sessions as "slow to react in an emergency" (Stewart, 1986), a P1 heart attack would have been the last thing the aircraft needed (and may have been the last thing it got).
  • The "Schofield Theory": This came about as the result of a technical report two years prior to the Staines accident in which a Trident pilot noted how a particularly worrying combination of mechanical movements could effectively disable the stick pusher system. Here is an extract from the relevant section of the AAIB report:

"On experiment I find that if the droop lever is moved to the down position and the flap lever is moved before the droop lever is fully down, it is possible for the droop lever to become locked on the baulk only and not on the normal lock down. All appears normal [] and it is not until the flap lever is selected up that anything untoward occurs. At this time the baulk is removed and the droop lever will return to the up position with the flap lever, the airspeed being some 50 knots below the correct speed for droop retraction." (AAIB, 1974, para. B.(vi).b2.)

In the event, however, the AAIB was satisfied this this could not have happened and in fairness it seems to make little difference to the success or otherwise of the emergency decision making as analysed in Scenarios A and B above. 

  • Design error: The purpose of having droops in the first place was to produce additional lift at low speeds, given aircraft weight, wing size, and engine power, and it was literally a matter of life and death that they should remain down until the aircraft reached 225 knots. Retraction below that speed rendered the aircraft more or less immediately unflyable. Yet nothing physically prevented the droop lever being raised after the flaps had been retracted.
  • Training Error: The AAIB report noted that prior to the accident no British airline trained for the eventuality of a change of configuration stall.
  • Procedural Black Hole (2): There is also a procedural hole between the cockpit procedures as designed for and the cockpit procedures as actually practised. One of the reasons BEA had not insisted on a physical droop-airspeed interlock at the development stage, was that it was only ever a few seconds of continuous acceleration between raising the flaps and raising the droops, BUT WHEN NOISE ABATEMENT PROCEDURES WERE INVOKED THIS CEASED TO BE TRUE, AND THE CONTINUITY OF THE COGNITIVE MODE WAS BROKEN.

RESEARCH ISSUE: There is little in the psychological literature on the normal recovery curve for extreme emotional states, nor how this might vary with personality. Captain Key, for example, had apologised to the colleague he had shouted at, but we do not know what his residual thoughts and feelings were. Nor were they necessarily negative, because it is recognised that some anger states result in a great deal of post-orgasmic guilt (eg. Warneka, 1998), so it could be that P1 was in fact being over-attentive towards P2 and deluging him with overly helpful explanations and demonstrations, rather than harumphing him. Which would explain the radio response delay at T+108.

The accident is significant because it forced the introduction of CVRs amongst UK airline operators, and is one of a series of accidents in which poor communication between captain and crew - specifically between an experienced P1 and a nervous P2 - was possibly a major factor. For further general details click here, and for the official AAIB report click here. 

References

Air Accidents Investigation Branch (1973). Aircraft Accident Report No. 4/73. London: HMSO. [Available online.]

Smith, D.J. (1997). Human Information Processing. Cardiff: UWIC. [ISBN: 1900666081]

Stewart, S. (1986). Air Disasters. London: Arrow.

Warneka, T.H. (1998). The experience of anger: A phenomenological study. Dissertation Abstracts International: Section B: The Sciences and Engineering, 58(12-B June 1998):6863.

Paris DC10 Air Disaster, 1974: In this incident on 3rd March 1974, a Turkish Airlines DC10, crashed shortly after take-off killing 345 people. The immediate cause of the accident was the failure of a baggage handler to seal the cargo door properly. However, the fact that this sort of error should have been allowed to go undetected in the first place, and the relative severity of the resulting damage, was due to a string of earlier bad design decisions (Dixon, 1994).

Tenerife Air Disaster, 1977: In this incident on 27th March 1977, a KLM Royal Dutch Airlines 747 was taking off in fog from Los Rodeos Airport, Tenerife, having forgotten that a Pan-Am 747 was taxiing towards it on the same runway. The Pan-Am started to pull off the runway at the last moment, saving 77 of its passengers and crew, but the resulting near head-on collision nevertheless has the dubious honour of being the world's worst ever air disaster, with 583 killed. The subsequent NTSB, Spanish, and Dutch investigations failed to agree whether the accident was down to pilot error on the part of the KLM crew, or mis-instruction on the part of the control tower. There was certainly some suspicion that the KLM captain was eager to get away, and a deal of confusion in the interaction between both planes and the tower [for a transcript of the CVR, click here]. There was also some hint that the junior members of the flight crew were less than convinced that it was clear to take off, but only felt confident enough to hint at their reservations rather than stating them clearly. [For further general details, click here.]

Flight Palm 90, 1982: In this incident on 13th January 1982, an Air Florida 737 was unable to develop enough lift during a snow-bound take-off from Washington National Airport, and crashed into the ice-covered River Potomac. Five survivors were rescued from the river, but the remaining 75 passengers were killed. The subsequent NTSB investigation noted that the engines had not been set to full power because of false readings from iced up sensors. They also noted a deal of nervous banter on the CVR prior to take-off, and in their final judgement noted that the de-icing equipment had not been switched on. This incident is significant because it is one of a series including Staines (1972) and Tenerife (1977) where communication on the flight deck, especially from co-pilot to captain was less than totally effective. There was a distinct hint that the second in command on this occasion was less than happy with the decision to go ahead. [For further details and an audio extract from the CVR click here, and for a partial CVR transcript click here.]

Osaka Flight 123 Air Disaster, 1985: In this incident on 12th August 1985, a Japan Airlines 747 en route from Tokyo to Osaka lost its tail fin and hydraulic systems after its rear pressure bulkhead blew out. Out of control, it crashed into the Mount Osutaka, north west of Tokyo, killing all but 4 of the 524 people on board (making it the world's largest single aircraft air disaster). The cause of the accident was traced to a faulty repair to said bulkhead in 1978 after an earlier heavy landing. The repair design was sound, but its execution was defective (a single row of rivets was used instead of the three rows called for in the repair specification).  However, the incident is not just a good example of the need for thorough quality control procedures during the engineering process, but also of how not to manage a disaster once it has happened. Here are three of the critical issues:

  • Misplaced National Pride (1): By chance, a US military helicopter was close by the accident site, and was on site in only a few minutes. However it was immediately asked to clear the site to allow Japanese rescue units access (but see next).
  • Disorganisation: Nevertheless, the Japanese rescue units did not arrive for a full 14 hours, leaving the four survivors (and perhaps others) untreated and exposed!
  • Misplaced National Pride (2): The confusion was then confounded by the fact that the Japanese accident investigation authorities would not approve NTSB representative George Seidlein to visit the wreck for a further 48 hours, which - had it been a fleetwide problem (which fortunately it was not) - would have needlessly prejudiced the safety of 747 passengers worldwide. 

[For further details, click here.]

Challenger Space Shuttle Disaster, 1986: In this disaster on 28th January 1986, NASA Flight 51-L - the Challenger space shuttle - was destroyed in a total loss explosion 73 seconds into its mission. The immediate cause of the explosion was a failure of the O-ring seals between the lower two segments of its starboard solid rocket booster (SRB). All seven members of the crew were killed. A by-the-millisecond account of the flight is available on the NASA website.

It emerged early in the accident investigations that there had been a major problem with the NASA decision making process prior to the launch. The SRBs were manufactured by Morton Thiokol Inc., Utah, (henceforth referred to as Thiokol), and it was their long-standing and clearly expressed engineering opinion that the specification for the SRB rendered them unsafe for use at temperatures below 53º Fahrenheit. The O-ring segment-to-segment seals simply lacked the necessary resiliency to seal safely at lower temperatures than this. Yet the predicted temperature for the morning of the launch was 25-29º. [In the event, the ambient air temperature was 36º, and the temperature on the shady side of the SRB (the point where the fatal rupture occurred) was estimated at 28º plus or minus 5º.] But the launch had nevertheless been authorised.

The enquiry report concluded:

"1. The Commission concluded that there was a serious flaw in the decision making process leading up to the launch of flight 51-L. A well structured and managed system emphasising safety would have flagged the rising doubts about the Solid Rocket Booster joint seal. Had these matters been clearly stated and emphasised in the flight readiness process in terms reflecting the views of most of the Thiokol engineers and at least some of the Marshall engineers, it seems likely that the launch of 51-L might not have occurred when it did.

2. []There was no system which made it imperative that launch constraints and waivers of launch constraints be considered by all levels of management.

3. The Commission is troubled by what appears to be a propensity of management at Marshall to contain potentially serious problems and to attempt to resolve them internally rather than communicate them forward. This tendency is altogether at odds with the need for Marshall to function as part of a system working toward successful flight missions, interfacing and communicating with the other parts of the system that work to the same end.

4. The Commission concluded that the Thiokol management reversed its position [] at the urging of Marshall and contrary to the views of its engineers in order to accommodate a major customer."

[For the fuller text, click here, and follow the menu.]

And here are the main lessons of this accident:

  • Overambitious Targets: The Commission remarked that the long term goal of the space shuttle programme to achieve two launches per month was unsupportable. NASA had had difficulties delivering nine launches in 1985, and had planned 14 for 1986. Spare parts were kept in short supply as a matter of budgetary policy.
  • Desire to Please: Numerous late payload changes were allowed, making for inefficient use of available resources.
  • Testing Error: There had been no requirement to test the SRBs in the configuration they would be in in flight. They were tested horizontally, not vertically.
  • Flawed Defect Prioritisation: The O-ring erosion risk had been fully documented in August 1985, but no remedial action was taken.
  • "Go Fever": This is especially dangerous in potentially high cost situations, where it is the low probability events which require the greatest attention.
  • Ineffective Fault Report Procedures: The Commission remarked on a lack of problem reporting requirements, inadequate trend analysis, misrepresentation of criticality, and lack of involvement in discussions, perhaps due to reductions in the workforce.

 For further general commentary, click here.

Habsheim A320 Air Disaster, 1988: In this accident on 26th June 1988, an A320 Airbus attempted a low and slow display flight across the Habsheim airshow in June 1988, but was unable at the end of the circuit to gain sufficient speed to clear a line of trees across the end of the runway. This was because the pilot - knowing that the A320 had a highly computerised flight deck - had purposefully disabled the GPWS. "He was so used to the system keeping the aircraft safe," says Race (1990), "that he felt it could wave a magic wand and get him out of any problem." It could not. The pilot's call for full power came too late and the engines were still accelerating when the aircraft hit the trees. Race's conclusion is ominous - complex systems routinely encourage overdependence on the part of their users. "The better our systems protect our clients," he writes, "the more likely it is that when a situation occurs outside the system's competence [] the client will make a hash of things" (p15). This incident provides a good example of both a mode error, and of the problems encountered in producing safety critical software. [For an introduction to the cognitive science of mode error, click here.]

Race, J. (1990). Computer-encouraged pilot error. Computer Bulletin, August 1990, 13-15.

USS Vincennes Air Disaster, 1988: Because this is essentially a military command and control disaster, it is dealt with in the section on military systems failures. [To be transferred, click here.]

Kegworth Air Disaster, 1989: In this incident on 8th January 1989, a British Midland 737 en route from Heathrow to Belfast suffered a turbine failure in its port engine and was diverted for an emergency landing at East Midlands Airport, near Nottingham. What happened next is an excellent example of how cognitive failures can crash an aircraft.

The aircraft took off at 1952hr, and climbed normally for 13 minutes before the engine failed. This event was accompanied by smoke and vibration, and was wrongly judged to have taken place in the starboard engine. The AAIB report blamed this misjudgement on a combination of vibration, noise, and smoke outside the flight crew's training and experience. The critical factor here was that on older 737s the air conditioning intakes had been situated on the starboard side. However, the air conditioning system had been redesigned on newer aircraft. It now included a port side air intake, but that fact had gone largely unreported. When the crew smelled smoke, therefore, their immediate - but false - presumption was that it must be the starboard engine which had the problem.

Both engines were then throttled back and the worst of the smoke and vibration died away. In fact, this was coincidental - it was just that the damaged port engine chose that time to stabilise itself temporarily. Yet it helped to convince the crew that their initial diagnosis had been correct. The only inconsistency was that the port engine was showing vibration on the cockpit vibration gauges, whereas the starboard was not. However, that data was disregarded as unreliable, because the vibration gauges on older 737s had acquired a "common knowledge" reputation for being unreliable. Flight crew had grown accustomed to not relying on them. And again, when the gauges had been upgraded on newer aircraft, and now told the truth, that fact had gone largely unreported. The good starboard engine was then shut down altogether, and the port engine continued to turn, but under a slightly reduced load since the aircraft was now descending.

The flight crew's attention was now entirely taken up with weather reports, approach instructions, and carrying out the checklist for emergency landings with one engine. They also took time to make an announcement over the cabin address system to explain what had happened - that the starboard engine had been shut down, and that all was in hand to make an emergency landing. This announcement greatly puzzled and concerned the many passengers who had seen smoke and debris coming from the port engine, but they said nothing.

And then, when making the final approach, the stricken port engine failed completely. At this point, the aircraft was still two and a half miles from touchdown, and had already descended to 900 feet. There was accordingly insufficient time to restart the starboard engine, and 59 seconds later the unpowered aircraft hit the ground just short of the M1 motorway, bounced across it, and came to a halt on the embankment on the far side. Out of 126 passengers and crew on board, 47, all passengers, lost their lives.

The AAIB report also listed:

  • the flight crew reacted prematurely to the initial engine problem, and in a way contrary to training
  • they did not assimilate the indicators on the engine instrument display before throttling back the starboard engine.
  • the fact that the port engine ceased to vibrate coincidentally with throttling back the starboard engine confirmed an erroneous belief.
  • None of the main cabin witnesses to port engine problems were consulted from, or reported spontaneously to, the flight deck.

As a result of this incident, pilot training has been extended to include major specification enhancements and defect corrections, so that "known defects" are explicitly removed, and cabin crew are required to be consulted for their perspective on events. [For further details, click here .]  [For an introduction to the cognitive science of situational awareness, click here.]

Sioux City Flight 232 Air Disaster, 1989: In this incident on 19th July 1989, a United Airlines DC10 en route from Denver to Chicago suffered a turbine failure in its tail engine, as a result of which all three sets of hydraulic control lines to the rudder and elevators were severed. The plane was then essentially unsteerable: all it could do was go faster and slower (and thus up and down). However, the residual throttle control did allow power to be reapportioned between the two wing engines, thus giving a very rudimentary ability to power steer, and this allowed the flight to divert for an emergency landing at Sioux City, Iowa. The crew were also lucky to have an experienced pilot amongst the passengers, who volunteered to take over the throttle steering while the regular crew went about the more urgent business of arranging for the crash landing. Nevertheless they were faced with solving a problem for which there existed no laid down procedures, because neither Boeing nor the airline had predicted loss of all three hydraulic systems.

The combined crew of four had no less than 103 years flying experience between them, and used it to the very best effect! They also benefited from a United Airlines training programme called Cockpit Resource Management (CRM) introduced in 1980 to improve effective emergency problem solving. By pooling their experience the crew managed to stabilise the aircraft, descend, and line it up with one of the runways at Sioux City. Even so, the final touchdown - without the benefit of flaps - was at 215 knots instead of the normal 140 knots, and in the resulting crash landing, 107 passengers were killed. The flight crew were among the 189 who survived. [For further details, click here and here. Cockpit Resource Management may also be referred to as Aircrew Coordination Training (ACT) or Crew Resource Management.]

Long Island Flight 052 Air Disaster, 1990: In this incident on 25th January 1990, a Columbian Avianca Airlines 707 ran out of fuel on its final approach to Kennedy International Airport, New York, and crashed killing 73 of its 158 passengers. Investigations revealed that the aircraft had informed ATC that it was low on fuel, but had failed to declare a formal emergency when its landing was further delayed by bad weather. Analysis of the CVR indicates a lot of discussion amongst the flight crew on whether or not ATC fully appreciated the seriousness of their plight, but nevertheless communication with ATC was only occasional and disturbingly unassertive. The FAA subsequently suggested greater standardisation of vocabulary and phraseology between flight crew and controllers to minimise the likelihood of further misunderstandings of this sort, especially where the flight crew were ENFL. [For general details click here, and for the CVR transcript click here.] 

Bangalore Air Disaster, 1990: In this incident on 14th February 1990, an India Airlines A320 crashed on final approach to Bangalore airport, having undershot the runway by half a mile, despite perfect visibility. All 90 persons on board were killed. The probable cause of the disaster was the fact that the automated control systems had been accidentally set to <OPEN DESCENT> mode, which cuts the engines to idle and then automatically loses height to maintain speed. The crew were unable to diagnose their error quickly enough to take corrective action. This incident is now regarded as one of the classic examples of "mode error", one of the main problems encountered when automating systems control. [For an introduction to the cognitive science of mode error, click here.]

Los Angeles Air Disaster, 1991: In this incident during the night of 1st February 1991, an air traffic controller at Los Angeles International Airport (LAX) cleared a US Air 737 to land, having forgotten that a few minutes previously she had cleared a Skywest Metroliner to take off from the same runway. Due to contributory lack of radio monitoring on the part of the two flight crews involved, this error was not detected, and moments after the 737 touched down it collided with the stationary Metroliner, killing all 12 people on board. Severely damaged, the 737 then careered off the runway and hit a building, killing 22 of its own passengers [picture]. The NTSB report blamed "the failure of the Los Angeles Air Traffic Facility Management to implement procedures that provided redundancy [] and the failure of the FAA Air Traffic Service to provide adequate policy direction and oversight to its air traffic control facility managers. These failures created an environment in the Los Angeles Air Traffic Control tower that ultimately led to the failure of the local controller 2 (LC2) to maintain an awareness of the traffic situation, culminating in the inappropriate clearances []. Contributing to the cause of the accident was the failure of the FAA to provide effective quality assurance of the ATC system." (NTSB/AAR-91/08.) [For an introduction to the cognitive science of situational awareness, click here.]

Dahran Scud Attack, 1991: In this incident on 25th February 1991, a Raytheon MIM-104 surface-to-air missile (better known as a "Patriot") [pictures] was fired from Dahran, Saudi Arabia, to defend against an incoming Scud missile. The interception failed, and, by a fluke of targeting, the Scud hit a US barracks, killing 28 servicemen [details]. Failures such as this constitute operationally highly sensitive information, and so several deliberately conflicting cover stories were immediately released [example], but by 1992 investigations had revealed that the real cause of the incident was a "rounding error" in the missile's floating point arithmetic software (Skeel, 1992/2003 online). The US government's General Accounting Office reported that "the Patriot's weapons control computer used in Operation Desert Storm is based on a 1970s design with relatively limited capability to perform high precision calculations [.....] the conversion of time from an integer to a real number cannot be any more precise that 24 bits" (GAO Report B-247094, 1992/2003 online). [For more on the technicalities of rounding error in floating point arithmetic, click here and go to Section 1.3.]

Strasbourg A320 Air Disaster, 1992: In this incident on 20th January 1992, an Air Inter A320 crashed on a night-time approach to Strasbourg airport, after the crew had failed to detect they were descending too rapidly. As with the Bangalore disaster two years previously (above), it is possible that the inherent risks of the A320's <DESCENT> mode had not been fully appreciated.  [For an introduction to the cognitive science of mode error, click here.]

Nagoya A320 Air Disaster, 1994: In this incident on 26th April 1994, a China Airlines A300 crashed on approach to Nagoya airport, following a clash of wills and understanding between the human crew and the autopilot. The autopilot won. All but 15 of the 279 people on board died. [Report] [For an introduction to the cognitive science of mode error, click here.]

Flight ZD576 Helicopter Air Disaster, 1994: This was a total loss of a military Chinook helicopter on 2nd June 1994. The aircraft crashed into a hillside on the Mull of Kintire, Scotland, en route from Northern Ireland. The 4 crew and 25 passengers (all police and intelligence service personnel) were killed. There has been considerable debate over the cause of this incident with concern over the safety of the new engine control software, FADEC. This system controls the flow of fuel to the aircraft's two engines, but is so complicated that it is virtually impossible to simulate all possible operational conditions. Along with the Habsheim (1988), Strasbourg (1992), Nagoya (1994), Fox River Grove (1995), and Ariane (1996) disasters, this incident is another good example of the problems engineering safety critical software. 

Ariane 5 Flight 501, 1996: This was a total loss rocket launch on 4th June 1996. The vehicle veered out of control 36.7 seconds after lift-off, due to a failure of the back-up and main guidance systems. This put the steering hard over in an attempt to cure a defect which did not in fact exist. This, in turn, overstressed the airframe and caused the booster rockets to start to break away, and the resulting self-destruct explosion destroyed the vehicle. The guidance failure was caused by a software incompatibility: programs written for the Ariane 4 rocket series contained code not needed by, and catastrophically not compatible with, the Ariane 5 series.

Paris Concorde Air Disaster, 2000: In this incident on 25th July 2000, an Air France Concorde ruptured a fuel tank during take-off and crashed in flames a few seconds. See the referenced website for the principal facts and figures.

THIS SPACE RESERVED. BUT NOT FOR YOU, WE HOPE .....

 

[Home]

[Disasters Database Menu]