What should WE learn from Boeing’s two-crash tragedy?

The case of the two crashes of Boeing’s 737 MAX aircraft, less than six months apart, in 2018 and 2019, involves three big management failures that deserve to be learned, so some effective lessons can be internalized by all management.  The story, including some of the truly important detailed facts, is shown in the recent launch of “Downfall: The Case Against Boeing”, a documentary by Netflix.

We all demand 100% safety from airlines, and practically also from every organization: never let your focus on making money cause a fatal flaw!

However, any promise for 100% safety is utopian.  We can come very close to 100%, but there is no way to ensure that fatal accidents would never happen.  The true practical responsibility is made by two different actions:

  1. Invest time and effort to put protection mechanisms in place.  We in the Theory of Constraints (TOC) call them ‘buffers’, so even when something goes wrong, no disaster would happen.  All aircraft manufacturers, and all airlines, are fully aware of the need.  They include protection mechanisms, and very detailed safety procedures, into the everyday life of their organizations.  Due to the many safety procedures, any crash of an aircraft is the result of a combination of several things going wrong together and thus is very rare. Yet, crashes sometimes happen.
  2. If there is a signal that something that shouldn’t have happened has happened, then a full learning process has to be in place to identify the operational cause, and from that identify the flawed paradigm that let the operational cause happen.  This is just the first part of the learning. Next is deducing how to fix the flawed paradigm without causing serious negative consequences.  Airlines have internalized the culture of inquiring every signal that something went wrong.  Still, such a process could and should be improved.

I have developed a structured process of learning from one event, now entitled as “Systematic Learning from Significant Surprising Events”.  TOCICO members could download it from the TOCICO site of New BOK Papers, the direct link is https://www.tocico.org/page/TOCBodyofKnowledgeSystematicLearningfromSignificantSurprisingEvents

Others could approach me and I’ll gladly send the paper to them.

Back to Boeing.  I, at least, don’t think it is right to blame Boeing for what led to the crash of the Indonesian aircraft in October, 29th, 2018.  All flawed paradigms look as if everybody should have recognized the flaw, but this is inhuman.  There is no way for human beings to eliminate all their flawed assumptions.  But it is our duty to reveal the flawed paradigm once we see a signal that points to it.  Then we need to fix the flawed assumption, so the same mistake won’t be repeated in the future.

The general objective of the movie, like the habit of most public and media inquiries, is to find the ‘guilty party that is responsible for so-and-so many deaths and other damage.’  Boeing top management at the time was an easy target given the number of victims.  However, blaming top management because they were ‘greedy’ will not prevent any safety issue in the future.  I do expect management to strive to make more money now, as well as in the future.  However, the Goal should include several necessary conditions, and refusing to take a risk for a major disaster is one of them.  Pressing for very ambitious short time of development, and launching a new aircraft without the need to train the pilots, who are trained with the current models, are legitimate managerial objectives.  The flaw is not being greedy, but failing to see that the pressure might lead to cutting corners and to prevent employees from raising a flag that there is a problem.  Resolving the conflict between ambitious business targets and dealing with all the safety issues is a worthy challenge that needs to be addressed.

Blaming is a natural consequence of anything that went wrong.  It is the result of a widely spread flawed paradigm, which pushes good people to conceal the facts that might lead to their involvement with highly undesired events.  The fear is that they will be blamed and their career will end.  So, they do their best to prevent revealing their flawed paradigms.  The problem is: other people still use the flawed paradigm!

Let’s see what were the critical flawed paradigm(s) that caused the Indonesian crash.  Typically, two different combined flaws led to the crash of the Indonesian plane.  A damaged sensor sent wrong data to a critical new automatic software module, called MCAS, which was designed to fix a problem of too high angle of rising.  This was a major technical flaw of failing to consider the case that if the sensor is damaged then MCAS would cause a crash.  The sensors stick out of the airplane body, so hitting a balloon or a bird can destroy the sensor, and this makes the MCAS system deadly.

The second flaw, this time managerial, is deciding not to let the pilots know about the new automatic software. The result was that the Indonesian pilots couldn’t understand why the airplane is going down.  As the sensor was out of order, many alarms were heard filled with wrong information and the stick shaker on the captain’s side has been loudly vibrating.  To fix that state the pilots had to shut off the new system, but they didn’t know anything about MCAS and what it was supposed to do.

The reason for refraining from telling the pilots about the MCAS module was the concern that it’d trigger mandatory pilot training, which would limit the sales of the new aircraft.  The underlining managerial flaw was failing to realize how that lack of knowledge could lead to a disaster.  It seems reasonable to me that the management tried their best to come up with a new aircraft, with improved performance, and no need for special pilot training.  The flaw was that being unaware of the MCAS module could lead to such a disaster.

Once the first crash happened, and the technical operational cause revealed, the second managerial flaw took place.  It is almost natural after such a disaster to come up with the first possible cause that is the least damaging to the Management.  This time it was easy to claim that the Indonesian pilot wasn’t competent.  This is an ugly, yet widely spread, paradigm of putting the blame on someone else.  However, facts coming from the black box eventually told the true story.  The role of MCAS in causing the crash was clearly discovered, and the role of the pilots not having any prior information about it.

The safe response to the crash should have been grounding all the 737 MAX aircraft until a fix for MCAS is ready and proven safe.  It is my hypothesis that the key management paradigm flaw, after establishing the cause for the crash, was highly impacted by the fear of being blamed for the huge cost of grounding all the 737 MAX airplanes.  The public claim from Boeing top management was: “everything is under control”, a software fix would be implemented in six weeks, so there is no need to ground the 737 MAX airplanes.  The possibility that the same flaw of MCAS would lead to another crash was ignored in a way that could be explained only by top management being under huge fear for their career. It doesn’t make sense that the reason for ignoring the risk was just to reduce the costs of compensating the victims, by still putting the responsibility on the pilots.  My assumption is that the top executives of Boeing at the time were not idiots. So, something else pushed them to take the gamble of another crash.

Realizing the technical flaw forced Boeing to reveal the functionality of MCAS to all airlines and pilot unions.  It included the instruction that when the MCAS goes wrong to shut-off the system.  At the same time, they published that a software fix to the problem would be ready in six weeks, an announcement that was received with a lot of skepticism.  Due to these two developments Boeing formally refused to ground the 737 Max aircraft.  When directly asked by a member of the Allied Pilot Association, during a visit of a group of Boeing managers (and lobbyists) to the union, the unbelievable answer was: No one has concluded that this was the sole cause of the crash!  In other words, until we have full formal proof, we prefer to continue business as usual. 

Actually, the FAA, the Federal Aviation Administration, issued a report assessing that without a fix there will be a similar crash every two years!  This means there is 2-4% chance that a second crash could happen within one month!  How come the FAA has allowed Boeing to let all the aircraft fly?  Did they carry out an analysis of their behavior when the second crash occurred after five months without a fix of the MCAS system?

Another fact mentioned in the movie is that once the sensors are out-of-order and the MCAS points the airplane down, the pilots have to shut off the system in 10 seconds, otherwise the airplane is doomed due to the speed of going down!  I wonder whether this recognition has been discussed during the inquiry into the first crash.

When the second crash happened Boeing top management went into fright mode, misunderstanding the reality that the trust of the airlines, and the public, in Boeing, has been lost. In short: the key lessons from the crash and after-crash pressure were not learned!  They still didn’t want to ground the airplanes, but now the airlines took the initiative and one by one decided to ground them.  A public investigation was initiated and from Boeing Management Team perspective: hell broke loose.

The key point for all management teams: 

It is unavoidable to make mistakes, even though a lot of effort should be put trying to minimize them.  But it is UNFORGIVEN not to update the flawed paradigms, causing the mistakes.

If that conclusion is adopted, then a systematic method for learning from unexpected cases should be in place, with the objective of “never to repeat the same mistake”.  Well, I cannot guarantee it’ll never happen, but most of the repeats can be avoided.  Actually, much more can be avoided, as once a current flawed paradigm is recognized and the paradigm updated, the derived ramifications can be very wide.  If the flawed paradigm is discovered from a signal that, by luck, is not catastrophic, but surprising enough to initiate the learning, then huge disastrous consequences are prevented and the organization is much more secure.

It is important for everyone to identify, based on certain surprising signals, flawed paradigms and update them.  It is also possible to learn from other people, or organizations, mistakes. I hope the key paradigm of refusing to see a problem already visible, and trying to hide it, is now well understood not just within Boeing, but within every top management of any organization.  I hope my article can help to come up with the proper procedures for learning the right lessons from such events.