What should WE learn from Boeing’s two-crash tragedy?

The case of the two crashes of Boeing’s 737 MAX aircraft, less than six months apart, in 2018 and 2019, involves three big management failures that deserve to be learned, so some effective lessons can be internalized by all management.  The story, including some of the truly important detailed facts, is shown in the recent launch of “Downfall: The Case Against Boeing”, a documentary by Netflix.

We all demand 100% safety from airlines, and practically also from every organization: never let your focus on making money cause a fatal flaw!

However, any promise for 100% safety is utopian.  We can come very close to 100%, but there is no way to ensure that fatal accidents would never happen.  The true practical responsibility is made by two different actions:

  1. Invest time and effort to put protection mechanisms in place.  We in the Theory of Constraints (TOC) call them ‘buffers’, so even when something goes wrong, no disaster would happen.  All aircraft manufacturers, and all airlines, are fully aware of the need.  They include protection mechanisms, and very detailed safety procedures, into the everyday life of their organizations.  Due to the many safety procedures, any crash of an aircraft is the result of a combination of several things going wrong together and thus is very rare. Yet, crashes sometimes happen.
  2. If there is a signal that something that shouldn’t have happened has happened, then a full learning process has to be in place to identify the operational cause, and from that identify the flawed paradigm that let the operational cause happen.  This is just the first part of the learning. Next is deducing how to fix the flawed paradigm without causing serious negative consequences.  Airlines have internalized the culture of inquiring every signal that something went wrong.  Still, such a process could and should be improved.

I have developed a structured process of learning from one event, now entitled as “Systematic Learning from Significant Surprising Events”.  TOCICO members could download it from the TOCICO site of New BOK Papers, the direct link is https://www.tocico.org/page/TOCBodyofKnowledgeSystematicLearningfromSignificantSurprisingEvents

Others could approach me and I’ll gladly send the paper to them.

Back to Boeing.  I, at least, don’t think it is right to blame Boeing for what led to the crash of the Indonesian aircraft in October, 29th, 2018.  All flawed paradigms look as if everybody should have recognized the flaw, but this is inhuman.  There is no way for human beings to eliminate all their flawed assumptions.  But it is our duty to reveal the flawed paradigm once we see a signal that points to it.  Then we need to fix the flawed assumption, so the same mistake won’t be repeated in the future.

The general objective of the movie, like the habit of most public and media inquiries, is to find the ‘guilty party that is responsible for so-and-so many deaths and other damage.’  Boeing top management at the time was an easy target given the number of victims.  However, blaming top management because they were ‘greedy’ will not prevent any safety issue in the future.  I do expect management to strive to make more money now, as well as in the future.  However, the Goal should include several necessary conditions, and refusing to take a risk for a major disaster is one of them.  Pressing for very ambitious short time of development, and launching a new aircraft without the need to train the pilots, who are trained with the current models, are legitimate managerial objectives.  The flaw is not being greedy, but failing to see that the pressure might lead to cutting corners and to prevent employees from raising a flag that there is a problem.  Resolving the conflict between ambitious business targets and dealing with all the safety issues is a worthy challenge that needs to be addressed.

Blaming is a natural consequence of anything that went wrong.  It is the result of a widely spread flawed paradigm, which pushes good people to conceal the facts that might lead to their involvement with highly undesired events.  The fear is that they will be blamed and their career will end.  So, they do their best to prevent revealing their flawed paradigms.  The problem is: other people still use the flawed paradigm!

Let’s see what were the critical flawed paradigm(s) that caused the Indonesian crash.  Typically, two different combined flaws led to the crash of the Indonesian plane.  A damaged sensor sent wrong data to a critical new automatic software module, called MCAS, which was designed to fix a problem of too high angle of rising.  This was a major technical flaw of failing to consider the case that if the sensor is damaged then MCAS would cause a crash.  The sensors stick out of the airplane body, so hitting a balloon or a bird can destroy the sensor, and this makes the MCAS system deadly.

The second flaw, this time managerial, is deciding not to let the pilots know about the new automatic software. The result was that the Indonesian pilots couldn’t understand why the airplane is going down.  As the sensor was out of order, many alarms were heard filled with wrong information and the stick shaker on the captain’s side has been loudly vibrating.  To fix that state the pilots had to shut off the new system, but they didn’t know anything about MCAS and what it was supposed to do.

The reason for refraining from telling the pilots about the MCAS module was the concern that it’d trigger mandatory pilot training, which would limit the sales of the new aircraft.  The underlining managerial flaw was failing to realize how that lack of knowledge could lead to a disaster.  It seems reasonable to me that the management tried their best to come up with a new aircraft, with improved performance, and no need for special pilot training.  The flaw was that being unaware of the MCAS module could lead to such a disaster.

Once the first crash happened, and the technical operational cause revealed, the second managerial flaw took place.  It is almost natural after such a disaster to come up with the first possible cause that is the least damaging to the Management.  This time it was easy to claim that the Indonesian pilot wasn’t competent.  This is an ugly, yet widely spread, paradigm of putting the blame on someone else.  However, facts coming from the black box eventually told the true story.  The role of MCAS in causing the crash was clearly discovered, and the role of the pilots not having any prior information about it.

The safe response to the crash should have been grounding all the 737 MAX aircraft until a fix for MCAS is ready and proven safe.  It is my hypothesis that the key management paradigm flaw, after establishing the cause for the crash, was highly impacted by the fear of being blamed for the huge cost of grounding all the 737 MAX airplanes.  The public claim from Boeing top management was: “everything is under control”, a software fix would be implemented in six weeks, so there is no need to ground the 737 MAX airplanes.  The possibility that the same flaw of MCAS would lead to another crash was ignored in a way that could be explained only by top management being under huge fear for their career. It doesn’t make sense that the reason for ignoring the risk was just to reduce the costs of compensating the victims, by still putting the responsibility on the pilots.  My assumption is that the top executives of Boeing at the time were not idiots. So, something else pushed them to take the gamble of another crash.

Realizing the technical flaw forced Boeing to reveal the functionality of MCAS to all airlines and pilot unions.  It included the instruction that when the MCAS goes wrong to shut-off the system.  At the same time, they published that a software fix to the problem would be ready in six weeks, an announcement that was received with a lot of skepticism.  Due to these two developments Boeing formally refused to ground the 737 Max aircraft.  When directly asked by a member of the Allied Pilot Association, during a visit of a group of Boeing managers (and lobbyists) to the union, the unbelievable answer was: No one has concluded that this was the sole cause of the crash!  In other words, until we have full formal proof, we prefer to continue business as usual. 

Actually, the FAA, the Federal Aviation Administration, issued a report assessing that without a fix there will be a similar crash every two years!  This means there is 2-4% chance that a second crash could happen within one month!  How come the FAA has allowed Boeing to let all the aircraft fly?  Did they carry out an analysis of their behavior when the second crash occurred after five months without a fix of the MCAS system?

Another fact mentioned in the movie is that once the sensors are out-of-order and the MCAS points the airplane down, the pilots have to shut off the system in 10 seconds, otherwise the airplane is doomed due to the speed of going down!  I wonder whether this recognition has been discussed during the inquiry into the first crash.

When the second crash happened Boeing top management went into fright mode, misunderstanding the reality that the trust of the airlines, and the public, in Boeing, has been lost. In short: the key lessons from the crash and after-crash pressure were not learned!  They still didn’t want to ground the airplanes, but now the airlines took the initiative and one by one decided to ground them.  A public investigation was initiated and from Boeing Management Team perspective: hell broke loose.

The key point for all management teams: 

It is unavoidable to make mistakes, even though a lot of effort should be put trying to minimize them.  But it is UNFORGIVEN not to update the flawed paradigms, causing the mistakes.

If that conclusion is adopted, then a systematic method for learning from unexpected cases should be in place, with the objective of “never to repeat the same mistake”.  Well, I cannot guarantee it’ll never happen, but most of the repeats can be avoided.  Actually, much more can be avoided, as once a current flawed paradigm is recognized and the paradigm updated, the derived ramifications can be very wide.  If the flawed paradigm is discovered from a signal that, by luck, is not catastrophic, but surprising enough to initiate the learning, then huge disastrous consequences are prevented and the organization is much more secure.

It is important for everyone to identify, based on certain surprising signals, flawed paradigms and update them.  It is also possible to learn from other people, or organizations, mistakes. I hope the key paradigm of refusing to see a problem already visible, and trying to hide it, is now well understood not just within Boeing, but within every top management of any organization.  I hope my article can help to come up with the proper procedures for learning the right lessons from such events.

Advertisement

Published by

Eli Schragenheim

My love for challenges makes my life interesting. I'm concerned when I see organizations ignore uncertainty and I cannot understand people blindly following their leader.

7 thoughts on “What should WE learn from Boeing’s two-crash tragedy?”

  1. Will have to disagree with your take on responsibility of Indonesian crash on boeing management. Its been a while since I saw that documentary but I remember clearly that documentary doesn’t blame boeing management for failing to make a crash-proof plane but specifically for creating a culture where reporting problem is discouraged while ignoring issues and cutting corners is encouraged. Greed is not the alleged culprit for downfall but incompetence and recklessness has been shown as culprit, as evidenced by the Boeing’s declining market share against Airbus (which shows opposite of greed in my view.) Documentary, through interviews of long time employees/ whistleblowers, notes how boeing started originally as an engineering led organization, during which period it was most successful, and then morphed into a stock-price & finance led organization. Documentary also notes specific time where this change began, i.e. that unnecessary merger. In my view these decisions made by management were specific choices not merely reaction to external pressure and not a result of flawed paradigm. Of course Boeing kept losing to Airbus but management still kept it on same path. The same goes for all decision subsequent to crashes where boeing management have managed to make worse of all possible decision at each turn. From blaming pilots to attempts at hiding information from US congressional committee. Boeing understood clearly that pilot training will be required due to MCAS yet it lied anyway to airlines. This in my view is plain old fraud not flawed paradigm or inadequate understanding of risks. All this seems decisions , of not someone under pressure of circumstances or someone operating under flawed assumption, but arrogance and negligence. Management clearly understood the risks and went along this path.
    Further, External environment is not oppressive for management, on the contrary it is too forgiving. Even after this fiasco boeing received 17 billion $ bailout from US taxpayer while total penalty and damages for this fiasco does not exceed 5 billion usd. There is practically no penalty, considering the amount of golden parachute, for CEO for running this company to the ground ( in spite of being part of a duopoly).
    In short this fiasco is not result of an intolerant environment demanding 100% safe planes but rather very lax environment which breeds and rewards wilful negligence and incompetence with no downside for actual decision makers.

    Anyway, looking forward to reading you research paper.

    Like

    1. Dear Vedant, I’ve noticed the interviews. As I didn’t carried any inquiry I don’t know how to relate to the evidence of several past employees. I just think that airlines are very sensitive to safety problems with their planes and they can afford to demand from Boeing to fix them. Was there a surge of complaints from airlines? Not every safety issue causes a crash, but airlines, and all pilots, raise safety issues every time they see a signal that something is wrong. Were there such signals on variety of issues? The documentary does not tell us. So, it seems the first crash was the first signal that there is something truly wrong. But, if the whole managerial culture was based on rejecting any probable safety issue, then I’d expect there would be a surge of complaints.
      So, for me I’m not sure the development of 737 MAX was done without the full managerial support for dealing with every potential safety issue.
      What happened after the first crash was, for me, the truly starting point for learning from the huge failure of dealing with a disaster. Here definitely the management made several major mistakes that every manager should learn the right lesson from.

      Like

      1. Though my memory is hazy I do remember documentary devoting fair bit of time on exposing an internal culture where safety is ignored to satisfy performance metrics, there was one particular scene where an undercover reporter telling an employee that one of his coworker didn’t perform some necessary procedure to satisfy performance numbers and employee’s face becomes completely aghast with shock upon hearing this. Apart from documentary this issues of internal culture has been widely reported in media. See this one of many example:-

        https://iamamalaysian.com/2020/01/16/this-plane-was-designed-by-clowns-who-are-supervised-by-monkeys/

        From the above link:-
        “According to more than 100 pages of internal company communications (which were apparently withheld from the FAA during the certification process for the jet) Boeing employees could be heard mocking federal rules, openly discussing their deception of regulators, and joking about the MAX’s potential flaws.”

        I will acknowledge by bias here. I have spent considerable time trying to find answer of “Why TOC has not seen far more widespread acceptance than it currently has?”. My current answer for this is simply that, we have an external environment (mostly economic and political environment) which rewards incompetence, negligence and reckless behavior. I talked about Boeing’s bailout but that’s only one example. You are probably aware that Uber’s and Wework’s founders made billions inspite of making a very inefficient and dysfunctional organization. The same goes for banks of US and Europe. The same goes for a number of other industries such as healthcare, insurance, etc. ( regarding healtcare , I recommend two documentaries called “Bleeding edge” and “crime of the century” in case you are interested).
        Thus, it is possible that Boeing documentary fed into my preconceived conclusion and I gave more weighteg to cultural issues than they deserve.

        Like

        1. Vedant, it is so EASY to Blame executives for being idiots, or clown supervised by monkeys. In all my years as consultant and educator I’ve met only one, or two, executives that I thought were idiots.
          I’m also aware that past employees, being bitter on not reaching the status they thought they deserve, and maybe being laid-off, can easily find evidence of great stupidity of top management. My advice to you is to be careful accepting such allegations.
          This doesn’t mean great mistakes are not made. The current culture of judging everything based on actual results, ignoring the role of uncertainty, forces managers to stick to the performance measurements and just hope for good. This vicious cycle is what we should find a way to break it.
          Again, if Boeing is such unsafe company, how come that all the airlines, including American, United, Lufthansa and Turkish, are still using Boeing planes?

          Like

          1. The comment about monkeys was made by employee who were working at the Boeing at that time not some ex-employee. Further this was an email by one employee to another employee and subsequently uncovered in FAA investigation, so I don’t think that its is case of bitterness on behalf of employees. Futher, this is exchange from the 737 max test pilots is from the same link as I provided above:-

            “Would you put your family on a Max simulator trained aircraft? I wouldn’t,” one employee said to a colleague in another exchange from 2018, before the first crash. “No,” the colleague responded.”

            The article contains many more of such details.

            Regarding Airlines I would like to point out that Airlines were the one who grounded planes far before regulatory authorities. Airlines were misled by Boeing on safety and Boeing was charged with fraud by authorities. However, Boeing is part of a duopoly and Airlines doesn’t really have much options except continue to do business with boeing in short term. Boeing is losing market share to Airbus very fast which means that Airlines are indeed looking for Boeing replacement.

            I am well aware of my bias but be assured that calling executives idiots, incompetent or reckless is as much unpalatable for me as for you. Let me clarify my view once again. My view is that broader economic and political environment is underlying cause which breeds such behavior and not just a matter of one or two executive or companies. Please read the linked article in whole if you can spare time, it will certainly give you necessary information to understand this fiasco.

            P.S.: Linked article is itself based on a new york times article. I didnt link the original new york times article because it is behind a paywall but here is the link if you can access it:

            Like

            1. Vedant, let me assume now that the management of Boeing had neglected the warnings leading to the first crash, because of being obsessed by having to produce superior results. There is a lesson from that, but it cannot be simply to consider every small problem somebody is warning from and delay the project because of it. So, a more comprehensive thinking has to be carried to create a new safety mechanisms to ensure such a breach of safety would not happen again.
              I’m still troubled by what happened AFTER the crash. I can only explain it by the feeling of the top management of being under threat. This is another cause for managerial disfunction that needs clear learning.
              I think my generic article on Systematic Learning from Surprising Events should be used to learn from two major mistakes:
              1. How to keep fast flow and reliable launch of new aircraft, without neglecting the safety issues.
              2. How should management behave when a disaster happens, without having to feel threatened, as this could lead to more disastrous results.

              Like

  2. I also watched that documentary and another one that discussed the history of Boeing before and after the merger with McDonnell Douglas. It said that after the merger, the Boeing execs were replaced with the McDonnell Douglas execs. And immediately after that, engineers started complaining that the new execs were pressuring them to sacrifice standard engineering best practices/policies in order to meet production quotas. And these engineers explained that Boeing never did that. One of the standard policies that was sacrificed is that all primary systems must have at least one secondary/backup system. When Boeing made the new plane (by redesigning an old one), the MCAS system was created (a primary system), but with no secondary/backup system. In my view, a core aspect of this problem was the bad culture/ideas of the execs where they are ok with sacrificing engineering standards in order to meet production quotas (while they are not engineers and they don’t know what it would mean to not have a backup system to MCAS, and I bet they did not ask the engineers).

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s