What makes manufacturers unhappy about their ERP systems

Switch on captions if you’re watching this video on silent mode.

After you’ve invested hundreds of thousands into your own ERP system, it’s clear you wouldn’t want to find out that there’s a much better alternative on the market. Fortunately, it is not easy for an SME to replace their ERP, and large clients hate it even more. But, the reputation of the ERP company is meaningful for new implementations, and for being able to add more value to their existing clients, by adding applications that either add new required capability, or solve a current problem. 

The simple truth is that most ERP systems are necessary for running businesses, but they fail to yield the full expected value to all clients: to be in good control of the flow that generates the revenues.  

I’m looking forward to your comments here or on my Linkedin post.

Fundamental Default Gap 

When I encounter an ERP system for the first time, I first check whether it helps its user overcome Two Critical Challenges. With rare exceptions, a manufacturer who can’t handle these challenges is inherently fragile, where external factors determine the moment of a break.

All organizations have to make commitments to their clients. Part of such a commitment is the amount of the product or service to be supplied and its overall quality.  Another important part is the time frame for the delivery. There are two broad possibilities for the time frame: either immediately whenever the client asks for it, or a promise for a specific date in the future.

This very generic description defines Two Critical Challenges:

  1. WHAT TO PROMISE: what can we promise our clients, while ensuring a good chance of meeting all commitments?
  2. HOW TO DELIVER WHAT WAS PROMISED: once we have committed to a client, how can we ensure fulfilling the delivery on time and in full? 

It may seem that if you have a good answer to the first challenge, then the delivery is pretty much ensured.  Well, consider Murphy’s law: “Anything that could go wrong will go wrong.”  We are facing a lot of uncertainty, so following the plan is never just a straight walk, you need to deal with many things that go wrong, but that doesn’t mean that you cannot deliver as you’ve committed to. It just means you need to quickly identify problems and have the means to deal with them, still keeping the original commitment intact.

This article focuses on manufacturing companies, even though distribution companies face similar needs. Actually, most service companies also have similar issues with making commitments and then meeting all of them.

The potential value of any ERP package for manufacturing is to provide the necessary information, based on the actual data, to lead Operations to do what is required for delivering all sales-orders on time and in full, without generating too high cost. In other words: ERP should support the smooth and fast flow of goods, preferably throughout the whole supply chain, and at the very least, manage the flow from the immediate vendors to the immediate clients.  

When properly modelled and used, the current ERP systems provide the production planners with easy access to all the data about every open manufacturing order and the level of stock of every SKU. The data also covers the processing and setup times for every work center, again when being properly input by the user. So, capacity calculations can be performed with current technology.

Question for ERP brands and developers: 

How does that huge amount of data, currently collected by your ERP, help to overcome the Two Critical Challenges for your customers?

If this question sounds too rhetoric, I encourage manufacturers to share their experience on how they adapted to work around the gap after it was by default inherited with a chosen ERP package.  

Manufacturing organizations seem to be very complex. While outlining the whole process from confirming a sale-order until delivery isn’t trivial, the ERP tools are good enough to handle that level of complexity. But synchronizing all the released manufacturing orders, which compete for capacity of resources, is a major problem. Consequently, even though timely delivery of one particular high priority order is not a problem, achieving an excellent score on OTIF (on-time, in full) seems very complicated.

So, in order to be able to answer the two challenges there is a need to have good control on the capacity of resources.

Promise, then deliver

Measuring capacity is not trivial.  Just having to deal with setups adds considerable complexity.  During the 90s, along with powerful computers with great ability to crunch millions of data items, the idea of creating an optimized schedule, taking into account all the open sales-orders, going through each order routing, considering the available capacity at the right time, came to life with a new wave of software called APS (advanced planning and scheduling systems). These systems were supposed to yield the perfect planning, meaning it could be performed in a straightforward way, accomplishing all the objectives of the plan. If this could really work, then the second challenge would have been solved as well.

However, the APS systems eventually failed. Some claim they could still be used for inquiring about what-if scenarios.  Problem is: it can tell you what definitely wouldn’t work, for instance because of lack of capacity of just one resource. But APS failed to predict the safe achievement of all commitments, so its main value was, at most, quite limited.

The reason for the failure of all the APS is that on top of complexity of managing the capacity of many resources, there is considerable uncertainty, and any occurrence of a problem would mess up the optimal plan.

Relative to the APS programs, the development of ERP aimed at integrating many applications using the same database, without considering the capacity limitations and without striving for the ultimate optimal solution. Some ERP programs have widened the ability to model more and more complexity, others are still going with the basic structure.

Even when we have excellent data, and effective ERP tools, managing the uncertainty is quite a tough task. It is always tough – no matter the plant type or industry – because of the complexity of monitoring the progress of so many manufacturing orders. Every production manager struggles with the ongoing need to decide what work order has to be processed right now.  This also means that processing other work orders is delayed.  When the market demand fluctuates, when problems with the supply of materials occur, or when machine operators are absent, the production manager must have a very clear set of priorities, and certain flexibility with time, stock, and capacity to be able to respond immediately to any new problem. The objective is still valid: delivering everything according to the commitments to the client. 

Note, if we come up with a solution to the second challenge (2. How to deliver what was promised), then we might also understand better what truly limits our offering (and commitment) to the market. Once we know that, we can come up with an effective planning scheme where every single promise made by our sales people is pretty safe.

This is where the insights of the Theory of Constraints (TOC) come to our rescue.

Key insight #1:
Only very few resources, usually just one, truly limit the output of the system.  

The recognition of the above statement effectively simplifies monitoring the flow. In TOC we call that resource the ‘constraint.’ Certainly, the capacity of the constraint should be closely monitored. Several other resources should also be monitored, just to be sure that a sudden change in the product mix won’t move the ‘weakest link’ to another resource. The vast majority of the other resources have much less impact on the flow, as they have some excess capacity, which can be effectively used to fix situations when Murphy causes a local disruption. 

One additional understanding from that key insight: the limited capacity of the constraint is what could effectively be used to predict the safe-time where the company can commit to deliver. More on it will be explained later.

Key insight #2:
An effective plan has to include buffers to protect the most important objectives of the plan.

Buffers could be time, stock, excess capacity, excess capabilities, or money.

In manufacturing we can distinguish between make-to-order (MTO) and make-to-stock (MTS).  The vast majority of the manufacturing organizations produce both for order and for stock, sometimes within the same work-order there are quantities promised to be delivered at certain dates (MTO), while the production batch also includes items to cover future demand (MTS). This creates quite a lot of confusion, and makes the life of the production manager, who is required to locate all the parts that belong to a specific sales order, a never-ending nightmare. 

You will be surprised but even the most popular ERP systems do not differentiate between MTO and MTS. If you wonder why obviously an MTS company manages its production the MTO way, check their ERP system default (and watch a follow-up video below for more explanation on this).

My team discovered one ERP system, Odoo, that distinctly differentiates between MTO and MTS. This distinction makes Odoo a strong contender for a platform on which to build the necessary features that address the key insights. I’m excited to see how effectively these features are already responding to two critical questions, and I look forward to discovering what more we can achieve in the future. Generally speaking, this capability is also possible with other ERP systems as well.

Every MTO order has a date that is a commitment. Considering the uncertainty there is a need to give Production enough time to overcome the various uncertain incidents along the way, including facing temporary peaks of load on non-constraints, quality problems, delays in the supply and many more. This means necessarily starting production enough time before the due date to be confident that the order will be completed on time. That time given to Production with a good confidence is called Time-Buffer, and in manufacturing it includes the net-processing time, because in the vast majority of manufacturing environments, the ratio of net-processing time to the actual Production Time is less than 10%. Thus, the order release date should be: due-date minus the time-buffer-days.  We highly recommend not releasing MTO orders before that time, otherwise significant temporary peaks on non-constraints will be created.

Make-to-stock requires maintaining a stock buffer. The definition of the stock buffer includes on-hand plus open manufacturing orders for that item. Thus, when a sale automatically triggers the creation of a manufacturing order for the same SKU, then the stock buffer is kept intact.

Key insight #3:
The status of the buffers provides ONE clear priority scheme!

In TOC we call it Buffer Management. The idea is to define the status of the buffer as the percentage of how much of it is left. As already mentioned, in manufacturing the net processing time (touch time) of an order is a very small fraction of the actual production lead time. The most production time is spent waiting for the work-centers to finish previous orders. Thus, if a specific order becomes top priority, the wait time for that order would be significantly cut and so would the lead time.  

MTO orders use time buffers, while as explained earlier, MTS orders use stock buffers. Ideally, we should monitor both MTO and MTS orders in the same queue as you see in the Picture 1 below. When only one-third of the time-buffer or less is left until the delivery date, or the on-hand stock is only one-third or less of the stock buffer, the buffer status of that order is considered RED, meaning it has top priority.  Once that red order gets the top priority the wait time is dramatically cut. The production manager facing a list of several red orders could decide to take expediting actions to ensure a fast flow of the red orders, so all of them will be completed by the due date.

For MTS orders the buffer status is the percentage of the on-hand stock relative to the stock-buffer, which also includes the WIP (work-in-process). Following the one priority scheme, using both stock and time buffers significantly improves the probability of excellent delivery performance. This happens mainly when some level of excess capacity exists, even on the constraint, and more so on the few other relatively highly loaded resources. However, when the demand goes up, then at some point in time the number of orders in RED increases sharply.  When this situation happens, it radiates a clear warning: There is no way to meet the commitments as long as there is no significant increase in capacity.  We can call this kind of warning “too much Red.”

List of manufacturing orders prioritized by Buffer consumption (screenshot from TOC app in Odoo)
In Buffer Management, penetration of the red-line initiates expediting of a particular manufacturing order (and related work orders). When the number of RED orders – those that have crossed the red-line – goes up sharply, a true bottleneck is emerging. (Picture 1)

Key Insight #4:
Monitoring the size and trend of the Planned-Load of the constraint, and few other highly loaded resources.

The Planned-Load of a specific critical resource is the total hours required to process all the confirmed sales-orders. This is done simply by going through all the backlog of orders and adding the required hours of work of that resource to process the orders.  The most critical resource, the constraint, is expected to have the highest number of hours required for processing all the existing orders. The Planned-Load could be expressed as the date where we expect the critical resource to finish processing all the confirmed demand.

Note two critical advantages from the Planned-Load that benefit and align Production and Sales:

  1. We get an accurate prediction of the lead-time for a new order!  

When a new order appears in the backlog, then in most cases it will be processed by the critical resource only after all the existing demand (confirmed orders) have been processed. When we add to the Planned-Load some extra time (usually half of the time buffer for that order), covering for the processing time of the constraint and going through all the rest of the routing, we get a safe-date we can commit to.

  1. Watching the trend of the Planned-Load provides us with signals on the overall trend of the market.  

The Planned-Load should be re-calculated every day. The difference between today’s planned load and tomorrow’s is that the orders processed by the resource today disappear from the calculation, while new orders arriving today are added to it. When the Planned-Load of the constraint increases (see the right screen in Picture 2), it means more orders are coming than what the constraint has been able to process. If such a trend is consistent for some time, it might mean: a bottleneck is emerging. You either won’t be able to deliver on time (and suffer customer dissatisfaction), or you will be forced to increase the lead time (and consequently lose some business when your competition remains faster). When the trend goes down (as shown in the left screen in Picture 2), it means less orders are received from the market, and then it is possible to deliver faster.

Planned Load graph of the Capacity Constrained Resource generated by TOC app in Odoo.
The trend of the Planned Load of the Capacity Constrained Resource (CCR) is an immediate indication whether the CCR will become a bottleneck or whether sufficient protective capacity still remains (Picture 2).

Depending on the trend, the company should trigger a proper managerial initiative:

  • either to increase the capacity of the constraint, possibly also the capacity of one or more of the other critical resources, as we don’t want them to become bottlenecks,
  • or find ways to attract more orders and more new customers. Note, less loaded constraint means a shorter lead time. In markets where supplier response time and reliability matter a lot, a shorter lead time magnetizes new orders. This means if Sales and Operations activities are synchronized, any drop in demand could be just temporal. 

Balancing carefully between demand and capacity

The combination of monitoring both the number of red orders and the Planned-Load yields a powerful piece of information on the stability of the organization, regarding the sensitive balance between demand and capacity.  The advantage of Buffer Management is that it doesn’t depend on the quality of the vast majority of the ERP data, only on the consumption of time or stock. The advantage of the Planned-Load is its ability to issue a warning earlier than Buffer Management, giving the managers more time to react, including the option to add temporary capacity.

The actual value from the combination of Buffer Management and Planned-Load has been thoroughly checked using the MICSS simulator, developed by me during the 90s for testing various production policies and their impact on the business.  When the market demand starts to grow, after some time the number of red orders suddenly increases. At that time the current delivery performance is still adequate. But after running the simulation for one or two more weeks, the disaster in the delivery is clearly seen. I hope and wish your factory’s reality is much better than that!

The four key insights could be effectively used to vastly improve the value and skyrocket ROI of any modern ERP system, still using most of the capabilities and algorithms of the original ERP.  

My team and I at Enterprise Space, Inc. are determined to continue adding new algorithms and data visualisations to an ERP system that would focus the managers to the truly critical issues.

The next phase of value to the customers would be detailed by a subsequent article on how to support decisions when additional new sales initiatives are evaluated, predicting the net impact of those decisions on the bottom-line, taking revenues, cost and capacity into account, considering also the level of uncertainty.

Switch on captions if you’re watching this video on silent mode.

Fighting Uncertainty as a Critical Part in Managing Organizations

By Eli Schragenheim and Albert Ponsteen

The article is based on the webinar “Fighting Uncertainty in Organizations, Including Matrix Ones, to Achieve Excellent Reliability.”

This article explores managing uncertainty in organizations. Although the key ideas are generic, special emphasis is on multi-project environments. Drawing from Dr. Goldratt’s philosophy, it emphasizes knowing something but never everything. The authors spotlight overlooked “common and expected” uncertainties like equipment malfunctions and seasonal variations. These seemingly minor uncertainties can cumulatively impact performance and disrupt detailed planning. The “Domino Effect” describes how these uncertainties can combine and amplify challenges. The article advocates for adaptive strategies and the use of buffers as protective measures against unpredictable organizational challenges.

1. The Philosophy of Knowing and Not Knowing

Never Say ‘I Know’ and Never Say ‘I Don’t Know’: You always know something but never everything.

The principle of never saying “I know” or “I don’t know” comes from a nuanced understanding of the constant uncertainty we live in. The founder of TOC, Dr. Goldratt, emphasized the point of “never say I know” as a reminder to stay humble about our knowledge. However, he was equally insistent that we usually know something about the subject matter. This balance between acknowledging our limitations and recognizing our abilities is essential in navigating uncertainty. It helps us begin somewhere rather than getting paralyzed by what we don’t know.

“When Dr. Goldratt came to me and asked, ‘Eli, how long will it take you to develop a new feature’, I hesitated. I was well aware that part of my code was complex and any addition would be risky. My first inclination was not to give him a number. But Goldratt insisted, ‘At least you can tell me whether it’s closer to two hours or two years.’ That’s when it clicked for me. He didn’t need a precise number; he needed a sense of the time frame. So, I told him, ‘It will not be more than two weeks.’ And he replied, ‘OK, this is what I wanted to know’.” Eli Schragenheim

This paradigm shifts our perspective on how to approach uncertainty. It equips us with the mindset to acknowledge that while complete knowledge is unattainable, we must not let our gaps in knowledge deter us from making decisions or taking actions. It is this very mindset that shapes our approach to tackling uncertainty in organizations.

Read more at:

https://elischragenheim.com/2016/01/09/never-say-i-know-and-the-limitations-of-our-reasonable-knowledge/

2. Identifying the Huge Impact of Common and Expected Uncertainty

In the preceding section, we laid the groundwork for understanding the nature of common and expected uncertainty. Because these uncertainties are considered “part of the job,” there is a tendency to underestimate their impact. The mistake here is viewing them as isolated incidents rather than cumulative forces that can weigh heavily on the organization’s performance and planning. Here, we will dive deeper into understanding its mammoth impact on organizational planning and performance. Although the emphasis is on multi-project environments, the key ideas are generic.

The Incidents That Don’t Surprise Us

Often, these are the uncertainties that are most overlooked precisely because they are so familiar. These could range from employee turnover, equipment malfunctions, or even seasonal variations in sales for businesses. 

“I was approached by a company to investigate why a very important project that should have taken one year actually took five years. Management clearly did not think this was ‘common and expected’ uncertainty; they would have tolerated it for maybe two years. But the professionals who executed the project considered it a huge success. They said, ‘In the U.S., they have been working on it already for more than 10 years, and they are not close to what we have achieved.’ This was enough for me to deduce the team had set a one-year plan, knowing it is excessively optimistic because they were concerned that if they had told it could take five, or even ten years, the project would never have started.” Eli Schragenheim

The lesson is: when you ask for a clear one-number estimation, without defining the level of common and expected uncertainty, then you either might face big surprises or face frequent occurrences, where the project/mission finished exactly on time. The latter case actually means that the mission could have easily finished before the planned time. Both cases cause severe damage to the performance of the organization.

The Impracticability of Detailed Planning

Common and expected uncertainty often messes up the execution of a detailed plan, making the original plan to be almost useless after a relatively short time. The more complex the plan, the more vulnerable it is to disruption from these routine uncertainties. Adaptive planning becomes crucial here. While having a plan is essential, the ability to adapt and modify it in real time is invaluable. The problem is that the intended outcomes and commitments to the market might be negatively affected, harming the reputation of the organization. This section discusses the challenges of planning in an environment filled with common uncertainties and suggests more dynamic, adaptable approaches, which most of the time contribute to achieving the desired objectives.

For instance, suppose the output of the project requires five different inputs from five different teams. What is the probability that the project will finish on time? Any delay in any input would delay the whole project, no matter how early the other teams submitted their inputs. So, what date should you offer reliably? 

The Domino Effect

One of the most underestimated aspects of common and expected uncertainty is how it can accumulate. An employee’s unexpected sick day may not seem that impactful alone. But combine that with a minor delay in raw material delivery and throw in a sudden but not surprising market fluctuation, and you have a perfect storm. This is the Domino Effect in action: individual uncertainties, seemingly manageable on their own, can combine to produce an outcome far worse than any of them could individually. The cascading impact can be particularly detrimental to delivery performance, disrupting schedules and damaging reputation.

The above example of the five teams (see the end of the previous section) demonstrates a typical situation where lateness of one task is not compensated by other tasks that are early. This is the “integration effect.” Another situation for possible domino effect is when the output of a task, while on-time, fails to deliver the planned quantity, thus the subsequent operation has only limited inputs to work on, creating a negative effect throughout the chain. 

Parkinson’s law states that work expands to fill the time allotted for its completion. Practically it means that when a planned task is delayed due to uncertainty, the delay is not compensated by other tasks finishing early. This behavior of Parkinson’s law causes much of the Domino Effect in projects, on top of the two above situations that generates an accumulation of delays.

3. Including Protection Mechanisms, Like Buffers, Against Uncertainty

In the previous section, we highlighted the underappreciated role of common and expected uncertainties in project management. This chapter digs deeper into a powerful tool for mitigating their impact: buffers. Beyond just time, buffers can cover finances, manpower, and more.

Everyday Strategies to Counter Uncertainty

Just as project managers use buffers to navigate uncertainties, we employ similar tactics in our daily lives. These aren’t complex plans but simple, intuitive steps that serve as safety nets for the unexpected.

  • Catching a Flight (Time Buffer). When you need to catch a flight, you usually aim to reach the airport well before the departure time. This accounts for potential traffic delays, long security lines, or other unforeseen events.
  • Preparing Dinner (Resource & Quantity Buffer). When hosting a dinner, you might buy extra ingredients. This accounts for the possibility of some ingredients getting spoiled, the recipe requiring more than expected, or perhaps an unexpected guest arriving.
  • Saving Money (Financial Buffer). Financial advisors often recommend keeping an emergency fund to cover sudden, unforeseen expenses like a broken appliance or medical emergency. This is different from saving money for investment purposes, which is about growing your financial resources over time. The emergency fund acts as an immediate financial buffer, providing peace of mind and stability in case of unexpected setbacks.
  • Carrying an Umbrella (Risk Buffer). Even if the forecast suggests only a 10% chance of rain, you might carry an umbrella when heading out for the day, just in case.
  • Keeping a Spare Tire (Risk & Resource Buffer). Most vehicles come with a spare tire. Even if you never expect a flat, it’s there as a buffer against that potential problem.
  • Having Insurance (Risk Buffer). Be it health, car, or home insurance; the idea is to have a buffer against unexpected damage or health issues.
  • Dressing in Layers (Flexibility Buffer). If you’re uncertain about the weather, you might dress in layers. This way, you can adapt to a warmer or colder environment by adding or removing clothing.
  • Backup Power/Charger (Resource Buffer). Carrying a power bank when out for long hours ensures that even if your phone’s battery depletes faster than expected or if you use it more than usual, you won’t be left without a functioning device.
  • Learning Additional Skills (Skill Buffer). People often upskill or learn things outside of their primary profession. This not only helps in personal growth but also acts as a buffer in changing job markets or if one decides on a career shift.

In short, these everyday buffers help us manage uncertainties and offer peace of mind, reinforcing that while we can’t foresee all events, we can prepare for many incidents.

Buffers Mean Planning Spare of What We Might Need

Buffers are vital tools in project management, used to mitigate various uncertainties. While time buffers are most common, there are several types to know, each targeting specific risks.

Buffer Types at a Glance

Buffers are surplus resources allocated to navigate inevitable uncertainties. For example, a project estimated at 100 work hours might include a 20-hour time buffer for contingencies like absences or technical issues.

“I remember being invited by the Olympic Committee of a country to consult on their planning for the Games. The date was fixed, the world would be watching, but what buffers did they have? You’ll be surprised: Their main buffer was a huge amount of money. Most of which could have been saved with better planning. Sometimes you need to ‘waste’ to create a buffer, but the trick is to waste wisely so as not to incur unnecessary costs.” Eli Schragenheim

By strategically applying buffers—be it a financial buffer for a grand event like the Olympics or a time buffer for a smaller-scale project—managers can significantly enhance the project’s success rates and readiness for unforeseen issues. This real-world example underscores the importance of not only having buffers but also optimizing them to save resources.

The Stigma Against Buffers

Buffers are essential for managing uncertainties, yet they often meet resistance, especially in management circles. This skepticism arises from the belief that buffers are wasteful or overly cautious. When buffers are made visible, trust is crucial; without it, their presence can reinforce negative stereotypes, painting them as budget inflators rather than strategic tools. The key challenge is to reframe buffers as vital elements of proactive planning, dissolving resistance, and promoting a resilient approach to managing inevitable project uncertainties.

The Key New Insight: Include Visible Buffers in the Planning!

Making buffers visible in project plans fosters transparency and effectiveness. Unlike traditional hidden buffers, visible ones enhance project tracking and offer a measurable cushion for unexpected challenges. They also set realistic expectations by highlighting flexibility in planning. This visibility counters the stigma against buffers, framing them as strategic, rather than wasteful. In short, visible buffers improve accountability and project success.

Buffers Are Sometimes Partially Consumed

An important subset of these visible buffers is the partially consumed buffer. These buffers are dynamic and adaptable, offering the flexibility to make real-time adjustments. For example, if a project task initially includes a 20-hour time buffer and only 10 hours are needed, the budget for the 10 hours can be reallocated or saved for future needs. Such flexibility not only allows for real-time optimization but also leads to more efficient resource management. The effective planning and use of these dynamic buffers are key for project success.

Planning visible buffers has to include the issue of where the buffers should be inserted. Protecting every single task within a project is useless. What we really need is to protect the completion of the project. Thinking about what could easily disrupt the safe completion and also ensure the quality of the outcome would lead us to note where the buffers are truly needed. TOC has fully developed methodologies for determining the right location of buffers in projects as well as in manufacturing.

Conclusion

Buffers serve as a crucial mechanism in the armor of project management against the unpredictability and uncertainties inherent in any project. While they are often misunderstood or misapplied, a nuanced and strategic approach to buffering can save both time and resources in the long run. Making buffers visible and understanding that they can be partially consumed allows for a more flexible, robust, and resilient project management approach.

4. Setting Priorities in the Execution Phase, Based on the Actual Consumption of the Buffers

In this chapter, we’ll delve into the critical role of prioritization and buffer management during a project’s execution phase. We’ll emphasize the limitations of relying solely on time buffers and advocate for a real-time dual-buffer approach that monitors both time and capacity.

The Essential Role of a Unified Priority System

Navigating the complexities of multiple projects demands a unified priority system. This system serves as a single source of truth for decision-making and is informed by real-time buffer consumption, guiding efficient resource allocation.

In a multi-project landscape, a unified priority system is indispensable. It provides a stabilizing framework and helps avoid the pitfalls of striving for an elusive “optimal” solution, focusing instead on what’s practically achievable.

Read more:

https://elischragenheim.com/2016/12/26/the-toc-contribution-to-healthcare/

Read more: 

https://www.researchgate.net/publication/228472949_Utilising_buffer_management_to_manage_patient_flow

Monitoring the State of Many Buffers Leads to Identifying Situations Where the Whole Protection Scheme Might Crash.

The effectiveness of buffers relies not just on their existence, but on ongoing monitoring and adjustment. As projects evolve, so should the buffers that safeguard them, meaning part of them is consumed by the incidental delays that have occurred so far. However, when implementing a buffer management system, a single focus on individual projects or processes can be short-sighted. A broader, more comprehensive outlook is necessary for effective risk mitigation and preventing potential cascading failures across the entire protective scheme. We need an early warning that the system might crash.

Monitoring the Buffers

In the context of managing uncertainties in multi-project environments, buffer monitoring is intricately linked with the “fever chart” of Critical Chain Project Management (CCPM). The fever chart graphically represents the consumption of project buffers against the completion of project tasks. By regularly monitoring the consumption of these buffers – whether they pertain to time, resources, or finances – organizations can derive real-time insights into the health and progress of their projects. This visual tool, when intersected with the buffer consumption rate, offers a clear picture of project performance. If the chart indicates a buffer being consumed too rapidly relative to task completion, it serves as a warning signal, indicating potential issues and enabling managers to proactively adjust strategies or allocate resources, thereby ensuring optimal project outcomes.

A modern version of a Fever Chart using infographics by Epicflow

The Hidden Risk: Relying Solely on Time Buffers

Time buffers, often represented through color-coded statuses, may be misleading indicators in a multi-project environment. While they effectively signal potential issues in individual projects, they fall short of capturing systemic risks. A few projects shifting from “green” to “red” may seem like isolated issues. However, when multiple projects turn “red,” it can expose the fragility of the entire system and trigger a cascade of failures. By this point, corrective action is usually too late to avoid widespread disruption.

The Missing Link: Capacity Buffers

Capacity issues, when overlooked, lead to significant obstacles in project management. If these issues are not addressed promptly, they result in what is called a “bottleneck” – a point of congestion where tasks accumulate, leading to delays. For instance, if a particular engineering team is stretched thin, all projects dependent on that team will experience delays. Since capacity isn’t project-specific but shared, a deficit impacts multiple projects. 

Monitoring the level of access capacity provides the missing layer of protection. They alert management to shared critical resources that can derail multiple projects simultaneously. Neglecting to monitor capacity can leave the system vulnerable to unanticipated crashes, making the dual-buffer approach imperative for holistic project management.

Read more: https://www.epicflow.com/blog/once-in-red-always-in-red/

Read more: https://elischragenheim.com/2016/09/03/the-critical-information-behind-the-planned-load/

The Dual Buffer Strategy: A Comprehensive Approach

The most effective buffer management strategy incorporates both time and capacity buffers. This dual approach provides a nuanced, multi-dimensional view of project health, enabling proactive measures to avoid both individual project delays and systemic failures.

Dual buffer representation in Epicflow

Understanding History to Plan for the Future

Understanding the dynamics of capacity buffers to plan the future is incomplete without considering the historical performance of your teams. This data is not just a reflection but a predictive tool, embedding lessons from the past to enrich future planning strategies.

Consider an engineer scheduled for four 8-hour tasks in a 40-hour week. If only three tasks are consistently completed, a gap between planned and actual output becomes apparent. This recurring pattern isn’t a one-time anomaly but indicates a need for reassessment and adaptation in task estimations or effective capacity planning.

The incorporation of historical data ensures that future strategies are well-rounded, combining past performance trends and adaptive responses to uncertainties. This data serves not just as a record but a predictive tool, embedding past lessons to enhance future adaptability and resilience.

In cases where only three out of four planned tasks are consistently completed, it signifies an essential gap between planned capacity and actual output. Such insights lead to a reassessment of the benchmarks set for capacity. Informed by historical performance, adjustments can be made to align with actual output trends. This data-driven approach builds robustness and adaptability, ensuring that project plans are efficient and effective. It prepares teams for unforeseen challenges, enhancing the organization’s agility and responsiveness to change.

Historical performance measured in Epicflow

Conclusion

While time buffers are valuable tools, their utility is severely compromised if capacity buffers are ignored. A dual-buffer system that includes real-time monitoring of both time and capacity is essential for navigating the complexities of multi-project environments. This approach offers a comprehensive safety net, enhancing both individual project success and overall organizational resilience.

5. A More Generic Insight: We Should Estimate the Size of the Common and Expected Uncertainty as a Reasonable Range

The concept of “reasonableness” is subjective and varies from one situation to another. However, when it comes to forecasting, being “reasonable” requires us to rely on judgment informed by both data and experience. Too often, we see forecasts presented as a single number, creating an illusion of certainty that’s misleading. In truth, the term “reasonable” isn’t about aiming for pinpoint accuracy, but about grounding our expectations in reality.

Do Not Overprotect from Very Rare Incidents

Planning against highly improbable events, like your supplier being hit by a tsunami, may not be reasonable in most scenarios. Overpreparing for such outliers can tie up resources and lead to inefficiencies. The key is to prepare for what is most likely to happen. For example, if a supplier promises delivery in two months, a reasonable buffer might be planning for a three-month wait instead. This allows you to prepare for the “expected uncertainty” without spreading your resources too thin.

Use Risk Management Tools to Evaluate High Risks with Very Low Probability

For those black swan events that are highly improbable but catastrophic, other mechanisms like insurance or government protection schemes are more appropriate. These are separate from the day-to-day operational buffers and help shield against large-scale disruptions.

A major realization is: Don’t forecast ONE number -> always use a range!

Forecasts should always be expressed as a range rather than a single point. A range not only provides a more realistic picture but also allows for better planning and resource allocation.

Putting it All Together: The Need for a “Reasonable Range”

When it comes to decision-making, whether it’s forecasting sales for the next month or planning a project, we need to adopt the practice of using “reasonable ranges.” A range provides us with a cushion, offering protection without leading to resource wastage. This is critical, not just in supply chain decisions but in marketing, operations, and human resource management.

Budgets Should Reflect Realistic Buffers

In many organizations, budgets often have a narrow buffer, usually entitled as “reserve,” sometimes as little as 5%. This is usually insufficient for dealing with unexpected changes or opportunities. Budgets need to be more dynamic, with larger reserves that can be deployed when genuinely needed.

Forecasts as Targets: A Double-Edged Sword

Many organizations use forecasts as targets, which can lead to unintended consequences. For example, if you set a two-week target for a project, you can be almost certain it won’t be completed in less time, which is the essence of Parkinson’s law. People will often “use up” the allotted time, sometimes focusing on unnecessary details. Targets should be flexible, allowing for both over-performance and the unexpected.

Measurement Systems Need to Reflect Realistic Expectations

Just like university grading systems, where a score of 84% doesn’t really mean that the student did better than another student who got only 80%, and worse than one who scored 88%, we need measurement systems in the workplace that reflect broader categories. Instead of obsessing over small numerical differences, focus on what is “good enough”, what is “excellent”, and what is “unacceptable”.

Read more: https://elischragenheim.com/2016/04/10/between-reasonable-doubt-and-reasonable-range/

Read more: https://elischragenheim.com/2016/03/09/why-should-the-red-zone-be-13-of-the-buffer/

In Summary

When it comes to forecasting and planning, it’s essential to always think in terms of ranges rather than specific points. This provides a more flexible and realistic framework for decision-making and resource allocation. “Reasonable” doesn’t mean overly cautious; it means well-considered and flexible. As we become more experienced, our sense of what constitutes a “reasonable range” will become more accurate, allowing for more efficient and effective operations across all aspects of business.

References:

[1] “Never Say I Know” and the Limitations of our (Reasonable) Knowledge, Eli Schragenheim (2016).

[2] The TOC contribution to Healthcare, Eli Schragenheim (2016).

What are the fundamental concepts of the Theory of Constraints?

By Eli Schragenheim and Dave Updegrove

Eli.S note: this article is the result of a collaboration between Dave Updegrove and I, on the topic of defining the essence of the Theory of Constraints (TOC). 

We claim that every beneficial insight removes a current limitation that prevents us from achieving value.  The limitations TOC deals with are usually caused by flawed paradigms, or assumptions. 

With this basic insight, Dave and I have looked into the TOC body-of-knowledge, and tried to understand, for every insight, concept or tool, what was the limitation that the particular insight had removed, meaning what value couldn’t have been generated and now, using the new insight, we are able to.

Next step is to understand the flawed paradigm behind the limitation, so we’ll be able to clearly understand the scope of new potential value that can be reached.

We have chosen what we believe to be the most generic concepts of TOC and the limitations, followed by the identified flawed assumptions, behind these concepts, and also the somewhat lower-level insights/concepts/tools they impact.

Three key concepts that address the three fears of every manager: complexity, uncertainty, and conflicts.

Concept 1: Inherent Simplicity.

All systems (for instance, organizations) are inherently simple, despite their apparent complexity.

In systems, a few or even one point (Constraint[s]) controls the performance of the whole system, and a few or even one root cause (Core Conflict[s]) generates the vast majority of problems.

Limitation addressed:

Being unable to predict, in a good enough way, the consequences of an action or imposed change.  This failing to predict consequences vastly reduces the quality of decisions.

Flawed assumption:

Treating our current reality as complex, thus failing to make the efforts to identify the few variables that significantly impact the consequences of any action or change.

We can find and manage the few points controlling the system

Example of following the flawed assumption:

Dividing a complex system into subsystems, assuming, 1) that they are much less complex, and 2) that this can help to predict the local impact of any change; further hoping that optimizing local systems will result in good enough prediction of the impact on the whole.

This expectation is most damaging.

Resulting/affected applications:

The Five Focusing Steps, the Four Concepts of Flow, the Three Questions

Concept 2: Inherent Consistency (Harmony):  

There are no conflicts (or inconsistencies) in reality.

All conflicts (on inconsistencies) exist only in our minds. One or more invalid assumptionsare behind every perceived conflict (or inconsistency).

Limitation addressed:

Having to compromise between two conflicting actions, where each action is necessary to satisfy a necessary condition for achieving a desired common objective.

By compromising we get significantly less value for the desired objective.

Flawed assumption:

We know and accept our perception of “reality.”  Actually, every perception of reality is based on many (hidden) assumptions.  It is possible that challenging just one assumption, meaning creating a situation where that assumption is not valid, opens the way to get much more of the common objective.

The perception of a “conflict” should trigger us to reveal our assumptions and then look actively for a valid (realistic) way to challenge them.

Example of following the flawed assumption:

The seesaw conflict of holding less inventory to lower investment and carrying costs, versus holding more inventory to ensure availability to the system, allowing generating more value.

Resulting/affected applications:

The Evaporating Cloud (Conflict resolution diagram), The Change Matrix / Procon cloud

Concept 3: Inherent Consistency (Harmony):  There are no conflicts (or inconsistencies) in reality.

All conflicts (on inconsistencies) exist only in our minds. One or more invalid assumptions produce any perceived conflict (or inconsistency).

Limitation addressed:

Having to compromise between two conflicting actions, where each action is necessary to satisfy a necessary condition for achieving a desired common objective.

By compromising we get significantly less value for the desired objective

Flawed assumption:

We know and accept our perception of “reality.”  Actually, every perception of reality is based on many (hidden) assumptions.  It is possible that challenging just one assumption, meaning creating a situation where that assumption is not valid, opens the way to get much more of the common objective.

The perception of a “conflict” should trigger us to reveal our assumptions and then look actively for a valid (realistic) way to challenge them.

Example of following the flawed assumption:

The seesaw conflict of holding less inventory to lower investment and carrying costs, versus holding more inventory to ensure availability to the system, allowing generating more value.

Resulting/affected applications:

The Evaporating Cloud (Conflict resolution diagram), The Change Matrix / Procon cloud.

Two Key Tools

Concept 4: Inherent Causality:

Systems are subject to cause-and-effect dynamics.

To understand and manage a system, apply rigorous cause-and-effect logic, governed by the Categories of Legitimate Reservation

Limitation addressed:

Being unable, even when accepting the inherent simplicity, to answer the three questions:

  • What to Change?
  • What to change to?
  • How to cause the change?

Flawed assumption:

Using logic is too cumbersome, subjective, and difficult to quantity, to make it effective in finding answers.

A few simple, learnable logical tools can greatly enhance analysis and provide answers to important questions.

Example of following the flawed assumption:

Attempting to independently solve any undesirable effects (symptoms) in the organization without considering root cause(s).

Resulting/affected applications:

The TOC Thinking Processes, The Three Questions

Concept 5:  Inherent Valuation

By dividing expenses into truly-variable costs and the cost of capacity, an entire system and each of its parts may be properly valued. The focus is on Throughput: the pace of generating goal-units. For commerial organizations throughput is: the periodical revenues minus the truly variable expenses.

Limitation addressed: 

It is very complicated to predict the financial outcomes of a suggested action, trying to evaluate directly its impact on revenues and expenses.  Without a well-accepted procedure to make such decisions, managers would be afraid to use such a complicated analysis. 

Flawed assumption:

Not distinguishing between linear behavior and non-linear.

Subtracting truly variable expenses from revenues (resulting in Throughput), and considering non-truly variable expenses to be part of Operating Expense allows us to reliably assess the system prtformamce and the contribution of each of its parts. 

Example of following the flawed assumptions:

Believing in and utilizing cost-per-unit, which is based on assuming expenses behave in a linear way, thus if the cost-per-unit is $1, then the cost of 25 units is $25. This dramatically distorts the real performance of the whole system.

Resulting/affected applications:

Throughput Accounting, Throughput Economics, Operations, Project, and Replenishment Planning, The Six Questions.

Two Beneficial Beliefs

Concept 6:  Inherent Goodness – People are Good.

The reasons for negative outcomes or evets in our systems does not come from people’s nature (good or bad), but from their assumptions and circumstances.

Limitation addressed:

Failing to achieve a desired objective due to contradictory behavior of other people, which wasn’t anticipated or understood.

Flawed assumption:

It is impossible to understand the behavior of other people. Thus, we cannot find the right way to convince them to behave in a way that would contribute to what we want to achieve.

The previous pillar of resolving conflicts also highlights the case where other people act to achieve something that clashes with what we are trying to achieve.

This pillar is wider than direct conflict with other people, it highlights our inability (limitation) to understand the motivation, or resistance, of other people to our initiatives.  From a business perspective, there is special importance to understand our clients, the clients of our client, our suppliers, and our employees.

It is difficult to use cause and effect logic alone to describe the motivation of another person. It should be possible however, based on some known effects and generic assumptions about human behavior, to reduce the overall impression of complexity. In other words, there are a few critical variables that should be considered, including practical gain, ego, and fear.

Example of following the flawed assumption:

Blame and finger pointing – “I did my part. ‘So-and-so’ is the problem…”

Resulting/affected applications:

The Engines of Harmony.

I suggest you read Eli Schragenheim’s article that deals with the extreme, yet realistic, cases of EVIL, and how rhis insight should be treated.

https://elischragenheim.com/2023/10/18/goldratt-claimed-that-people-are-good-how-can-we-understand-that/

Concept 7: Inherent Potential:  Never say “I Know.”

The more solid the base, the higher the jump. Any situation can be substantially improved by identifying new opportunities with significant added-value.  Thus, added-value potential is unlimited.

Limitation(s) addressed:

Being successful makes it difficult, and looks very risky, to identify new, big opportunities.

Success can lead to recency bias and inertia that limit searching for more opportunities.

Flawed assumptions:

I’ve made great improvements and am successful enough – no need for more, and there is no secure way to achieve more.

Actually, the opportunity is in fact huge and can be achieved safely.

If all you have is a hammer, everything looks like a nail.

New opportunities should always be looked at from a fresh perspective, not assuming the solution a priori.

Examples of following the flawed assumptions:

Thinking that since you are better than you used to be, there is no need to continue improving.

Thinking that new, big ways to generate more value do not exist.

Assuming, without careful examination, that a new opportunity will respond to the solution applied to the last opportunity.

Resulting/affected applications:

The Evaporating Cloud, The Three Questions, The S&T Trees, Decisive Competitive Edge (DCE), The Six Questions of Technology.

Three resulting, breakthrough, insights

One result from Inherent Simplicity is:

Inherent Focus:  All systems have one or very few constraints that determine their overall performance. We can maximize the performance of any system by identifying its constraint(s), deciding how we can best exploit them, subordinating everything else to these decisions, and getting more constraint capacity when necessary.

Limitation addressed:

Being unable to focus on what is truly constraining performance prevents very significant leaps of improvement.

Flawed assumptions:

We have many constraints that shift all the time.

Improvements in most areas have very minor impact on overall performance.

If we can make each part of the system more efficient, the entire system will be more efficient.

Improvement at the true constraint greatly improves the performance of the entire system.

Example of following the flawed assumption:

Policies driving local improvements, “peanut butter” spread budget cuts across the entire system.

Resulting/affected applications:

The Five Focusing Steps.

One result from Inherent Tolerance is:

Inherent Control / Buffer Management:  Effective priorities for meeting all our planning objectives can be generated by monitoring the state of the buffers.

Limitations addressed:

Buffers give us only limited protection; we are still exposed to some accumulated fluctuations that disrupt performance.

Our initial buffers are based on guesses.  Continuing to guess doesn’t improve the fitness of the buffers to protect performance from the actual level of uncertainty.

Measuring buffer penetration often provides “early warning” of the potential impact of disruptions.

Flawed assumptions:

When things go wrong it is already too late to react. Too frequent reactions, like expediting, might worsen the overall reliability.

Example of following the flawed assumptions:

Adding more and more status reporting and data analysis to our daily work, thinking that more data yield better information.

Resulting/affected applications

Planned Load, Capacity buffers, Simplified Drum-Buffer-Rope, Critical Chain Project Management, TOC Distribution/Replenishment.

One result from Inherent Potential is:

Inherent Value from Innovation:  Use Goldratt’s Six Questions for assessing the value of a new technology, but expand them to evaluate projects, new strategic moves, and new products

Limitation addressed:

  • Developing anything new is very risky

Flawed assumptions:

Asking potential customers to evaluate future value, which doesn’t exist today and being disappointed from the confused answers. People need to see the product in order to evaluate its value.

Risk funding: Investing in many innovations, expecting that 1 in 10, or even 1 in 20 will yield very high value – enough to cover all the rest and still leave good profit.

A breakthrough is achieved by analyzing future value, without asking potential users, by using the Six Questions of Technology for many different types of seemingly innoated proposals.

Each of the six questions is required to gain the most value from any innovation!

Goldratt claimed that “People are Good” – how can we understand that?

Writing in pain is problematic.  Pain causes negative emotions, which distort the ability to understand the underlying cause-and-effect. 

I’m in pain, so you have to read me carefully, and raise your doubts.

Comment: The article was written before the explosion at the hospital in Gaza. To my mind it doesn’t make any change to the analysis of EVIL.

My great mentor, Dr. Eli Goldratt, defined the pillars behind the Theory of Constraint, and “People Are Good” is one of the pillars.  Collaborating with Dave Updegrove on defining the insights of TOC we have explained the insight this way:

Inherent Goodness: People are good.

The reasons for negative outcomes or events in our systems do not come from people’s nature (good or bad), but from their assumptions and circumstances.

  • Limitation addressed: Failing to achieve a desired objective due to contradictory behavior of other people, which wasn’t anticipated or understood.
  • Flawed assumption: It is impossible to understand the behavior of other people.

 Can we understand the wicked behavior of Hamas? 

Can we treat them as good people?

Some comments to Goldratt’s pillar:

  1. The key message is that it is important to do our best to uncover the assumptions and circumstances that the other party faces, so we’ll be able to understand the behavior and know better what to expect.  This is instead of immediate blaming, which doesn’t help to achieve any value, actually it only causes anger, and in extreme cases even a desire to avenge.
  2. There are cases where assuming that “People are Good” is definitely invalid.  The case is when all what the other side wants to achieve is to make us suffer, and achieving that makes them happy.  This is EVIL, and yet we better understand the causes of EVIL, as it’d give us clues on how to protect ourselves. 
  3. When one side enjoys the suffering of the other side: there is definitely no win-win.  This is the only case where we should actively look for win-lose! 

Let me clarify some facts regarding the Israeli-Hamas catastrophe:

  • Hamas does not look for freedom of occupation!!! 
  • They don’t fight for having a Palestinian state to the side of Israel. Their formal vision is to allow some Jews to live in the one Palestine state as only second-class citizens.
    • Their unclear big dream for the future:  be part of a big Arab Islamic state, not just a Palestinian state.
  • Unlike the West Bank Territories, Israel doesn’t occupy Gaza.  Israel interferes in Gaza just for keeping the security, not always successfully.

Wars are the ultimate case of lose-lose.  How come we had so many wars?

Most wars start because of one of two core causes:

  1. A clash between different religions.
    • For the purpose of this discussion, a basic rooted belief that is so strong that it is accepted as “absolute truth” is treated here the same as a “religion.”  So, extreme racism, including antisemitism, is treated here as a religion.
  2. Dispute over land.

Both are hard to settle.  But the first is the one with the potential of becoming truly evil.  The cause behind religious people doing terrible things to other human beings, is that religion gives the impression of perfect knowledge of what is right.

Let me clarify an issue:  when you read the holy scripts of the well-known religions (not racism), the underlining intentions are:  DO GOOD! 

However, it can be also interpreted as allowing to punish non-believers that sin just because they believe in somewhat different ‘truth.’  Of course, the distorted interpretation is done by people that see something to win from the particular interpretation, usually gaining power over other people.

I think that when Goldratt verbalized “Never Say I Know” (another TOC pillar), he meant this: we human beings cannot know the full and absolute truth.  So, no matter what we observe and deduce we should never assume we know, and always should give room for doubt.  When we see in reality a signal that is not in line with our current knowledge, we should be able to consider the possibility that our knowledge has a flaw that we should fix.  Note, having a flaw doesn’t mean that what we had believed and thought is absolutely wrong (!), it should only point to the need to update the knowledge, like some corrections of the key interpretation.

Some more relevant facts.  While Hamas doesn’t look to any peace settlement, the Palestinian Authority announced that they are ready for a certain two countries settlement.  Solving the conflict is HARD, and it is not clear whether the Palestinian Authority has truly accepted the condition of leaving in peace with an Israeli state.  Land issues have a huge impact, and on top of that there are security issues; after all who gives us assurance that such a settlement would hold?  To my own horror, some extreme Jewish Orthodox leaders claim that GOD gave us the land, so we are forbidden to give part of it away to another nation.

A key assumption for me is that it is possible to analyze emotions in a way that would let us predict certain behaviors, and hopefully also lead us to good enough prediction of the consequences.  It seems to me that when our logic leads us to realize the possible consequences of our actions, it might give us the strength to control and limit negative emotions.

The destructive emotion that is natural, but should be strongly restrained is: looking for REVENGE!

Revenge leading lethal disputes to continue on and on, spreading EVIL all around.  While Israel has to make sure it’d never find itself in such a catastrophic event, it should take measures to keep the revenge emotion out of any act!

I wrote in the past about having to learn from surprises: https://elischragenheim.com/2016/11/10/learning-from-surprises-the-need-the-several-obstacles/

Israel was vastly surprised twice:

  1. A major belief was that Hamas was intimidated by the military power of Israel.  Was that the core flaw behind the inability to predict such an attack?
  2. A failure of the Israeli Army to react quickly to such a surprise, is another surprise caused by another flaw in viewing what is required for a fast response.

There is a lot of talk in Israel on the need to make in depth inquiry after neutralizing the immediate threat.  The biggest obstacle for any beneficial learning is to be very careful from blaming those who made the mistake, while others might have probably made exactly the same mistake.  The valuable benefit would be to learn the core flaw(s) in our current thinking, thus improving our capabilities to ensure a better, and much more secure, future.

The Full Meaning of Flow in Operations

Flow in Operations refers to the movement of products and services to the client.  But do we fully understand the meaning of ‘improving the flow?’ 

Do we fully understand what value is generated when we improve the flow?

Improving the Flow, from the perspective of Operations, could easily be directed at two very different measurements:

  1. The time it takes for one particle of the flow to pass through the whole route.
  2. The quantity of units arriving at the end per period of time.

What happens when improving one measurement is at the expense of the other?

Actually, this conflict is the core of the dispute between the efficiency paradigm, which calls for big batches and high WIP in an effort to increase the total output, and the Lean/Kanban principles, which are focused on the speed of the materials, ensuring fast delivery to actual demand.

The Theory of Constraints (TOC), has resolved the conflict by achieving high scores for both measurements.  It started with improving and controlling the overall potential flow quantity, what is actually delivered to clients.  On top of that TOC succeeds in making the commitments to the clients highly reliable.  Understanding that the potential Flow, interpreted as the overall output generated by the organization, is limited by one capacity constraint, and drawing critical insights from this observation.

The TOC methodology also improves the second prime measurement, by preventing releasing orders that can safely be released later. In Goldratt’s verbalization the idea is to “choke” the release of new orders to the floor.  This is done by estimating the reliable time the order can be completed, considering the common and expected uncertainty, and refusing to release it earlier. Doing that ensures the WIP includes only the orders that have to be delivered within that reliable time.

Opening the way for the orders to flow without long wait times, also keeping a clear priority system to identify the few orders that might need extra push to be completed on time, is how the original conflict between the two prime measurements is settled.

This attitude clashes with the flawed managerial policy of trying to achieve high utilization of every resource, which is practically impossible, and TOC has realized that only the utilization of the constraint, or the weakest link, truly matters.

TOC also interprets the second performance management of Flow as the total value delivered in a period of time, rather than counting the physical output.  This is achieved by using the term ‘Throughput’, the marginal total contribution, to represent the value delivered in a period of time.

On one hand the use of Throughput bypasses the difficulty of defining the units or ‘particles’ of the Flow.  By that it gives an estimation of the total value generated in a period of time.

On the other hand, such interpretation raises an issue that is beyond the scope of Operations, as it looks on the net value the Flow generates.  As long as we are still focusing on Operations and on maximizing the total throughput (T) a relevant, yet disturbing, question is raised:

Can faster flow generate more T per unit sold?

The question expresses the key assumption behind the conflict: less units sold means less total T.  However, if it is possible that faster flow would make the customers ready to pay more, generating higher T per unit, then it could well be that the organization can generate overall more T by accelerating the deliveries, even when it is on the expense of the total quantity of products delivered.

A third performance measurement of Flow is emerging: The net value of the output per period, or the total Throughput generated!

Once the TOC solution for Flow is fully implemented, there is still a certain trade-off that lies within the exploitation scheme of the constraint.  TOC recognizes the need for maintaining protective capacity, even on the capacity constraint resource (CCR) itself, to ensure reliable delivery, in spite of the inherent uncertainty.  The size of the time-buffer, an integral part of the TOC planning methodology, depends on the available protective capacity, which depends on how much the planner is ready to load the most constraining resource. When the constraint is planned for more than 90% of its available capacity, then the reliable lead-times to the customers have to be fairly long, partially because it is difficult for the constraint to cover for fluctuations that impact its own utilization.  At this level of exploitation, it is practically impossible to accept new urgent orders, and the reliable response time cannot be truly short, due to the queue for the CCR time.

When Sales are ready to restrict the planned load on the CCR to 80-85%, then while the total flow is reduced, the speed of handling one order is significantly fast, and highly reliable.

Each of the three performance measurements of Flow can be improved, but special care should be given to whether the improvement of one doesn’t reduce the level of the others.

One of the ideas on turning superior operations into a decisive-competitive-edge (DCE) is offering ‘fast-response’ to clients that truly need it, for a markup.  This is a worthy strategy when the fast response time is perceived by the customer to generate added value. The idea is similar to the offerings of the international delivery companies (FedEx, DHL, UPS).  The advantage of the idea is that while it requires a certain amount of protective capacity, offering fast deliveries whenever required by the customer adds considerable Throughput.

A hidden assumption behind the previous analysis is that the response time in production is the same as the response time to the client.  This is valid only for strictly make-to-order environments.

The vast majority of manufacturing organizations involve make-to-stock products and parts.  This means that from the perspective of the client, when perfect availability is maintained, the response time is immediate.

So, what is the advantage of fast flow for finished-goods items that are held for stock?

There are two ways where fast flow, the first measurement of Flow, could increase sales.

  1. Significantly reducing the on-hand stock without causing more shortages.  With the right method of control, it is possible to vastly reduce the number of shortages.  This basically means much less money stuck in inventory, and fewer items that are sold at significantly reduced price in order to get rid of excess inventory. Even more important is being able to sell more items due to the improved availability, possibly also due to a better reputation.
  2. Being able to quickly identify changes in demand.  When the inventory levels are relatively small it is easier to identify that in too many cases emergency replenishments are required to prevent shortages.  This is a signal for a real increase in demand.  On the other hand, when the small quantity of stock stays for too long, it signals the demand is lower than before.  The TOC tool called Dynamic Buffer Management (DBM) is used to identify those changes.

So, fast flow for make-to-stock is able to increase the profit, but it has to be cleverly used, as just fast flow to stock doesn’t add value to the customers.

Should make-to-stock, more specifically make-to-availability, be applied to slow-movers?

Slow movers cause, on average, slower flow due to two different causes.  One is that when the priorities on the floor are properly followed, then when sales are weak, the production order is given lower priority, so it could be stuck until the higher priority orders are processed.  The second cause is that the finished goods inventory of slow-movers is held for a relatively long time.  From the return-on-investment perspective, slow-movers yield low return, even when the T per item is higher than for fast-movers.  It is more effective to manage slow-movers as make-to-order items, unless the clients insist on immediate delivery.

The key conclusion

Flow has to be measured by three different measurements, with certain dependencies between them.  First, the speed of one particle of the flow to go through the whole route.  Second, the total quantity that passes through the flow in a period of time, which highly depends on the available capacity of the constraint/weakest-link. Third, is the total value that can be generated in a period of time, which depends on the two other measurements, but also on additional factors.

Recognizing the cause-and-effects that enable fast response, understanding the dependencies between fast flow, the total quantity delivered, and the analysis of the generated value, is important for every organization.  Just accelerating the speed of orders is not sufficient.

The Business Potential from a Unique Capability

A unique capability of a business is not widespread, as most businesses compete without being perceived by their customers as being “special.”  It is quite different in Art, Sport, and Science, where being unique or special is truly desired.  The unique capability of artists, sportsmen and scientists could lead, in various ways, to commercial success.  Both Art and Sport have their influences on Fashion, where the unique capability, when available, has a strong influence.  Some key high-tech organizations succeed to develop their own special capability, usually around one person, which sometimes, not too often, is also ready to teach and inspire others, so the unique capability gets stronger and wider within the organization.

The majority of the commercial organizations don’t have a unique capability and due to that face fierce competition and as a result a chronic difficulty to prosper.  

The Theory of Constraints (TOC) is naturally focused on the impact of capacity constraints on the performance of the organization, and comes up with breakthrough ideas on how to better exploit the constraint so more of the goal is achieved.  Two key concepts that rise from the recognition of having to deal with a capacity constraint resource are:

  1. Exploitation of the constraint, making sure its capacity is utilized for what generates the best profit.  Practically exploitation means a plan on how to exploit the limited capacity.
  2. Subordination to the exploitation scheme.  Setting the right policies so the exploitation plan can work to the fullest extent.

‘Capacity’ is a related term to ‘Capability.’  It looks for the maximum output units the resource can do in a period, like a day, a week, or a year.   The capacity limitation impacts the potential quantity of products/services that can be sold, but it doesn’t control the value to the customer, relative to the value the customer gets from the competition. 

So, when the operations of a relatively routine production system are significantly improved by identifying the capacity constraint, and instituting the most effective exploitation and subordination processes, there is a great opportunity to sell much more, which could leap the profit up. 

Identifying a unique capability, which delivers very high value to the customers, could yield even more, but just gaining a unique capability is insufficient.  At least two additional conditions must be in place.  One is that the unique capability can generate additional value to many customers, and the other is having a holistic program to generate as much value as possible.

Within the TOC methodology for Strategy the concept of gaining a ‘decisive-competitive-edge’ (DCE) is especially important. 

Having a decisive competitive edge (DCE) is achieved by the company answering a critical need of the client in a way that their competitors do not and it is difficult for the competition to quickly replicate their own solution to that need.  Another requirement of a DCE is that for all the other critical parameters the company performs well enough – at about the same level as its competitors.  A last requirement for validating a DCE is that it doesn’t result in new problems of significance.

The concepts of ‘DCE’ and ‘unique capability’ are connected.  A DCE that doesn’t rely on a unique capability can be easily imitated, except when the DCE is based on overcoming a very widely held but flawed assumption.  When an organization buys a unique capability, like acquiring a small company with such capability, then the challenge is developing and implementing a holistic scheme to draw the full value.  So, while imitating a competitor that came first with the unique capability is possible, it takes considerable time and effort.

How should a holistic scheme be made?  Here is an insight to digest: 

It is not enough to gain a unique capability, which can add considerable value to the customers, in order to be successful.  There is a need to develop effective exploitation and subordination procedures.  This means that on top of the capacity constraint, the unique capability, when available, requires such procedures to ensure it is fully directed to maximize the value, and that nothing else is missing from what the customer requires to draw the value.  The exploitation plan is to design the products/services in a way that emphasis the added-value of the unique capability, as well as pricing them accordingly.  The subordination processes should come up with the appropriate intermediate objectives, and performance measurements, that are targeted at fulfilling all the necessary requirements for the exploitation scheme.

Exploitation is a plan, a set of absolutely necessary decisions, targeted at achieving the best overall achievement of the goal. The subordination processes and policies are targeted to allow the exploitation plan to be performed as smoothly as possible.

It makes sense that the leading exploitation/subordination logic should be applied first to the unique capability, and then derive how should the capacity constraint be exploited.  The unique capability is used to enhance the value, and its broad perception, in the market.  Once that is planned, the capacity constraint requires its own exploitation and subordination to control the volume of the sales and the timely delivery.

Watching the final of the 2022 Mondial (World Cup soccer) highlighted for me some relevant observations.  A very highly desired unique capability for a soccer/football star player is being able to spot and take advantage of a very short-time opportunity to score a goal.  All the team should strive to create as many situations as possible for such an opportunity.  This is the essence of the exploitation, and the subordination means that all the team members stick to that objective.

The goalkeeper should be viewed as the natural constraint, because of the lack of capacity of any human being to equally protect all the goalpost area.  The overall exploitation of both the constraint (our goalkeeper) and the unique capability means having the ball away from our goalpost as possible, which also supports the exploitation scheme for preparing opportunities at the other end.  The overall subordination means focusing on the two targets: keeping the ball away from our goalpost, and finding more opportunities for the main striker(s).

In businesses the opportunity to develop a unique capability has a major impact on gaining a competitive edge, possibly even a decisive competitive edge.  Certainly, Steven Jobs had this kind of unique capability, which Apple still succeeds in maintaining.  But, instead of looking for a one-time genius, it is possible to create a team with combined skills and methodology that are focused on achieving the required unique capability.  When the specific capability is the outcome of a plan, then the key ingredients of the exploitation scheme should have been already thought of, and the challenge is to come up with the effective subordination of the whole organization, making it a true decisive-competitive-edge (DCE).

Dr. Goldratt, the developer of TOC, came up with three major steps for a strategy that is based on a new DCE: Build, Capitalize and Sustain

Build is the step of developing all the required skills for the unique capability.  Capitalize is the exploitation scheme for Marketing and Sales.  Sustain is a critical element for Operations to be ready for significantly increased demand, a necessary condition for success.

Thinking about the first two steps raises the issue that exploiting the key unique capability should involve not just the Marketing part, but also the Operations and Finance. 

Many restaurants, all over the world, struggle to achieve a competitive edge through the unique food made by their chef.  While being successful in achieving a competitive edge, it is seldom a “decisive” one.  Gaining Michelin star(s) does that by providing “proof” that the food is exceptional, worthy of the high price.  But, even when the edge is truly decisive, it is hard to scale up the volume of business, because the stars apply only to the specific restaurant, not to a whole chain, and customers are aware that when the same chef expands his/her reach to more restaurants, it is unclear whether the local chefs truly produce in the spirit of the star chef.  Another business problem of such a chain is that by expanding the unique knowledge of preparing special dishes, other chains might find ways to learn the secret and imitate the famous chef specialties.

An interesting case is the success and eventually failure of the Concorde plane.  The unique technology made the plane significantly faster than all other commercial aircraft.  The value to customers came from the much shorter flying time across long distances.  The extra value for a traveler with time to spare was limited and the price for a Concorde flight was too high for them.  So, the target market segment had to be top business and political people, who assume that their time is worth a great deal of money.

Along came the problem of failing to find an effective means for exploitation of the unique capability:  a busy businessperson, say in New York, has a specific window of time to go to Paris to meet associates and quickly return to New York.  Well, the few Concordes couldn’t offer terribly flexible departure and arrival schedules.  Private planes, even though they are much slower, provide that overall flexibility.

On top of that, the Concorde created a new problem, in TOC they are called “a negative branch,” where a valuable new idea also causes a new problem.  The Concorde, on top of its high cost, was way too noisy (sonic booms with every penetration of the sound barrier) and big cities don’t like noisy airplanes taking off and landing nearby.  Not finding a solution to the negative branch, coupled with the difficulty finding enough demand, brought the unique capability of the Concorde to an end and that was before the current priority on reducing carbon emissions.

Is it possible to gain a unique capability in Operations? 

Imitating a new operational procedure is a piece of cake, right?  Sometimes it is but look at the Toyota Production System.  How many other manufacturing organizations succeeded in being as effective?  Maybe we still don’t fully understand all key insights that Toyota has adopted?

Dr. Goldratt strived to achieve a unique capability from significantly improved operations.  One provoking idea is to be able to deliver, much faster than normal, some orders, for a substantial markup.  The emphasis is not on always delivering faster than others but on being able to reliably give that premium service when truly beneficial to the customer, which makes it possible to ask for a markup.  This is a key idea regarding exploitation: letting the customer decide whether there is a real need to get the product sooner. 

The idea follows FedEx, UPS, and similar international delivery companies, who developed their own unique capabilities to do that, but Goldratt transformed the idea to the more complex environment of manufacturing. 

Some generic conclusions

All commercial organizations, also some not-for-profits, should recognize the potential of gaining a DCE that is based on a carefully developed unique capability, which brings huge value to well-defined market segment(s), making it difficult for competitors to quickly imitate. Basing the strategy around that unique capability means using the core insights for effective exploitation and developing the rules for subordination.  Thus, the exploitation and subordination of a unique capability should be the core of the whole strategy.

There should be two major inputs for new ideas about developing a strategy:

  1. What are we good at?  What particular skills do we have?  What new skills can we acquire?
  2. What seems to be currently painful to quite a lot of potential clients? 

The major challenge is recognizing the current pain of potential customers, which our special capabilities could remove.

The challenge in recognizing what we are good at is to be able to judge our skills objectively.  Note, even when we recognize that our skills are not extraordinary, if we find a way to utilize them to develop a unique capability that there is a need for – this is what many other people and organizations, with similar raw skills fail to see.

The skill to be able to develop worthy new skills is of huge advantage.  It is a pity so few organizations are looking for people who can quickly learn new skills.

Is it Right, Wrong, or Unclear?

I find respectable discussions on the content of key issues of our life especially rewarding.  Here is an issue my friend Alejandro Fernandez had during his presentation at the 2022 TOCICO Conference.

The topic was The Measurement Nightmare Solved with Throughput Economics Approach. The idea is to judge the added value of a new move or idea, opening the door to evaluate the contribution of the new move to the Goal of the organization.

One of the financial measurements that can be used is the return-on-investment of the new move.  Here is the formula stated by Alejandro:

Sanjeev Gupta and Filippo Pescara, two well-known TOC experts, claimed that the above formula is incorrect.  The situation of presenting live could be too pressing to fully understand the criticism and its validity.  Moreover, one of the most common, but also trickiest problems is when a specific expression can be interpreted in two very different ways.  I believe this is the situation here.

Let us use an example:

Imagine a restaurant chain with four restaurants spread over the city.  The owner is contemplating adding a fifth branch.  He believes that such a restaurant, at a location far away from the others, would add mainly new customers, who are aware of the reputation of the chain, but highly prefer the new location.

  • The new restaurant requires a net investment of $500K.
  • The additional operating expenses of the chain would go up by $1.2M a year.
  • The evaluation of overall Throughput (revenues minus the truly-variable-costs, like the purchased food) comes to 1.5M a year.
  • This means the chain of restaurants will gain, due to the additional branch, net-profit, before tax, of (1.5M – $1.2M) = $300K a year.
  • The ROI of the investment in the new restaurant is $300 / $500 = 60%.

But here is the clarity issue: The ROI of 60% is only for the new restaurant – it is NOT the ROI of the chain and it is obvious that the total ROI is NOT going up by 60%!  To calculate the new ROI for the whole chain we need to consider the new total throughput of all five restaurants minus the operating expenses of all the restaurants, then dividing it by the total of the current investment plus the new one.

The point here is: what do you understand from the expression: Delta-ROI? 

Is it the change in ROI for the whole organization?  Or is it the ROI of just the new move? 

The above formula refers to the later interpretation!

Comment: The full Throughput Economics method involves TWO series of calculations, one is based on conservative assessments of the additional Throughput and additional Operating Expenses, and one is based on optimistic assessments.  To understand the reason for going through the calculations twice, see Alejandro’s whole presentation, or read the book: Throughput Economics, by Henry Camp, Rocco Surace, and me.

Please, come up with your reservations to continue the open discussion.

TOC and AI: Using the TOC Wisdom to Draw the Full Value from AI for Managing Organizations

By Eli and Amir Schragenheim

A powerful new technology has the potential of achieving huge benefits, but it is also able to cause huge damage.  So, it is mandatory to carefully analyze that power.  We think it is the duty of the TOC experts to look hard at AI and see how to exploit the benefits, while eliminating, or vastly reducing, the possible negative consequences.

Modern AI systems are able to make predictions based on large volume of data and simulated results and either take actions, like robots do, or support human decisions.  An important example is the ability to understand language, get the real meaning behind it, and generate a variety of services. The “experience” is created by the provided dataset, which has to be very large.  This kind of learning tries to imitate human beings learning from their experience, with the advantage of being able to learn from a HUGE amount of past experience, hopefully with less biases. 

AI currently generates value mainly by replacing human beings in relatively simple jobs, making it faster, more accurate, and with less ‘noise’.

AI has some critical flaws; one is being unable to explain how a specific decision has been reached.  Its dependency on both the large datasets and the training makes the inability to explain a decision a potential threat of making mistakes that most human beings won’t. Even huge datasets are biased due to the time, location and circumstances where the data have been collected, so they might misinterpret a specific situation. 

This document deals with the potential value for managing organizations that can be achieved by combining the Theory of Constraints (TOC) with AI.  It doesn’t deal with other valuable uses of AI.

The focus of TOC is on the goal and how to achieve more of it – so in terms of management it will look on what prevents the management team from achieving more of the goal.

TOC focuses on options for finding breakthroughs, trying to explore where there’s a current limitation to achieve more goal units, so we’d like to explore whether the power of AI can be used to overcome such limitations.

Without a deep understanding of the management needs, the potential value of AI, or any other new technology, is limited to needs that are obvious to all, and that AI is able to answer via automation, without having additional elements for the solution to work. In the more complicated case of using robots to move merchandise in a huge warehouse, we have a fairly obvious combination of two technologies, AI and robotics, for answering the need to replace lower-level human workers, probably also improving the speed with less mistakes (higher quality).

When it comes to supporting the decisions of higher-level managers the added value of AI is much less obvious.  One aspect that is basically different from the regular current uses of AI is: the human decision maker has to be fully responsible for the decision.  This means the AI could recommend, or just supply information and trade-offs, but it should not be the decision maker.  This raises several tough demands from AI technology, but when these demands are answered, new opportunities to gain value are raised.

Providing absolutely necessary information, which is either missing today, or given by the biased and inaccurate intuition of the human manager, is such an opportunity. 

Covering for not-good-enough human intuition, replacing it by considering a very wide large volume of data, performing a huge number of calculations, looking for correlations and patterns that imitate the human mind, using reinforcement rewards to identify the best path to the supporting information, the human decision maker gets a generic opportunity to improve the quality of the decisions.  Eventually, the decision maker might need to include facts that aren’t part of the datasets, and use human intuition and intelligence to complement information upon which an important decision has to be made.

Measuring the uncertainty and its impact

The trickiest part in predictions is getting a good idea not just of the exact value we like to know but also the reasonable range of deviations from it.  Any prediction of the future isn’t certain, so the key question should be ‘what should we reasonably expect?’

TOC developed the necessary tools for keeping a stable flow of products and materials in spite of all the noise (common and expected uncertainty), using visible buffers as an integral part of the planning, and buffer management for determining the priorities during the execution.  This line of thinking should be at the core of developing AI tools to support the management of the organization.

The most immediate need in managing a supply chain (and other critical and important decisions in business) is to get a good idea of the demand tomorrow, next week, next month and also in the long term.  Assessing the potential demand for next year(s) is critical for investing in capacity or in R&D. There is NO WAY to come up today with a reliable exact number of the demand tomorrow, and it gets worse the longer we go into the future (this is just the way uncertainty works). 

Example: Suppose the very best forecast algorithm tells you that next week’s demand for SKU13 is 1,237 units, but the actual demand turns out to be 1,411. 

Was the original forecast wrong? 

Suppose another forecast predicted the sales to be 1,358, is the algorithm behind the second forecast necessarily better?  After all, both were wrong.

Suppose now that the first algorithm included an estimation of the average absolute deviation, called the ‘forecasting error’.  The estimation was plus-minus 254.  This puts the first forecast in a better light because the prediction included the possibility of getting 1,411 as the actual result.  If the second algorithm doesn’t include any ‘forecasting error’, then how could you rely on it?

Effective managers have to be aware of what the demand might be.  When they face one-number forecasts, no matter how good the forecasting algorithm is, they frequently fail to make the best decision, given the available information.

Thus, a critical request from any type of forecasting is to reveal the size, and its related impact, of the uncertainty around the critical variables that impact the decision.  Having to live with the uncertainty means recognizing the damages when the actual demand will be different from the one-number forecast.  The relative size of the damage when the demand is less than the forecast, and when it is higher than the forecast, should lead the manager to make a choice that significantly impacts the decision.

There are meta-parameters of the AI algorithm, that dictate the decision made by it. Adjusting these meta-parameters can easily generate a result that is more conservative or more optimistic (for example – instead of using 0.76 as the threshold we can use 0.7 in one instance and 0.82 in the other). This way, being exposed to both predictions gives the decision-maker better information to consider the most appropriate action, without getting used to standard deviation or the like.

Reaching for more valuable information on sensing the market

A critical need of every management is to predict the market reaction to actions aiming at attracting more demand, or being able to charge more.  Most forecasting methods, with a few exceptions, assume no new change in the market.  Thus, on top of dealing with the quality of forecasting the demand, considering just the behavior in the past, there is a need to evaluate the impact of proposed changes, also expected changes imposed by external events, on the market demand.

Analyzing the potential changes in the market using the logical tools provided by the Thinking Processes can usually predict, with reasonable confidence, the overall trends that the changes would generate.  But the Thinking Processes cannot provide a good sense of the size of the change in the market.  When proposed changes cause different reactions, like when the esthetics of the products go through a major design change, human predictions are especially risky. 

Significant changes are a problem for the current practices of AI. However, AI algorithms that detect a deviation from a certain reality already exist, and are used extensively in predictive maintenance of manufacturing facilities. Such a signal from the AI can direct the decision-makers that the reality has changed, giving them the signal that manual intervention is needed. 

Predicting the impact of big changes that are made internally, like changes in item pricing, launching a big promotion etc., is a real need for management.  While changing the pricing of an item seems like an easy task, it is tricky to assess all the implications on the demand for other items and the response of the competitors. Plus – those changes don’t occur very frequently, and the internal data gathered for such changes in the past might not be enough to generate an effective AI model that predicts the implications accurately enough. This presents an opportunity for a 3rd party organization that deals with Big Data. Such an organization can gather data from many interested organizations, and use the aggregated data to build a much more capable AI model, which can be used by the organizations sharing their data to predict the effects of those actions better. This would create a win-win for all parties involved, and can cover the operations cost easily.  Such an organization should guarantee to avoid disclosing any data of a specific organization, just share the overall insights.

Warnings about changes in the supply

The natural focus of management is first on the market, then on Operations, which represent the capabilities of the organization to satisfy (or not), and possibly achieve more, demand.

The supply is, of course, an absolutely necessary element for maintaining the business. The problem is that when a supplier is going through a change that negatively impacts the supply, it might take a considerable amount of time for some clients to realize the change and the resulting damage.  The focus of management should not be on routine relationships.  However, when a change in the behavior is identified early enough, possibly by using software, it answers a basic need.  It is especially valuable when the cause of the change is not known. For instance, when a supplier faces financial problems or a change of management.

Achieving effective collaboration between AI, analytics, and human intuition

The three key limitations of AI are

  1. Being a ‘black box’ where its recommendations are not explained.
  2. The current practices don’t use cause-and-effect logic.  There are moves within AI to include cause-and-effect sometime in the future.
  3. AI is fully dependent on the database and the training.

One way to partially overcome the limitations is to use software modules, based on both cause-and-effect logic and on ‘old-fashioned’ statistical analysis, that evaluates the AI’s recommendations and checks how reasonable they are, possibly also re-activating the AI module in order to check a somewhat different request. 

Example.

Suppose the AI prediction for product P1 deviates significantly from the regular expectation (either regular forecast or simply the current demand), then the AI module could be asked to predict the demand for a group of similar products, say P2 up to P5, assuming that if there is a real increase in demand for P1 the other similar products should also show a similar trend.  Predicting the demand for a group of products should not be based on predicting the demand for each and combining them, but to repeat the sequence of operations considering the combined demand in the past. Thus, logical feedback is obtained checking whether the AI unexplained prediction or recommendation makes sense.

The other way is to let the human user accept or reject the AI output.  It is desired that the rejection is expressed in a cause-and-effect way, which could be used by the AI in the future as new input.

Additional inputs from the human user

AI cannot have all the relevant data required for making a critical decision.  If the human manager is able to input the additional relevant data to the AI module, and a certain level of training is done to ensure that the additional data participate in the learning and the output of the AI module, this could improve the usability of the AI as high-level decision support.

Conclusions and the vision for the future

AI is a powerful technology that can bring a lot of value, but also may cause a lot of damage.  In order to bring value, AI has to eliminate or reduce a current limitation.  Implementing AI has also to consider the current practices and to outline how the decision-makers should adjust to the new practice and how to evaluate the AI recommendations before taking the actions. 

Supporting management decisions is a worthy next direction for AI.  But it definitely needs a focus to ensure that truly high value is generated, and possible damage is prevented.

TOC can definitely contribute a focused view into the real needs of top management.  It also enables an analysis of all the necessary conditions for supporting the need.  This means that while AI can be a necessary element in making superior decisions, in most cases the AI application would be insufficient. For drawing the full value other parts, like responsible human inputs, other software modules, and proper training of the users, have to be in place.

TOC is about gaining the right focus for management on what is required in order to get more of the organizational goal.  Assisting managers to define what needs immediate focus, as well as assisting in understanding the inherent ‘noise’ and allowing quick identification of signals, is a critical direction for AI and TOC combined to improve the way organizations are managed.  Even human intuition could be significantly improved, while being focused on the areas where AI is unable to assist.

Improvements that AI can give to TOC

The proposed collaboration between the TOC philosophy and AI should not be just one way. The TOC applications can get substantiate support from AI, especially for buffer sizing and buffer management.

Buffer sizing is a sensitive area.  The introduction of buffers for protecting, actually stabilizing, the delivery performance, is problematic at the initial state. But at that point AI cannot help, because analyzing the history before the TOC insights have been actively used is not helpful.  But, after one or two years under the TOC guidelines, AI should be able to point to too-large buffers, also pointing to few too low ones.  The Dynamic-Buffer-Management (DBM) procedure for stock-buffers, based on analyzing the penetrations into the Red Zone and for how long, could be significantly improved by AI. Another potential improvement is letting AI recommend by how much to increase the buffer.  Similar improvements would be achieved by analyzing when staying too long in the Green Zone signals a safe decrease of the stock buffer.

The most important use of Buffer management is setting one priority system for Operations, guiding what is the most urgent next job for delivering all the orders on time.  A part that needs improvement is when expediting actions are truly needed, including the use of capacity buffers to restore the stability of the delivery performance.  Here is again a critical mission for AI to come up with improved prediction of the current state of the orders against the commitments to the market.

The TOC procedures were influenced by recognizing the capacity limitations of management attention. By relieving some of the ongoing, relatively routine, cases where AI is fast and reliable enough, TOC can focus management attention on the most critical strategic steps for the next era.

What should WE learn from Boeing’s two-crash tragedy?

The case of the two crashes of Boeing’s 737 MAX aircraft, less than six months apart, in 2018 and 2019, involves three big management failures that deserve to be learned, so some effective lessons can be internalized by all management.  The story, including some of the truly important detailed facts, is shown in the recent launch of “Downfall: The Case Against Boeing”, a documentary by Netflix.

We all demand 100% safety from airlines, and practically also from every organization: never let your focus on making money cause a fatal flaw!

However, any promise for 100% safety is utopian.  We can come very close to 100%, but there is no way to ensure that fatal accidents would never happen.  The true practical responsibility is made by two different actions:

  1. Invest time and effort to put protection mechanisms in place.  We in the Theory of Constraints (TOC) call them ‘buffers’, so even when something goes wrong, no disaster would happen.  All aircraft manufacturers, and all airlines, are fully aware of the need.  They include protection mechanisms, and very detailed safety procedures, into the everyday life of their organizations.  Due to the many safety procedures, any crash of an aircraft is the result of a combination of several things going wrong together and thus is very rare. Yet, crashes sometimes happen.
  2. If there is a signal that something that shouldn’t have happened has happened, then a full learning process has to be in place to identify the operational cause, and from that identify the flawed paradigm that let the operational cause happen.  This is just the first part of the learning. Next is deducing how to fix the flawed paradigm without causing serious negative consequences.  Airlines have internalized the culture of inquiring every signal that something went wrong.  Still, such a process could and should be improved.

I have developed a structured process of learning from one event, now entitled as “Systematic Learning from Significant Surprising Events”.  TOCICO members could download it from the TOCICO site of New BOK Papers, the direct link is https://www.tocico.org/page/TOCBodyofKnowledgeSystematicLearningfromSignificantSurprisingEvents

Others could approach me and I’ll gladly send the paper to them.

Back to Boeing.  I, at least, don’t think it is right to blame Boeing for what led to the crash of the Indonesian aircraft in October, 29th, 2018.  All flawed paradigms look as if everybody should have recognized the flaw, but this is inhuman.  There is no way for human beings to eliminate all their flawed assumptions.  But it is our duty to reveal the flawed paradigm once we see a signal that points to it.  Then we need to fix the flawed assumption, so the same mistake won’t be repeated in the future.

The general objective of the movie, like the habit of most public and media inquiries, is to find the ‘guilty party that is responsible for so-and-so many deaths and other damage.’  Boeing top management at the time was an easy target given the number of victims.  However, blaming top management because they were ‘greedy’ will not prevent any safety issue in the future.  I do expect management to strive to make more money now, as well as in the future.  However, the Goal should include several necessary conditions, and refusing to take a risk for a major disaster is one of them.  Pressing for very ambitious short time of development, and launching a new aircraft without the need to train the pilots, who are trained with the current models, are legitimate managerial objectives.  The flaw is not being greedy, but failing to see that the pressure might lead to cutting corners and to prevent employees from raising a flag that there is a problem.  Resolving the conflict between ambitious business targets and dealing with all the safety issues is a worthy challenge that needs to be addressed.

Blaming is a natural consequence of anything that went wrong.  It is the result of a widely spread flawed paradigm, which pushes good people to conceal the facts that might lead to their involvement with highly undesired events.  The fear is that they will be blamed and their career will end.  So, they do their best to prevent revealing their flawed paradigms.  The problem is: other people still use the flawed paradigm!

Let’s see what were the critical flawed paradigm(s) that caused the Indonesian crash.  Typically, two different combined flaws led to the crash of the Indonesian plane.  A damaged sensor sent wrong data to a critical new automatic software module, called MCAS, which was designed to fix a problem of too high angle of rising.  This was a major technical flaw of failing to consider the case that if the sensor is damaged then MCAS would cause a crash.  The sensors stick out of the airplane body, so hitting a balloon or a bird can destroy the sensor, and this makes the MCAS system deadly.

The second flaw, this time managerial, is deciding not to let the pilots know about the new automatic software. The result was that the Indonesian pilots couldn’t understand why the airplane is going down.  As the sensor was out of order, many alarms were heard filled with wrong information and the stick shaker on the captain’s side has been loudly vibrating.  To fix that state the pilots had to shut off the new system, but they didn’t know anything about MCAS and what it was supposed to do.

The reason for refraining from telling the pilots about the MCAS module was the concern that it’d trigger mandatory pilot training, which would limit the sales of the new aircraft.  The underlining managerial flaw was failing to realize how that lack of knowledge could lead to a disaster.  It seems reasonable to me that the management tried their best to come up with a new aircraft, with improved performance, and no need for special pilot training.  The flaw was that being unaware of the MCAS module could lead to such a disaster.

Once the first crash happened, and the technical operational cause revealed, the second managerial flaw took place.  It is almost natural after such a disaster to come up with the first possible cause that is the least damaging to the Management.  This time it was easy to claim that the Indonesian pilot wasn’t competent.  This is an ugly, yet widely spread, paradigm of putting the blame on someone else.  However, facts coming from the black box eventually told the true story.  The role of MCAS in causing the crash was clearly discovered, and the role of the pilots not having any prior information about it.

The safe response to the crash should have been grounding all the 737 MAX aircraft until a fix for MCAS is ready and proven safe.  It is my hypothesis that the key management paradigm flaw, after establishing the cause for the crash, was highly impacted by the fear of being blamed for the huge cost of grounding all the 737 MAX airplanes.  The public claim from Boeing top management was: “everything is under control”, a software fix would be implemented in six weeks, so there is no need to ground the 737 MAX airplanes.  The possibility that the same flaw of MCAS would lead to another crash was ignored in a way that could be explained only by top management being under huge fear for their career. It doesn’t make sense that the reason for ignoring the risk was just to reduce the costs of compensating the victims, by still putting the responsibility on the pilots.  My assumption is that the top executives of Boeing at the time were not idiots. So, something else pushed them to take the gamble of another crash.

Realizing the technical flaw forced Boeing to reveal the functionality of MCAS to all airlines and pilot unions.  It included the instruction that when the MCAS goes wrong to shut-off the system.  At the same time, they published that a software fix to the problem would be ready in six weeks, an announcement that was received with a lot of skepticism.  Due to these two developments Boeing formally refused to ground the 737 Max aircraft.  When directly asked by a member of the Allied Pilot Association, during a visit of a group of Boeing managers (and lobbyists) to the union, the unbelievable answer was: No one has concluded that this was the sole cause of the crash!  In other words, until we have full formal proof, we prefer to continue business as usual. 

Actually, the FAA, the Federal Aviation Administration, issued a report assessing that without a fix there will be a similar crash every two years!  This means there is 2-4% chance that a second crash could happen within one month!  How come the FAA has allowed Boeing to let all the aircraft fly?  Did they carry out an analysis of their behavior when the second crash occurred after five months without a fix of the MCAS system?

Another fact mentioned in the movie is that once the sensors are out-of-order and the MCAS points the airplane down, the pilots have to shut off the system in 10 seconds, otherwise the airplane is doomed due to the speed of going down!  I wonder whether this recognition has been discussed during the inquiry into the first crash.

When the second crash happened Boeing top management went into fright mode, misunderstanding the reality that the trust of the airlines, and the public, in Boeing, has been lost. In short: the key lessons from the crash and after-crash pressure were not learned!  They still didn’t want to ground the airplanes, but now the airlines took the initiative and one by one decided to ground them.  A public investigation was initiated and from Boeing Management Team perspective: hell broke loose.

The key point for all management teams: 

It is unavoidable to make mistakes, even though a lot of effort should be put trying to minimize them.  But it is UNFORGIVEN not to update the flawed paradigms, causing the mistakes.

If that conclusion is adopted, then a systematic method for learning from unexpected cases should be in place, with the objective of “never to repeat the same mistake”.  Well, I cannot guarantee it’ll never happen, but most of the repeats can be avoided.  Actually, much more can be avoided, as once a current flawed paradigm is recognized and the paradigm updated, the derived ramifications can be very wide.  If the flawed paradigm is discovered from a signal that, by luck, is not catastrophic, but surprising enough to initiate the learning, then huge disastrous consequences are prevented and the organization is much more secure.

It is important for everyone to identify, based on certain surprising signals, flawed paradigms and update them.  It is also possible to learn from other people, or organizations, mistakes. I hope the key paradigm of refusing to see a problem already visible, and trying to hide it, is now well understood not just within Boeing, but within every top management of any organization.  I hope my article can help to come up with the proper procedures for learning the right lessons from such events.

The other side of the coin: Amazon’s future threats

Part 2

By Henry F. Camp and Eli Schragenheim

What future threats does Amazon face?  Don’t believe they are bulletproof because of their current dominant position or an internal culture that embraces Jeff Bezos’ “Day One” philosophy, which demands companies to stay as sharp as they were when they were first founded – vulnerable, before amassing financial or political strength.  While Amazon serves its customers well, it behaves differently with its lower-level employees and even many of its business collaborators are less than satisfied.

As with any company that operates on a massive scale, there is significant pressure on Amazon to control wages.  This is obvious, right?  Their size, in terms of the sheer number of employees, means that paying well would come at a tremendous cost.  After all, the purpose of getting big was to gain operational efficiencies that allow Amazon to both earn high profits and offer low prices.  Given their customer orientation, this combination means Amazon feels it must look out for its customers at the expense of its employees and suppliers.

The Amazon decisive competitive edge relies on efficiencies.  The company’s approach to achieving them is multidimensional.  They work to automate wherever possible, so they require fewer employees and gain speed and accurate delivery.  They push back on their suppliers as well, sharing the cost of providing logistics.  More on this later.

When you employ 1.5 million people, assuming 2,000 hours for each per year, adding one dollar to hourly compensation increases costs by $3 billion per year, a non-trivial consideration.  That may be exactly what they had to do to maintain operations during fiscal 2021, the year of the casual COVID-worker. 

Nevertheless, Amazon’s cash flow increased by $17 billion to $60 billion in fiscal 2021.  To put that number into perspective, it is closing in on Microsoft, $87 billion, and Apple, $120 billion, the two most profitable companies in the world, outside of government owned entities.

Now, having high profits is not intrinsically bad.  Both customers, employees, suppliers and governments alike benefit enormously from Amazon’s success.  Nor do high profits oblige the companies that earn the most to do more than any other company does for their employees.  The question of whether compensation is fair or unfair is in comparison to what other companies pay and for equivalent work in an equivalent context. 

By context, we mean culture, as well as the physical environment and relative safety.  People who work in coal mines may demand higher compensation than those who sit all day in comfortable office chairs.  A wonderful corporate culture or purpose may attract some employees, even if the pay is not up to par, such as missionary work.  Is the workplace tough or even cruel?  Bad cultures typically result in high turnover and quitting-in-place, where workers accept paychecks for doing as little as possible.  Lastly, the degree to which a person’s work relies on their ability to plan out into the future determines what Elliott Jaques called felt-fair pay.  A PhD does not expect to be paid more than their coworkers for flipping burgers at McDonalds.

Back to Amazon in particular, dozens of articles going back for years have decried their treatment of front-line employees, claiming heavy workloads and loss of autonomy, down to timed restroom breaks.  The question is, are these factors a potential risk to Amazon’s future?

Amazon is a system and a very efficient one.  A system that largely took in stride a massive increase in volume as a result of the totally unforeseen COVID pandemic.  They provided the world with goods when we were unable or unwilling to go out shopping in our neighborhood brick-and-mortar stores.

They have dialed in exactly what they expect of employees to gain this efficiency.  The hope is their customers are the beneficiaries.  Meanwhile Amazon’s payrates are not the lowest.  So, where is the risk?

From a TOC point of view, it is a local/global conflict.  The efficiencies Amazon measures their operators against are local, not global.  The reason they want higher efficiencies is to become more effective globally, across their enormous company.  It all seems to be working far better than we might have reasonably predicted.  So, again, what risk?

Systems Theory points us in the direction of an answer.  It informs us that the sum of local optima does not lead to a global optimum.  This conclusion is in sync with the TOC, which implores us to focus on the system’s constraint.  The implication is that non-constraint resources must have some slack in their scheduled workloads.  The many complaints about Amazon as an employer is that they seem to focus on “sweating the assets,” a quote from John Seddon, British consultant to service industries.  By assets he means the employees themselves.

Let’s start from Amazon’s point of view.  They have scale.  This means they can apply division of labor and workloads to an extent seldom seen.  The scale of their operations requires many distribution and sorting centers, which are generally larger than one million square feet.  They have subordinated physical centralization in favor of speed of delivery to their customers.  A high-level manager runs each of these facilities and they are scored on their efficiencies.  Getting more work out of their staff at each facility drives great customer service and validates what a person can produce.  The latter allows Amazon to know when to hire and how many which leads to low excess capacity.  All of these result in the incredibly high profits mentioned earlier.

One of us recently spoke to the manager of one of these distribution centers.  He exposed us to a problem with these measurements.  They are not applied consistently.  Safety and external events blow the circuit breakers on the metrics machine that is Amazon.

For example, during thunderstorms outside a DC (and if the buildings get any bigger, there may soon be weather inside) drivers are prohibited, for their own safety, from leaving their trucks to dash into the building.  This prevents their trucks from being unloaded.  So, if a storm persists, as they often do in the Southern United States, the DC can become both starved of inputs and constipated for lack of outputs. 

The newest and most efficient DCs employ a direct flow model where receipts that are immediately required flow directly to shipping without having to first be put away (buffered) and then picked to be shipped.  Efficient – skipping steps – resulting in faster shipments out to customers, particularly of those items that have been on backorder.  However, during a long-lasting thunderstorm, if drivers can’t deliver their loads, the DC grinds to a stop after just a few hours.  (Interestingly, the older less efficient designs that are put away first and then pick are more resilient for a much longer time when such external events occur.)  When a stoppage occurs, the efficiency metrics are ignored, as it is not the fault of the staff of the distribution center; the stoppage being caused by an external event and a built-in desire for people’s safety.

What is really interesting is that behaviors change as soon as an external event is declared.  The sub-system resets.  Training that was needed but not prescribed takes place.  Equipment that needed fixing are repaired.  Preventative maintenance is accomplished.  It seems that efficiency measurements prohibit management from doing what they instinctively understand must be done to be effective.  Without an external event to turn off the efficiency spotlight, Amazon’s top managers’ efforts to improve future productivity (training, coaching, maintenance, repairs, etc.) make the manager look bad in the short term.  We imagine this creates a pressure-cooker workplace – damned if I do, damned if I don’t.

Furthermore, there is no metric that scores how quickly the facility recovers from the externally triggered reset.  As such, what often happens is the managers of the sub-systems prolong an interrupted condition specifically to be able to afford to work on what they perceive as necessary, over and above that which is proscribed by upper management.  In other words, managers must cheat to do the right things.  Eli Goldratt described this as his fifth Engine of Disharmony: gaps between responsibility and authority. You are responsible to accomplish something both in the short-term and over the long-term but you are not given the authority to undertake the actions that are necessary to meet your responsibilities.

Let’s investigate what this revelation means to employees.  Many psychological studies, popularized by Dan Pink, suggest that there are three main drivers of motivation: autonomy, mastery and purpose.  The last two are certainly achievable at Amazon, as it is today.  Our question is about the opportunity for autonomy, other than when there is an external interruption of Amazon’s proscribed processes.

The conflict is between allowing people to realize their very human need for autonomy versus expecting them to be part of a well-oiled machine.  At this point, allow us to broaden our discussion beyond employees to include suppliers, service providers and other stakeholders.

Earlier, we promised to return to the sellers of goods through Amazon.  The desire to provide top value to customers does not extend to suppliers.  Amazon provides these owners of the products sold on its site a world class operating system but they also charge high fees.  Why?  These fees offset Amazon’s expenses.  Many Amazon partners claim that the giant dictates to their ‘partners’ and forces them to largely finance their efficiency machine.

Let us briefly explain the diagram.  For Amazon to succeed, it must be both effective at doing what the market expects and do so efficiently enough to earn a vast profit.  To gain that efficiency, they must proscribe precisely what they understand needs to be done and hold others, employees and providers, accountable for delivering.  On the other hand, in order to keep their enormous system functioning now as well as in the future, they rely on the continuing support of their stakeholders.  To maintain that support, they must honor the needs of those stakeholders to express themselves, especially when they identify and desire to implement good ideas that improve Amazon’s effectiveness.  But, how can they give stakeholders autonomy, while ensuring the flow of work for as little money as possible?

We know from Eli Goldratt’s pillars of TOC that every conflict can be eliminated.  Our intuition suggests that the D action creates a legitimate risk to the C need.  However, is there an inherent jeopardy to efficiency if they seek collaboration and allow autonomy?  With an injection of seeking advice from staff, suppliers, providers and acting on it in meaningful ways, the cloud falls apart.  The D’ action is sufficient to both be globally effective (C) and locally efficient (B).

Current relationships with its work force and suppliers cannot be fully described as win-lose (there are other places selling your wares) but it would be correct to claim that they are BIG-WINsmall-win, with Amazon always getting the bigger win.  The more their internal and external constituents accumulate objections and the longer they remain unresolved, the greater the resultant animosity.  The threatening situation is that complaints of unethical behavior are spreading through the media and might erode their customers’ trust, which seems to us is Amazon’s biggest strategic asset.  Should public opinion swing against them, the government may be forced to intervene.  Afterall, politicians love to pander to their constituents.  Many other giants have fallen.  We little folk love to tell the stories.  Antitrust is a real risk to a company of Amazon’s scale.  That should be warning enough.