How to implement drum-buffer-rope for telecoms networks?

Drum-buffer-rope is a standard quality management technique for manufacturing plants. If you work in telecoms and haven’t heard of it, that tells you a lot about the maturity of quality management in our industry.

In a previous essay I wrote about the need for the telecoms industry to upgrade its quality management systems. This involves getting better visibility and control over the experience it delivers, without simply throwing (idle) capacity at every network service quality problem.

The concepts needed to deliver better experiences at lower cost are well known in other industries. This kind of ‘lean’ operation requires management of flow of value through the system. This includes what kinds of demand are admitted, and how the supply is scheduled to meet different kinds of demand. The aim is to limit work in progress, reduce cycle time from order to delivery, eliminate waste, and increase the system’s ability to respond to variation in demand.

A notably successful approach to delivering ‘lean quality’ has been Goldratt’s Theory of Constraints (TOC). This is a systems method for optimising the whole enterprise (as opposed to locally optimising its component parts, which is how most management methods work). The method that Goldratt created is captured in The Goal, a famous business book as well as being a love story and novella.

A central concept of TOC is ‘Drum, Buffer, Rope’ (DBR). The ‘drum’ is the most constrained resource, and sets the ‘beat’ at which the whole system must operate. The ‘buffer’ protects that constrained resource by storing up demand to keep it constantly busy, even if upstream processes have breakdowns or other ‘normal’ variation. The ‘rope’ is what connects value leaving the production system (say, for onward shipping) to the acceptance of fresh demand (like customer orders) into the system.

You can see these ideas in operation in many industries. For instance, I am presently sat in the airline lounge at London Gatwick airport. The single runway is one of (if not the) world’s most intensively used. As the system constraint it is ‘buffered’ by a queue of aircraft waiting to take off. A limit on aircraft movements prevents the overall system being overloaded with demand. There is currently a battle going on to get a second runway built to elevate this system constraint.

The same thinking can be seen at the security checks. The constraint is the X-ray machine and its operator. The flow has been redesigned to protect this constraint. There are now half a dozen places to stand along the conveyor to offload your items for scanning before the machine. After the X-ray and pat-down area there are unloading bays so that ‘jams’ of people don’t prevent the conveyor system from advancing.

The broadband industry is presently caught in an insane ‘fat pipes’ model that simultaneously fails to deliver predictable and consistent experiences, whilst also wasting huge amounts of capital and thus inflating costs. In this model, the central belief is that the job of a network is to create as much data throughput as possible, which is (wrongly) conflated with enabling good user experiences.

Packets are always accepted at network ingress, regardless of load. They are always passed on as fast as possible, no matter what the downstream constraint. What we have is the antithesis of TOC and DBR: the system constraint is ignored, there are mismanaged buffers everywhere, and typically no means of keeping the system as a whole from (transient or persistent) overload conditions.

The outcome for the broadband business is like a 1950s American gas-guzzler car. The chrome and fins cannot compensate for the unreliability, oil leaks, poor safety, environmental pollution and high running costs. We are now at the cusp of the kind of quality revolution that shook-up the automotive industry, when the Japanese came along with cars built by the Toyota quality method.

Over the next decade, I anticipate that we will see a new class of network and operator emerge. These new entrants will adapt ideas like TOC to telecoms, bringing a fresh quality management approach that incumbents will be unable to replicate easily or quickly. The systems of reward in existing broadband operators are all tied to delivering more throughput, not better experiences.

There are significant differences between physical and virtual goods, so you cannot naively apply these ideas of quality management treating packets as packages being ordered by customers. These are distributed computing systems, and the flows of packets have to be seen in the context of those computations and their competing (and often greedy) use of shared resources.

For instance, in a factory the ‘rope’ from the ‘shipping bay’ exit back to the ‘sales order entry system’ entry can communicate control signals orders of magnitude faster than the time it takes for a physical item to transit the factory. In a telecoms network, the reverse is true: the packets are already going close to the speed of light, and you can’t create control loops that go faster than the data being sent. This is the difference between ‘elastic’ and ‘ballistic’ timescales of control.

Implementing the ‘drum’ and ‘rope’ requires new predictive models and mechanisms to schedule packets according to the internal constraints of the network. Managing buffers imposes requirements to control trade-offs between packet loss and delay, which is not an issue with physical goods (where we don’t erase half-finished goods in production and throw away all the raw material!). Value in networks comes from delivering populations of packets, not individual datagrams (which have no intrinsic material value).

The result for customers will be a great improvement in the fitness-for-purpose of broadband and a significant lowering of cost. There is only upside for the public. For many inside the industry, however, there will be upheaval. The shift from supply-centric model (focused on more bandwidth) to a demand-centric one (based on experiences) likely disrupts the established pattern of winners and losers.

Today’s network operators could find themselves being challenged in the same way that low-cost airlines have taken over much of the airline industry. Their vendors will see a collapse in equipment prices and sales as the rewards get attached to delivering better outcomes, not more raw inputs. A ‘devops’ model of network management will require the supply chain to take far more responsibility for the experience it delivers.

The issue of redesigning the management system is one that sits with the most senior management. It is an CEO issue, not a CTO problem, or something that is delegated to your vendors. By its nature it cuts across all phases of the service lifecycle, and how people, processes and policies are connected.

The first step is to become aware of the nature of the existing quality management system and its inherent limitations. Only then can the journey towards implementing systems like DBR be safely and successfully undertaken.

For the latest fresh thinking on telecommunications, please sign up for the free Geddes newsletter.