3 missing capabilities for a “lean quality” network revolution

The broadband industry is an immature one, and its basic science and engineering is still being established. Just as physical manufacturing went through a quality revolution in the second half of the 20th century, we are at the cusp of the same thing happening to “digital manufacturing”. This requires new core skills, since online experiences are continuously created, rather than being discrete one-off events.

The telecoms industry has a shameful secret. It cannot create engineered experiences for broadband applications with predictable performance and low cost. For sure, it can create amazing yet unpredictable experiences at low cost, or predictably terrible experiences at low cost, or predictably good experiences at high cost. But if you want affordable predictable quality, you are out of luck.

This is the kind of situation that cat manufacturing faced up until the 1960s and 1970s. Then the Japanese came along with the Toyota Production System and kaizen continuous improvement. They were willing to make quality their core focus, and in doing so could also deliver low cost by limiting variability and rework. This “low waste, low defect” management methodology extended up and down their supply chain.

For broadband to have its “lean quality” revolution, it needs to adapt ideas from the physical world to the virtual one. You can’t hit a red button to stop the production line for online video streaming when you see a buffering “circle of death”. It just doesn’t work that way. Information moves at close to the speed of light, and is processed in machines that can make billions of choices per second.

To reach our nirvana of mature digital supply chain management, with visibility of quality and managed variability, we need three new basic quality management capabilities: calibrationcoordination, and control.

Calibration

Whilst we have many network metrics and measurements, they typically fail to give us the first requirement of “lean”: visualise your flow of value. Packet networks change state at very high rates, have huge numbers of flows, and each packet gets a unique treatment (no two have identical delay). We are dealing with very “high frequency” quality phenomena with very high quantities of “units of processing”.

Rather like how a bee sees in the ultra-violet range, we need to be adapted with better “glasses” to be able to view at “network X-ray” frequencies. Our new “prescription lenses” have to give us perception of both space and time, letting us “see” how the quality of populations of packets is impacted by each element of the supply chain.

The essential (and hitherto missing) aspect of calibration is to be able to locate our metrics in the end user experience, and to quantify the “aberration” of our magic mathematical “lenses”. The lack of testing, inspection, and certification players in this space tells you all you need to know. This is still numerate craft building medieval cloud cathedrals, not modern process engineering.

Coordination

In a factory, you take it for granted that you can dial the worker at the other end of the production line and have a conversation. In a packet network, you can’t send the control messages faster than the “product” is moving. This means that your order “upstream” for less “raw materials”, or “downstream” for faster customer shipping, has to be done on a different timescale and using a different control method.

The upshot is that we need to think about constraints and flow management in a somewhat different way. Our “production lines” need to have very stable and predictable properties over short timescales (seconds and less). Then we can build orchestration systems that manage flow to the (potentially shifting) system constraint at longer timescales.

This is the exact opposite of how today’s networks work. Pseudoscientific nonsense like the “end-to-end principle” leads people to break basic control theory, with bad results. We are totally unable to coordinate the elements of the supply chain to deliver predictable performance, either locally or globally. The idea of managing to a system constraint or a safety margin is shockingly absent from network architecture.

Control

The final aspect of any “lean” system of quality is to limit work in progress (WIP). It is no secret that this is the key to managing quality and variability whilst also delivering low cost. Sadly, the networks we have built are the exact opposite: the equivalent of 1930s car plants pumping out batches of “best effort” vehicles that instantly rust and often leak oil.

Being able to calibrate and coordinate quality is not enough. You also need to be able to control it if you are going to manage the end-to-end variability. This means creating new classes of scheduling mechanism that both limit work in progress, as well as manage resources at those “X-ray” high frequencies. The good news is that this is a solved problem.

The bad news is that this goes against the deepest and most entrenched beliefs of the broadband industry. The core dogma is that success is to forward as many packets as fast as possible, i.e. WIP maximisation. The reality is that the defining characteristic of packet data is statistical multiplexing, i.e. probabalistic flow control. This fact has yet to seep into the consciousness of mainstream vendor R&D labs and telco buyers.

For the latest fresh thinking on telecommunications, please sign up for the free Geddes newsletter.