The 5 steps to scientific network management

Why is broadband so unreliable compared to every other utility service we consume? Because it is not scientifically managed! Here is the way forward…

At last Friday’s sell-out and successful Scientific Network Management for Cloud Computing workshop, I asked the audience four simple questions:

  • In the past three months, were you unable to run a home appliance because the power supply to your property wasn’t working well enough? No hands.
  • Were you unable to heat the house, cook food, or enjoy hot water due to the gas supply not being good enough? No hands.
  • How about not having water of sufficient quantity or quality to drink or shower? No hands.
  • What about being unable to successfully run a broadband application that you had wanted to use and had used before? Every hand goes up!

It is no secret that broadband lags other utility services in terms of reliability and dependability. Indeed, it is arguable that it is not a utility at all (yet), since it lacks the essential fitness-for-purpose and clearly defined properties that a true utility requires.

The ability to deliver precision-engineered performance on packet networks is a solved problem from a technical perspective. It just happens to be such a minority sport that you can’t even muster two opposing teams.

That will soon change. The issue now is going from industry ignorance to widespread awareness of the path to being a predictable utility. It is a question of “when”, not “if”.

The journey involves five steps.

Step 1: Adopt a scientific mindset

The first step is to actually aspire to reach the goal of scientific management! If you’ve no intention to reach the destination, it doesn’t really matter which way you initially head off, or how fast you start running.

The scientific mindset is one that seeks to find reliable causal relationships, based on robust models, built using testable hypotheses. It’s more than just doing machine learning voodoo on existing datasets in a vague hope of finding correlations that persist.

This scientific mindset has already been applied in general management theory. Whether it is “just in time”, “six sigma”, “lean kanban”, or “theory of constraints”, there is a wealth of knowledge about managing the flow of value through complex industrial systems. Maybe we should consider applying some of this understanding to telco and cloud!

Step 2: Adopt scientific metrics for experience

Management is a process of focusing on helpful change. To do this you have to understand the system you are managing, and have metrics that appropriately reflect both the present state and intended future state. Those metrics have to tell you if you are getting closer to your goal, or not.

Managers today in telcos are overwhelmed with network and experience KPIs. The trouble is that they now have a “metamanagement” problem: which of these myriad KPIs actually matter, and do they even have the right metrics at all to manage the business, taking the customer point of view as being figural?

The job of scientific management is to simplify the system to as few metrics as possible, subordinating all decision-making to what actually matters to the ultimate goal of the system. Most KPIs can safely be ignored most of the time! They either represent normal variation, or fail to have a sufficiently important impact.

Step 3: Adopt scientific measurement of experience

Producing cakes or teddy bears involves manufacturing discrete objects that can be stored. Telecoms and cloud both involve the continuous manufacture of ephemeral experiences. That means our quality of experience measurements need to reflect the nature of the service we offer.

Regrettably, the great majority of the measurements we use today — typically interval-based throughput, jitter and loss — fail to reflect the user reality. The difference is substantial, and huge sums are spent on probing and data capture of information that confuses the picture of what is being delivered, rather than clarifies it.

What is needed are high-fidelity measurements that allow the causal breakdown of the user experience, in both space and time. This demands we apply proven ideas from physical supply chains to digital ones. The key change we need to make is to measure the instantaneous service properties, not intervals.

Step 4: Adopt scientific models of success

At one client, I did a training workshop on network performance and they said at the end the big lesson was “thinking in terms of supply and demand”. I quietly thought to myself: “OMG! Our industry is really, really primitive!”

At another telco client, one allegedly dedicated to quality, I suggested that it might be useful to have units of supply and demand that “add up”, so you can compare them. I got accused of “confusing them with algebra”.

At yet another client, one of the world’s biggest ISPs, we asked them what the intended user experience was. They said they didn’t know, but marketing would be delighted if the speed tests ran faster than last year.

If you want to adopt scientific management, then you need to be able to quantify the kind of user experience demand you have, as well as the broadband supply you offer, and tell whether you have delivered on your promise or not. Is this basic modelling task really so much to ask?

Step 5: Adopt scientific mechanisms for control

According to the Standard Handbook of Machine Design (3rd Edition)“Finding a permissible stress level which will provide satisfactory service is not difficult. Competition forces a search for the highest stress level which still permits satisfactory service. This is more difficult. (My emphasis — HT @cladingbowl).

Any fool can make something work with unlimited resources. Success in telecoms and cloud comes from (statistically) sharing resources, which introduces variability to the user experience. This means having network resource scheduling mechanisms that give us sufficient flow and sharing control, as we must deliver many concurrent applications to many diverse users.

If there is not enough sharing, then competitors will underprice you by running their systems “hotter”. Too much sharing, and customers will flee you for other offers, as the experience falls below what they deem acceptable. The good news is we can both have high sharing and high flow control if we reconsider some false assumptions.

The answer is known, you just have to apply it

Every one of these steps has an existing answer that is ready and waiting to be applied:

  • The quality attenuation framework combined with the theory of constraints gives us a system of management focus.
  • ∆Q metrics have the properties we desire (and nothing else yet does or likely ever will). Don’t like it? Find another universe!
  • ∆Q measurements have been widely used at multiple telcos. (Some examples from Vodafone are here.)
  • ∆Q models have been used to accurately predict the feasibility of military systems, isolate faults in complex supply chains, and optimise the performance of distributed applications like blockchain.
  • ∆Q mechanisms can take control over the user experience, offering precision-engineered QoE control whilst simultaneously allowing for maximum resource usage.

2018 looks like being a breakout year for these breakthrough techniques and technologies. If you wish to join the growing scientific network management movement, hit ‘reply’ to get in touch. We can set up a time to discuss how to collaborate to mutual benefit.

 

For the latest fresh thinking on telecommunications, please sign up for the free Geddes newsletter.