How to tame network complexity

In a previous article, I discussed how telecoms is facing a growing complexity crisis. To resolve this crisis, a new approach is required. Here I explore how that complexity can be tamed.

The crucial role of ‘invariants’ in service delivery

‘Invariants’ are things that are meant to be ‘true’ and stay true over time. Some invariants are imposed upon us by the universe. “E=mc2” is a ‘truth’ about the universe, barring unexpected new scientific discoveries. Others are imposed by people. “No postpaid customer should see VoIP fail until all prepaid customers have experienced failure” would be an example (albeit a possibly foolish one).

As engineers, we aim to establish these abstract ‘truths’ about the system. Typical ‘truths’ relate to the users’ quality of experience, the network’s performance and the system’s cost. Delivering customer service is then the act of maintaining the right invariants to support the ‘roles’ that different users wish to perform (e.g. gaming, telework, etc.)

So what does it mean to ‘maintain an invariant’? This requires us to understand what is normal variation, quantify it, and eliminate unnecessary remedial action for such expected behaviours. In the case of broadband, we need to understand what’s ‘normal’ for a large-scale stochastic system.

Invariants enable us to collapse complexity

This process of creating invariants collapses complexity in managing all systems. When the invariants hold, then all other details become irrelevant. It is both a necessary and sufficient condition for creating sub-linear cost scaling.

Engineers thrive on well-founded metrics and invariants. As complexity collapses in lower-order and fast-changing systems, network management becomes a higher-order problem closer to the human and business timescales. Executive attention can be refocused from short-term fire-fighting towards creating long-term strategic differentiation.

Any management system for networks should support the creation and maintenance of appropriate invariants. Any monitoring system should be reporting where they hold, or are broken. The monitoring system should also tell you the impact of the failure to maintain the invariant, and how much slack you have (if any).

When invariants don’t hold, you need to know where and why, with a quantified benefit of fixing the issue. Today this process all too often involves twiddling network settings and spending money on capacity in the hope of stumbling on the right answer. What we want instead is scientific and rapid root cause analysis that allows you to identify and treat the problem, ideally with as much automation as possible.

Choosing appropriate invariants

When we have the right invariants, we can align business goals with network reality. This ensures the business is promised only feasible outcomes both in terms of QoE and cost.

We then use those invariants to match supply and demand. At shorter timescales, this means we relate QoE aspirations to service performance, which is then related to controlled delivery via a diversity of network mechanisms.

This assurance process reduces or eliminates performance failures, with their direct and indirect costs to users. That generates application dependability, service trustworthiness and user confidence. Users are then willing to pay a premium, and the value of the operator brand rises.

ΔQ is the right tool for performance invariants

To reach the ideal state of operational simplicity, we need to measure and manage the right invariant set of properties. This is where ∆Q comes in. It is the ‘ideal metric’, in that it (alone) relates the user experience to the network performance in the manner we desire. Any other metric introduces new complexity due to its junk and infidelity.

A ∆Q performance refinement calculus allows us to turn the customer experience we intend to deliver into the right operational network invariants. This calculus links all the layers, both horizontally and vertically. As a result, we can do complete ex anteperformance engineering for any network service.

This is a major advance, as it allows us to understand the performance and cost of a system, with high certainty, before we build it. Once it is up and running, we can automate most measurement and management of invariants. This results in much lower operational costs than for today’s networks.

For the latest fresh thinking on telecommunications, please sign up for the free Geddes newsletter.