Rethinking the quality management system for telecoms

A significant barrier to progress in telecoms is not network technology, IT systems, or the products on offer. It is pervasive and invisible: the management system in use, and its implied paradigm for quality control.

Based on my discussions with Pete Cladingbowl, here is a somewhat wordy exposition of why the telecoms industry needs to upgrade its management system to incorporate the quality control ideas long used in other industries. Specifically, gaining maximum visibility and control over the user experience requires upgrading to a scientific management methodology.

Quality needs better management

Network operators are selling weakly differentiated data services to fixed and mobile users using a number of technologies (packet data, Ethernet, MPLS, 3GPP, etc). This places downward pressure on prices, and requires growing marketing spend to attract customers from rivals and discourage churn.

As a result of this dynamic, operators are looking for increased visibility and control over the experience being offered, so as to establish some form of differentiation. To achieve this goal, operators are seeking ways of improving their measurement, analytics and network resource management systems. This is evidenced in a wide array of technologies that include DPI, traffic shaping, SD-WAN, 5G and SDN.

These technologies allow operators to create and market a quality of experience benefit over rivals; reduce dissatisfaction and churn due to poor experiences; segment performance according to willingness to pay; and avoid over-delivery that creates cost without corresponding revenue.

In the longer term there is an opportunity to create new services that meet the needs of a growing range of commercial applications, from assured media delivery to connected autonomous devices.

The challenge of the management system

In addressing this goal of user experience control, operators (and their equipment vendors) face a challenging situation. They have built up a legacy of systems, tools, techniques, processes, beliefs, skills and relationships over many years. Managers face complexity and constraints that make it hard to know where to focus their improvement effort, and what is possible (or impossible) to achieve.

Every organisation has an explicit or implicit system to decide what is important (and what is not), and where to focus any improvement initiative. This is the management system in use, and can be thought of as being the “operating system for the user experience”. Often the assumptions and nature of the management system are unconscious and unexamined.

This operating system can be thought of as running on a vast hardware platform comprising the network and its supply chain, with the overall user experience as the “display”. A programmatic set of management inputs to the operating system configures the “quantities of quality” being delivered to each user. This is a bit like “quality” being the “colour” and “quantity” being the “brightness” of that giant matrix display.

An operating system for delivering quality

The “operating system” describes the relationship between how the “network controls” are set, and how these result in an experience “picture” for the users. It offers some indication of the direction and magnitude of how the experience will change as a result of moving a “control” knob or lever.

Over time the understanding and quantification of this relationship goes through a set of evolutionary steps. This is a common path for the development of all scientific knowledge, and has the following three phases:

  • Categorisation. The phenomena are identified, given names and defined. For example, we may create a concept of “net promoter score” to define the subjective customer experience resulting from prolonged service usage.
  • Correlation. The couplings between the phenomena are identified and characterised. For example, we may notice that networks with persistent high packet loss result in lower net promoter scores over time.
  • Cause/effect. This final phase is to understand the true causality of the relationships (using a model) and to quantify them. We may understand how packet loss affects service quality and hence application performance, and at what level it drives actual dissatisfaction and churn for users of different applications.
  • At present the telecoms industry is stuck at the second phases of development. Large quantities of data are being acquired from the network and ingested by analytics systems. These then look for correlations between network data and user behaviours. The assumed causal relationships are then used to optimise the trade-offs between cost and the user experience, and between quantity and quality.

    The task facing senior executives is to understand the inherent limits of their present management paradigm—based on correlation—and the opportunities of upgrading to a more sophisticated scientific method of management. This is an ongoing process in many other areas of human endeavour. For instance, the economics profession is stuck somewhere between the correlation and cause/effect stages of understanding.

    Our evolving understanding of quality

    This journey to the highest level of management capability involves a transformation of our understanding of the nature of network service quality and its relationship to the experience that results. This too goes through a series of evolutionary steps:

  • Quantity is quality. Initially more bandwidth is seen as being synonymous with a better experience. It’s like a requirement to simply make the user experience “display” as bright as possible regardless of need, which consumes a lot of resources.
  • Quantity with a quality. The differences between bearers and access lines with the same nominal bandwidth becomes apparent, and we start to characterise different qualities. We begin to understand that there are different quantity and quality needs, and delivering lots of “intense blue” high quality to someone who values “gentle red” low quality is unhelpful.
  • Quantity of quality. At this stage we can accurately quantify the specific level of quality being delivered, and compare bearers and the experience they offer. This is where we can define the service offering to the level that is the norm in other utility industries.
  • The first phase can be seen as the bandwidth allocation approach typical in most networks. The second level is how traffic shaping works to raise or lower the quality of different flows. In the final phase we are able to relate the quantity of quality delivered to the application performance outcomes that users seek.

    We are then able to sell the assured outcome benefit, not merely the network service quality input that enables it. Our ideal is to be able to target every “pixel” of the user experience “display matrix” of our management system with the “just right” quantity “brightness” and quality “colour” according to our best understanding of need and willingness to pay.

    A systems approach to a repeatable processes

    These are two parallel journeys: one of how we define and quantify quality, and the other on how we model the relationship between the “inputs” of the management system and its experience “outputs”. Taken together they define a systematic approach to a scientific management methodology for service quality and user experience.

    As we increase our level of capability maturity, we are able to cope with increasingly levels of variation in service quality supply and user experience demand. That is because the system allows us to reliably distinguish between what is normal variation in operation, and what is an exception that requires management attention. As a result, operational experts are not overwhelmed by fault-fixing, but rather can focus on working on the management system to improve it, rather than in the system to keep it working day to day.

    Where attention from management is required, the issue is then how to understand the root cause of any problem in a business process, and what to do to resolve it. This activity is dependent on:

  • Having a system to tell us where to focus in the first place, and hence what hypotheses to form, experiments to design, and data to gather.
  • The fidelity of the network data we have acquired to end user reality, and the robustness of the models we have of the cause/effect.
  • When we get these right, then we generate actionable insights that result in the right interventions to continually upgrade the management “operating system”.

    Common difficulties of industrial scaling

    We have outlined how operators might redesign the management system to reach high levels of visibility and control over the delivered experience. This process faces two key issues.

    Quantity is something that by its nature is delivered over a period of time, and we can use peak or average throughput to describe it. In contrast, the user experience is an instantaneous phenomenon, and hence network quality cannot be adequately described using averages gathered over a period. That means we are limited by the resolution (in space and time) of the measurements being taken from the network. At present it is the norm to take single-point averages.

    The second issue is inherent to engineering at scale. This requires a process of abstraction and containerisation. The required and resulting functional speciality creates the inherent hazard of all industrialisation and automation: local optimisation inside the “container” sub-system, rather consideration of and improvement of the system as a whole, e.g. an end-to-end- flow of value. This exhibits itself, for example, in how packet networks are typically optimised for maximum local throughput rather than system-wide value delivery.

    Reaching for the ideal

    Bringing a system to improve its entire self, rather than locally optimise, is the holistic management method we expose and facilitate. These issues of defining and measuring quality, and knowing where to focus and what to change, are not unique to the telecoms industry. There is an established body of knowledge from primary, manufacturing and service industries on value flow management.

    In particular, there is a mature literature on “lean” thinking. The “theory of constraints” that offers a way to cut through the complexity and focus on what really matters to the goal of delivering more customer value with fewer resources. This requires us to:

  • Understand the demand for value flow, the “containers” and control systems that it has to flow through;
  • Identify bottlenecks to that flow of value, and characterise them (e.g. policy, process, task); and
  • Manage those bottlenecks according to a defined methodology.
  • In the past few years network scientists have also made a complementary set of advances in the measurement and control of networks. The first advance is the ability to measure the instantaneous properties of the network, and the second optimises scheduling at a global (as opposed to local) level.

    When we combine the management method with these technologies, we have an “ideal” system for the scientific management of networks. It allows us to safely extract the maximum theoretical value from the underlying resources. You can think of it as being the OS and graphics driver that finally allows each user and application to receive highly controlled and predictable level of quality. This aligns network supply to experience demand at all locations and timescales.

For the latest fresh thinking on telecommunications, please sign up for the free Geddes newsletter.