The six challenges of selling QoS

Network operators have long wanted to be able to sell quality-of-service (QoS) network capabilities to either end users or third party application providers. However, each attempt to do so on Internet Protocol networks has failed to gain market adoption. There are good reasons for this, but they are subtle, and don’t match the common prejudices against creating multiple classes of service.

This article explores the six core problems with the standard approach to QoS:

  1. Too weak a proxy for user outcomes.
  2. Obsessed with real-time flows.
  3. Misallocates resources.
  4. Opens a new denial of service attack route.
  5. “Quality inversion” removes the economic incentive to pay.
  6. Too limited in the scope of the trades available.

There is an alternative way of managing network resources that cures all these issues. A short summary of how to achieve this is at the end.

Too weak a proxy for user outcomes

The most basic problem is that QoS is too weak a proxy for the value the network user seeks.

End users doesn’t want to buy “priority” for packets, they want applications that work. The demand, therefore, is for enablement of application outcomes, not a technical mechanism in the network. This is a serious problem, since what is currently being supplied doesn’t match what a customer values.

On time-division multiplexed (TDM) circuits there was a direct correlation between what you bought and what you could do with it, since there was a dedicated circuit just for you. In an IP network with QoS, the “priority” part of the network is still a contended resource. The desired application outcomes may not be delivered, even if the user is paying for priority, because of excessive contention from other (priority) users.

Obsessed with real-time flows

QoS suffers an additional related issue: delivering outcomes is about both the user experience and cost. Standard approaches to “delivering quality” focus on giving priority to real-time flows, and neglect other flows where cost is the primary constraint. The market is therefore unnaturally small. Any and every application wants to get the right user experience outcome and cost structure; quality and fitness-for-purpose is for everyone, not just the select few.

Misallocates resources

The nature of QoS is that is robs Peter to pay Paul. When we prioritise one flow over another, we are re-allocating loss and delay between flows of data. However, with priority-based QoS we are shifting both at loss and delay together at once, not independently.

You can think of it as being like a crowd of people walking home from the supermarket, some of whom are hungry, whereas others are parched from dehydration. To address this, we take all the food and drink away from select other shoppers, and give it to those who are suffering. In doing so, we will end up making the hungry receive beverages they don’t value, and force food on those who only wanted a drink.

In order for the application to deliver a satisfactory quality of experience we want to bound both loss and delay (“hunger and thirst”) for each flow. We therefore shovel too much loss and/or delay away from “prioritised” flows onto other flows. Standard QoS approaches can’t say “this flow is fine for delay, but can withstand a bit more loss”.

Thus the prioritised flows end up with either too little loss, or too little delay; and by the law of conservation, other non-prioritised flows therefore have too much. (This tends to be skewed in one direction, giving low delay to flows that don’t need it in order to bound their loss rate.) The end result is a misallocation of resources, and a decrease in the overall value-carrying capacity of the network.

Opens a new denial of service attack route

We can already see that prioritising some flows can give unwanted and unnecessary ill-effects on other unprioritised flows. If you allow high priority traffic to grow unboundedly, then it can displace all other applications. As a result it has an amplifier effect for denial-of -service (DoS) attacks. A flood of priority traffic has a devastating effect on the non-priority traffic.

Quality inversion removes the economic incentive to pay

As a result of the DoS attack issue, the high priority flows have to be offered a bounded portion of the overall capacity. That means the network will suffer “quality inversion”, where priority QoS delivers worse outcomes, not better ones. How so?

Imagine you are a senior official in a communist state, and you are standing on the street wanting to go across town. There are two types of transport: flashy limousine taxis reserved only for the powerful, and battered old minibuses for the ragged masses. The limousines cruise around town, mostly empty, looking for rich customers needing a ride. The minibuses are completely full about half of the time, however there are ten times as many minibuses as limousines.

The official will usually find that a minibus will get her to her destination faster than a limousine. The chances of an empty minibus turning up are higher than that of an empty limousine passing.

In telecoms networks, packets have more goes at the “game of chance” on the lower priority level, even if the odds of winning aren’t as good.  Thus unprioritised flows typically experience better application outcomes than those of prioritised applications. It is only when the network begins to saturate that the network owner makes any money from priority, despite having carried the cost of taking that capacity out of service.

Because of this quality inversion effect, the pricing power for priority-based QoS is very low. Why pay more, when you get less most of the time? VoLTE experiences this effect, where over-the-top voice on LTE networks will usually have better quality than VoLTE until networks become significantly more contended than at present.

Too limited in the scope of the trades available

There’s another deep and subtle flaw in the idea of QoS. Because people (mistakenly) think networking is about the packets, and not the flows, they think of QoS as being about managing single queues, not overall systems. That means the maximum timescale over which QoS can re-allocate resources is in the microsecond to millisecond range – the time it takes to flush one queue.

However, we are interested in deferring demand over all timescales. The overnight backup can maybe be delayed by hours in order to let other tasks complete successfully. Hence we should be able to trade demand over periods much longer than the time of a single packet queue, by working at the flow level. You can give the backup just enough capacity so that it does not terminate with an error, but no more. In this way, you can effect demand-shifting over arbitrarily long periods.

A telecoms network is like a stock exchange, seeking users who want to “buy” more immediate delivery, and swapping resources away from others who want to “sell” their option to communicate. It makes money by finding such “buyers” and “sellers”, matching them by timescale and willingness to pay, and marking up the trades.

When the trades are artificially limited in timescale by ten orders of magnitude, it’s no wonder the economic viability of the system is put under duress!

Doing it right

It is possible to create mechanisms that remove these obstacles, and indeed my colleagues at Predictable Network Solutions have come up with not merely a new kind of network scheduling algorithm, but indeed a new kind of network that solves all these issues.

It starts from a fundamentally different set of assumptions:

  • You begin with the outcomes the user seeks, and work backwards to the network capabilities required to deliver them.
  • You build mechanisms that seek to support end outcomes, not intermediate effects like stuffing as many packets into the network as possible for no gain to the user.
  • It’s all about flows, and their distribution of loss and delay; not about packets, bandwidth, or priority.
  • You must control the degrees of freedom directly; every flow gets (up to) as much loss and delay as it can withstand, and no more.
  • You make trades over all timescales, not just at the scope of a single queue.

For guidance on how to create a network to overcome these challenges, please get in touch.

To keep up to date with the latest fresh thinking on telecommunication, please sign up for the Geddes newsletter