IPX: Telecoms salvation or suffering?

Who would have thought that the wholesale telecoms interconnect market could be so exciting? I spent a day last year at the excellent IPX Summit in London, together with a host of experts in the field. What I learnt about was the continuing struggle the telecoms industry faces to move from a circuit to a packet-based world. In this article I will focus on IPX as the most “real” example of how (not?) to create new and valuable quality-managed information delivery services.

The wholesale business and IPX

It is worth starting with a few words of context on the wholesale telecoms business, and also to explain what IPX is.

When voice networks interconnect, the process follows an “80:20” type of rule. Most of the traffic goes to a few destination networks, so these are typically negotiated on a bilateral basis between the two access network operators. The rest of the traffic goes via hubs, which may pass the calls on to (multiple) other hubs. The calls then eventually reach their terminating network, which receives traffic on good faith that it will later be paid for the cost involved in completing the call.

The i3forum is an industry association with 49 members that is taking this circuit model into the packet world. They are selling “guaranteed quality” for a much wider range of types of traffic than just voice. This is being enabled by a technology shift from TDM circuits to private packet networks using Internet Protocol (which is being used differently to how it is employed on the Internet itself). The i3forum lists 75 countries with IPX services available, with voice over IPX unsurprisingly being the key service in commercial deployment.

This suite of services is enabled by Internet Protocol eXchange (IPX) standards. IPX-based services are designed to provide service level agreements (SLAs) for multiple classes of service, even as they cross several hubs. It should be noted that IPX does not address the issue of assuring service quality in the access networks themselves, only in their interconnection. The i3forum has created a short introductory presentation on IPX which explains its keys features, and is relatively free from marketing hype.

All IPX services can be offered in three basic flavours: direct routes, indirect routes, and partners whose routes are being resold. There is no “blending” of direct and indirect routes to accommodate variable cost and quality demands as with TDM (except during outages).

This need to know exactly what is being delivered where is symptomatic of a long-term struggle in this space with call quality and trust. Routes can be substituted in ways that break call quality. Services are easily mischaracterized when being sold. Transcoding to lower quality can be used to squeeze out a few extra cents per call. Arbitrage is everywhere.

Keeping this in mind, the existence of so many bilateral arrangements is of interest. It suggests an absence of economies of scale for hubs that offset the costs of such arrangements. These bilateral agreements are driven not only by the desire to avoid the costs of hub intermediaries, but also by the desire to avoid reputational risks of call quality problems.

Nirvana in this space are wholesale products that:

  • are perfectly characterised,
  • deliver exactly what they say they will, and
  • when you compose them end-to-end (even when bought from multiple suppliiers) deliver a predictable outcome and customer experience.

Value proposition of IPX

There appear to be five basic value propositions to IPX services:

  • Extend old revenue streams. An example of this is High Definition voice. The inability to deliver this over TDM/SDH is seen as a key driver for IPX. HD voice naturally has a higher audio quality, which drives higher user satisfaction, longer calls and lower churn.
  • Create new revenue streams. These are services that existing TDM or best-effort IP services cannot replicate, such as VoLTE roaming, managed WebRTC delivery, or high-definition video conferencing (HDVC). LTE data roaming is another example use case, but will take years to take off due to small market size and frequency band issues.
  • Lower costs. A change in technology for international interconnect is part of a wider change  to all-IP networks. IPX services are part of a bigger effort to retire technologies like SDH in domestic networks. IPX can also be used for cheap domestic IP interconnect, as well as being a replacement for legacy GRXroaming services.
  • Enable direct connections. IPX makes it easier to say explicitly that you will deliver something directly to the right party, not just to some random intermediary vaguely nearby the final destination. This direct routing improves the latency and experience of the user, and potentially can draw a higher charge. A possible new market is offering routes to the biggest OTT content “sinks” via a “one-stop shop” with one ingress and multiple egress routes.
  • Enable true end-to-end assured service delivery. IPX can be part of a longer chain that includes quality assurance in the access networks. Cecila Wong of Hutchinson described their ambition as being to “provide premium QoS with a strict end-to-end SLA to improve experience at low cost”. Easily said, not so easily done.

How IPX works

IP inherits the timing characteristics of whatever underlying transport it happens to reside upon. It itself offers no native phase or flow isolation capabilities. That means there’s a basic challenge with delivery of quality-managed IP transport services, which is to constrain the distribution of packet loss and delay, and the rate at which these vary. This is a very non-trivial problem compared to best-effort delivery.

Furthermore, although IPX is “IP”, it is not really anything like the Internet; this is not an “Internetwork protocol” as such. IP is just being used as a multiplexing header in a system that is constructed in a completely different manner to enable quality management along the path. In practise, you have session border controllers (SBCs) absolutely everywhere, shoehorning packet data back into a more circuit-like model. The value-add of these SBCs is somewhat opaque. (Readers from Genband et al are welcome to provide me with suitable enlightenment or self-justification!)

There are three basic issues that have to be solved to create this new IPX-enabled value chain and make it work:

  1. How to make the SLAs compose? Imagine we want to get from domestic network A to network F in another country. This goes via wholesaler B to wholesaler C to wholesaler D. We want SLA(B-E) =SLA(B-C)+SLA(C-D)+SLA(D-E). Will this equation hold and be meaningful?
  2. How to measure the service quality across the above, individually and collectively? You want to know whether you did what you said you would do.
  3. How to detect and punish cheats who don’t keep their SLA promises and break the chain? The wholesale operators have a long experience at managing fraud at all levels of the network business: leaves, branches and trunk.

These are examined in the next few sections: we’re not just interested in how IPX works, but also in how it breaks.

Composing SLAs

An issue that afflicts the whole telecoms industry is that the metrics and thinking used to deliver TDM circuits are no longer helpful in a packet-based statistically multiplexed world.

TDM circuits have fixed delay, and effectively zero loss. You can characterise them with a two parameters: their bandwidth capacity, and where they go to. It was easy to add up the delays end-to-end, and your capacity was that of the narrowest part of the path. Composing this system and its SLAs was easy. Voice capacity planning was also easy: Masters courses in telecoms engineering have been teaching Erlangs as their bread-and-butter for decades.

Contrast this with IP networks, which aren’t like TDM circuits in the slightest. There is variable loss and delay. The “bandwidths” don’t compose. Familiar approaches to SLAs fail in use: even if each party in the chain meets its SLA, the service can still fail, because when you compose the elements there is sporadically too much loss and delay, or too high a rate of variability.

This means there is a basic challenge in coming up with a language in which to express SLAs in this new world. We don’t have a generic “application erlang” framework to use yet.

Service measurement

Knowing what we are trying to achieve is one thing. Knowing whether we – and our partners in the chain – have achieved it is quite another.

Given the complexity of these interconnection systems, which of the myriad metrics on offer should we be measuring? How can we create the technical structures to do so? What are the measurements I need to demand of other parties to whom I am handing traffic and paying them money?

None of these issues have complete answers yet in the IPX world.

Detecting cheats

This is a big one. There is a huge incentive to cheat in this international interconnection game. A lot of money is at stake, and the parties don’t share the incentives of domestic carriers who have to live with each other (and their regulator) for a very long time.

I overheard the following statistics, but cannot vouch for their accuracy: The overall fraud loss in telecoms is around $40bn/year, of which $6bn is attributable to the international network interconnect market. Around 6% of market activity has some connection to fraud. Given that there are 9500 telcos that switch minutes, locating the fraud is hard, although there are only around 60 carriers that effectively control the whole market and its structure. Assigning responsibility often falls to the large wholesaler. The wholesaler may be “clean”, but still gets handed the problem as the one presenting the bill to the unhappy call originator.

This fraud takes on many forms: origination fraud (e.g. someone hacking into a PBX and selling onward minutes); termination fraud (e.g. charging for calls that aren’t actually completed); missing trader and carousel VAT fraud; money laundering with sudden discount schemes; quality fraud, where you deliver below the quality level promised; and over-sharp business practises like exploiting accidental pricing errors and reselling those routes like crazy.

So there is a lot of quick and dirty money to be made, and it’s easy to for criminals and terrorists to hide in this system. That means “fraud management as a service” is quite a large opportunity, with a complete fraud protection service being the end game. New standards have to be created to share information about fraud between the players, and this potentially calls for new trusted third parties.

Incentives also matter, and not everyone has good ones. There is no IPX market in Africa to speak of, and won’t be one for 10 years or more. The outsourcing of international switches there has complicated responsibility in a way that inhibits market development. Targeting salespeople on revenue, rather than margin, also offers scope for lots of bad behaviour.

On top of this, the skill set looks evermore like that from email and web hosting, and the fraud management techniques they use. Is this a game best suited to traditional telcos, or does it open up a market entry opportunity for new players? After all, IPX creates whole new classes of potential fraud and arbitrage, and it is far from clear there is a good telco map of the criminal terrain.

Issues and concerns with IPX

IPX has been slow in coming to market, and is clearly facing some stiff technical and commercial challenges. I summarise some of the issues that I believe they are grappling with below.

  • Is IPX backwards-looking? It would seem that IPX is tied to many “legacy” telecoms issues such as number portability, caller ID, transcoding and general “smart things in the network that get in the way of progress”. It also faces some difficult backwards-compatibility issues. For instance, when a voice call goes from 4G to 2G, will you keep paying a premium for quality delivery? As such, IPX feels somewhat trapped between a telephony-centric past and a cloud-centric future.
  • Is the business model right? As one speaker said: “IPX is a technology change, not a business model change; the business model will be changed by the edge, e.g. termination fee changes”. But if the business model isn’t right for the future, IPX won’t succeed. It is far from clear that trying to charge at the granularity of individual calls is the right way forward.
  • Is it fit-for-purpose? Offering “guarantees of service assurance” is still a supply-driven view of telecoms, not demand-driven one. Fitness-for-purpose is missing from the IPX model. This is troubling as the current TDM voice still struggles with this issue, so trying clever new things may not work out as planned. The problem is this: how can I express the technical requirements of my application, and be sure that a particular IPX service will deliver a good quality of experience (QoE)? The industry is again caught between a TDM world (stable timing, fixed delay, no loss) and a true assured cloud delivery model driven by a robust and reliable matching of supply and demand.
  • Does it do what it says on the tin? It’s all very well offering a variety of routing models, but will there be improved transparency (compared to TDM) of what is being provided, and discoverability of what is on offer? How can I know what direct routes you support vs “I have a mate who knows someone…”.
  • What is the impact of the lack of end-to-end control? IPX is spliced into chains that include Internet and non-assured/unmanaged delivery. Issues like WiFi offload are a domestic mobile operator choice, and the wholesaler doesn’t care how the user got to access the domestic network. That limits the benefits of IPX to the weakest link in the chain, over which the wholesaler has no control.
  • Is there an execution credibility gap? Philippe Bellordre of the GSMA launched a bit of a rocket into the room when he said the GSMA was proposing “service-unaware end-to-end quality differentiation with service assurance”. That’s a huge ambition, but given the struggles the GSMA has had with RCS and getting its own members to implement what they themselves asked for, can we take this seriously?
  • Is the scope too narrow? The “end-to-end QoS” model on offer takes a very narrow view of “quality”. In practise the IPX service providers assume that the market is just for the lowest latency and non-time-shiftable traffic, which excludes a wide range of possible uses and users.
  • Is it worth the price asked? For the pleasure of going over IP, there was even a suggestion from suppliers of pricing higher than TDM! Yet there is a demand-side expectation to pay less because something is on IP. Services like HD voice appear to gather the same charge as standard voice, so why bother? How many intermediaries are sharing a voice pie that isn’t growing?
  • Will it cost too much? These IPX networks are full of SBCs doing transcoding, which costs a lot (and need de-jittering and thus degrade QoE rather a lot). At the end of the day, services like HD voice are competing with OTT alternatives like Facetime and Skype, which are all free. IPX service implementation is typically bolted to the cost anchor of IMS, which readily sinks the business case.

IPX: the bottom line

IPX is not hype – but it’s not hyper either. It is one part of an essential step the industry must take, which is to create a variety of cost, quality and quantity options for delivering data around the world.

The only value on offer in telecoms is quality – i.e. the absence of loss and delay in delivery of information goods. Telcos have tried to capture the value of the content itself, and have been told firmly to, ahem, get lost. They have to make IPX – or something very like it – work.

However, IPX exists in a strange limbo land. The margin-squeezed voice people believe they can live off the fat of the growing data business, and the data people believe the exact opposite and can grow off the cash flows from voice. IPX, as described to us at this Summit, has neither the capability nor cost structure necessary to be a “native” end-to-end assured service delivery system. It broadly carries over the operational cost structure of TDM, to which are added costs of packetisation, but without exploiting the savings of statistical multiplexing. It simultaneously loses quality by carrying services over the capabilities of IP, and by doing transcoding. What we want is the reverse for these key profitable personal communications services: the costs of IP, with the quality of TDM!

An industry colleague recently quipped: “The answer is ‘money’. What’s the question?”. For IPX, only retailer telcos can capture the quality margin, and they mostly work bilaterally when there’s lots of money at stake. IPX can only address a sliver of the market, whilst carrying all the costs of creating a new interconnect ecosystem and fixing decades of failure in service quality management. That’s a tall order. The question is: can it deliver given that basis from which to start?

Yet these difficulties are not necessarily a sign of doom for IPX. For example, BT is self-peering, and can get you everywhere on its own and partner networks with its own global IPX.  All the major carriers are thinking hard how to solve all of the above problems, because the size of the ultimate prize of being a global cloud communications service provider is so big.

Getting to the root of IPX’s problems

Up until now quality problems in packet networks have been considered either to be someone else’s problem, or deferrable to the future. The bill for telcos failing to do basic research into quality management on packet networks is now falling due.

The IETF’s “hack it and see” culture is equally culpable (if not worse) than that of the “managed quality” telecoms community. The IETF is tinkering with new “active” scheduling algorithms, but the only way to make them work in deployment is to have an army of people doing regression testing. Every time there is an unexpected behaviour, they have to dream up a new “fix”. They have have no underlying theory to tell them what the consequences of their actions are, in terms of application or systemic risks.

So managing quality is an endemic problem to networking, and not unique to IPX. Thus there remains a huge technical and business opportunity to complete the transformation of the core telecoms market from a circuit to a packet approach. Achieving this requires a major shift in thinking that addresses the root causes of why IPX is late to market and slow to be adopted, and solves these basic research issues.

We need a fundamental “network science”. This links together cost, user QoE and network performance (i.e. quality) for packet networks, just as Erlangs did (and continue to do) for TDM ones. We need to be able to model the complete performance hazard space for any application, make appropriate trades of resources between users and uses, and price the resources accordingly.

Specifically, we need to understand how IP networks involve performance and cost risks that didn’t exist in the past. These networks don’t have the properties telcos (and their customers) think they have. That means they need new quality assurance mechanisms that are native to packet-based statistical multiplexing.

The IPX community implicitly knows all of this. However, what they appear to have done is to mistakenly equate “control” with “quality”. By putting in lots of SBCs into the path, there is an appearance of having control, for which you can charge. But these extra network elements kill quality and increase cost: the very thing wholesale operators think they can charge for, they are degrading. Thus there is a disconnect: operators are charging for the mechanisms, not for the quality outcomes at a level that is a strong proxy for delivered QoE to the user.

This is all a solved problem

There is some good news I have to share with you.

What the IPX community has yet to solve in 7 years was done by my business partners at Predictable Network Solutions Ltd in 7 months. They built an end-to-end assured video service for the deaf community. Sign language requires very stable frame rates, but the resulting service works over commodity broadband. It demonstrates solutions to all the problems of end-to-end composable SLAs and assured service delivery. We also know how to price it rationally.

So network science provably exists! The issue is getting network science to spread in an industry wedded to network alchemy.

What next for IPX?

What creates value for users are fit-for-purpose application outcomes. For instance, in that video sign language example, if I am engaged in a 2-way video I need cognitive absorption. This is the experience you get after a 4 hour teleconference working session with your colleagues, and are surprised that they are not actually sitting over your shoulder when you turn around at the end. The same is true for streaming video, gaming, and a host of other inter-personal and interactive applications.

The tiniest glitches make me suddenly realise I’m really talking to a lump of plastic and glass instead of directly to my interlocutor, and cause a large loss of perceived value. Avoiding these glitches requires a particular set of bounds on packet loss and delay.

The job of the telecoms value chain is to take risk away from the end user in adopting such applications by delivering on those bounded requirements. Only by demonstrably transferring QoE risk away from the end user can telcos differentiate themselves on quality, and hence establish a price floor for their services.

To deliver such quality-managed services, telcos will have to both simplify their offer, and enhance the technology used to deliver it. For IPX that means three things have to change:

  • SLA composability: These services have to use Quality Transport Agreements (QTAs), based on the quality attenuation (ΔQ) performance refinement calculus we have developed.
  • Service measurement: This has to be done using a multi-point approach, since that is the only way of directly measuring ΔQ – i.e. the cumulative loss and delay along any path and its statistical distribution.
  • Detecting cheats: A new scheme needs to be established for the sharing of key performance data from probes along the paths, in order for cheats to be visible. Everyone will have to say what they do (QTAs) and then do what they say (i.e. assurance). The good news is that the underlying maths makes this detection both feasible and predictable.

Let’s say that again: the only generator of value here is “moving information with bounded quality attenuation”. The difference between success and failure is grasping the true nature of quality, its invariable impairment along a data path (i.e. ΔQ), and hence how to measure ΔQ and manipulate it.

How to master network quality

The first step with immediate value is service assurance: demonstrate whether you either did, or did not, meet the QTA. You then have something you can expect to sell, because the user can get a refund if you fail to deliver. This requires measurement of ΔQ.

Once you have that, your “ensurance” that the service works has value, as you do not default on these contracts, and can run networks hotter (and thus more profitably).This requires manipulation of ΔQ.

Finally, there is a need to radically shift the cost:capability equation. That means ripping out most of the SBCs and all the IMS kit; managing aggregated flows en masse; having systems to allocate (aggregated) flows to the right service class; and doing call admission control only for the premium class. This requires mastery of ΔQ.

Engaging with these issues is not optional. There is also a growing risk of a serious structural disruption to this space. Imagine if a Google, Amazon or smart new entrant opened up a global “Data Translocation as a Service” fibre network, using the principles and techniques we propose, and a radical new business model. In this plausible scenario, a lot of people in the current wholesale telecoms business would go broke, real fast, possibly taking their equipment vendors down with them.

To keep up to date with the latest fresh thinking on telecommunication, please sign up for the Geddes newsletter