Interoperable SLAs: Where’s the glue?

For the telecoms industry to succeed at cloud application access it needs to address its core problem: the lack of interoperable SLAs for performance.

A new “coordination age” has begun

The good folk at STL Partners — known for the Telco 2.0 initiative — have just published a new report that caught my eye: The Coordination Age: A third age of telecoms. I couldn’t agree more! This report reflects the message of a presentation to a regulator that I helped to put together in 2012. That also said there would be a “third age”: we labelled it “capability”, but it essentially is the same message.

The “coordination age” is one where you have to align supply to demand, which is now shifting to cloud applications and unified communications. The service has to be fit-for-purpose, but you cannot endlessly over-deliver capacity, as that bloats costs and prices, and makes you uncompetitive. Lots of other industries have had this “lean quality” revolution, but digital logistics is still in its infancy. We don’t even have standard “units” for supply and demand!

The telecoms industry does have all kinds of initiatives on the go to better coordinate supply to demand. Perhaps the most advanced (or notorious) is 5G slicing. Personally, I’m pretty skeptical that this will go anywhere. How can you create multiple precision “slices”… when you can’t even accurately “dice” the present single service’s customers and applications? We can’t even reliably coordinate a simple fixed-line voice service yet!

Coordination implies availability SLAs

All of these coordination initiatives ultimately hit the same problem: appeals to magic via “best effort” performance will only get you so far. There comes a time when you have to do proper science and engineering. Any utility-like industry needs to find a way to define the service availability in the customer’s terms, and then deliver it to an assured level.

In other words, to coordinate for cloud access we need to redefine “availability” in application-centric terms, not merely raw connectivity. The network service is “up” if the application is working well enough, not if the odd packet dribbles through with a late note and feeble excuse that it was routed the wrong way round the world. This is what it means to engineer reliability: you take responsibility for failure.

The “bread and butter” of reliability engineering is to manage variability via service level agreements (SLAs). What’s missing in telecoms is a generalised and standardised way to:

  • take a (variable) quality of experience requirement from users for their cloud application(s);
  • turn it into a (variable) quality of service specification for the end-to-end network path; then
  • deliver that by decomposing it into a set of SLAs  for the sub-path element variability; and finally
  • assure the result by measuring whether the SLA was met, so you can collect any reward, and avoid any penalty.

Where’s the SLA ‘glue’?

When you “stick together” all the bits, do you get a working service in the eyes of the customer using their cloud application? This coordination goal is presently blocked by a fundamental omission: there is no “glue” to join up the services! Either we fail to capture the true variability requirement, or we use metrics and measures that don’t “add up” the variability in the way we require for end-to-end predictability.

You might have thought that resolving this SLA interoperability problem would be a major focus of industry activity, but it is not. The integration risk is owned by the network operators, who dump it onto their customers. Telecoms equipment suppliers have perverse incentives to sell telcos more capacity and complexity. Yes, we’re looking at you, 5G and SDN, which just increase variability without managing its integration risk.

Someone has to step up and grasp hold of this issue, and it might as well be me! (Admittedly, with the help of a cast of many supporters and allies.) I am cooking up a Cloud Access Reliability Engineering (CARE) Initiative that is pragmatic and commercially-minded. I have been working on it behind the scenes with various relevant stakeholders. It’s too early to bring the initiative offer into the public light, but I can already share with you the technical backdrop.


A downloadable PDF version is here.

The real cloud opportunity for telcos

Internet Protocol gives us a universal means of interoperating networks at the connectivity level. Interoperable SLAs for cloud application access solve the “next level Internet” interoperability problem. They enable a fundamental upgrade of traffic peering and transit, financially rewarding those who deliver lower variability and latency.

Such interoperable SLAs have an important collateral benefit. They not only manage coordination between networks, but also inside of them. Interoperable SLAs (or “performance invariants” for computer scientists) are a prerequisite to “zero touch automation”. If saving money is your thing, then you need to solve this “digital glue” problem first! No amount of machine learning will help if you have the wrong underlying metrics and models.

By now you should have gotten the message: if you want to coordinate a digital supply chain, then you need to take care of SLA interoperability. If you would like to be notified when the CARE Initiative commercial prospectus is available, sign up for the Just Right Networks Ltd mailing list. Spaces are limited, it’s by invitation only, and precedence is given to existing clients.

It has taken over a decade of work, but the original Telco 2.0 vision from 2006 can now be technically delivered. All that was missing was the glue to assemble the parts.


For the latest fresh thinking on telecommunications, please sign up for the free Geddes newsletter.