WebRTC: The web learns to hear and see

Voice in the web is not new. What is changing is the ability for Web programmers to easily access interactive voice and video services with just a few lines of code, and without plug-ins or Java, using native browser functions. The APIs necessary to achieve this feat are documented in the W3C’s WebRTC standard, a component of HTML5.

A baseline for browser communications

The APIs on offer add in the ability to access the device’s microphone(s) and camera(s). These devices behave differently from a keyboard, mouse or touchscreen; their input is not directed at a specific tab or window. Rather, they are ambient sensors whose input needs to be programmatically targeted.

This adds a complexity browsers have not had to deal with before. It changes the interaction paradigm, in ways that can cause subtle problems between web applications. Eric Cheung of AT&T gave one example of video privacy. Imagine you are talking to your doctor in one tab, who puts you on hold. Whilst waiting, you have a video chat with friends. Your doctor then returns and resumes the call, or instant messages you. “Now, take down your trousers and show me again where it hurts…”. If your friends are recording their copy of the video stream, you’ve just had an unintended viral YouTube video moment.

This is not the only kind of complexity that WebRTC needs to deal with. The other parties to sessions can come and go, and permission to access a device can be revoked mid-session. Enumerating all the possibilities and interactions is reminiscent of the process by which the signalling systems for landline telephone networks were designed.

Context, not calls

What WebRTC enables is what the Web and Internet do best: unbounded experimentation. No telco is about to launch a motion-controlled space invaders application, but the web is ready and willing to try. Indeed, the web is the backdrop for much of our digital lives, and has the context into which our communications are embedded. The innovation opportunity is less around making pictures and audio appear per se; instead, WebRTC enables new patterns of behaviour that were not possible on a traditional phone call.

So what WebRTC gets right is that it leaves the details of signalling and application semantics to the programmer. There is no ‘WebNet’ for voice or video. WebRTC is the minimal set of functions necessary to enable adoption of voice and video, and learning of its new affordances. It is quite possible in a few years we’ll see the need for something better, but that will be informed by the tinkering of the 20m web developers in the world.

The initial applications are likely to follow a simple model: one or more users talking – via their desktop browser –- to a single online service. The need for interconnection and federation of services is a long way off. The kinds of interoperability concerns that have blighted telco voice and messaging initiatives like RCS simply don’t (yet) apply.

Over time we are likely to see WebRTC being applied to more ‘grown-up’ problems and mobile use. Indeed, we can look forward to a day when you call your bank from inside your smartphone online banking application, and don’t need to re-authenticate yourself; the context is carried through end to end.

Baby steps into a big world

Yet for all its potential, WebRTC remains an immature and incomplete technology. Some of the underpinnings of the session description protocol come from the 1990s, with baggage we’d rather not carry. It is unclear whether the SDP paradigm will meet the needs of complex applications on heterogeneous networks going forward. The nuance of the standards is still being worked out. Browsers are likely to make subtly different choices in implementation. It is unclear how WebRTC should fully interact with the browser container, or the container be extended to accommodate real-time applications. For example, we have call logs on our phone; should our browser history show who we talked to?

The list of possible issues and incidents around inbound call notifications, feature interaction and privacy problems is long enough to ensure this is a technology for experimental use, not core production business services.

There is also an issue around browser support; an unkind observer might call it GoogleRTC. Microsoft is pushing a rival standard called CU-RTC-Web. Apple has neither committed to nor rejected the standard, and holds its cards close to its chest. Indeed, Apple could be seen as betting the future will be less browser-centric; packaged applications are becoming the dominant mobile paradigm, even if the content is HTML.

Naïve about neworks

In a way, WebRTC is the flip side of telco-driven standards like IMS and RCS. Telcos have spent years creating technologies and standards to contain the ‘failure modes’ of applications users don’t want. WebRTC is enabling new ‘success modes’ of useful applications, but is often impotent at containing failure when the ‘network weather’ is inclement to voice and video. Considerable strides have been made in codecs to conceal packet loss and delay, and the IETF’s Opus codec holds much promise. That said, optimism about the Internet’s potential as a real-time communications platform is at odds with technical challenges like bufferbloat.

Voice is intolerant to packet delay, whilst video is allergic to packet loss. As a single class of service environment, the Internet does not distinguish between flows or application types. That means adding additional traffic sensitive to loss and delay means all traffic needs to be carried with those same quality parameters. This means either adding capacity, decreasing the demand of other applications, or accepting higher rates of failure. The real-time lunch may be tasty, but it is not free.

The Internet is not ready or able to carry society’s real-time communications needs, and may never be. However, that does not diminish the potential value of WebRTC. Much as how you always need a plan B for an important Skype call, you have to accept this technology comes with inherent limitations and is a trade-off between cost, complexity and convenience. Maybe – a little like how Skype works today – we need the equivalent of the ‘signal strength’ indicator on your mobile to indicate whether a web call is likely to be successful?

Opportunity for telcos and suppliers

Despite being 2012, it still feels a little as if telcos are in denial that the Internet and web ever happened. Enormous sums have been spent on recreating telephony on IP, rather than engaging with the actual web. It is an indictment of the leaders of the global telecoms industry that there isn’t a simple ubiquitous API that allows web developers to consume the industry’s top-selling product: minutes of voice.

However, all is not lost. Real-time communications remains a demanding network application, both technically and in terms of user tolerance of failure. WebRTC can be seen as an opportunity for telcos to both support a better real-time web, as well as to adopt and learn from the new patterns of communication it enables. For every telco going ‘over the top’ and off-network to retain voice customers, there is an equal and opposite opportunity to make ‘over the top’ web voice applications your customers demand work better.

If Skype and Viber are telephony’s real-time communications competition, WebRTC is your enemy’s enemy – since it lessens the need for these rival voice services. It should thus be considered a friend. This is similar to how hypertext stimulates demand for telco broadband services, even if fewer faxes are sent. Telcos are in the business of prompt and efficient delivery of digital goods and services, and the Internet is by its nature reliably unreliable for real-time use. WebRTC creates a new distribution opportunity for telco real-time delivery capabilities at the quality end of the market.

Indeed, WebRTC could be the gift that keeps on giving to the session border controller market, as ever more real-time traffic gets siphoned off the public Internet onto dedicated voice and video networks. The dusk of telephony need not result in the demise of the phone company; it just requires lighting up a different kind of network to move forward.