P4, PISA and Tofino: Important, powerful… and dangerous

Making networks programmable in a protocol-independent way is a good idea, but it’s not all upside: there are new risks and hidden complexities.

Back last autumn, I presented at SDN NFV World Congress. It’s the telecoms and networking event where the computing engine hits the transmission road, so it’s naturally a source of considerable attraction for people like me.

The keynote speaker on stage before me was Nick McKeown. He is a “big name” in the software-defined networking (SDN) space, having practically invented it. The essence of his presentation was a neat way to make the network “programmable”. In his own words, it is all about putting the “control plane” (where the system’s intentionality sits) in charge of the “forwarding plane” (which does — in his paradigm — the figural operational activity of sending packets).

The P4 programming language is kind of the “Visual Basic” for the “network silicon device driver”. (Your programming analogy may vary — I am being polite by avoiding comparisons to JavaScript.) P4 democratises access to the underlying hardware capabilities, when coupled to the PISA architecture. You can think of this as being what “If This Then That” does, but for packets. This is implemented in the Tofino silicon product from his equipment start-up, Barefoot Networks.

Heuristic: languages with curly braces rather than parentheses mean complexity!

P4 means we are all now network equipment hackers. You can go from ignoramus to innovator in a day, having learnt the language and committed new code to the software repository.

I see problems ahead with this approach. So, what could possibly go wrong?

What this methodology does is it allows coders to add arbitrary levels of added complexity to networks that results in performance and security that cannot be reasoned about in advance. As Nick noted, the complexity of the silicon has historically resulted in many bugs. Now we have a software system to generate new and more sophisticated bugs, as well as to automate writing workarounds for our inability to engineer service quality in the first place.

For example, it gives you the tools to add a whopping audit trail to packets of where they went past and when, making their headers much bigger. If you thought IPv6 was a bit of a misbegotten mission, then this exercise in datagram autobiography takes it to the next level.

If you have a “by design” paradigm with “true” engineering, you generally don’t need to even ask these questions.

We’re finding new ways to make our payload heavier and heavier, and hence slower and slower to arrive. Making packets size distributions vary unexpectedly in this way creates wonderful new failure modes, and dastardly new performance hazards, all to uncover only when in operation and the network is stressed.

This paradigm is one where “we fixed it last time, so let’s fix it again!”, rather than “right first time”. It’s the antithesis of engineering, as there are no enforceable performance or security invariants in the system. What is supposed to be true, and stay true under all circumstances? That’s the basis of engineering, ensuring that a specification is met.

Doubling down on the present “network tinkerer” model, by enlarging and automating it, is like taking the toy light sabre away from your five-year-old kid, and giving her a real one. It’s not a very responsible thing to do if you care about family safety. Don’t blame the tool if you only have half a cat left in the bloodied litter tray by bedtime, with suspicious scorch marks up the nursery wallpaper.

I used to program in 6502 and ARM assembler in the 80s and 90s. We’re only just reaching that point in network evolution in the 2010s and 2020s.

The evidence for this paradigm problem is in Nick’s presentation. His hypothesis — and the one the whole industry is unconsciously immersed in — is network value comes from “quantity with a quality”. Value is generated by forwarding packets first and foremost.

Control over the forwarding plane is more value, to the extent it keeps up the simple scalar bandwidth marketing claims that sells more equipment. We locally optimise the purchasing choices, ignoring the systemic result from integrating it all — which is what the customer experiences.

His headline was a throughput figure (“6Tb per second!” — the “fastest” switch in history) because value is “obviously” in speed and bandwidth. The strapline said there was no compromise in cost or energy use to generate more complexity. And the footnote? A rather telling small box at the bottom of one page on “QoS”.

Taming the legacy for sure… but is it really the game changer we need for SDN?

That’s the essence of it: a hierarchy of priority that I argue is backwards, since it makes end user value subservient to an arbitrary internal network metric for traffic aggregation. The “control” that this model gets is a mirage. By P4’s inception, it cannot deliver engineered isolation of security and performance between applications of varying needs, since they are not first-class design objects.

To understand how things have gone awry, you need to zoom right out, as far as you can go. All scientific endeavours go through three stages: classification (e.g. astrology), correlation (e.g. astronomy) and causation (e.g. cosmology). Packet data went off the rails at the first stage, so is stuck at the second, and thus can never reach the third of being a true science with robust engineering. We must refactor and go back to the beginning.

What we do in telecoms is software-defined distributed computing (value lies in timeliness), not software-defined networking (value lies in bandwidth). They are not the same thing!

Distributed computing is defined by the application outcome from the end users’ perspective, whereas networking is about the communications mechanisms in the middle. The sole value is in the end-to-end latency being bounded in some suitable way, a “quantity of quality”. In this alternative perspective, the resource aggregation and allocation that are figural, not the forwarding.

The good news is that not all is lost. Just as you can write a C compiler in assembler, or a nice functional language in C, these new tools are powerful and flexible enough to bootstrap a better model in a different paradigm. For example, since the whole point of P4 is protocol independence, it allows RINA to be implemented both inside of, and on top of, existing TCP/IP, MPLS and Ethernet networks.

RINA offers a universal container or API, a bit like a hypervisor does for cloud computing. It is recursively composable, so is like running a VM inside a VM inside a VM. With RINA, we are in the distributed computing paradigm, not the packet forwarding one. It enables us to define what is happening “inside” each scope, it way that is decoupled from its design and implementation. It ends the artificial split between the forwarding, transport protocols and orchestration.

We can therefore work at a higher level of abstraction, one that encompasses PISA, TCP/IP and SDN. We can truly collapse complexity by automation, and elimination of duplicate functions and protocols. To do this, our higher-order design and operations tools can emit P4 code where necessary to configure equipment, but no human should ever go near it.

So whilst P4 is definitely progress, it is not the end game by any means, but really only the beginning. In the meantime, proceed with caution: a sharp knife in the wrong hands can be dangerous.

For the latest fresh thinking on telecommunications, please sign up for the free Geddes newsletter.