Lean Networking

I have been fortunate to have many great colleagues and mentors throughout my career. One of them is David Anderson, with whom I have worked when at Oracle, Sprint and now running my own company. David is a leading light in the agile software development movement, and has written a number of books on the topic. His latest one (which is being revised for a new edition) is Kanban, subtitled Successful evolutionary change for your technology business.

The essence of the Kanban methodology is to surface how value flows through a system (in this case software development), and to do it in a way that makes it manageable. Many of the ideas are taken from lean manufacturing, agile management techniques, plus the theory of constraints and throughput accounting, as espoused by Eli Goldratt in his famous book The Goal.

I was fortunate enough to go on David’s Kanban course in Barcelona back in 2012, and commend attendance at a future event as essential basic literacy in managing software development. What I noticed were great similarities between the issues of value flow in both manufacturing and software development with those in packet networking and telecoms. I would like to share with you one critical idea that I feel will become a central theme in telecoms over the next decade and more. I call this idea ‘lean networking’, based on the the insight is that there are two kinds of efficiency in any system of value-flow — including networks.

Understanding lean

The idea of ‘lean’ has gotten a bad name in recent times due to misunderstanding and misapplication. It became synonymous with outsourcing, cost reduction and management short-sightedness. This is diametrically opposed to its origins and true meaning, which were a ruthless focus on value to the customer, agility and low defect rates. The means of achieving these ends are restricted work-in-progress, low cycle times, flexible resources, and highly standardised and optimised work.

What the Toyota Production System demonstrated was that ‘lean’ is not just a system of practises, but rather a set of attitudes and beliefs. Central is the idea of respect for people, and that everyone is part of the system and responsible for the outcomes of the system. This rejects the idea of workers as machine-parts that occasionally malfunction and produce poor quality that has to then be ‘managed’ by professional managers using systems of carrots and sticks. Instead, it sees that quality is not opposed to efficiency, but rather that it is central it, and is the duty of everyone to ensure it.

Famously, in Toyota plants you are a hero if you spot a quality issue and stop the production line. Defects cause re-work, and that is waste. Defects that reach the customer cause a loss of satisfaction, and that is unacceptable.

Flow vs resource efficiency

Contrast this with the traditional view of manufacturing, where what matters is keeping the expensive plant busy. Stopping the production line reduces resource usage, and therefore loses some perceived efficiency. Whilst quality is not discounted, it is secondary to meeting targets of efficient production and low unit cost. That traditional worldview suffers from several critical flaws:

      • It cannot cope well with variation in demand and supply capacity, since variation requires slack, which is at odds with high utilisation.
      • It results in long lead times and high inventory costs, as work is stockpiled to ensure no resource is starved of work.
      • Furthermore, the focus on quantity over quality allows re-work into the system to deal with those quality issues, which further increases lead time.
      • Because of the long lead times, some work needs to be expedited, which incurs set-up and task switching costs, which further increases lead times and decreases ability to respond to variable demand.
      • These two approaches can be thought of as two different beliefs of what matters: the ‘traditional’ view that resource efficiency is primary, and the ‘lean’ view that flow efficiency (low lead time, high throughput) works best.
      • Ideally we want to have both: no wastage of production capacity and short lead times, that together make for high throughput. The same ideas are pursued in greater depth in the excellent blog posts by Håkan Forss (part 1 and part 2).

        The right kind of efficiency

        These ideas can be summarised in the chart below. This is a critical chart, and one worth remembering.

        Resource vs flow efficiency

        Resource vs flow efficiency

        The crux is that there is no path from high resource efficiency and low flow efficiency to the nirvana of having both. It is mathematically impossible, because the variation in presented load means you have no slack with which to achieve flow efficiency. This is the stuck position that the telecoms industry finds itself in, since the core belief of telcos is that they are in the business of delivering pipes and filling those pipes with bits.

        Telecoms networks: variation, quality, defects and re-work

        The traditional bandwidth-based view of networks is one that focuses on resource utilisation:

                  • When we think of bandwidth as the resource, we imagine ‘bandwidth caps’ as being a way of preventing over-exploitation of that resource.
                  • We put queues in front of network links to ensure they remain fully loaded, and have lots of ‘work in progress’ as packets sit in queues.
                  • QoS offers priority, but can’t cope with differential needs of loss or delay between flows, or assure any particular outcome. In other words, QoS doesn’t actually deliver quality which is an absence of excessive loss and delay; every failed Skype call or delayed Web page is a defect, and having to hit ‘redial’ or ‘refresh’ is re-work.
                  • QoS is also a form of expediting that comes at a high cost to other flows; we not only rob Peter to pay Paul, but at the cost of leaving Peter’s bodily organs so impoverished as to be unable to function, and Paul suffering from obesity from consuming too much cream and lard.
                  • As we load up networks to their point of saturation, we see an increasing failure load on the networks of lost packets (which generate no value, despite consuming resources and delaying other packets), and re-sent packets.
                  • As an added bonus, as we approach full resource utilisation, phasing effects in the network cause correlated flow patterns that tend to push us into chaotic behaviour and network collapse.
                  • What we ignore at our peril is that load is highly variable, and flows have different loss, jitter and throughput needs. Value to the customer comes from meeting both their quantity and quality needs. Those quality needs may be weak or highly stringent, depending on the application’s performance aspirations.
                  • The PSTN is like a cottage craft business: one product, made to perfection, at high expense. The Internet is like a traditional manufacturing plant: fill her up, damn the quality. We can do better than both of these.

                    Limit work-in-progress

                    The basic problem is that we ignore the central idea of the Theory of Constraints: we have to limit work-in-progress, and match at ingress to the system that work to the bottleneck in the system. That means in telecoms terms you need to accept some basic realities:

                                      • To get to both high flow and resource efficiency, you need to work with flow efficiency first.
                                      • The resource you need to manage for flow efficiency is not bandwidth, but contention (i.e. loss and delay) along the path.
                                      • The place to manage it is at the point of ingress to the network.
                                      • This is counter-intuitive, as it requires having networks that run idle more of the time, and drop packets more often. However, the result is a network which can deliver both flow and resource efficiency – the quality outcomes of the PSTN, and the flexibility and cost of the Internet.

                                        The journey to lean networking

                                        The telecoms industry is caught up in a trap of its own making driven by a failed mental model of networks as pipes, bandwidth as the resource that is to be packaged and sold, and resource efficiency as the driver of profit. The inevitable result is that the typical telco resembles a Model T car plant (complete with QoS mine, network policy plantation and IMS steelworks), rather than a modern lean flexible manufacturing system matching resources to needs quickly and efficiently.

                                        I have been working with colleagues on creating Toyota Production System for packet data, and applying these ideas to corporate, wholesale and retail telecoms networks. Simply by having a different ‘lean’ view on the networking world can create rapid savings of tens of millions of dollars in a typical operator or large ISP – and we have the case studies to prove it.

                                        So there is a successful path for evolutionary change to lean operations for networking businesses. If you would like to learn more, please get in touch.

                                        To keep up to date with the latest fresh thinking on telecommunication, please sign up for the Geddes newsletter