Why write software?

Before we attempt to answer the main title question, let's first answer the question: why do we write software at all?

We create software to solve problems for our customers, something economists would refer to as creating “utility” for our users.

However, creating software that costs more in resources than it produces in utility for its users is unsustainable, and so we have 2 dual goals when creating software:

Maximising the utility created for users (“maximum value”)
Minimising the cost of the system (“minimum cost”)

This article will focus primarily on the second goal - how do we produce minimum cost systems?

Minimum Cost

Let us first define “minimum cost” -

we are interested in creating systems that are cheap to develop, cheap to operate, cheap to maintain and cheap to modify

. The relative priorities placed on each of these costs vary by organisation.

The dominant cause of cost in developing and operating software systems is human time. The exponential decay in hardware costs has ensured that the cost of hardware itself is normally single digit %pts of most software systems. As the dominant cost is human time, it follows that cheaper software generally involves fewer people and has a faster time-to-market.

In order to minimise the cost of human time in software development, it makes sense to focus on the portion of the software lifecycle that has the most significant cost. Estimates regarding cost for each stage of the lifecycle vary by organisation and by project, but it is generally accepted that software “maintenance” accounts for >50% of the cost of software.

“Maintenance” is generally defined to be ongoing “debugging” and minor modification, however the cost of “debugging” also shows up earlier in the software lifecycle.

The true cost of debugging is the cost of everything the programmer does in the development of a system beyond what would be necessary if they made no mistakes

; that is, everything they do over and above the initial writing of code, the initial execution of the tests to validate the behaviour, and the opening of a pull request on GitHub.

It is evident even in 60 minute programmer job interviews that “debugging” accounts for a substantial proportion of the time taken to go from blank-slate to functioning implementation.

That most of the cost of systems development today is due to errors is not something to be denied, but rather than insight to be traded upon. No theory of programming or programs, no technique or practice for programming or systems design, which does not give central recognition to the role of bugs and debugging, can be of much value to anyone in the field.

How do we make fewer mistakes?

This is going to get abstract, but the implications are broadly relevant - stay with me.

Let’s start with a controversial statement: it’s harder to solve a harder problem.

Expressing this statement mathematically - If we assume we have some measure of the size of a problem, $P$ , say $M(P)$ , then the cost of programming $P$ , $C(P)$ , obeys the rule:

\text{If }M(P) > M(Q)\text{ then }C(P) > C(Q)

We could try taking two separate problems, and instead of writing two programs, create a combined program. Putting two problems together makes them bigger than two problems taken separately. The primary reason for not combining problems is that we, as humans, don’t deal well with great complexity. As the complexity of a problem increases, we make disproportionately more mistakes. When problems are combined, we must solve not only each individual problem, but also the interactions between the two (which may involve preventing or avoiding interactions). Thus:

M(P + Q) > M(P) + M(Q)

And similarly:

C(P + Q) > C(P) + C(Q)

It is always easier, quicker and cheaper to create two small pieces than one big piece, if the two small pieces do the same job as the single piece.

This phenomenon is not unique to the software field, and is true of any field of problem-solving: mathematics, civil engineering or naval-warfare. In all of these fields, it is possible to go from very trivial, to trivial, to not-so-trivial, without a substantial increase in errors, however sooner or later the rate of errors begins to increase dramatically (and non-linearly) as the size of the problem increases.

“The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information"[1] is one of the most highly cited papers in psychology, and is written by Psychologist-Mathematician George Miller. It describes some of the limitations of humans' ability to process information, and in particular makes the claim that the number of objects a human can hold in working memory is $7 \pm 2$ . It doesn’t matter what these objects are - whether they’re functions, database tables, SQL queries or items from a grocery shopping list - the average limit is still $7 \pm 2$ . Some humans fall on the far-right side of this distribution, and can potentially hold as many as 15 concepts in their head at once - but the point is that everyone’s ability to process information is fundamentally finite, and fundamentally quite small.

As the number of concepts a programmer has to hold in working memory grows past 7, 8, 9 or 10, the rate of errors the programmer makes will increase sharply and non-linearly, as does the time taken by the programmer to write a correct implementation.

This fundamental and well-established property of human information processing underlies all strategies for segmenting, factoring or decomposing problems into sub-problems. It is this relationship between problem elements and error generation that ensures that:

C(P + Q) > C(P) + C(Q)

Once a problem becomes non-trivial, there is therefore a significant incentive to break the problem into smaller pieces, we can state this using Yourdon and Constantine’s “Fundamental Theorem of Software Engineering”[2]:

C(P) > C(\dfrac{1}{2}P) + C(\dfrac{1}{2}P)

This expression says that we can win by dividing any problem into independent sub-problems. However, it is not sufficient to finish here and simply state the (obviously fallacious) claim that infinitely decomposing any problem will eventually make it 0 cost. If the sub-problems are not truly independent, then we are not just solving the two sub-problems, but we are also dealing with the interactions between them.

If we break down a problem P, into 2 non-independent parts of equal complexity, $P' = 1/2 P$ , $P'' = 1/2 P$ . The cost of solving the entire problem is:

C(P' + I_1 \times P'') + C(P'' + I_2 \times P')

Where I1 is a fraction representing the interaction of P' with P''. Whenever I1 and I2 are non-zero, it is obvious that:

C(P' + I_1 \times P'') + C(P'' + I_2 \times P') > C(\dfrac{1}{2}P) + C(\dfrac{1}{2}P)

However, if I1 and I2 are both small - which we should expect if we do a good job of modularising the system - we should still expect that:

C(P) > C(P' + I_1 \times P'') + C(P'' + I_2 \times P')

Now of course, decomposing a system into parts introduces its own sources of errors, such as attempting to make an API call with a field missing, however the nature of these errors tend to be less insidious and easier to resolve than the kind of errors where a statement on line 2341 conditionally overwrites a variable used on line 4123 (I’m looking at you, process_salary).

The key take-away here is that, if we decompose a problem into pieces that are relatively independent, we can avoid the non-linear rise in the rate of errors, and thus avoid the non-linear rise in the cost of solving the problem.

Clearly, the process of decomposition itself carries a cost, however the key thesis of this article is that the cost incurred by decomposition is substantially less than the costs incurred by failing to break down the problem.

Other Benefits of Decomposition

This article has mostly focused on the benefits decomposition brings to error rates, however it should be obvious that decomposition also brings other benefits:

Reduced onboarding time: If decomposing a problem into two parts allows one developer to worry about one half, and the other to worry about the other half, then neither developer needs to spend onboarding time understanding the other half of the problem in detail before they can be productive - they can simply immediately focus on their half, and their half alone.
Parallelised development: 2 engineers can work on 2 sub-problems simultaneously without needing to co-ordinate
- The key insight of Fred Brook’s quote that “what one programmer can do in one month, two programmers can do in two months”[3] is that communication/coordination is incredibly expensive, and there are significant benefits to avoiding it where possible
Independent evolution: A frequent concrete example of problem decomposition is separate services. If each service independently creates utility for customers, then those separate services can also evolve independently of each other (see AWS for an obvious example of great commercial success with this philosophy).
Fault tolerance: If one part of a modularised system fails, it’s less likely to take the entire system offline, and more likely to allow the system to survive in a degraded mode
Security: Problems can be decomposed by security domain, which can reduce the surface area of the public system that can be attacked
Resource scalability: The resources behind each service can be independently scaled according to demand on that service (both hardware, and people)

Summary

We should focus on creating maximum value, minimum cost software systems.
The cost of developing and operating software systems is largely the cost of debugging them.
The cost of debugging is essentially equivalent to the cost of errors made by the programmer.
The number of errors made during the design, coding and debugging of a system rises non-linearly as the complexity of the system increases.
Complexity can be decreased by breaking a problem into smaller and smaller parts, so long as these parts are relatively independent of each other. This in turn disproportionately reduces the number of errors and cost of developing the system.
Eventually the process of breaking pieces of a system into smaller pieces will create more complexity than it eliminates, but this does not occur as quickly as one might think.
Breaking problems into smaller parts has a large number of ancillary benefits.

References