May 2010

Modularity Parable and Software

In his seminal book, The Sciences of the Artificial, Herb Simon describes the parable of watchmakers named Hora and Tempus. They built watches out of 1000 parts. The watches were of the highest quality – as a result, they were often interrupted by customers calling up to place orders. However, they built watches using different techniques. Tempus created watches by putting all 1000 parts together in a monolithic fashion while Hora created it out of components which were assembled from the parts. Each of Hora’s watches was assembled with 10 components, each created out of 10 subcomponents, which, in turn, were assembled from 10 parts each. Whenever Tempus was interrupted by a call, he was forced to put down his work and had to start all over again. On the other hand, when Hora was interrupted he was forced to put down a subcomponent and had to re-assemble only that subcomponent. With a probability of interruption of 1%, Simon calculated that Hora would produce 4000 times more watches than Tempus. Hora’s modular assembly gave him an overwhelming advantage over Tempus.

Read Neeraj Sangal's latest blog on software architecture design

But how does this parable apply to programmers? While phone calls, instant messages, and even hallway conversation might be disruptive, they do not force rework. This parable is about interruptions that force rework. Programs change or evolve, mostly because there are requirements for new capabilities. In fact, most programs are written against a backdrop of a long list of features that is itself changing. The interruptions are new requirements that will require rework of parts that have been already been implemented.

For Hora the watchmaker, an interruption required the re-assembly of the component that he was working on. Other components and assemblies weren’t really affected. For Hora the programmer, things aren’t all that simple. If supporting one requirement affects the entire program then like Tempus, his team will spend all its time reworking what was already complete. On the other hand, to the extent that things could be arranged so that requirements affect a small part of the program, it is conceivable that the team could even service multiple requirements simultaneously.

Perhaps, Hora could split his software by dividing it into different logical grouping arranged by packages, by name spaces, by file and directories, by schema etc. But does this really guarantee that the impact of a new requirement will be limited? Alas, requirements are rooted in the real world and there is nothing that can ever give that ironclad guarantee. If this problem cannot be overcome in the absolute sense, is there something we can do to ameliorate it? What we do to ameliorate this problem is what modularity is all about. The modularity of a program is the ability to limit the scope of change. To understand modularity, it is worth looking into what Parnas called information hiding.

Information Hiding

Contrary to what some might think, “information hiding” has nothing to do with corporate management. Nor does it have anything to do with “open source” or “closed source” software. However, it does have a profound bearing on abstractions, that helps realize information hiding. One benefit of an abstraction is that the consumers of the abstraction don’t care about the implementation details of that abstraction – those details are “hidden” within the abstraction. To the extent that changes for a new requirement affect only the implementation details, the rest of the system isn’t affected.

To illustrate his reasoning, Parnas used a program which reads a collection of input lines into a storage system, generates multiple new lines for each input line using word shifts, and then outputs a sorted list of those new lines. By abstracting the details of the storage system into a module, Parnas showed how its implementation details could be hidden thereby making it easier to change and to maintain. As developers, we naturally group related programming elements. Object oriented programmers define classes and name spaces. Data architects use schemas to group related tables, stored procedures and other elements. Methods, files, directories, classes, schemas are all examples of abstractions.

If methods and classes represent abstraction wouldn’t every change affect an abstraction? Isn’t that what we are trying to avoid? What matters is the scope of the affected abstraction. If you change the body of a method while keeping its signature the same, you haven’t really affected any of the callers of that method. If you change a private method in a class, you don’t affect the external usage of the class. Software is a hierarchy of abstractions: just as a class contains methods, there are packages and name spaces that contain classes; and, packages and names spaces are themselves hierarchical. Generally, the smaller the scope of the abstractions affected by a requirement, the easier it is to support that requirement; and, if different requirements affect disjoint scopes, then team efficiency improves significantly.

Conclusion

I have learned that good modularity almost always maps to abstractions that are readily understandable. Just because you split up your program into modules, doesn’t mean that the benefit of modularity will accrue immediately. Indeed, the value of modularity should be judged primarily in the context of fulfilling new requirements.

Making systems modular requires experience and hard work. When systems are truly modular, the results are magical – it’s a pleasure to work on the system and productivity improves by leaps and bounds.