modularity

Reasons NOT to Refactor your code

 

Last week I wrote about the reasons to refactor code. Let us now look at some reasons why you shouldn’t refactor code. When dealing with legacy code there will always be a temptation to refactor the code to improve its understand-ability or performance. However, here are some reasons why it might be better to hold off:

1. You do not have the proper tests in place

Do not waste time refactoring your code when you do not have the proper tests in place
to make sure the code you are refactoring is still working correctly. A refactoring exercise pre-supposes a good engineering environment. And testing is one of the key components of that environment. If you don’t have a good way to test what you changed, it is better to hold off making that change until you can fully test it. Our developers tell us it is impossible to write good code without thorough testing. I believe them.

2. Allure of technology

Don’t make a refactoring change because a new exciting technology gets released. Given the fast pace of change there will always be something new and exciting. Today’s new and exciting technology will be legacy tomorrow. Instead, seek to understand the value of the new technology. If a Java backend is working fine, don’t jump to node.js unless you know that event handling is necessary for your application. Too many legacy applications are hard to maintain because they have a mish-mash of languages, frameworks, and technologies.

To learn more watch our webinar on reengineering legacy code.

3. The application doesn’t need to change

The primary purpose for changing an application is to satisfy new user requirements or usage conditions. So as long as the user of the application is content with the operation of the application there is less of a need to refactor the code. If there is no reason to change the application there is no reason to refactor it. Even if your company is swimming in money and you don’t have anything else to do, don’t do it.

Vetting Software for Acquisition or Investment

Often when vetting software for acquisition or venture investment certain questions go unanswered. Typically in these situations buyers and investors focus on market share, gross revenues, projected earnings and other financial data. While focusing on capital debt, many will ignore technical debt. Some key questions are left unanswered. What is the state of the code? What is the quality of implementation? How easy will it be to fix and to add new capabilities?

Robert L. Glass talks about the “60/60” rule where he states the maintenance of software will consume 60% of your costs on average.1 These costs can escalate with software that is complex, riddled with dead code, inadequately structured, buggy, and that contains multiple implementations of the same functionality. Supplement this with a lack of documentation and developer turnover, and the system will be very costly and time consuming to maintain.2 Even when documentation is available it is typically no longer accurate as both design and code have changed. Floris and Harald in a recent study concluded that incomplete documentation is an important factor in increasing the cost of maintaining code.3

Having a firm understanding of the current state of the code will help you better understand the long term costs of maintenance and enhancements. An understanding of the architecture of the code prior to an acquisition or investment can help you make a more accurate valuation.

One of the reasons why due diligence tends to overlook code is because the process to analyze it is manual and can be costly and time consuming. For a large, complex system it could take months to truly understand where things stand from the perspective of modularity, security, performance and other key attributes.

With Lattix Architect you can understand and document the architecture in days instead of months. A DSM (Dependency Structure Matrix) will not only give you a graphical view of the source code but also show you unwanted dependencies, cycles, and other architectural defects. Lattix Architect provides important reporting metrics such as stability, complexity and coupling to benchmark and to track the code. Armed with this information you can make an informed investment decision. Knowledge of long-term maintenance and development costs can also be a negotiating factor at the time of acquisition, thereby saving upfront capital and diminishing long term costs.

Lattix can help you analyze a code base in a company that you are preparing to invest in or acquire, or if you simply want to get a better understanding of your current software portfolio. We have helped companies all over the world improve the quality of software and we can help you achieve the same results. Contact Lattix at sales@lattix.com or call 978-664-5050.

1. "Frequently Forgotten Fundamental Facts about Software Engineering" by Robert L. Glass, (an article in IEEE Software May/June 2001)
2. “On the Relationship between Software Complexity and Maintenance Costs” Edward E. Ogheneovo - Department of Computer Science, University of Port Harcourt, September 2014
3. “How to save on software maintenance costs” Floris P, Vogt Harald H., Omnext white paper, SOURCE 2 VALUE, 2010

Android Kernel: Lacking in Modularity

android-panda-dsm

We decided to take a look at the architecture of the Android Kernel. We selected the panda configuration for no particular reason - any other configuration would have worked just as well. The kernel code is written in C and it is derived from the Linux kernel. So, our approach will work on any configuration of the generic Linux kernel, as well.

Now we all know that C/C++ is a complex language and so we expect the analysis to be hard. But that difficulty just refers to the parser. Armed with the Clang parser we felt confident and were pleased that we didn't run into any issues. Our goal was to examine all the C/C++ source files that go in the panda configuration and to understand their inter-relationships. To do this, it was necessary to figure out what files are included or excluded from the panda build. And then there were issues dealing with how all the files were compiled, included and linked. That all took effort. The resulting picture showed how coupled the Linux kernel is.

First, let's acknowledge that the Linux kernel is well-written. What goes into it is tightly controlled. Given its importance in the IT infrastructure of the world, that is just what one would hope. Let us also remember that many of the modularity mechanisms in use today were invented in Unix. The notion of device drivers that plug into an Operating System was popularized by Unix and is commonplace today. Application pipes were pioneered by Unix. And yet, the Linux kernel itself has poor modularity.

Part of the problem is that that when Unix/Linux kernels were developed programming language support for modularity was poor. For instance, C does not have the notion of an interface and so dependency inversion is not naturally supported (it is possible, however). And, Linux has no real modularity mechanisms to verify or enforce modularity

A few things become apparent after a partitioning algorithm is applied. This partitioning algorithm reorders the subsystems based on dependencies, revealing what is "lower" and what is "higher." In an ideal implementation, the developers of the higher layer need only understand the API of the lower layers, while the developers of the lower layers need to worry about the higher layers only when an interface is affected. In a coupled system developers need to understand both layers making the job of understanding the impact of change considerably harder. Indeed, in the Android kernel where nearly all the layers are coupled, developers may sometimes have to understand thousands of files to feel confident about their changes.

This also means is that the intent behind the decomposition has been lost. For instance, 'arch.arm' is so strongly coupled with 'kernel' that it is hard for developers to understand one without understanding the other. Notice how even the 'drivers' are coupled to rest of the system. I experimented by creating a separate layer for the base layer of the drivers and I even moved some of the basic drivers such as 'char' and 'tty' and yet the coupling remained. Sadly, even some of the newer drivers are also coupled to the kernel.

All this goes to show that unless there is a focus on architecture definition and validation, even the best managed software systems will experience architectural erosion over time.

If you would like to discuss the methodology of this study or if you would like to replicate the results on your own, please contact me (neeraj dot sangal at lattix dot com). You can peruse a Lattix white paper on the Android kernel for some more details.

Modularity as a Portfolio of Options

I have been exploring the use of financial analogies with regard to programming and design. Ward Cunningham's Technical Debt metaphor has become well known. Prior to writing this blog entry, I looked a little deeper into Ward's metaphor and discovered that it has been interpreted and extended in multiple ways. Since this is my view of the different interpretations, I recommend that you go to the actual source to arrive at your own conclusion.

First let's examine how Ward used it originally. He used the metaphor to explain the need for going back and improving the code to incorporate what you learn after an actual implementation. I believe that there are a few good reasons for this suggestion:

  • We often don't know the best way of implementing something until we actually implement it.
  • The customers learn what they really want only after they have seen an implementation.

So for Ward, technical debt is a good thing, perhaps, even a necessary thing. An implementation, even a suboptimal one, allows you to learn what to implement and how to implement it. This is pretty much the original raison-d’être for Agile.

Uncle Bob applied the metaphor for an example of an implementation that is known to be sub-optimal in the long term but allows you to get things done in the short term. While Ward considers the debt as necessary for understanding the selection and design of the right system, Bob used the metaphor for a worthwhile engineering trade-off. Steve McConnell and Martin Fowler would also include poor coding or “quick and dirty” implementations as technical debt.

If the value of the metaphor is for explaining to others why improperly conceived code will extract a price during future development, just as a debt extracts an interest in future years, then I think that the metaphor works for all these different situations.

But now on to what this article is all about. It is about another metaphor - a metaphor that also goes quite far, I might add. This metaphor comes from Carliss Baldwin and Kim Clark, from their book, Design Rules: The Power of Modularity. It too deals with how the current design of a system impacts the cost and the value that can be realized from it in future.

According to them, a design yields a portfolio of options. For a modular system, the options are the opportunities to improve the design in a modular or piece meal fashion. A monolithic design, by contrast, gives you a single option. You have to redesign the entire system to improve it.

Baldwin and Clark point to a well known theorem in finance - it is more valuable to hold a basket of options for many different securities than it is to hold a single option on the entire portfolio. So it is with system design. A monolithic design gives you a single design “option” to improve the system while a modular system gives you multiple “options” to improve the modules of the system.

Consider the example of a PC. It is a highly modular. When I bought my laptop, it had an 80 GB disk. Later, I went out and bought a 300 GB disk and all I had to do was swap my disk with the new one. In the intervening period since I bought my laptop, the disk manufacturers were able to improve the disk design so that the same form factor gave me a bigger and faster disk. I was similarly able to upgrade my laptop with additional memory. Each of these modularity “options” represents an axis for further improvement that does not force the redesign of the entire system. The ease of improving the design of a modular system confers dramatic benefits to the customers of that system. This is why modular designs are so valuable.

Of course, it is important to keep in mind that the value of modularity exists only in the context of requirements. Because we need larger disks to store ever increasing amounts of data, it becomes valuable to improve the density of disks. In other words the “option” is only valuable because the value of the underlying “security” could increase. Just because you create a module it doesn’t mean that you have suddenly added value – it must help meet requirements that somebody actually needs or wants.

Modularity Parable and Software

In his seminal book, The Sciences of the Artificial, Herb Simon describes the parable of watchmakers named Hora and Tempus. They built watches out of 1000 parts. The watches were of the highest quality – as a result, they were often interrupted by customers calling up to place orders. However, they built watches using different techniques. Tempus created watches by putting all 1000 parts together in a monolithic fashion while Hora created it out of components which were assembled from the parts. Each of Hora’s watches was assembled with 10 components, each created out of 10 subcomponents, which, in turn, were assembled from 10 parts each. Whenever Tempus was interrupted by a call, he was forced to put down his work and had to start all over again. On the other hand, when Hora was interrupted he was forced to put down a subcomponent and had to re-assemble only that subcomponent. With a probability of interruption of 1%, Simon calculated that Hora would produce 4000 times more watches than Tempus. Hora’s modular assembly gave him an overwhelming advantage over Tempus.

Read Neeraj Sangal's latest blog on software architecture design

But how does this parable apply to programmers? While phone calls, instant messages, and even hallway conversation might be disruptive, they do not force rework. This parable is about interruptions that force rework. Programs change or evolve, mostly because there are requirements for new capabilities. In fact, most programs are written against a backdrop of a long list of features that is itself changing. The interruptions are new requirements that will require rework of parts that have been already been implemented.

For Hora the watchmaker, an interruption required the re-assembly of the component that he was working on. Other components and assemblies weren’t really affected. For Hora the programmer, things aren’t all that simple. If supporting one requirement affects the entire program then like Tempus, his team will spend all its time reworking what was already complete. On the other hand, to the extent that things could be arranged so that requirements affect a small part of the program, it is conceivable that the team could even service multiple requirements simultaneously.

Perhaps, Hora could split his software by dividing it into different logical grouping arranged by packages, by name spaces, by file and directories, by schema etc. But does this really guarantee that the impact of a new requirement will be limited? Alas, requirements are rooted in the real world and there is nothing that can ever give that ironclad guarantee. If this problem cannot be overcome in the absolute sense, is there something we can do to ameliorate it? What we do to ameliorate this problem is what modularity is all about. The modularity of a program is the ability to limit the scope of change. To understand modularity, it is worth looking into what Parnas called information hiding.

Information Hiding

Contrary to what some might think, “information hiding” has nothing to do with corporate management. Nor does it have anything to do with “open source” or “closed source” software. However, it does have a profound bearing on abstractions, that helps realize information hiding. One benefit of an abstraction is that the consumers of the abstraction don’t care about the implementation details of that abstraction – those details are “hidden” within the abstraction. To the extent that changes for a new requirement affect only the implementation details, the rest of the system isn’t affected.

To illustrate his reasoning, Parnas used a program which reads a collection of input lines into a storage system, generates multiple new lines for each input line using word shifts, and then outputs a sorted list of those new lines. By abstracting the details of the storage system into a module, Parnas showed how its implementation details could be hidden thereby making it easier to change and to maintain. As developers, we naturally group related programming elements. Object oriented programmers define classes and name spaces. Data architects use schemas to group related tables, stored procedures and other elements. Methods, files, directories, classes, schemas are all examples of abstractions.

If methods and classes represent abstraction wouldn’t every change affect an abstraction? Isn’t that what we are trying to avoid? What matters is the scope of the affected abstraction. If you change the body of a method while keeping its signature the same, you haven’t really affected any of the callers of that method. If you change a private method in a class, you don’t affect the external usage of the class. Software is a hierarchy of abstractions: just as a class contains methods, there are packages and name spaces that contain classes; and, packages and names spaces are themselves hierarchical. Generally, the smaller the scope of the abstractions affected by a requirement, the easier it is to support that requirement; and, if different requirements affect disjoint scopes, then team efficiency improves significantly.

Conclusion

I have learned that good modularity almost always maps to abstractions that are readily understandable. Just because you split up your program into modules, doesn’t mean that the benefit of modularity will accrue immediately. Indeed, the value of modularity should be judged primarily in the context of fulfilling new requirements.

Making systems modular requires experience and hard work. When systems are truly modular, the results are magical – it’s a pleasure to work on the system and productivity improves by leaps and bounds.