refactoring

Refactoring Legacy Software

The Problem with Legacy Software and Refactoring

For an architect who is new to a legacy software project, it is often very hard to understand the existing architecture, determine the extent of architectural decay, and identify architectural smells and metric violations. It’s almost impossible to perform refactoring without breaking the working code. Legacy applications are often critical to the business and have been in use for many years, sometimes decades. Since the business is always changing, there is constant pressure to support additional requirements and fix existing bugs. However, changing these applications is difficult and you end up spending an increasing amount of resources maintaining the software.

There are many reasons why maintaining legacy software is such a difficult problem. Often most, if not all, of the original developers are gone, and no one understands how the application is implemented. The technologies used in the application are no longer current, having been replaced by newer and more exciting technologies. Also, software complexity increases as it evolves over time as you add new requirements.

The key to managing the lifecycle of the software is to understand how the application is implemented and how it is used, whether you are looking to replace a legacy application or gradually refactor it to support new requirements. Renovation of a legacy systems involves considerable re-architecting.

Understand the Legacy Software

You need to understand how an application is used by various stakeholders in order to understand it. These different perspectives are ultimately reflected in the design of the software. This is the same process we use to understand the complexity of any system.

Understand how it is used
Understanding how an application is used is critical to understanding the design. As an application evolves, special use cases are often added that are unique to the business and appear in the way the application was designed and implemented. Also, how it is used influences the performance requirements. As an example, a word processor has very different performance requirements than a high-frequency trading platform.

Understand how it is deployed
This is often one of the most neglected aspects of architectural analysis. One reason is that, historically, many applications were monoliths and there was not much to understand about how it was deployed. With the rise of microservices, an application can be distributed in multiple containers or services which makes understanding how it is deployed more important than ever.

Understand how it is built
It is necessary to understand how each component is built. This is especially true for languages like C/C++ where there are a variety of compile time options when generating object files. These options are used for generating different variants (typically for different hardware platforms) from the same source code. Without understanding these options, it wouldn’t be possible to fully analyze the code.

Understand how it is structured
This is an area that developers typically care about a lot and where a large part of the complexity resides. The code could be organized into thousands of interdependent files. A key goal of architectural analysis is to organize these files and elements into modular groups. Architecture discovery is necessary.

Using a dependency structure matrix representation (DSM) and analysis techniques is a great methodology for understanding and analyzing a legacy system.

Refactoring Legacy Software

This methodology can be applied to reduce complexity and make the software transparent. There are a number of techniques you can use to analyze legacy systems. Boeing uses a DSM approach for their knowledge-based engineering systems and Erik Philippis, founder and CEO of ImprovemenT BV, uses the horseshoe model.

Here is another technique you can use:

  1. Examining the existing artifacts is a great starting point. For example, the file/directory structure or the package/namespace structure is already a guide to how the developers organized these code elements.
  2. Apply partitioning and clustering algorithms to discover the layers and independent components. Even if the architecture has significantly eroded, identifying the minimal set of dependencies that cause cycles will often lead to the discovery of the intended layers. Lattix’s DSM approach is very helpful with this.
  3. Experiment with what-if architectures. Create different logical modules and examine the dependencies associated with those modules. If you are looking to create a microservice or componentize, create logical components or services using the current set of files/classes/elements. If there are no dependencies between these components and they are supposed to be independent of each other, then you know these can be independent services. On the other hand, if there are dependencies between these components, you know what dependencies to eliminate.
  4. Ask the developers and architects who have been supporting the application. They will already have some understanding of the architecture. They will also have the knowledge to assist in experimenting with what-if architectures. Experimenting with what-if architecture is a good exercise to sharpen your understanding of the system.

The goal of architectural discovery is to understand the organization of the application. It is one of the most effective ways to start a refactoring process to make the code more understandable and maintainable. Also, a clear understanding of the architecture will also prevent further architectural erosion.

Summary

Architecture is very important when dealing with legacy applications. It contains the knowledge of how to handle the software. Even if you decide to end-of-life a legacy application, the architectural knowledge left over from the project will be vital for the application that will replace it. If you are interested in seeing how Lattix can help with your legacy applications, see our whitepaper “Managing the Evolution of Legacy Applications” or sign up for a free trial.

C++ Refactoring Tip: Primitive Obsession

C++ code refactoring is difficult because the C++ language is large and complex with a hard-to-process syntax (tools like Lattix Architect can simplify the process). Given the importance of refactoring, here is a C++ refactoring tip for solving Primitive Obsession.

What is Primitive Obsession?

Primitive Obsession is using primitive data types (like integers, strings, doubles, etc.) to represent a more complicated entity such as share prices or temperature. Primitive types are generic because many people use them, but a class provides a simpler and more natural way to model things. When the data type becomes sufficiently complex (i.e. share prices can’t be negative), it might be time to replace the primitive data type with an object.

Why Refactor (C++ example)?

Often when you start writing code you use a primitive data type to represent a “simple” entity. Here is a C++ example:

class Stock
{
private:
	std::string company;
	int shares;
	double share_val;
public:
	Stock();         // default constructor
	Stock(const std::string & co, long n = 0, double pr = 0.0);
	void buy(long num, double price);
	void sell(long num, double price);
	void update(double price);
};

In this case, you are representing share_val as a double even though share_val should never be negative. To make this behavior work, you need to add type checking in the code. You will see four instances of type checking in the C++ code below:

// constructors
Stock::Stock()          // default constructor
{
	company = "no name";
	shares = 0;
	share_val = 0.0;
}

Stock::Stock(const std::string & co, long n, double pr)
{
	company = co;
	shares = n;

	if (pr < 0)
	{
		std::cout << "Share price can't be negative; "
			<< company << " share price set to 0.\n";
		share_val = 0;

	}
	else
	    share_val = pr;
}

// other methods
void Stock::buy(long num, double price)
{
	shares += num;
	
	if (price < 0)
	{
		std::cout << "The share price can't less than zero. "
			<< "Transaction is aborted.\n";
	}
	else
		share_val = price;
}

void Stock::sell(long num, double price)
{
	if (num > shares)
	{
		std::cout << "You can't sell more than you have! "
			<< "Transaction is aborted.\n";
	}
	else
	{
		shares -= num;
		if (price < 0)
		{
			std::cout << "Share price can't be less than 0"
				<< "Transaction is aborted.\n";
		}
		else
			share_val = price;
	}
}

void Stock::update(double price)
{
	if (price < 0)
	{
		std::cout << "Share price can't be less than 0"
			<< "Transaction is aborted.\n";
	}
	else
	    share_val = price;
}
In this case, we had to put four identical instances of validation logic to handle share_val (code smell: duplication). This breaks the "Don't repeat yourself principle which states “Every piece of knowledge must have a single, unambiguous, authoritative representation in the system.” The benefit to moving all this to a class is all the relevant behaviors will be in one place in the code.

How to Refactor (C++)

1. Create a new class for this field that contains all the validation logic that is currently spread across the application
class SharePrice
{
private:
	double shareVal;
public:
	SharePrice();
	SharePrice(double price);
	double getPrice() const { return shareVal; }
	void setPrice(double price, bool initial=false);
};
SharePrice::SharePrice()
{
	setPrice(0, true);
}
SharePrice::SharePrice(double price)
{
	setPrice(price, true);
}

void SharePrice::setPrice(double price, bool initial)
{
	if (price < 0)
	{
		if (initial)
		{
			std::cout << "Share price can't be negative "
				<< " share price set to 0.\n";
			shareVal = 0;
		}
		else
		{
			std::cout << "Share price can't be negative"
				<< "Transaction aborted.\n";
		}
	}
	else
		shareVal = price;
}
2. In the original class, change the field type(double share_val in this case) to the new class (SharePrice). Also, change the getters/setters in the original class to call the getters/setters in the new class (also may have to change constructor if initial values had been set to field values).
#include <string>
#include "shareprice.h"

class Stock
{
private:
	std::string company;
	int shares;
	SharePrice share_val;
public:
	Stock();         // default constructor
	Stock(const std::string & co, long n = 0, double pr = 0.0);
	void buy(long num, double price);
	void sell(long num, double price);
	void update(double price);
};
// constructors
Stock::Stock()          // default constructor
{
	company = "no name";
	shares = 0;
	share_val.setPrice(0.0, true);
}

Stock::Stock(const std::string & co, long n, double pr)
{
	company = co;
	shares = n;

	share_val.setPrice(pr, true);
}

// other methods
void Stock::buy(long num, double price)
{
	shares += num;

	share_val.setPrice(price, false);
}

void Stock::sell(long num, double price)
{
	if (num > shares)
	{
		std::cout << "You can't sell more than you have! "
			<< "Transaction is aborted.\n";
	}
	else
	{
		shares -= num;
		share_val.setPrice(price);
	}
}

void Stock::update(double price)
{
	share_val.setPrice(price);
}
3. Compile and test.

Summary

Refactoring in C++ is harder than C# or Java, but the reasons for refactoring (improved quality, better maintainability, etc.) make it worthwhile. As you can see in this example, the code in the C++ source file went from 72 lines to 46 lines. In most cases, duplicated code represents a failure to fully factor the design. Duplicate code makes modifying the code more difficult. Whenever you make a change in one place you need to remember all the other places where that code must be changed. As Martin Fowler said, the definition of refactoring is “a change made to the internal structure of the software to make it easier to understand and cheaper to modify without changing its observable behavior.” Check out our C++ Architectural Refactoring page for more information on how Lattix Architect can help with C++ refactoring.

Steps to Follow when Reengineering Code

Developers know that a software system will become more complex and more highly coupled over time as additional changes are made. Often this calls for refactoring or reengineering the code to make it more maintainable. Reengineering will allow you to incorporate what has been learned about how the code should have been designed. This is the kind of learning that was the original basis for the term “technical debt.”

So how should we go about reengineering code that remains vital and useful? In real life we keep applying metaphorical Band-Aids as we make changes and incorporate new technologies. This leads to design erosion. Many progressive managers now understand the debilitating nature of this erosion and how it affects quality and productivity.

code refactoring

Even if we agree that reengineering is called for, how can we plan for it? Here are four key steps to take if you have decided to reengineer your software.

1. Understand the current structure of the code. Always resist the temptation to reengineer without a clear understanding of what you have. Understand and identify the critical components and what their dependencies are. For example, if you are Java or a .NET programmer, understand the various JAR files or assemblies and how they are related to each other. For C/C++, understand the executables and libraries, as well as the code structure used to generate them. Now ask yourself: Are these the right components in my desired architecture? Sometimes you have only one component. If that is the case, ask yourself if you need to split up the component into smaller components.

Read our other blog on Reasons NOT to Refactor

2. Examine the internals of the components, particularly the larger ones and the more important ones. Look at the dependencies of the classes or files that constitute the component. Is there excessive coupling? Does this coupling make the code harder to maintain? As a team, decide what your desired architecture is. Consult senior developers. Ask the team members with different areas of expertise to validate your ideas. The testing team can be particularly helpful. A good architecture will make a huge difference in how easy and effective it is to test. You should be able to take the existing classes or files and build new components. Try various what-if architectures to arrive at the desired architecture for your existing code.

3. With the desired architecture in hand, you should now know what changes are needed and what the unwanted dependencies are. Prioritize the dependencies to fix based on your requirements. If you have areas of code that change frequently, you should think about componentizing them. Always take into account your current requirements. While reengineering has its own benefits, it is unlikely that you will stop making other improvements during this time. Any reengineering effort is likely to be in conjunction with other improvements. A good reengineering tool will allow you to perform reengineering work in conjunction with making continued enhancements to the product. Another benefit of this approach is that it will build management support for the reengineering effort.

To learn more watch our Webinar on Reengineering Legacy Code.

4. The last step is to make sure you communicate the reengineering plan to the entire team. With a prioritized scheme, make reengineering a part of continuous integration. You can create rules that prevent things from getting worse by continuously examining the changes against the desired architecture. Reengineering stories should be part of agile planning just like any other stories. Not only can you do reengineering, you can make it part of your normal development. The best reengineering exercises often minimize disruption and still allow you to migrate to a new architecture.

Reasons NOT to Refactor your code

 

Last week I wrote about the reasons to refactor code. Let us now look at some reasons why you shouldn’t refactor code. When dealing with legacy code there will always be a temptation to refactor the code to improve its understand-ability or performance. However, here are some reasons why it might be better to hold off:

1. You do not have the proper tests in place

Do not waste time refactoring your code when you do not have the proper tests in place
to make sure the code you are refactoring is still working correctly. A refactoring exercise pre-supposes a good engineering environment. And testing is one of the key components of that environment. If you don’t have a good way to test what you changed, it is better to hold off making that change until you can fully test it. Our developers tell us it is impossible to write good code without thorough testing. I believe them.

2. Allure of technology

Don’t make a refactoring change because a new exciting technology gets released. Given the fast pace of change there will always be something new and exciting. Today’s new and exciting technology will be legacy tomorrow. Instead, seek to understand the value of the new technology. If a Java backend is working fine, don’t jump to node.js unless you know that event handling is necessary for your application. Too many legacy applications are hard to maintain because they have a mish-mash of languages, frameworks, and technologies.

To learn more watch our webinar on reengineering legacy code.

3. The application doesn’t need to change

The primary purpose for changing an application is to satisfy new user requirements or usage conditions. So as long as the user of the application is content with the operation of the application there is less of a need to refactor the code. If there is no reason to change the application there is no reason to refactor it. Even if your company is swimming in money and you don’t have anything else to do, don’t do it.

Four Reasons to Refactor your Code

1. Maintenance is easier


Legacy code architecture erodes over time and becomes difficult to maintain. Legacy code bugs are harder to find and fix. Testing any changes in legacy code takes longer. Even small changes can inadvertently break the application because over time the design has been extended to accommodate new features and the code has become increasingly coupled. Refactoring code allows you to improve the architecture, reduce the coupling, and help the development team understand the intended design of the system. A clean architecture makes the design understandable and easier to manage and change.

Read our other blog on Reasons NOT to Refactor.

2. Make the Design Modular

Split up large applications into components. For instance, monolithic applications can be split up into microservices. In embedded systems, interfaces are created to allow drivers to be written to support a variety of hardware devices. These drivers serve to encapsulate the logic for interacting with different hardware devices. Also, most large applications can often be layered into separate layers such as the business logic and the user interface, which can itself be split up into various pages, forms, dialogs and other components. Modularity simplifies the design and is probably the most effective way to increase team productivity.

Check out our blog on a New Way to Think About Software Design.

3. Refactoring is often the cheaper option

When faced with new requirements that appear not to fit into the current design, it is often tempting to propose a complete rewrite. However, a rewrite can be expensive and highly risky. When a rewrite of a project fails it leaves in its wake a dispirited organization with no product to take to market. Before starting a rewrite, do a what-if exercise on the current application to see what would need to change to support the new requirements. Often large parts of an application can be salvaged while other parts are refactored, thereby reducing risk and saving considerable time and effort.

4. Your development team is happier

A design that is easy to understand reduces stress on the team. A modular design allows different team members to improve different components of the project at the same time without breaking each other’s code.