C/C++

Reduce C++ Build Times (Part 2) with the Pimpl Idiom

What is Pimpl Idiom?

Pimpl means Pointer to the IMPlementation. The Pimpl idiom technique is also referred to as an opaque pointer, handle classes, compiler firewall idiom, d-pointer, or cheshire cat. This idiom is useful because it can minimize coupling, and separates the interface from the implementation. It is a way to hide the implementation details of an interface from the clients. It is also important for providing binary code compatibility with different version of a shared library. The Pimpl idiom simplifies the interface that is created since the details can be hidden in another file.

What are the benefits of Pimpl?

Generally, whenever a header file changes, any file that includes that file will need to be recompiled. This is true even if those changes only apply to private members of the class that, by design, the users of the class cannot access. This is because of the C++ build model and because C++ assumes that callers know two main things about a class (and private members).

  1. Size and Layout: The code that is calling the class must be told the size and layout of the class (including private data members). This constraint of seeing the implementation means the callers and callees are more tightly coupled, but is very important to the C++ object model because having direct access to object by default helps C++ achieve heavily-optimized efficiency.
  2. Functions: The code that is calling the class must be able to resolve calls to member functions of the class. This includes private functions that are generally inaccessible and overload non-private functions. If a private function is a better match, the code will fail to compile.

With the Pimpl idiom, you remove the compilation dependencies on internal (private) class implementations. The big advantage is that it breaks compile-time dependencies. This means the system builds faster because Pimpl can eliminate extra includes. Also, it localizes the build impact of code changes because the implementation (parts in the Pimpl) can be changed without recompiling the client code. 1

Example: How to implement Pimpl Idiom

In this section, I am going to use a simple example of a Cow class to show how you can update your code to include the Pimpl idiom. Here is a simple Cow class that has private data members:

#include <iostream>

class Cow
{
public:
	Cow();
	~Cow();

	Cow(Cow&&);
	Cow& operator=(Cow&&);
private:
	std::string name;
	std::string color;
	double weight;
};

To implement the Pimpl idiom, I will:

  1. Put all private members into a struct or class in the header file
  2. In the class definition, declare a (smart) pointer to the class (struct) as the only private member variable
#include <memory>

class Cow
{
public:
	Cow();
	~Cow();

	Cow(Cow&&);
	Cow& operator=(Cow&&);
private:
	class cowIMPL;                                     
	std::unique_ptr<cowIMPL> pimpl; 
};

In the source (.cpp) file:

  1. Put the class definition in the .cpp file
  2. The constructors for the class need to create the class
  3. The destructor of the class is defaulted so that the destructor can see the complete definition of cowIMP
  4. The assignment and CopyConstructor need to copy the struct appropriately or else be defaulted (in this case defaulted)
#include "cow2.h"
#include <iostream>

class Cow::cowIMPL                
{
public:
	void do_setup()
	{
		name = "Betsy";
		color = "White";
		weight = 275;
	}
private:
	std::string name;
	std::string color;
	double weight;
};

Cow::Cow() : pimpl{ std::make_unique<cowIMPL>() } 
{
	pimpl->do_setup();
}

Cow::~Cow() = default;

Cow::Cow(Cow&&) = default;
Cow& Cow::operator=(Cow&&) = default;

Summary:

The Pimpl idiom is a great way to minimize coupling and break compile-time dependencies, which leads to faster build times. If you are looking for other ways to reduce compile times, read our blog post on header dependencies. If you are looking to reduce dependencies in general or would like to visualize your source code architecture, check out Lattix Architect.

1. Herb Sutter GotW#100

Reduce C++ Build Times by Reducing Header Dependencies

Slow build times are a common problem in C++. The build speed is based on language complexity and code organization. While you may not be able to change C++ language complexity, you can improve code organization. As Herb Sutter said, “Managing dependencies well is an essential part of writing solid code.”

The more modular and less interdependent (complex) your code is in general, the less often you will have to recompile everything. This will reduce the amount of work the compiler has to do on any individual block at the same time, because the compiler has less that it needs to keep track of in memory.

Today we will talk about header dependencies and their effect on build times.

Direct includes not needed

One of the main problems affecting the speed of C++ build times is the unnecessary inclusion of header files. Header inclusion should be done only when needed. For example, if you are using only classes X and Y, then you only need to include x.h and y.h. Unfortunately, many programmers habitually include many more headers than necessary, like <iostream> or windows.h. This can seriously degrade build time.

The Chromium Projects C++ Dos and Don’ts recommends not including unneeded headers. They mention that, after refactoring a file, there may often be symbols that are no longer used in that header, meaning you can remove that header. With that in mind, when you are refactoring it is a good idea to track redundant includes either manually or using an external tool like Lattix Architect.

Indirectly included files

Another way to reduce header dependencies, and therefore build times, in C++ is to avoid including headers inside other header files. In C++, you get the declaration of a function by including its header file, which can be put in either a .cpp file or a header file. When you include a header in another header file, you may be slowing down compilation time because you may be including other files unnecessarily.

The solution is a forward declaration. A forward declaration of a function or class simply introduces a name. According to Wikipedia: A forward declaration is a declaration of an identifier for which the programmer has not yet given a complete definition. This can be used in situations where you need to know that the name of a class is a type, but not necessarily the structure. In C++, classes can be forward-declared if you only need to use the pointer-to-that class type or reference, since all pointers and references are the same size and can have the same operations performed on them.

This is useful inside a class definition if a class contains a member that is a pointer (or reference) to another class. If, on the other hand, you need to create an object in the header file you can’t use forward declaration because a forward declaration does not tell you how big it is or anything about member functions or constructors/destructors. Forward declarations significantly reduce build time by avoiding unnecessary coupling. Forward declarations reduce build times in two ways:

  1. By reducing the amount of files that the compiler has to open and process
  2. By saving on unnecessary recompilation. If you include the header, you will be forced to recompile the code even if the change is unrelated.

It turns out in my last refactoring blog tip, I had an indirectly included file (header file included in another header file:

Lattix Architect find issues

As you can see I included shareprice.h in stock30.h:

Lattix Architect dependency usage

The next section will show you how I fixed this issue.

How to fix indirectly included files

Here is my original stock30.h file:

#include <string>
#include "shareprice.h"

class Stock
{
private:
    std::string company;
    int shares;
    SharePrice share_val;
public:
    Stock();                  // default constructor
    Stock(const std::string & co, long n = 0; double pr = 0.0);
    void buy(long num, double price);
    void sell(long num, double price);
    void update(double price);
};

In stock30.h, I included “shareprice.h”. One way to fix this issue is by making the SharePrice class a forward declaration. I do this by:

  1. Removing #include “shareprice.h”
  2. Adding class SharePrice
  3. Changing SharePrice share_val to a pointer to SharePrice

You can see the updated code below:

#include <string>

class SharePrice;          // Forward declaration

class Stock
{
private:
    std::string company;
    int shares;
    SharePrice* share_val;       // changed to a pointer to SharePrice
public:
    Stock();                   // default constructor
    Stock(const std::string & co, long n = 0, double pr = 0.0);
    void buy(long num, double price);
    void sell(long num, double price);
    void update(double price);
};

You will also need to update the stock30.cpp file to reflect the change in the variable share_val, but I will leave that as an exercise for the reader.

Summary

Build times are a constant problem for larger C++ programs. But by thinking carefully about a C++ project’s design (especially for large projects, consisting of multiple modules), you can modify it so the compiler can produce output efficiently. This can be done manually or with an architectural analysis tool like Lattix Architect.

C++ Refactoring Tip: Primitive Obsession

C++ code refactoring is difficult because the C++ language is large and complex with a hard-to-process syntax (tools like Lattix Architect can simplify the process). Given the importance of refactoring, here is a C++ refactoring tip for solving Primitive Obsession.

What is Primitive Obsession?

Primitive Obsession is using primitive data types (like integers, strings, doubles, etc.) to represent a more complicated entity such as share prices or temperature. Primitive types are generic because many people use them, but a class provides a simpler and more natural way to model things. When the data type becomes sufficiently complex (i.e. share prices can’t be negative), it might be time to replace the primitive data type with an object.

Why Refactor (C++ example)?

Often when you start writing code you use a primitive data type to represent a “simple” entity. Here is a C++ example:

class Stock
{
private:
	std::string company;
	int shares;
	double share_val;
public:
	Stock();         // default constructor
	Stock(const std::string & co, long n = 0, double pr = 0.0);
	void buy(long num, double price);
	void sell(long num, double price);
	void update(double price);
};

In this case, you are representing share_val as a double even though share_val should never be negative. To make this behavior work, you need to add type checking in the code. You will see four instances of type checking in the C++ code below:

// constructors
Stock::Stock()          // default constructor
{
	company = "no name";
	shares = 0;
	share_val = 0.0;
}

Stock::Stock(const std::string & co, long n, double pr)
{
	company = co;
	shares = n;

	if (pr < 0)
	{
		std::cout << "Share price can't be negative; "
			<< company << " share price set to 0.\n";
		share_val = 0;

	}
	else
	    share_val = pr;
}

// other methods
void Stock::buy(long num, double price)
{
	shares += num;
	
	if (price < 0)
	{
		std::cout << "The share price can't less than zero. "
			<< "Transaction is aborted.\n";
	}
	else
		share_val = price;
}

void Stock::sell(long num, double price)
{
	if (num > shares)
	{
		std::cout << "You can't sell more than you have! "
			<< "Transaction is aborted.\n";
	}
	else
	{
		shares -= num;
		if (price < 0)
		{
			std::cout << "Share price can't be less than 0"
				<< "Transaction is aborted.\n";
		}
		else
			share_val = price;
	}
}

void Stock::update(double price)
{
	if (price < 0)
	{
		std::cout << "Share price can't be less than 0"
			<< "Transaction is aborted.\n";
	}
	else
	    share_val = price;
}
In this case, we had to put four identical instances of validation logic to handle share_val (code smell: duplication). This breaks the "Don't repeat yourself principle which states “Every piece of knowledge must have a single, unambiguous, authoritative representation in the system.” The benefit to moving all this to a class is all the relevant behaviors will be in one place in the code.

How to Refactor (C++)

1. Create a new class for this field that contains all the validation logic that is currently spread across the application
class SharePrice
{
private:
	double shareVal;
public:
	SharePrice();
	SharePrice(double price);
	double getPrice() const { return shareVal; }
	void setPrice(double price, bool initial=false);
};
SharePrice::SharePrice()
{
	setPrice(0, true);
}
SharePrice::SharePrice(double price)
{
	setPrice(price, true);
}

void SharePrice::setPrice(double price, bool initial)
{
	if (price < 0)
	{
		if (initial)
		{
			std::cout << "Share price can't be negative "
				<< " share price set to 0.\n";
			shareVal = 0;
		}
		else
		{
			std::cout << "Share price can't be negative"
				<< "Transaction aborted.\n";
		}
	}
	else
		shareVal = price;
}
2. In the original class, change the field type(double share_val in this case) to the new class (SharePrice). Also, change the getters/setters in the original class to call the getters/setters in the new class (also may have to change constructor if initial values had been set to field values).
#include <string>
#include "shareprice.h"

class Stock
{
private:
	std::string company;
	int shares;
	SharePrice share_val;
public:
	Stock();         // default constructor
	Stock(const std::string & co, long n = 0, double pr = 0.0);
	void buy(long num, double price);
	void sell(long num, double price);
	void update(double price);
};
// constructors
Stock::Stock()          // default constructor
{
	company = "no name";
	shares = 0;
	share_val.setPrice(0.0, true);
}

Stock::Stock(const std::string & co, long n, double pr)
{
	company = co;
	shares = n;

	share_val.setPrice(pr, true);
}

// other methods
void Stock::buy(long num, double price)
{
	shares += num;

	share_val.setPrice(price, false);
}

void Stock::sell(long num, double price)
{
	if (num > shares)
	{
		std::cout << "You can't sell more than you have! "
			<< "Transaction is aborted.\n";
	}
	else
	{
		shares -= num;
		share_val.setPrice(price);
	}
}

void Stock::update(double price)
{
	share_val.setPrice(price);
}
3. Compile and test.

Summary

Refactoring in C++ is harder than C# or Java, but the reasons for refactoring (improved quality, better maintainability, etc.) make it worthwhile. As you can see in this example, the code in the C++ source file went from 72 lines to 46 lines. In most cases, duplicated code represents a failure to fully factor the design. Duplicate code makes modifying the code more difficult. Whenever you make a change in one place you need to remember all the other places where that code must be changed. As Martin Fowler said, the definition of refactoring is “a change made to the internal structure of the software to make it easier to understand and cheaper to modify without changing its observable behavior.” Check out our C++ Architectural Refactoring page for more information on how Lattix Architect can help with C++ refactoring.

Steps to Follow when Reengineering Code

Developers know that a software system will become more complex and more highly coupled over time as additional changes are made. Often this calls for refactoring or reengineering the code to make it more maintainable. Reengineering will allow you to incorporate what has been learned about how the code should have been designed. This is the kind of learning that was the original basis for the term “technical debt.”

So how should we go about reengineering code that remains vital and useful? In real life we keep applying metaphorical Band-Aids as we make changes and incorporate new technologies. This leads to design erosion. Many progressive managers now understand the debilitating nature of this erosion and how it affects quality and productivity.

code refactoring

Even if we agree that reengineering is called for, how can we plan for it? Here are four key steps to take if you have decided to reengineer your software.

1. Understand the current structure of the code. Always resist the temptation to reengineer without a clear understanding of what you have. Understand and identify the critical components and what their dependencies are. For example, if you are Java or a .NET programmer, understand the various JAR files or assemblies and how they are related to each other. For C/C++, understand the executables and libraries, as well as the code structure used to generate them. Now ask yourself: Are these the right components in my desired architecture? Sometimes you have only one component. If that is the case, ask yourself if you need to split up the component into smaller components.

Read our other blog on Reasons NOT to Refactor

2. Examine the internals of the components, particularly the larger ones and the more important ones. Look at the dependencies of the classes or files that constitute the component. Is there excessive coupling? Does this coupling make the code harder to maintain? As a team, decide what your desired architecture is. Consult senior developers. Ask the team members with different areas of expertise to validate your ideas. The testing team can be particularly helpful. A good architecture will make a huge difference in how easy and effective it is to test. You should be able to take the existing classes or files and build new components. Try various what-if architectures to arrive at the desired architecture for your existing code.

3. With the desired architecture in hand, you should now know what changes are needed and what the unwanted dependencies are. Prioritize the dependencies to fix based on your requirements. If you have areas of code that change frequently, you should think about componentizing them. Always take into account your current requirements. While reengineering has its own benefits, it is unlikely that you will stop making other improvements during this time. Any reengineering effort is likely to be in conjunction with other improvements. A good reengineering tool will allow you to perform reengineering work in conjunction with making continued enhancements to the product. Another benefit of this approach is that it will build management support for the reengineering effort.

To learn more watch our Webinar on Reengineering Legacy Code.

4. The last step is to make sure you communicate the reengineering plan to the entire team. With a prioritized scheme, make reengineering a part of continuous integration. You can create rules that prevent things from getting worse by continuously examining the changes against the desired architecture. Reengineering stories should be part of agile planning just like any other stories. Not only can you do reengineering, you can make it part of your normal development. The best reengineering exercises often minimize disruption and still allow you to migrate to a new architecture.