4. How to take advantage of this tool to improve your code

Graph-includes is only a supporting tool for a refactoring effort. It can be useful in helping a developper to see where he should put its efforts in order to get cleaner and saner dependencies in a project.

In this respect, it is quite similar to a microscope: if you don't look at the right place, you won't see anything interesting. But if you start with a small magnifying factor, you can locate regions of interest, and then zoom on those to get to the interesting stuff.

4.1. On the spirit of dependency cleanup

4.1.1. First look at a dependency graph

When developping a project of medium size (we'll talk mostly C/C++ here, but that will apply to most languages), expecially with many people writing code, it is quite easy to get to a point where each file (out of several tens of hundreds of files) depends on too many other files.

The most obvious relation is the #include one. The more #includes a file has, the more time it takes to build - especially when those included files #include themselves a bunch of other files. For a project of about 100 files, just producing a graph of all those files, with arrows representing the #include dependencies, will usually give an unreadable graph, and will show very little about possible improvements. This is why this tool has been written: to make it possible to get to the useful information hidden in this unusable dependency graph.

4.1.2. Looking further

A less obvious relation appears more clearly when you consider not files by themselves, but the set of files made of an interface and the matching implementation. Let's consider two such sets, made of the files a.h, a.c, b.h, b.c. a.c includes b.h, and b.c includes a.h, and each implementation, following good practice, includes its own interface. A simple dependency graph as described above would show such a graph:

a.c -> b.h
   \  /|
   /  \|
b.c -> a.h

If OTOH we represent those sets of files instead of the files themselves, we now have something like:

a <--> b

This shows much more clearly that those two modules are intrinsicately related. In many cases, this will express that whenever you use the a.o file resulting from the build of a.c, you'll need to link b.o as well, and vice versa. This will be the case when each file uses the headers to get function prototypes. Then hunting for abusive dependencies will allow, for example, to select with finer grain which of those modules of code will need to go into which executable, thus producing lighter executables.

Note that such a reciprocal dependency may not be pathological. Many projects tend to split a large module into several files for clarity, even when those files are inter-dependant. It is much often in cycles of unidirectional dependencies that we find dependencies that should not be there.

In other cases, headers would just have been used to access a type definition from b.h, and the associated b.o would not be needed. In such cases, you may want to consider splitting such "low-level" declarations into their own headers. Not only this would simplify the graph, allowing you to get a better grasp on your source code, but it can also lead to faster compilations, since each file will be able include less unrelated definitions.

4.2. Tuning the "files" and "includes" parameters

Your first run will surely looks somewhat like:

graph-includes -o project.ps src/ lib/

You will take care of specifying all directories or individual source files that make up your project.

In addition to an initial graph in the project.ps file, which is quite likely to be incomplete by far, you will find a file named project.ps.graph-includes.report. It is a text file, which will help us to finetune ou command-line. Its first section will look something like:

General statistics:

412 files, 353 nodes (14% dropped)
245 dependencies, 137 edges (44% dropped)
225 leaf node(s)

280 dependencies not found
0 dependencies identified as system headers

As you can see, many dependencies are declared as "not found". What happens is quite similar to running a C compiler without any -I flags: most header files are not located.

We have in graph-includes two different flags to specify paths where to look for the dependencies. -I (aka -Include) specifies directories that are part of the project, and will allow to find all of our include-style dependencies. OTOH, -sysI (aka -sysInclude) specifies system directories; included files found in such a directory will of course not result in an intra-project dependency, and will add no edge to our graph, but will stop being displayed as part of the "dependencies not found" count. Thus, they will help us to see how far we are from specifying all the -I flags.

Now you will most likely require several iterations of adding -I/-sysI flags and checking the results. But that alone may not be sufficient to reach the ultimate "0 dependencies not found":

When you are confident that those remaining missing dependencies are system headers for other platforms, you can go on and look at the graph.

4.3. Possible strategies to help locating abusive dependencies

Keeping in mind that we are essentially looking for dependency loops, we expect to obtain in then end a graph that will be wihout cycles, that is, with all (or, at least, most of) arrows pointing from left to right in our graph.

Then we will look for those arrows pointing backwards, as a sure sign for a cycle. Remember that if the cycle is not a long one, it may be legitimate; only if you judge that some of the modules in this cycle are really unrelated, should your consider it pathological. Those backward arrows are not necessarily directly pointing to the abusive dependency, but they can surely be used to locate the culprit: by finding the various cycles of which our backward arrows are part of, and checking one by one all the dependencies in those cycles, you can bet at least one that, with some work, could be cleared.

Then, the way to modifications to do are really dependant on your code. Some possibilities include: