Improving partial recompilation in my custom build system

As I’ve been using the basic build system I’ve created for otter, there were several annoyances that kept cropping up. The one I want to talk about today is determining when a C file has changed and, therefore, needs to be recompiled. For C (and C++), you can basically think of a single .c file as a compilation unit (this isn’t necessarily true, but I’m glossing over the finer points). That compilation unit comprises the source code in the .c file as well as all the code included from other files (notably, headers, but you don’t just have to include headers 😉). These compilation units are independent and can later be linked together to create executables, shared objects, etc. So, say that you’ve already compiled the whole project and now a single file gets modified. You don’t want to have to rebuild your whole project again. You only made a small change! So which compilation units need to be reprocessed? Well, if it’s a .c file that’s easy. You just recompile that .c file into the corresponding .o file. But what if it’s a header file that’s been modified? That header is probably included in multiple .c files and could result in multiple compilation units being affected.

All of this complexity was originally manually managed in otter’s build system. Each target had to specify the .c file along with all of the headers that it depended on. Here’s an example:

/* example.c */
#include "example.h"
#include "file1.h"
#include "file2.h"

int foo(int val) {
  return val + 1;
}

/* make.c */
otter_target *example_obj = otter_target_create("example.o",
  allocator,
  filesystem,
  logger,
  "example.c",
  "example.h",
  "file1.h",
  "file2.h",
  NULL);
otter_target_add_command(example_obj, "cc -c example.c -o example.o");

/* example.c */
#include "example.h"
#include "file1.h"
#include "file2.h"

int foo(int val) {
  return val + 1;
}

/* make.c */
otter_target *example_obj = otter_target_create("example.o",
  allocator,
  filesystem,
  logger,
  "example.c",
  "example.h",
  "file1.h",
  "file2.h",
  NULL);
otter_target_add_command(example_obj, "cc -c example.c -o example.o");

This manual translation of headers that a .c file depends on into the target creation gets unwieldy to maintain. Especially once you consider that #includes can also #include their own stuff. So, the most likely outcome of this system is inaccurate or incomplete descriptions of targets and compilation units. That means targets that should have been recompiled on a change are not touched or maybe targets are recompiled when they didn’t need to be causing unnecessary work. In order to solve this, the targets API can be specialized to be specific to C and C++ compilation.

So, what can we do if we know for sure that these are only going to be .c and .h files? Well, then we could use that knowledge to traverse all of the #includes and derive the actual full file as one big long string, for example. GCC, Clang, and MSVC all have command line parameters that can control which steps of the compilation occur. We can use this to just execute the preprocessing step and get the full view of what a .c file is going to look like to the compiler. Here’s an example:

/* test.h */
#ifndef TEST_H_
#define TEST_H_

extern int example;
int foo(int arg);

#endif /* TEST_H_ */

/* test.c */
#include "test.h"

int example = 2;
int foo(int arg) {
  return arg + 1;
}

/* test.h */
#ifndef TEST_H_
#define TEST_H_

extern int example;
int foo(int arg);

#endif /* TEST_H_ */

/* test.c */
#include "test.h"

int example = 2;
int foo(int arg) {
  return arg + 1;
}

nathan@nathans-laptop:~/Dev$ gcc -E test.c
# 0 "test.c"
# 0 "<built-in>"
# 0 "<command-line>"
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 0 "<command-line>" 2
# 1 "test.c"
# 1 "test.h" 1



extern int example;
int foo(int arg);
# 2 "test.c" 2

int example = 2;
int foo(int arg) {
  return arg + 1;
}

nathan@nathans-laptop:~/Dev$ gcc -E test.c
# 0 "test.c"
# 0 "<built-in>"
# 0 "<command-line>"
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 0 "<command-line>" 2
# 1 "test.c"
# 1 "test.h" 1



extern int example;
int foo(int arg);
# 2 "test.c" 2

int example = 2;
int foo(int arg) {
  return arg + 1;
}

Instead of requiring each and every file that makes up a compilation unit be specified in a target’s creation, we now only need the primary .c file to derive the exact same information as before. We can then execute the compiler’s preprocessor, hash the output, and accurately decide if a target needs to be executed depending on if the hashes match the previous cached state. The new API looks something like this (please check out otter for the full files if you are interested):

/* otter_target.h */
typedef enum otter_target_type {
  OTTER_TARGET_OBJECT,        /* .o files */
  OTTER_TARGET_SHARED_OBJECT, /* .so files */
  OTTER_TARGET_EXECUTABLE,
} otter_target_type;

typedef struct otter_target {
  otter_allocator *allocator;
  otter_filesystem *filesystem;
  otter_logger *logger;
  char *name;
  otter_target_type type;
  OTTER_ARRAY_DECLARE(char *, files);
  char *command;
  OTTER_ARRAY_DECLARE(char *, argv);
  OTTER_ARRAY_DECLARE(otter_target *, dependencies);
  unsigned char *hash;
  unsigned int hash_size;
  bool executed;
} otter_target;

otter_target *otter_target_create_c_object(const char *name, const char *flags,
                                           otter_allocator *allocator,
                                           otter_filesystem *filesystem,
                                           otter_logger *logger, ...);
otter_target *otter_target_create_c_executable(
    const char *name, const char *flags, otter_allocator *allocator,
    otter_filesystem *filesystem, otter_logger *logger, const char **files,
    otter_target **dependencies);
otter_target *otter_target_create_c_shared_object(
    const char *name, const char *flags, otter_allocator *allocator,
    otter_filesystem *filesystem, otter_logger *logger, const char **files,
    otter_target **dependencies);
void otter_target_add_dependency(otter_target *target, otter_target *dep);
    
/* otter_make.c */
OTTER_CLEANUP(otter_target_free_p)
otter_target *otter_allocator_obj = otter_target_create_c_object(
    "otter_allocator.o", CC_FLAGS, allocator, filesystem, logger,
    "otter_allocator.c", NULL);

OTTER_CLEANUP(otter_target_free_p)
otter_target *otter_array_obj =
    otter_target_create_c_object("otter_array.o", CC_FLAGS, allocator,
                                 filesystem, logger, "otter_array.c", NULL);
otter_target_add_dependency(otter_array_obj, otter_allocator_obj);

OTTER_CLEANUP(otter_target_free_p)
otter_target *otter_array_tests = otter_target_create_c_shared_object(
    "otter_array_tests.so", CC_FLAGS LL_FLAGS, allocator, filesystem, logger,
    (const char *[]){"otter_array_tests.c", NULL},
    (otter_target *[]){otter_test_obj, otter_array_obj, NULL});
otter_target_execute(otter_array_tests);

/* otter_target.h */
typedef enum otter_target_type {
  OTTER_TARGET_OBJECT,        /* .o files */
  OTTER_TARGET_SHARED_OBJECT, /* .so files */
  OTTER_TARGET_EXECUTABLE,
} otter_target_type;

typedef struct otter_target {
  otter_allocator *allocator;
  otter_filesystem *filesystem;
  otter_logger *logger;
  char *name;
  otter_target_type type;
  OTTER_ARRAY_DECLARE(char *, files);
  char *command;
  OTTER_ARRAY_DECLARE(char *, argv);
  OTTER_ARRAY_DECLARE(otter_target *, dependencies);
  unsigned char *hash;
  unsigned int hash_size;
  bool executed;
} otter_target;

otter_target *otter_target_create_c_object(const char *name, const char *flags,
                                           otter_allocator *allocator,
                                           otter_filesystem *filesystem,
                                           otter_logger *logger, ...);
otter_target *otter_target_create_c_executable(
    const char *name, const char *flags, otter_allocator *allocator,
    otter_filesystem *filesystem, otter_logger *logger, const char **files,
    otter_target **dependencies);
otter_target *otter_target_create_c_shared_object(
    const char *name, const char *flags, otter_allocator *allocator,
    otter_filesystem *filesystem, otter_logger *logger, const char **files,
    otter_target **dependencies);
void otter_target_add_dependency(otter_target *target, otter_target *dep);
    
/* otter_make.c */
OTTER_CLEANUP(otter_target_free_p)
otter_target *otter_allocator_obj = otter_target_create_c_object(
    "otter_allocator.o", CC_FLAGS, allocator, filesystem, logger,
    "otter_allocator.c", NULL);

OTTER_CLEANUP(otter_target_free_p)
otter_target *otter_array_obj =
    otter_target_create_c_object("otter_array.o", CC_FLAGS, allocator,
                                 filesystem, logger, "otter_array.c", NULL);
otter_target_add_dependency(otter_array_obj, otter_allocator_obj);

OTTER_CLEANUP(otter_target_free_p)
otter_target *otter_array_tests = otter_target_create_c_shared_object(
    "otter_array_tests.so", CC_FLAGS LL_FLAGS, allocator, filesystem, logger,
    (const char *[]){"otter_array_tests.c", NULL},
    (otter_target *[]){otter_test_obj, otter_array_obj, NULL});
otter_target_execute(otter_array_tests);

The target’s API has gone from a general system where any kind of file, commands, etc. could be specified to something that is only concerned with C and C++. Sacrificing the generic and being able to constrain the code to the actual problem being solved actually simplifies things, which is kind of cool. Notice as well that commands for targets no longer need to be specified in the make source code. All of the actual compiler commands can be derived from the information provided when creating the target. So, there are even further savings in typing/repetition in the make source code which is an additional maintenance improvement. It also leaves the door open for extending the implementation code to handle other compilers in addition to GCC/Clang.

Nathaniel Wright

Leave a ReplyCancel reply

Improving partial recompilation in my custom build system

Share this:

Like this:

Leave a ReplyCancel reply

Discover more from Nathaniel Wright