An interesting bug

I was working on a string library in otter this weekend and ran into an interesting bug and I probably should have known better. The string structure uses a cool feature in C (as of C99… so recent!) called flexible array members. They allow an “empty” trailing array field in the struct that can be dynamically sized. You’re basically given a well-defined offset to the end of the struct’s memory that you can then use to access array items directly after the struct in memory. It may make more sense with an example. If I wanted to have a dynamically sized array of items stored in a struct, I may naively write it like this:

typedef struct {
  const char *name;
  size_t num_integers;
  int *integers;
} foo;

foo *create_foo(const char *name, const size_t num_integers, ...) {
  foo *f = malloc(sizeof(*f));
  if (f == NULL) {
    return NULL;
  }

  f->num_integers = num_integers;
  f->integers = malloc(sizeof(int) * num_integers);
  if (f->integers == NULL) {
    free(f);
    return NULL;
  }
  
  va_list integers;
  va_start(integers, num_integers);
  for (size_t i = 0; i < num_integers; i++) {
    int integer = va_arg(integers, int);
    f->integers[i] = integer;
  }

  va_end(integers);
  return f;
}

typedef struct {
  const char *name;
  size_t num_integers;
  int *integers;
} foo;

foo *create_foo(const char *name, const size_t num_integers, ...) {
  foo *f = malloc(sizeof(*f));
  if (f == NULL) {
    return NULL;
  }

  f->num_integers = num_integers;
  f->integers = malloc(sizeof(int) * num_integers);
  if (f->integers == NULL) {
    free(f);
    return NULL;
  }
  
  va_list integers;
  va_start(integers, num_integers);
  for (size_t i = 0; i < num_integers; i++) {
    int integer = va_arg(integers, int);
    f->integers[i] = integer;
  }

  va_end(integers);
  return f;
}

Notice that there is a need (need is maybe a strong word, but there are lots of cases for wanting to allocate the struct itself instead of having it on the stack) for two calls to malloc in create_foo. One to allocate the foo struct and another to allocate the array of integers that is copied into. We don’t really need two calls to malloc now though do we? We know all the information about the total allocation size we’ll need right at the very beginning of the function. And calling malloc as few times as possible is generally a good strategy. This is where flexible array members become useful. Here’s the example rewritten to use flexible array members:

typedef struct {
  const char *name;
  size_t num_integers;
  int integers[];
} bar;

bar *create_bar(const char *name, const size_t num_integers, ...) {
  bar *b = malloc(sizeof(*b) + (sizeof(int) * num_integers));
  if (b == NULL) {
    return NULL;
  }

  b->num_integers = num_integers;
  va_list integers;
  va_start(integers, num_integers);
  for (size_t i = 0; i < num_integers; i++) {
    int integer = va_arg(integers, int);
    b->integers[i] = integer;
  }

  va_end(integers);
  return b;
}

typedef struct {
  const char *name;
  size_t num_integers;
  int integers[];
} bar;

bar *create_bar(const char *name, const size_t num_integers, ...) {
  bar *b = malloc(sizeof(*b) + (sizeof(int) * num_integers));
  if (b == NULL) {
    return NULL;
  }

  b->num_integers = num_integers;
  va_list integers;
  va_start(integers, num_integers);
  for (size_t i = 0; i < num_integers; i++) {
    int integer = va_arg(integers, int);
    b->integers[i] = integer;
  }

  va_end(integers);
  return b;
}

Now with all the pre-work out of the way, let’s talk about my string library implementation and the bug I was experiencing. The string struct looks like this and should feel familiar now that we’ve discussed flexible array members. It has a size field and an array of characters. To allow the string to grow (e.g., appending strings together, etc.), a capacity of the data array is tracked as well and can be much larger than the used size.

typedef struct otter_string {
  otter_allocator *allocator;
  size_t size;
  size_t capacity;
  char data[] OTTER_COUNTED_BY(size); /* expands to __attribute__((counted_by(size))) */
} otter_string;

typedef struct otter_string {
  otter_allocator *allocator;
  size_t size;
  size_t capacity;
  char data[] OTTER_COUNTED_BY(size); /* expands to __attribute__((counted_by(size))) */
} otter_string;

The only additional concept added is the use of the counted_by attribute. It informs the compiler that the array will be holding a particular number of items at any given time. For the string struct here, it’s a safe assumption that the data array holding the actual string contents should never exceed the counted size. Even though the data could be associated with the capacity field (it’s what we actually allocated, after all), it would be incorrect to access anything outside of the range [0, size).

With this information, the compiler can “improve detection of object size information for such structures and provide better results in compile-time diagnostics and runtime features like the array bound sanitizer and the __builtin_dynamic_object_size.” There are a couple caveats though. For the checks to be correct and for there to be no undefined behavior, one must follow these rules:

The counted by field must be initialized before the first reference to the flexible array
The flexible array field has at least the counted by number of elements available all the time. This relationship must hold even after any of these related objects are updated during the program. (subtle foreshadowing)

So I’m going along implementing this string library. I’m writing tests and compiling and running them in debug mode. Everything is fine. Eventually, I got comfortable enough to use otter_strings throughout the codebase, replacing usages of raw C strings with my new and fancy dynamic string implementation. It was only once I’d integrated it into more places and ran code in release mode that I encountered a fun problem.

nathan@nathans-laptop:~/otter$ ./otter_make 
*** buffer overflow detected ***: terminated
Aborted                    ./otter_make

nathan@nathans-laptop:~/otter$ ./otter_make 
*** buffer overflow detected ***: terminated
Aborted                    ./otter_make

By running valgrind, it turns out the problem was coming from otter_string_append. Interestingly enough, this only happened when compiling with -D_FORTIFY_SOURCE=3. The problem seemed to disappear when using lower fortify source levels.

nathan@nathans-laptop:~/otter$ valgrind ./otter_make 
==201878== Memcheck, a memory error detector
==201878== Copyright (C) 2002-2024, and GNU GPL'd, by Julian Seward et al.
==201878== Using Valgrind-3.26.0 and LibVEX; rerun with -h for copyright info
==201878== Command: ./otter_make
==201878== 
**201878** *** memcpy_chk: buffer overflow detected ***: program terminated
==201878==    at 0x484A0DC: VALGRIND_PRINTF_BACKTRACE (valgrind.h:7352)
==201878==    by 0x48506F5: __memcpy_chk (vg_replace_strmem.c:1766)
==201878==    by 0x406AFB: otter_string_append (in /home/nathan/otter/otter_make)
==201878==    by 0x405C54: otter_target_create_c_object (in /home/nathan/otter/otter_make)
==201878==    by 0x400A8D: main (in /home/nathan/otter/otter_make)
==201878== 
==201878== HEAP SUMMARY:
==201878==     in use at exit: 116,643 bytes in 800 blocks
==201878==   total heap usage: 1,557 allocs, 757 frees, 141,436 bytes allocated
==201878== 
==201878== LEAK SUMMARY:
==201878==    definitely lost: 0 bytes in 0 blocks
==201878==    indirectly lost: 0 bytes in 0 blocks
==201878==      possibly lost: 0 bytes in 0 blocks
==201878==    still reachable: 116,643 bytes in 800 blocks
==201878==         suppressed: 0 bytes in 0 blocks
==201878== Rerun with --leak-check=full to see details of leaked memory
==201878== 
==201878== For lists of detected and suppressed errors, rerun with: -s
==201878== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

nathan@nathans-laptop:~/otter$ valgrind ./otter_make 
==201878== Memcheck, a memory error detector
==201878== Copyright (C) 2002-2024, and GNU GPL'd, by Julian Seward et al.
==201878== Using Valgrind-3.26.0 and LibVEX; rerun with -h for copyright info
==201878== Command: ./otter_make
==201878== 
**201878** *** memcpy_chk: buffer overflow detected ***: program terminated
==201878==    at 0x484A0DC: VALGRIND_PRINTF_BACKTRACE (valgrind.h:7352)
==201878==    by 0x48506F5: __memcpy_chk (vg_replace_strmem.c:1766)
==201878==    by 0x406AFB: otter_string_append (in /home/nathan/otter/otter_make)
==201878==    by 0x405C54: otter_target_create_c_object (in /home/nathan/otter/otter_make)
==201878==    by 0x400A8D: main (in /home/nathan/otter/otter_make)
==201878== 
==201878== HEAP SUMMARY:
==201878==     in use at exit: 116,643 bytes in 800 blocks
==201878==   total heap usage: 1,557 allocs, 757 frees, 141,436 bytes allocated
==201878== 
==201878== LEAK SUMMARY:
==201878==    definitely lost: 0 bytes in 0 blocks
==201878==    indirectly lost: 0 bytes in 0 blocks
==201878==      possibly lost: 0 bytes in 0 blocks
==201878==    still reachable: 116,643 bytes in 800 blocks
==201878==         suppressed: 0 bytes in 0 blocks
==201878== Rerun with --leak-check=full to see details of leaked memory
==201878== 
==201878== For lists of detected and suppressed errors, rerun with: -s
==201878== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Turns out, otter_string_append wasn’t abiding by some of the rules it told gcc it would follow.

void otter_string_append(otter_string **str, const char *append,
                         size_t length) {
  if (str == NULL || *str == NULL || append == NULL) {
    return;
  }

  if ((*str)->size + length > (*str)->capacity) {
    otter_string *expanded = otter_string_expand(*str, (*str)->size + length);
    if (expanded == NULL) {
      return;
    }

    *str = expanded;
  }

  memcpy((*str)->data + (*str)->size - 1, append, length);
  (*str)->size += length;
  (*str)->data[(*str)->size - 1] = '\0';
}

void otter_string_append(otter_string **str, const char *append,
                         size_t length) {
  if (str == NULL || *str == NULL || append == NULL) {
    return;
  }

  if ((*str)->size + length > (*str)->capacity) {
    otter_string *expanded = otter_string_expand(*str, (*str)->size + length);
    if (expanded == NULL) {
      return;
    }

    *str = expanded;
  }

  memcpy((*str)->data + (*str)->size - 1, append, length);
  (*str)->size += length;
  (*str)->data[(*str)->size - 1] = '\0';
}

On line 16 the data field is being offset to the end of size (minus the extra byte for the NULL terminator) and memcpy writes well past the old end of the string to append data. This goes against the rules for marking a flexible array member with the counted_by attribute! We’re explicitly writing past the end of the array (or at least, what we said was the end of the array) and only updating the size afterward. So, at high enough levels of fortify source and optimization, the contract of counted_by is called in when maybe it wasn’t being done before and the code magically breaks. The fix is simple though. The size field must be updated before the memcpy occurs. That way, any generated checks the compiler has added are happy. We’ve expanded the size to what we expect it to be after the append and only then copy to that chunk of memory.

void otter_string_append(otter_string **str_, const char *append,
                         const size_t length) {
  if (str_ == NULL || *str_ == NULL || append == NULL) {
    return;
  }

  otter_string *str = *str_;
  if (str->size + length >= str->capacity) {
    otter_string *expanded = otter_string_expand(str, str->size + length);
    if (expanded == NULL) {
      return;
    }

    str = expanded;
  }

  assert(str->size > 0);
  const size_t offset = str->size - 1;
  str->size += length;
  memcpy(&str->data[offset], append, length);
  str->data[str->size - 1] = '\0';
  *str_ = str;
}

void otter_string_append(otter_string **str_, const char *append,
                         const size_t length) {
  if (str_ == NULL || *str_ == NULL || append == NULL) {
    return;
  }

  otter_string *str = *str_;
  if (str->size + length >= str->capacity) {
    otter_string *expanded = otter_string_expand(str, str->size + length);
    if (expanded == NULL) {
      return;
    }

    str = expanded;
  }

  assert(str->size > 0);
  const size_t offset = str->size - 1;
  str->size += length;
  memcpy(&str->data[offset], append, length);
  str->data[str->size - 1] = '\0';
  *str_ = str;
}

Nathaniel Wright

Leave a ReplyCancel reply

An interesting bug

Share this:

Like this:

Leave a ReplyCancel reply

Discover more from Nathaniel Wright