Using C++11 Capturing Lambdas w/ Vanilla C API Functions

Here’s a C++11 specific idea I’ve been applying a lot recently.

When designing libraries, I don’t want to expose implementation details as templates. I want hidden types and vanilla C ABI functions on the link level. The reasons are for compile/link speed and to be able to keep a clean binary interface between modules. (I have a separate post about this subject you might like or hate.)

For example, at Insomniac Games, our general sorting routine is defined like this:

typedef int64_t (*QuickCompareFunc)
(const void*, const void*);
typedef int64_t (*QuickCompareFuncReentrant)
(const void*, const void*, void* arg);

void QuickSort (void* __restrict elems, int64_t elem_count,
size_t elem_size, QuickCompareFunc cf);
void QuickSort (void* __restrict elems, int64_t elem_count,
size_t elem_size, QuickCompareFuncReentrant cf,
void* arg);

Note the second overload; it allows you to pass through some opaque state to the compare function. This is essential if you’re trying to sort an array of indices or something like that, where the elements in the array themselves aren’t enough to compare them.

While these functions are good for compile & link speeds, it’s easy to screw up calls while making changes under maintenance because of the typeless interface. So I went looking for a way to combine the standard calling style of these functions with more type safe lambda objects that allow closures.

It turns out it’s pretty easy to do, and it generates good code. You add a template function that gets inlined into the call site. That template function accepts any type of callback object, and funnels it via the user_data pointer. Here’s the overload for type safe + lambda goodness QuickSort():

template <typename T, typename Func>
void QuickSort(T* __restrict elems, int64_t elem_count, Func f)
{
  auto compare_closure = [](const void* ll, const void* rr,
void* context) -> int64_t
  {
    const T* l = (const T*) ll;
    const T* r = (const T*) rr;
    return (*(Func*)context)(l, r);
  };

QuickSort(elems, elem_count, sizeof elems[0], compare_closure, &f);
}

What’s going on here? Well, the function object we’re getting in (the lambda expression) is not going to be a match for the expected C function interface. So we create a new anonymous function (compare_closure) which is a match. We then pass a pointer to our function object through the user_data machinery, cast it to the right type, and call through the pointer.

What does the codegen look like? It looks good. The reason is that the compiler is smart enough to inline your entire lambda in the compare_closure, and there’s still only a single function call. One drawback I’ve found is that in unoptimized builds there’s an additional function call to jump through. Another potential drawback is if you have tons of these functions and you’re calling them a lot. Some compilers generate redundant copies of these template function bodies that the linker will have to clean up.

Armed with this template we can easily sort an array of Frobs using an index table:

Frob* frobs = ...;
size_t frob_count = ...;
int16_t* indices = ...;
QuickSort(indices, frob_count, [&](const int16_t* l,
const int16_t* r)
{
return frobs[*l].m_SortKey - frobs[*r].m_SortKey;
});

What makes this better then std::sort or similar sorting templates? There’s still only one copy of the entire sorting algorithm, so it has huge implications for code bloat if you’re calling QuickSort() a lot.

Since discovering this technique we’ve been applying it to various things (it’s jokingly called “Fredriksson’s Reverse”.) It’s pretty easy to paper over a separately-compiled routine with a typeless interface and turn it into a convenient API at low cost. All it requires is that you have some opaque pointer you can use to recover the function object.

To wrap up, here’s another example where I’m using a separately-compiled zlib state machine pump that wants to pull in I/O data. The I/O data comes from some HTTP endpoint. Normally that would be a pain in the neck to write and adapt to a C-style callback interface, but with our trusty reverse-lambda hack, it’s a breeze:

bool ok = ZlibHelpers::DecompressFile(buffer, u_sz, d_sz, scratch,
[&](void* buf, size_t size) -> bool
{
  return DevHttpClient::ReadChunkedBlock(&m_HttpState, buf, size);
});

Note that you can also access member variables (m_HttpState) and everything works as expected. My function signatures look like this (formatted for display in web browser..):

typedef bool (ZReadFn)(
void* target_buffer,
size_t read_size,
void* user_data);

bool DecompressFile(
void* buffer,
size_t compressed_size, size_t decompressed_size,
MemAllocStack* scratch,
void *user_data,
ZReadFn* read_fn);

template <typename Fn>
bool DecompressFile(
void* buffer,
size_t cs,
size_t ds,
MemAllocStack* scratch,
Fn fn)
{
auto closure = [](void* target_buffer,
size_t read_size,
void* user_data) -> bool
{
return (*(Fn*)user_data)(target_buffer, read_size);
};

return DecompressFile(buffer, cs, ds, scratch, &fn, closure);
}
Advertisements

5 thoughts on “Using C++11 Capturing Lambdas w/ Vanilla C API Functions

  1. Very interesting technique. One question, why don’t you capture the lambda by reference so it doesn’t need to use the user_param?
    auto compare_closure = [&](const void* ll, const void* rr) -> int64_t {
    const T* l = (const T*) ll;
    const T* r = (const T*) rr;
    return (*f)(l, r);
    };

  2. Have you maybe tested this approach when library and client code were actually compiled with very different compilers (in other words: in the presence of different C++ ABIs)? Is it 100% portable accross different ABIs?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s