Tool/library memory management – You’re doing it wrong

Warning: this is something of a rant. Don’t say I didn’t warn you.

Something that continues to frustrate me is the proliferation of poor memory management decisions in tools and libraries for tools. The default is to allocate every single tiny piece of memory with malloc and then free it later.

Case in point: JSON parser libraries. Most of them allocate tiny JSON values individually from the C library heap. It makes absolutely no sense. The way most command-line tools work with JSON data is to slurp in a data file, parse it to a JSON structure, process that data and then exit. They typically don’t sit around holding a pointer to a small fragment of a JSON document, it’s all managed together. The choice to allocate JSON values on the CRT heap individually is terrible in a scenario like that. What we should do instead is to exploit the higher-level knowledge about the memory usage patterns.

A lot of people are now preparing their rebuttal detailing how fast a general heap manager can be and that we can optimize the allocator — I call bullshit on all that, and here’s why.

To prove how much of a difference this makes in practice, I modified the popular json-parser library so I can select at runtime whether to use the CRT heap or a custom page-based linear allocator. The program simply reads a block of JSON data into memory and parses it 50 times, immediately throwing away the parsed data.

You can check out the code here on github.

Here are the results:

$ time mmtest game.json heap

real 0m1.642s
user 0m1.589s
sys  0m0.051s
$ time mmtest game.json linear

real 0m0.755s
user 0m0.697s
sys  0m0.057s

The stock version using the CRT heap is 2.2x slower!!

Running with valgrind to trace the heap activity reveals the (only) difference between these runs:

heap: 7,024,565 allocs, 7,024,205 frees, 337,671,820 bytes allocated
linear: 615 allocs, 255 frees, 420,426,770 bytes allocated

No wonder it’s faster! That’s a LOT of cache misses and pointless branches we’re sidestepping entirely in the malloc/free functions; more than 7 million calls. Sure, we’re wasting 90 MB of RAM, but do you really care in a command-line program that does its thing and then quits? If you’re worried about it, it can be trimmed down easily. I used a block size of 1 MB in my linear allocator for this test. Shrinking it down to 128 kb will still give you almost the same benfit. (The mismatched allocs/frees are from Apple code outside my control, go figure.)

The lesson to take away from this is simple: Never make memory management decisions at the lowest level of a system. Always allow the user to control the memory management policy on the highest possible level, where its possible to do high-level optimizations safely.

As a bonus C++ bashing, this is why anything based on the STL sucks, as it by definition takes microscopic decicions about where memory comes from and how long it will live. The standard response to MM performance concerns is to “replace the allocator with a more efficient one”. This is the wrong answer, as I hope you can clearly see above. The fastest code is the one you never have to run.

Rant over. 🙂


10 thoughts on “Tool/library memory management – You’re doing it wrong

  1. Huh? STL allows you to override the default allocator on every container. Yeah, the allocator design sucks, but it’s better than having no control at all.

    I think I agree with the general point of allowing the user to control memory management on everything, but you also have to be careful not to overload the user with too many parameters. Customisation and parameterisation can be taken too far. That being said, for games at least, control over MM is a must.

    • You can replace the allocator in theory, but no average library writer does so. And it is an incredibly painful way to go. Consider if json-parser stored std::vector objects in its values and you wanted this same speed up I showed. It would take you days of work mucking around with templates and crap which adds considerable complexity.

      There are not a whole lot of success stories regarding STL and efficient memory handling, that’s all I’m saying.

      • I used STL containers with custom allocators in a pet toy project and it worked out fine. What issues do you typically see when using, say, a std::vector with a custom allocator in a real world project?

      • First of all, if that std::vector is inside some middleware library, how to you replace the allocator? Hopefully they’ve expose some way of setting it, so assuming that’s out of the way — Second of all, a std::vector always manages memory the same way – it grows exponentially (or when you resize/reserve it) — and it believes it owns the memory it sits on top of. There’s really no way to tell it that “look, we’re all just using scratch memory in this 16 MB range, so don’t bother freeing anything” – when you have lots of data, these things really start to take their toll.

      • Ugh, can’t nest comments further.

        std::vector takes the Allocator type as a template parameter. This additionally allows the compiler to inline the allocation / deallocation code; which may be a significant speed boost for a linear allocator since it has a tiny fast path (that falls back to a larger, out-of-line stub when you need to call out to malloc).

        You can’t modify the allocation policy of growing the buffer exponentially but you can have the std::vector not free any memory by simply making your allocator’s “free” operation a null-op.

        This is the toy project I was talking about:

      • I know how std::vector and std::allocator work. What I’m saying is you don’t have enough information at that low level to make global optimizations like this. The abstraction doesn’t fit the problem.

        If you feel different about it, try writing a JSON parser library using std::vector and allowing the user to customize the memory management in a manner similar to what I was doing.

  2. I don’t think profiliation is a word; maybe proliferation is what you’re looking for? Otherwise I’m on board with your argument.

  3. Dude. Could not agree more. This sort of thing is symptomatic of inexperienced people writing libraries. When I was at uni (pre stl) the sample solution code for a string class they got us to write called new in practically every function.
    We used a facial animation middleware on a ps2 game that newed every single value. When we asked WTF? they replied “overlaod global new”…

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s