Looking towards Tundra 2.0

If you don’t know already, Tundra is an open-source build system I’ve written and have been maintaining for a couple of years now. The source is available on github (https://github.com/deplinenoise/tundra).

In maintaining the tool, I’ve learned some lessons and rethought some of my initial design decisions. This post is about how I want to evolve the tool with those lessons in mind.

What has worked well

Tundra is up to version 1.2 now and is relatively stable. It’s definitely not all bad. The current tool is fast and portable. Its used in a lot of projects by friends and other random people on Twitter (you know who you are!) Specifically, I think the following design decisions have worked out well:

Multi-threaded build engine in C

The build engine itself has proven to be robust and relatively bug-free since early on. It does parallel dependency and header scanning, so incremental builds run really quickly. It uses simple memory management techniques (just flat arrays mostly) which has turned out to work really well.

Using Lua for configuration

Lua is remarkably fast and flexible. I didn’t expect the front-end to run as quickly as it does, given that it does a ton of string processing and data collection before the actual build engine runs. It will commonly dish out a large DAG for the backend in under 100 ms, which is a great achievement for a interpreted script language that has to hit disk to find file names and other things.

Hard separation between configuration and building

The current design emphasizes the two distinct phases in the tool: configuration and building. Once the Lua front-end has produced a DAG, it is no longer part of the runtime. This makes the build faster, as the C side can run unimpeded by script language performance and multi-threading issues.

Header scanning over implicit tracking

Some new-wave build tools (tup, ninja) depend on tool-specific options or OS/file system-level hooking to automatically track implicit dependencies. This can be convenient when it is supported, but significantly limits the build tool in what platforms it can run on and what tools you can use with it. Tundra has successfully been able to accommodate quirky Amiga toolsets, cross-compilation scenarios (Tundra itself is cross-compiled!) and other non-mainstream scenarios such as custom assemblers by scanning the file system for #include information and keeping a cache of such discovered dependencies.

What could be improved

There are some things I’d really like to fix with version 2.0:

Cleaning up stale build products in the file system

Because the DAG is generated fresh every time, the front-end doesn’t generate a complete DAG including all possible configurations and variants. This means a debug build will know nothing of the release build targets. If the build engine knew about all the possible outputs, it could safely deduce that certain output files are now stale and can be deleted. This keeps the file system nice and tidy.

Targeting the build engine with other types of configuration input

The build engine doesn’t really care about the Lua configuration, but because the tool is a monolithic executable with a single front-end, it’s awkward to try to use it to build something completely unrelated to code. For example, if you’d like to use the build engine to build game assets, it requires a significant amount of Lua scripting within the existing framework (which is code-oriented) to try to express those build rules. If the build engine could be configured in more ways, it would be more useful.

IDE integration

There is some integration with Visual Studio and Xcode in the current tool, but it is relatively limited. It would be cool if the current front-end could be made to generate project files that integrate with these tools more easily.

The Plan

Here’s how I’ve been thinking about fixing some of this, while keeping the best features around.

Completely divorce configuration and build engine

It would be cool to split the tool in three pieces, analogous to how the gcc compiler driver is really multiple executables:

  1. One driver executable (the one you invoke – tundra.exe). This is a light-weight front-end binary that first launches a configuration executable (if needed, see below) and then kicks the build engine.
  2. One configuration executable (tundra-luafrontend.exe). This is a program that encapsulates the task of reading configuration data and producing a DAG as output.
  3. One build engine executable (tundra-buildengine.exe). This program just reads DAG data from a file and executes the build as fast as possible.

This design enables some nice benefits:

  • The front-end binary only needs to run when needed (the build files have changed, glob queries would change results, that sort of thing.) Most builds use an identical input DAG. Therefore the top-level build can just skip running the front-end program entirely and use the cached DAG from the previous run. This will shave of precious milliseconds from every incremental build.
  • The build engine can be targeted directly using a custom front-end tool. This means you can plug in some other DAG generator to build your game assets or run other build rules that are not easily expressible in the Lua front-end. Maybe you already have existing build system data (Visual Studio projects?) that you want to run on the Tundra back-end.
  • Maintenance becomes simpler, as changes to the front-end can be tested in isolation without involving the back-end, and vice versa.

Produce complete DAG data – Clean up file system

If the build-engine sees complete DAG data (all configurations, variants, platforms and so on) it can clean up the file system state before it starts building. The reason this isn’t done today is performance; we don’t want the Lua front-end to generate 8x as many DAG nodes if we’re only going to build the debug config anyway. With the mandatory caching outlined above, this problem goes away–the data will only be regenerated when you change the build files anyway.

Feedback welcome

What would you like to see in Tundra 2.0? Drop me a line and let me know!

9 thoughts on “Looking towards Tundra 2.0

  1. I’ve been very happy with our switch to tundra – building a little short of a million lines of C/++ on linux and Windows.

    Some random thoughts:

    I think shaving that 100ms off the front-end is worth it. I’m not really sure what the ‘Runtime-Compiled C++’ guys are doing with their incremental compilation tech, but being able to hook that sort of thing up with tundra would be cool.
    Incidentally “tundra –debug-stats” gives me
    “time spent in Lua doing setup: 1.857s” on our code base.
    From MSVC, I typically hit “Run without Debugging”, so I would definitely appreciate those two seconds.

    It is nice having tundra.exe as a single executable. My preference would be for it to stay that way, but that’s obviously not a big deal.

    Adding new DAG node types has been quite difficult, due mostly to the fact that Lua is dynamically typed, and having to piece together the internals without any documentation. I don’t see any easy way around this, but more documentation of the internals wouldn’t hurt.

    Setting TUNDRA_THREADS to twice the number of hardware threads speeds builds up on all the machines I’ve tested on.

    We do miss the ability to use MSVC’s “Edit And Continue”, so it would be great if the IDE generation had a different mode, where it would spit out native .vcxproj files.

    Thanks for an awesome build tool!

    • Thanks for the feedback Ben!

      1.9 seconds of Lua setup seems like a lot. Have you tried playing with the existing DAG caching at all? Would be very interested in hearing what that would do to your setup time. It only helps when the options are the same on the next run, e.g. when building the Debug configuration several times, which is something the forced split would fix.

      I understand the difficulty of extending the Lua front-end, I find it confusing myself sometimes after returning to it after a couple of months 🙂 It would be cool if the separation of concerns works out well so people could experiment with alternate configuration front-ends.

      When I started prototyping Tundra it was actually a C# program, believe it or not. However, the 500 ms startup overhead of loading a few required assemblies such as System.Text.Regex was just a deal breaker.

      Cheers,
      Andreas

      • What do you mean by “existing DAG caching”?

        I’m still using a modified 1.1 fork.. I should get up to date some time.

        Regarding the splitting of concerns – I’m sure that would help comprehension.

        Some more timings on a build that ends up “0 jobs run”:
        lua setup: 1.897
        build loop: 1.931
        implicit dependency scanning: 28.7
        stat() time: 0.55
        file signing time: 0.95
        up2date checks: 0.204
        total: 3.9

        My units.lua file is 4980 lines.

      • There is a way to have the Lua frontend write out a cache file of the DAG for the next run. Try this in your tundra.lua:

        Build {
        EngineOptions = {
        FileHashSize = 14791,
        RelationHashSize = 29789,
        UseDagCaching = 1,
        },
        -- ...

        And see if that makes a difference. You can play with those hash table sizes too to see if it helps you.

  2. You know what, it’s not so much what tundra does as to how well it does it. The ability to just code your own environments or tools as you need them is really neat (scripting FTW!). However, this I must admit, have been somewhat daunting. I do remember having to spend a lot of time reading all the source before I actually managed to wrap my head around what was going on. Anything that makes this a simpler process, or more straight forward, is a much welcome addition.

    I think the biggest mistake I made approaching tundra was not realizing how powerful the DAG is, as a tool. The way tundra let’s me pipeline different parts of the build process is magical and as much as this is a user error on my part, I can’t help to think about everyone who is missing out.

    I believe I’m reiterating much of what @Ben has said in his comment.

  3. Turning on that DAG caching option brings my Lua setup time down to 0.35 seconds.

    To echo John’s comments – one particularly nasty thing that I did when writing my own DAG node was to make the mistake of allowing different nodes with identical signatures. In hindsight this was obviously going to break things, but it would be good if this caused an error, instead of silent strange behaviour.

  4. ninja does not depend on tool-specific options. It just happens that everyone who uses ninja uses tool-specific options, because it’s easy to integrate, is simple, and gives the best results.
    As for the way tup does it, if it works on a given OS, it works with any tool, including weird cross-compilation toolchains that might not have any sort of tool to extract dependencies and for which you might not have a scanner. Unfortunately, both making it work and making it work fast have significant technical difficulties for each operating system.

  5. Tundra is a great system to provide a good turn around time.
    In our current usage case we have a build machine that
    keeps everything hunky dory. In our current system
    some of our dependent development machine that are much slower and not interested with build from source to due the length time constraints and having to know that they are checking out a good build, work in different language, extra. We provide a snap shot of good dll’s,pdbs.. for them to work with on a daily basis. At time their are frame work problems were we require to have be able to build local on the machine to resolve problems timelously, But this does require us having to now build the dependent project from source which takes alot of time.
    In the past we have try to circumvent this by copying the all tundra output, to prevent build again with version1. But due to the environment variables and things being slightly different this attempt failed.

    Would it be possible for future tundra release to look at supporting a concept so that all source and tundra cache and output can be copied from one machine to another into a different directory with the same relative directory structure to tundra and then after still build incrementally if only the source files actually changed.

    tundra currently will still build incrementally, but a change of system will cause what is in effect a full build from scratch.

    • Hi, thanks for your comment!

      If you copy the files with exactly the same timestamps and to exactly the same location (drive/etc) – it should work today.

      In practice that’s not the way things are done 🙂

      The bigger problem you’ll have if you want to change paths is debug information (either embedded or in PDB files.)

      It’s be nice in some utopian world to have target file caching using input hashes at a higher level, so you don’t even have to get the object files to get the final executable targets. But that has the same problem with debug information.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s