SSE expression generation

I was reading . A very interesting post!  Basically I don’t like relying on tools for autovectorization as I find they’re too unpredictable and you get different behavior depending on compiler flags. So I experimented with hand-vectorizing the expression from the article to see how bad it would be:

static float fun(float x) {
    return 2+min(x+x+x,1.0f/5.0f)*(2.0f+x*3.0f);

Unless my SSE-fu is extremely weak, this turns into the arguably harder-to-read explicit version:

static __m128 fun128(__m128 x) {
    static const __m128 v2 = {2.0f, 2.0f, 2.0f, 2.0f};
    static const __m128 v3 = {2.0f, 2.0f, 2.0f, 2.0f};
    static const __m128 div5 = { 1.0/5.0f, 1.0/5.0f, 1.0/5.0f, 1.0/5.0f };
    return _mm_add_ps(v2, _mm_mul_ps(_mm_min_ps(_mm_mul_ps(v3, x), div5), _mm_add_ps(v2, _mm_mul_ps(x, v3)));

This is of course much harder to maintain and change. But what if there was a way to go from an expression form to SSE form automatically? This sounds like a job for a Common Lisp program; just walk the expression tree and emit the equivalent SSE code. I basically hacked one up in 30 minutes before lunch today, and it sort of works. Let’s try it on some simple cases from the REPL:

CL-USER> (sse-expr '(+ 1 x))
const __m128 vconst48 = { 1.0, 1.0, 1.0, 1.0 };
return _mm_add_ps(vconst48, X);
; No value
CL-USER> (sse-expr '(+ 1 x (/ 2 y)))
const __m128 vconst50 = { 2.0, 2.0, 2.0, 2.0 };
const __m128 vconst49 = { 1.0, 1.0, 1.0, 1.0 };
return _mm_add_ps(vconst49, _mm_add_ps(X, _mm_div_ps(vconst50, Y)));
; No value
CL-USER> (sse-expr '(* 2 (min x (/ 2 y)) z))
const __m128 vconst53 = { 2.0, 2.0, 2.0, 2.0 };
return _mm_mul_ps(vconst53, _mm_mul_ps(_mm_min_ps(X, _mm_div_ps(vconst53, Y)), Z));
; No value

Now let’s try it on the original expression:

CL-USER> (sse-expr '(+ 2 (* (min (* 3 x) 1/5) (+ (* 3 x) 2))))
const __m128 vconst61 = { 0.2, 0.2, 0.2, 0.2 };
const __m128 vconst60 = { 3.0, 3.0, 3.0, 3.0 };
const __m128 vconst59 = { 2.0, 2.0, 2.0, 2.0 };
return _mm_add_ps(vconst59, _mm_mul_ps(_mm_min_ps(_mm_mul_ps(vconst60, X), vconst61), _mm_add_ps(_mm_mul_ps(vconst61, X), vconst61)));

Not too shabby! With something like this it’s much easier to go from expression to SSE form and avoid relying on compiler magic.
I’ve posted the sources (all 58 lines of Common Lisp!) over at github for your browsing pleasure. There are lots of improvements you could make to this code, such as to handle if expressions and convert them to masked selects, but I’ll leave that hack for a rainy day.


2 thoughts on “SSE expression generation

    • Hehe yep we’re still relying on the compiler, but it’s much more explicit than auto-vectorization. It would be easy to create some form of hlsl-like -> vector intrinsic preprocessor based on an approach like this if you wanted to. (Basically save the lisp core out to an executable file and plug it into your build system.)

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s