Tail Calls and C

Some C compilers, such as gcc and clang, can perform tail call optimization (TCO). But not all calls that are in tail position (using an intuitive notion of what tail position means in C) will be subject to TCO. The documentation for these compilers is obscure about which calls are eligible for TCO. That's disappointing if you wish to write C code which exploits this optimization.

One reason for this obscurity might be a feature of the C language that can prevent TCO even when a call is syntactically in a tail position. Consider a called function that accesses local variables of the calling function via a pointer, e.g.:

void f(void)
{
    int x = 42;
    g(&x);
}

void g(int *p)
{
    printf("%d\n", *p);
}

In this example, TCO cannot be applied to the call to g, because that would have the result that f's local variables are no longer available (having been cleaned off the stack). But the behaviour of this C code is well defined: g should be able to dereference the pointer to access the value stored in x.

That is a trivial example. But the issue doesn't only arise when pointers directly passed to a call in tail position. A pointer to a local variable of a calling function might be exposed through a less obvious route, such as a global variable or the heap. So if a pointer is taken to a local variable anywhere in the calling function, and that local variable remains in-scope at the site of a potential tail call, it might prevent TCO:

void f(void)
{
    int x = 42;

    global_var = &x;
    ...

    /* The compiler cannot perform TCO here,
     * unless it can establish that g does not
     * dereference the pointer in global_var. */
    g();
}

As the comment suggests, it's possible that the compiler can perform some analysis to establish that the called function does not in fact dereference the pointer to the local variable. But given the compilation model typically used by C compilers, it is optimistic to expect them to perform such analysis.

But perhaps there is a way to avoid this issue: If the programmer really wants the call to g to be eligible for TCO, they can make it explicit that the lifetime of x does not overlap the call by introducing a nested scope:

void f(void)
{
    {
        int x = 42;

        global_var = &x;
        ...
    }

    g();
}

Unfortunately, this does not have the desired effect for gcc (4.8.2) and clang (3.3). I have written a simple test suite to explore the TCO capabilities of gcc and clang, and it demonstrates that even with the nested scope, taking the pointer to x defeats TCO for f.

(In fact, even if the contents of the nested scope are hoisted into an inline function called from f, that is still sufficient to contaminate f and prevent TCO, in both gcc and clang.)

I'm not aware of other unrelated features of the C language that can pose an obstacle to TCO. But there are implementation issues in gcc and clang that can prevent TCO. That will be the subject of a future post.