Question

我想检查g ++是否支持尾调用，所以我编写了这个简单的程序来检查它：http://ideone.com/hnXHv

using namespace std;

size_t st;

void PrintStackTop(const std::string &type)
{
    int stack_top;
    if(st == 0) st = (size_t) &stack_top;
    cout << "In " << type << " call version, the stack top is: " << (st - (size_t) &stack_top) << endl;
}

int TailCallFactorial(int n, int a = 1)
{
    PrintStackTop("tail");
    if(n < 2)
        return a;
    return TailCallFactorial(n - 1, n * a);
}

int NormalCallFactorial(int n)
{
    PrintStackTop("normal");
    if(n < 2)
        return 1;
    return NormalCallFactorial(n - 1) * n;
}


int main(int argc, char *argv[])
{
    st = 0;
    cout << TailCallFactorial(5) << endl;
    st = 0;
    cout << NormalCallFactorial(5) << endl;
    return 0;
}

当我正常编译时，似乎g ++并没有真正注意到两个版本之间存在任何差异：

> g++ main.cpp -o TailCall
> ./TailCall
In tail call version, the stack top is: 0
In tail call version, the stack top is: 48
In tail call version, the stack top is: 96
In tail call version, the stack top is: 144
In tail call version, the stack top is: 192
120
In normal call version, the stack top is: 0
In normal call version, the stack top is: 48
In normal call version, the stack top is: 96
In normal call version, the stack top is: 144
In normal call version, the stack top is: 192
120

两者中的堆栈差异为48，而尾部调用版本需要一个 INT。（为什么？）
所以我认为优化可能很方便：

> g++ -O2 main.cpp -o TailCall
> ./TailCall
In tail call version, the stack top is: 0
In tail call version, the stack top is: 80
In tail call version, the stack top is: 160
In tail call version, the stack top is: 240
In tail call version, the stack top is: 320
120
In normal call version, the stack top is: 0
In normal call version, the stack top is: 64
In normal call version, the stack top is: 128
In normal call version, the stack top is: 192
In normal call version, the stack top is: 256
120

在这两种情况下堆栈大小都增加了，虽然编译器可能认为我的CPU比我的内存慢（不管怎么说），我不知道为什么一个简单的函数需要80个字节。（为什么？）尾调用版本也比普通版本占用更多空间，如果int的大小为16字节，则它完全合乎逻辑。（不，我没有128位CPU）现在想一想编译器没有尾调用的原因，我认为它可能是异常，因为它们紧紧依赖于堆栈。所以我尝试了没有例外：

> g++ -O2 -fno-exceptions main.cpp -o TailCall
> ./TailCall
In tail call version, the stack top is: 0
In tail call version, the stack top is: 64
In tail call version, the stack top is: 128
In tail call version, the stack top is: 192
In tail call version, the stack top is: 256
120
In normal call version, the stack top is: 0
In normal call version, the stack top is: 48
In normal call version, the stack top is: 96
In normal call version, the stack top is: 144
In normal call version, the stack top is: 192
120

将正常版本切换回非优化堆栈大小，而优化版本则在其上有8个字节。仍然是一个int不是8个字节我认为在c ++中我错过了需要堆栈的东西，所以我尝试了c：http://ideone.com/tJPpc
仍然没有尾调用，但堆栈要小得多（两个版本中每帧32位）。然后我尝试了优化：

> gcc -O2 main.c -o TailCall
> ./TailCall
In tail call version, the stack top is: 0
In tail call version, the stack top is: 0
In tail call version, the stack top is: 0
In tail call version, the stack top is: 0
In tail call version, the stack top is: 0
120
In normal call version, the stack top is: 0
In normal call version, the stack top is: 0
In normal call version, the stack top is: 0
In normal call version, the stack top is: 0
In normal call version, the stack top is: 0
120

不仅尾部调用优化了第一个，它还尾调用优化了第二个！
为什么g ++没有尾部调用优化，而它在平台上清晰可用？有没有办法强迫它？

Answer 1

因为您正在将临时std :: string对象传递给PrintStackTop（std :: string）函数。此对象在堆栈上分配，从而阻止尾调用优化。

我修改了你的代码：

void PrintStackTopStr(char const*const type)
{
    int stack_top;
    if(st == 0) st = (size_t) &stack_top;
    cout << "In " << type << " call version, the stack top is: " << (st - (size_t) &stack_top) << endl;
}

int RealTailCallFactorial(int n, int a = 1)
{
    PrintStackTopStr("tail");
    if(n < 2)
        return a;
    return RealTailCallFactorial(n - 1, n * a);
}

编译：g ++ -O2 -fno-exceptions -o tailcall tailcall.cpp

现在它使用尾调用优化。如果使用-S标志生成程序集，则可以看到它的运行情况：

L39:
        imull   %ebx, %esi
        subl    $1, %ebx
L38:
        movl    $LC2, (%esp)
        call    __Z16PrintStackTopStrPKc
        cmpl    $1, %ebx
        jg      L39

你看到递归调用内联为循环（jg L39）。

Answer 2

我没有发现其他答案令人满意，因为一旦本地对象消失，它就不会对堆栈产生任何影响。

这是一个good article，它提到本地对象的生命周期延伸到尾部调用的函数中。尾调用优化需要在放弃控制之前销毁本地，GCC不会应用它，除非它确定尾调用不会访问本地对象。

终身分析虽然很难，但看起来过于保守。即使本地的生命周期（范围）在尾部调用之前结束，设置引用本地的全局指针也会禁用TCO。

{
    int x;
    static int * p;
    p = & x;
} // x is dead here, but the enclosing function still has TCO disabled.

这仍然无法模拟正在发生的事情，所以我发现了另一个错误。使用用户定义或非平凡的析构函数将本地传递给参数也会禁用TCO。（定义析构函数= delete允许TCO。）

std::string有一个非常重要的析构函数，所以这就引起了问题。

解决方法是在嵌套函数调用中执行这些操作，因为生命周期分析将能够通过尾调用告知对象已死。但是没有必要放弃所有的C ++对象。

Answer 3

具有临时std::string对象的原始代码仍然是尾递归的，因为该对象的析构函数在退出PrintStackTop("");后立即执行，因此在递归return之后不应执行任何操作言。

但是，有两个问题会导致尾调用优化（TCO）混淆：

参数通过引用PrintStackTop函数
std :: string

可以通过自定义类验证这两个问题中的每一个都能够打破TCO。正如前面的回答中@Potatoswatter所述，这两个问题都有解决方法。只需使用临时PrintStackTop来帮助编译器执行TCO就足以将std::string的调用包装起来了。{/ p>

void PrintStackTopTail()
{
    PrintStackTop("tail");
}
int TailCallFactorial(int n, int a = 1)
{
    PrintStackTopTail();
//...
}

请注意，仅通过将{ PrintStackTop("tail"); }括在大括号中来限制范围是不够的。它必须作为单独的函数包含在内。

现在可以用g ++版本4.7.2（编译选项-O2）验证尾递归被循环替换。

Pass-by-reference hinders gcc from tail call elimination

中也出现了类似的问题

请注意，打印(st - (size_t) &stack_top)不足以确保执行TCO，例如使用优化选项-O3，函数TailCallFactorial自行内联五次，因此TailCallFactorial(5)为作为单个函数调用执行，但是对于较大的参数值（例如对于TailCallFactorial(15);）显示该问题。因此，可以通过查看使用-S标志生成的汇编输出来验证TCO。

为什么gcc没有g ++尾调用优化？

3 个答案: