Question

有时，编译器可以通过对不变量使用模板化的内部实现来更好地优化一段代码。例如，如果图像中有已知数量的通道，而不是执行以下操作：

Image::doOperation() {
    for (unsigned int i = 0; i < numPixels; i++) {
        for (unsigned int j = 0; i j mChannels; j++) {
            // ...
        }
    }
}

你可以这样做：

template<unsigned int c> Image::doOperationInternal() {
    for (unsigned int i = 0; i < numPixels; i++) {
        for (unsigned int j = 0; j < c; j++) {
            // ...
        }
    }
}

Image::doOperation() {
    switch (mChannels) {
        case 1: doOperation<1>(); break;
        case 2: doOperation<2>(); break;
        case 3: doOperation<3>(); break;
        case 4: doOperation<4>(); break;
    }
}

允许编译器为不同的通道计数生成不同的展开循环（这可以大大提高运行时效率，并且还可以打开不同的优化，例如SIMD指令等）。

但是，这通常会扩展为一些相当大的case语句，并且以这种方式优化的任何方法都必须具有展开的case语句。所以，让我们说我们已经有了enum Format的已知图像格式（其中枚举的值恰好映射到通道数）。由于枚举只有一定范围的已知值，因此有尝试这样做的诱惑：

template<Image::Format f> Image::doOperationInternal() {
    for (unsigned int i = 0; i < numPixels; i++) {
        for (unsigned int j = 0; j < static_cast<unsigned int>(f); j++) {
            // ...
        }
    }
}

Image::doOperation() {
    const Format f = mFormat;
    doOperationInternal<f>();
}

然而，在这种情况下，编译器（正确地）抱怨f不是常量表达式，即使它只有一个有限的范围，理论上编译器可以生成switch逻辑来覆盖所有枚举的值。

所以，我的问题是：是否有一种替代方法可以让编译器生成不变值优化的代码，而不需要每个函数调用都有一个switch-case爆炸？

Answer 1

制作跳转表数组，然后调用。目标是创建各种函数的数组，然后执行数组查找并调用所需的函数。

首先，我将使用C ++ 11。 C ++ 1y包含自己的整数序列类型，并且易于编写auto返回类型：C ++ 11将返回void。

我们的仿函数类看起来像这样：

struct example_functor {
  template<unsigned N>
  static void action(double d) const {
    std::cout << N << ":" << d << "\n"; // or whatever, N is a compile time constant
  }
};

在C ++ 11中，我们需要一些样板：

template<unsigned...> struct indexes {};
template<unsigned Max, unsigned... Is> struct make_indexes:make_indexes< Max-1, Max-1, Is... > {};
template<unsigned... Is> struct make_indexes<0, Is...>:indexes<Is...> {};

创建和模式匹配索引包。

接口看起来像：

template<typename Functor, unsigned Max, typename... Ts>
void invoke_jump( unsigned index, Ts&&... ts );

并被称为：

invoke_jump<example_functor, 10>( 7, 3.14 );

我们首先创建一个帮手：

template<typename Functor, unsigned... Is, typename... Ts>
void do_invoke_jump( unsigned index, indexes<Is...>, Ts&&... ts ) {
  static auto table[]={ &(Functor::template action<Is>)... };
  table[index]( std::forward<Ts>(ts)... )
}
template<typename Functor, unsigned Max, typename... Ts>
void invoke_jump( unsigned index, Ts&&... ts ) {
  do_invoke_jump( index, make_indexes<Max>(), std::forward<Ts>(ts)... );
}

创建static Functor::action表，然后对它们进行查找并调用它。

在C ++ 03中，我们没有...语法，因此我们必须手动执行更多操作，并且没有完美的转发。我要做的是改为创建一个std::vector表。

首先，一个可爱的小程序，按顺序在[Begin，End]为我运行Functor.action<I>()：

template<unsigned Begin, unsigned End, typename Functor>
struct ForEach:ForEach<Begin, End-1, Functor> {
  ForEach(Functor& functor):
    ForEach<Begin, End-1, Functor>(functor)
  {
    functor->template action<End-1>();
  }
};
template<unsigned Begin, typename Functor>
struct ForEach<Begin,Begin,Functor> {};

我承认它过于可爱（链是由构造函数依赖项隐式创建的。）

然后我们使用它来构建vector。

template<typename Signature, typename Functor>
struct PopulateVector {
  std::vector< Signature* >* target; // change the signature here to whatever you want
  PopulateVector(std::vector< Signature* >* t):target(t) {}
  template<unsigned I>
  void action() {
    target->push_back( &(Functor::template action<I>) );
  }
};

然后我们可以将两者联系起来：

template<typename Signature, typename Functor, unsigned Max>
std::vector< Signature* > make_table() {
  std::vector< Signature* > retval;
  retval.reserve(Max);
  PopulateVector<Signature, Functor> worker(&retval);
  ForEach<0, Max>( worker ); // runtime work basically done on this line
  return retval;
}

将我们的跳转表构建为std::vector。

然后我们可以轻松调用跳转表的第I个元素。

struct example_functor {
  template<unsigned I>
  static void action() {
    std::cout << I << "\n";
  }
};
void test( unsigned i ) {
  static std::vector< void(*)() > table = make_table< void(), example_functor, 100 >();
  if (i < 100)
    table[i]();
}

当传递整数i时，打印它然后换行。

表中函数的签名可以是您想要的任何内容，因此您可以传入指向类型的指针并调用方法，I是编译时常量。 action方法必须是static，但它可以调用其参数的非基于static的方法。

C ++ 03的最大区别在于，您需要为跳转表的不同签名使用不同的代码，许多机器（以及std::vector而不是静态数组）来构建跳转表。 / p>

在进行严格的图像处理时，您需要以这种方式生成扫描线功能，每个像素的操作可能嵌入在生成的扫描线功能中的某个位置。每个扫描线执行一次跳跃调度通常足够快，除非您的图像宽度为1像素，高度为10亿像素。

上面的代码仍然需要审核才能正确：它是在没有编译的情况下编写的。

Answer 2

Yakk的C ++ 11 / 1y技术很棒但是如果C ++ 03版本对你来说有点太多模板技巧，那么它就是一个更简单/更不优雅的版本，至少可以避免复制和粘贴switch语句并给出你只需要维护一个switch语句：

#include<iostream>

using namespace std;

struct Foo {
    template<unsigned int c>
    static void Action() {
        std::cout << "c: " << c << endl;
    }
};

template<typename F>
void Dispatch(unsigned int c) {
    switch (c) {
    case 1: F::Action<1>(); break;
    case 2: F::Action<2>(); break;
    case 3: F::Action<3>(); break;
    }
}

int main() {
    for (int i = 0; i < 4; ++i)
        Dispatch<Foo>(i);
}

Answer 3

仅仅为了完整起见，这是我在此期间使用的（临时）解决方案：

#define DISPATCH_TEMPLATE_CALL(func, args) do { \
    switch (mChannels) { \
    case 1: func<1> args; break; \
    case 2: func<2> args; break; \
    case 3: func<3> args; break; \
    case 4: func<4> args; break; \
    default: throw std::range_error("Unhandled format"); \
    } \
} while (0)

template<unsigned int n> void Image::doSomethingInternal(a, b, c) {
    // ...
}

void Image::doSomething(a, b, c) {
    DISPATCH_TEMPLATE_CALL(doSomethingInternal, (a, b, c));
}

这显然不是一种更好的方法。但它确实有效。

使模板化优化更易于维护

3 个答案: