Question

我正在研究一系列优化算法的抽象。这些算法可以使用锁定机制或原子操作来串行或多线程运行。

关于算法的多线程版本，我有一个关于完美转发的问题。比方说，我有一些我不愿意复制的仿函数，因为它很贵。我可以确保仿函数是静态的，因为对operator()(...)的调用不会改变对象的状态。下面是一个这样的虚拟函子：

#include <algorithm>
#include <iostream>
#include <iterator>
#include <thread>
#include <vector>

template <class value_t> struct WeightedNorm {
  WeightedNorm() = default;
  WeightedNorm(std::vector<value_t> w) : w{std::move(w)} {}

  template <class Container> value_t operator()(Container &&c) const & {
    std::cout << "lvalue version with w: " << w[0] << ',' << w[1] << '\n';
    value_t result{0};
    std::size_t idx{0};
    auto begin = std::begin(c);
    auto end = std::end(c);
    while (begin != end) {
      result += w[idx++] * *begin * *begin;
      *begin++ /* += 1 */; // <-- we can also modify
    }
    return result; /* well, return std::sqrt(result), to be precise */
  }

  template <class Container> value_t operator()(Container &&c) const && {
    std::cout << "rvalue version with w: " << w[0] << ',' << w[1] << '\n';
    value_t result{0};
    std::size_t idx{0};
    auto begin = std::begin(c);
    auto end = std::end(c);
    while (begin != end) {
      result += w[idx++] * *begin * *begin;
      *begin++ /* += 1 */; // <-- we can also modify
    }
    return result; /* well, return std::sqrt(result), to be precise */
  }

private:
  std::vector<value_t> w;
};

这个仿函数可能也有一些成员函数的引用限定符，如上所示（尽管如此，它们彼此没有区别）。此外，允许函数对象修改其输入c。为了完美地将这个仿函数正确地转发到算法中的工作线程，我想到了以下几点：

template <class value_t> struct algorithm {
  algorithm() = default;
  algorithm(const unsigned int nthreads) : nthreads{nthreads} {}

  template <class InputIt> void initialize(InputIt begin, InputIt end) {
    x = std::vector<value_t>(begin, end);
  }

  template <class Func> void solve_ref_1(Func &&f) {
    std::vector<std::thread> workers(nthreads);
    for (auto &worker : workers)
      worker = std::thread(&algorithm::kernel<decltype((f)), decltype(x)>, this,
                           std::ref(f), x);
    for (auto &worker : workers)
      worker.join();
  }

  template <class Func> void solve_ref_2(Func &&f) {
    auto &xlocal = x;
    std::vector<std::thread> workers(nthreads);
    for (auto &worker : workers)
      worker = std::thread([&, xlocal]() mutable { kernel(f, xlocal); });
    for (auto &worker : workers)
      worker.join();
  }

  template <class Func> void solve_forward_1(Func &&f) {
    std::vector<std::thread> workers(nthreads);
    for (auto &worker : workers)
      worker = std::thread(
          &algorithm::kernel<decltype(std::forward<Func>(f)), decltype(x)>,
          this, std::ref(f), x); /* this is compilation error */
    for (auto &worker : workers)
      worker.join();
  }

  template <class Func> void solve_forward_2(Func &&f) {
    auto &xlocal = x;
    std::vector<std::thread> workers(nthreads);
    for (auto &worker : workers)
      worker = std::thread(
          [&, xlocal]() mutable { kernel(std::forward<Func>(f), xlocal); });
    for (auto &worker : workers)
      worker.join();
  }

private:
  template <class Func, class Container> void kernel(Func &&f, Container &&c) {
    std::forward<Func>(f)(std::forward<Container>(c));
  }

  std::vector<value_t> x;
  unsigned int nthreads{std::thread::hardware_concurrency()};
};

基本上，在编写上述内容时我想到的是algorithm::solve_ref_1和algorithm::solve_ref_2仅在使用lambda函数时彼此不同。最后，他们都使用kernel的左值引用和f的左值引用来调用x，其中x被复制到每个线程中std::thread如何工作或通过lambda中的副本捕获xlocal。它是否正确？我应该小心选择其中一个吗？

到目前为止，我无法做到我想要达到的目标。我没有制作f的不必要的副本，但我也没有尊重它的参考限定符。然后，我想到将f转发给kernel。由于已删除的algorithm::solve_forward_1构造函数用于右值引用，我无法找到编译std::ref的方法。但是，使用lambda函数方法的algorithm::solve_forward_2似乎正在起作用。通过“似乎工作”，我的意思是以下主程序

int main(int argc, char *argv[]) {
  std::vector<double> x{1, 2};
  algorithm<double> alg(2);
  alg.initialize(std::begin(x), std::end(x));

  alg.solve_ref_1(WeightedNorm<double>{{1, 2}});
  alg.solve_ref_2(WeightedNorm<double>{{1, 2}});
  // alg.solve_forward_1(WeightedNorm<double>{{1, 2}});
  alg.solve_forward_2(WeightedNorm<double>{{1, 2}});

  return 0;
}

编译并打印以下内容：

./main.out
lvalue version with w: 1,2
lvalue version with w: 1,2
lvalue version with w: 1,2
lvalue version with w: 1,2
rvalue version with w: 1,2
rvalue version with w: 1,2

简而言之，我有两个主要问题：

有什么理由我应该更喜欢lambda函数版本（或反之亦然），
在我的情况允许/好吗？

f

我问上面的问题，因为在the answer的另一个问题中，作者说：

但是，你不能多次转发，因为这没有任何意义。转发意味着您可能会将参数一直移动到最终调用者，并且一旦移动它就会消失，因此您无法再使用它。

我认为，在我的情况下，我没有移动任何东西，而是试图尊重参考限定符。在我的主程序的输出中，我可以看到 w在rvalue版本中具有正确的值，即，1,2，但是并不意味着我正在做一些未定义的行为，例如尝试访问已经移动的向量的值。

如果您帮助我更好地理解这一点，我将不胜感激。我也对我试图解决问题的方式提出任何其他反馈意见。

Answer 1

没有理由喜欢
在for周期内转发不行。您无法两次转发相同的变量：

template <typename T> void func(T && param) { func1(std::forward<T>(param)); func2(std::forward<T>(param)); // UB }

另一方面，链转发（std::forward(std::forward(…))）没问题。

转发多线程代码

1 个答案: