Question

如何通过parfor-cycle在附加功能中并行化for-cycle以在matlab中获得额外的加速？

提前感谢任何提示？

function p = permryser_fast( A )
% Computes the permament of square matrix, A
%   The matrix permanent is defined like the matrix determinant, but
% without the sign terms.

mlock % locks the current M-file in memory 
% suggest using "munlock('perm_ryser_fast')" at start of calling script
persistent pn pz ps % used to build tables for specific size

[m n]=size(A);

if (m ~= n),
   error('Must be a square matrix. perm_ryser_fast()  %d %d\n',m,n)
end

if isempty(ps)
   pn=-1;
end

if (n~=pn)&&(n>1),
   fprintf(1,'Creating table for fast Ryser n=%d %d\n',n,pn)
   pn=n;
   x=1:(2^n -1);                % count (assumes n<=52)
   y=bitxor(x,bitshift(x,-1));  % gray-coded count
   pz=log2(double(bitxor([0 y(1:2^n-2)],y(1:2^n-1))))+1;
   % computes which position comes in/out in gray-count
   ps=(-1).^(mod(y./ 2.^(pz-1),2) < 1);
   % computes whether its in (+1) or out (-1)
end

if (n == 1),
  p = A;
else
  p = 0;  % running permanent accumulator
  rs = zeros(1,n);  % running row sums vector
  % ==============================================
  % Loop over all 2^n subsets of {1,...,n}
  for i=1:(2^n -1) % Just skipping the null subset
    rs = rs + ps(i) * A(pz(i),:);
    p = p + (-1)^i * prod(rs);
  end
  % ==============================================

  p = p * (-1)^m;
end

return

示例：

n = 20;
A = ones(n);
permanent = permryser_fast(A)

在这种情况下，永久物应该是相等的阶乘（n）。

说明：

矢量化能够极大地提高计算速度，但内存要求非常糟糕，矩阵的实际大小最高可达25 x 25
可能唯一的方法是在正确修改递归for-loop后通过parfor-loop进行并行化。
我的最终目标是能够在合理的时间和精度下计算尺寸从25x25到35x35的非负矩阵的永久物。

Answer 1

在尝试parfor之前，您应该尝试对代码进行矢量化以摆脱循环：

function p = permryser_fast_vectorized( A )
% Computes the permament of square matrix, A
%   The matrix permanent is defined like the matrix determinant, but
% without the sign terms.

% suggest using "munlock('perm_ryser_fast')" at start of calling script
persistent pn pz ps % used to build tables for specific size

[m n]=size(A);

if (m ~= n),
   error('Must be a square matrix. perm_ryser_fast()  %d %d\n',m,n)
end

if isempty(ps)
   pn=-1;
end

if (n~=pn)&&(n>1),
   fprintf(1,'Creating table for fast Ryser n=%d %d\n',n,pn)
   pn=n;
   x=1:(2^n -1);                % count (assumes n<=52)
   y=bitxor(x,bitshift(x,-1));  % gray-coded count
   pz=log2(double(bitxor([0 y(1:2^n-2)],y(1:2^n-1))))+1;
   % computes which position comes in/out in gray-count
   ps=(-1).^(mod(y./ 2.^(pz-1),2) < 1);
   % computes whether its in (+1) or out (-1)
end

if (n == 1),
  p = A;
else

  % === vectorized version starts here
  k = 1:(2^n -1); 
  ps_times_A = bsxfun(@times,A(pz(k),:),ps(k)');
  rs = cumsum(ps_times_A);
  p = (-1).^k.*prod(rs,2)';
  p = sum(p) * (-1)^m;
  % === vectorized version ends here
end

return

我使用此基准脚本将我的矢量化版本与您的代码进行基准测试：

n=20;
A=ones(n);
runs = 10;
% run original loop-based code
tic;
for k=1:runs
    permanent_loop = permryser_fast(A);
end
t_loop = toc/runs;

% run vectorized code
tic;
for k=1:runs
    permanent_vectorized = permryser_fast_vectorized(A);
end
t_vectorized = toc/runs;

fprintf('loop: %f s\n',t_loop);
fprintf('vectorized: %f s\n',t_vectorized);

<强>输出

loop: 1.446856 s
vectorized: 0.163842 s

矢量化版本的速度提高了8倍以上。

由parfor进行的matlab并行化

1 个答案: