如何通过parfor-cycle在附加功能中并行化for-cycle以在matlab中获得额外的加速?
提前感谢任何提示?
function p = permryser_fast( A )
% Computes the permament of square matrix, A
% The matrix permanent is defined like the matrix determinant, but
% without the sign terms.
mlock % locks the current M-file in memory
% suggest using "munlock('perm_ryser_fast')" at start of calling script
persistent pn pz ps % used to build tables for specific size
[m n]=size(A);
if (m ~= n),
error('Must be a square matrix. perm_ryser_fast() %d %d\n',m,n)
end
if isempty(ps)
pn=-1;
end
if (n~=pn)&&(n>1),
fprintf(1,'Creating table for fast Ryser n=%d %d\n',n,pn)
pn=n;
x=1:(2^n -1); % count (assumes n<=52)
y=bitxor(x,bitshift(x,-1)); % gray-coded count
pz=log2(double(bitxor([0 y(1:2^n-2)],y(1:2^n-1))))+1;
% computes which position comes in/out in gray-count
ps=(-1).^(mod(y./ 2.^(pz-1),2) < 1);
% computes whether its in (+1) or out (-1)
end
if (n == 1),
p = A;
else
p = 0; % running permanent accumulator
rs = zeros(1,n); % running row sums vector
% ==============================================
% Loop over all 2^n subsets of {1,...,n}
for i=1:(2^n -1) % Just skipping the null subset
rs = rs + ps(i) * A(pz(i),:);
p = p + (-1)^i * prod(rs);
end
% ==============================================
p = p * (-1)^m;
end
return
示例:
n = 20;
A = ones(n);
permanent = permryser_fast(A)
在这种情况下,永久物应该是相等的阶乘(n)。
说明:
矢量化能够极大地提高计算速度,但内存要求非常糟糕,矩阵的实际大小最高可达25 x 25
可能唯一的方法是在正确修改递归for-loop后通过parfor-loop进行并行化。
我的最终目标是能够在合理的时间和精度下计算尺寸从25x25到35x35的非负矩阵的永久物。
答案 0 :(得分:1)
在尝试parfor
之前,您应该尝试对代码进行矢量化以摆脱循环:
function p = permryser_fast_vectorized( A )
% Computes the permament of square matrix, A
% The matrix permanent is defined like the matrix determinant, but
% without the sign terms.
% suggest using "munlock('perm_ryser_fast')" at start of calling script
persistent pn pz ps % used to build tables for specific size
[m n]=size(A);
if (m ~= n),
error('Must be a square matrix. perm_ryser_fast() %d %d\n',m,n)
end
if isempty(ps)
pn=-1;
end
if (n~=pn)&&(n>1),
fprintf(1,'Creating table for fast Ryser n=%d %d\n',n,pn)
pn=n;
x=1:(2^n -1); % count (assumes n<=52)
y=bitxor(x,bitshift(x,-1)); % gray-coded count
pz=log2(double(bitxor([0 y(1:2^n-2)],y(1:2^n-1))))+1;
% computes which position comes in/out in gray-count
ps=(-1).^(mod(y./ 2.^(pz-1),2) < 1);
% computes whether its in (+1) or out (-1)
end
if (n == 1),
p = A;
else
% === vectorized version starts here
k = 1:(2^n -1);
ps_times_A = bsxfun(@times,A(pz(k),:),ps(k)');
rs = cumsum(ps_times_A);
p = (-1).^k.*prod(rs,2)';
p = sum(p) * (-1)^m;
% === vectorized version ends here
end
return
我使用此基准脚本将我的矢量化版本与您的代码进行基准测试:
n=20;
A=ones(n);
runs = 10;
% run original loop-based code
tic;
for k=1:runs
permanent_loop = permryser_fast(A);
end
t_loop = toc/runs;
% run vectorized code
tic;
for k=1:runs
permanent_vectorized = permryser_fast_vectorized(A);
end
t_vectorized = toc/runs;
fprintf('loop: %f s\n',t_loop);
fprintf('vectorized: %f s\n',t_vectorized);
<强>输出强>
loop: 1.446856 s
vectorized: 0.163842 s
矢量化版本的速度提高了8倍以上。