加速代码 - 矢量化

时间:2013-02-12 22:13:11

标签: matlab vectorization nested-loops

我对vectorization并不熟悉,但我知道,在MATLAB的优势中,代码矢量化可能是最有回报的。

我有这段代码:

ikx= (-Nx/2:Nx/2-1)*dk1;
iky= (-Ny/2:Ny/2-1)*dk2;
ikz= (-Nz/2:Nz/2-1)*dk3;

[k1,k2,k3] = ndgrid(ikx,iky,ikz);
k = sqrt(k1.^2 + k2.^2 + k3.^2);
Cij = zeros(3,3,Nx,Ny,Nz);
count = 0;
for ii = 1:Nx
    for jj = 1:Ny
        for kk = 1:Nz
            if ~isequal(k1(ii,jj,kk),0)
                count = count +1;
                fprintf('iteration step %i\r\n',count)
                E_int = interp1(k_vec,E_vec,k(ii,jj,kk),'spline','extrap');
                beta = c*gamma./(k(ii,jj,kk).*sqrt(E_int));
                k30 = k3(ii,jj,kk) + beta*k1(ii,jj,kk);
                k0 = sqrt(k1(ii,jj,kk)^2 + k2(ii,jj,kk)^2 + k30^2);
                Ek0 = 1.453*(k0^4/((1 + k0^2)^(17/6)));
                B = sigmaiso*sqrt((Ek0./(k0.^2))*((dk1*dk2*dk3)/(4*pi)));
                C1 = ((beta.*k1(ii,jj,kk).^2).*(k0.^2 - 2*k30.^2 + k30.*beta.*k1(ii,jj,kk)))./(k(ii,jj,kk).^2.*(k1(ii,jj,kk).^2 + k2(ii,jj,kk).^2));
                C2 = ((k2(ii,jj,kk).*(k0.^2))./((k1(ii,jj,kk).^2 + k2(ii,jj,kk).^2).^(3/2))).*atan2((beta.*k1(ii,jj,kk).*sqrt(k1(ii,jj,kk).^2 + k2(ii,jj,kk).^2)),(k0.^2 - k30.*beta.*k1(ii,jj,kk)));
                xhsi1 = C1 - C2.*(k2(ii,jj,kk)./k1(ii,jj,kk));
                xhsi2 = C1.*(k2(ii,jj,kk)./k1(ii,jj,kk)) + C2;
                Cij(1,1,ii,jj,kk) = B.*((k2(ii,jj,kk).*xhsi1)./(k0));
                Cij(1,2,ii,jj,kk) = B.*((k3(ii,jj,kk)-k1(ii,jj,kk).*xhsi1+beta.*k1(ii,jj,kk))./(k0));
                Cij(1,3,ii,jj,kk) = B.*(-k2(ii,jj,kk)./(k0));
                Cij(2,1,ii,jj,kk) = B.*((k2(ii,jj,kk).*xhsi2-k3(ii,jj,kk)-beta.*k1(ii,jj,kk))./(k0));
                Cij(2,2,ii,jj,kk) = B.*((-k1(ii,jj,kk).*xhsi2)./(k0));
                Cij(2,3,ii,jj,kk) = B.*(k1(ii,jj,kk)./(k0));
                Cij(3,1,ii,jj,kk) = B.*(k2(ii,jj,kk).*k0./(k(ii,jj,kk).^2));
                Cij(3,2,ii,jj,kk) = B.*(-k1(ii,jj,kk).*k0./(k(ii,jj,kk).^2));               
            end
        end
    end
end

通常,我可能会避免使用嵌套的for循环;尽管如此,关于if值的k1声明目前正引导我走向古典和旧时代码结构。

我公然希望绕过for循环的存在,转而采用矢量化和更优雅的解决方案。

任何支持都非常受欢迎。

修改

为了更好地理解代码应该执行的内容,我特此为您提供一些基础知识:

enter image description here

enter image description here

enter image description here

enter image description here

EDIT2

@Floris建议,我想出了这个替代解决方案:

ikx= (-Nx/2:Nx/2-1)*dk1;
iky= (-Ny/2:Ny/2-1)*dk2;
ikz= (-Nz/2:Nz/2-1)*dk3;

[k1,k2,k3] = ndgrid(ikx,iky,ikz);
k = sqrt(k1.^2 + k2.^2 + k3.^2);

ii = (ikx ~= 0);
k1w = k1(ii,:,:);
k2w = k2(ii,:,:);
k3w = k3(ii,:,:);
kw = k(ii,:,:);

E_int = interp1(k_vec,E_vec,kw,'spline','extrap');
beta = c*gamma./(kw.*sqrt(E_int));

k30 = k3w + beta.*k1w;
k0 = sqrt(k1w.^2 + k2w.^2 + k30.^2);
Ek0 = (1.453*k0.^4)./((1 + k0.^2).^(17/6));
B = sqrt((2*(pi^2)*(l^3))*(Ek0./(V*k0.^4)));

k1w_2 = k1w.^2;
k2w_2 = k2w.^2;
k30_2 = k30.^2;
k0_2 = k0.^2;
kw_2 = kw.^2;

C1 = ((beta.*k1w_2).*(k0_2 - 2.*k30_2 + beta.*k1w.*k30))./(kw_2.*(k1w_2 + k2w_2));
C2 = ((k2w.*k0_2)./((k1w_2 + k2w_2).^(3/2))).*atan2((beta.*k1w).*sqrt(k1w_2 + k2w_2),(k0_2 - k30.*k1w.*beta));

xhsi1 = C1 - (k2w./k1w).*C2;
xhsi2 = (k2w./k1w).*C1 + C2;

Cij = zeros(3,3,Nx,Ny,Nz);

Cij(1,1,ii,:,:) = B.*(k2w.*xhsi1);
Cij(1,2,ii,:,:) = B.*(k3w - k1w.*xhsi1 + beta.*k1w);
Cij(1,3,ii,:,:) = B.*(-k2w);
Cij(2,1,ii,:,:) = B.*(k2w.*xhsi2 - k3w - beta.*k1w);
Cij(2,2,ii,:,:) = B.*(-k1w.*xhsi2);
Cij(2,3,ii,:,:) = B.*(k1w);
Cij(3,1,ii,:,:) = B.*((k0_2./kw_2).*k2w);
Cij(3,2,ii,:,:) = B.*(-(k0_2./kw_2).*k1w);

1 个答案:

答案 0 :(得分:1)

您可以只进行一次测试,然后创建“只需要您需要的元素”的数组。例如:

% create an index of all the elements that are worth computing:
worthComputing = find(k1(:)~=0);
% now create sub-arrays of all the other arrays... a little bit expensive on memory,
% but much faster for computation:
kw =  k(worthComputing);
k1w = k1(worthComputing);
k2w = k2(worthComputing);
k3w = k3(worthComputing);

% now we'll compute all the results of the innermost for loop in single statements:
E_int = interp1(k_vec,E_vec,kw,'spline','extrap');
beta = c*gamma./kw.*sqrt(E_int));
k30 = k3w + beta*k1w;
k0 = sqrt(k1w.^2 + k2w.^2 + k30.^2);
Ek0 = 1.453*(k0.^4/((1 + k0.^2).^(17/6)));

%下一行有dk1dk2dk3 ...不确定它们是什么?未显示已初始化。假设标量没有编入索引。

B = sigmaiso*sqrt((Ek0./(k0.^2))*((dk1*dk2*dk3)/(4*pi)));
C1 = ((beta.*k1w.^2).*(k0.^2 - 2*k30.^2 + k30.*beta.*k1w))./(kw.^2.*(k1w.^2 + k2w.^2));
C2 = ((k2w.*(k0.^2))./((k1w.^2 + k2w.^2).^(3/2))).*atan2((beta.*k1w.*sqrt(k1w.^2 + ...
    k2w.^2)),(k0.^2 - k30.*beta.*k1w));
xhsi1 = C1 - C2.*(k2w./k1w);
xhsi2 = C1.*(k2w./k1w) + C2;

%在接下来的几行中,我正在使用“折叠”其余索引的技巧 换句话说,Matlab指出我想要访问C中的元素 与之前选择的ii, jj, kk对应的%...

Cij(1,1,worthComputing) = B.*((k2w.*xhsi1)./(k0));
Cij(1,2,worthComputing) = B.*((k3w-k1w.*xhsi1+beta.*k1w)./(k0));
Cij(1,3,worthComputing) = B.*(-k2w./(k0));
Cij(2,1,worthComputing) = B.*((k2w.*xhsi2-k3w-beta.*k1w)./(k0));
Cij(2,2,worthComputing) = B.*((-k1w.*xhsi2)./(k0));
Cij(2,3,worthComputing) = B.*(k1w./(k0));
Cij(3,1,worthComputing) = B.*(k2w.*k0./(kw.^2));
Cij(3,2,worthComputing) = B.*(-k1w.*k0./(kw.^2));

完全有可能在上面有一两个错字 - 但这是矢量化的基本方法。