Question

我正在尝试学习一点Julia，在阅读了几个小时的手册之后，我写了下面这段代码：

ie = 200;
ez = zeros(ie + 1);
hy = zeros(ie);

fdtd1d (steps)=
    for n in 1:steps
        for i in 2:ie
            ez[i]+= (hy[i] - hy[i-1])
        end
        ez[1]= sin(n/10)
        for i in 1:ie
            hy[i]+= (ez[i+1]- ez[i])
        end
    end

@time fdtd1d(10000);

 elapsed time: 2.283153795 seconds (239659044 bytes allocated)

我认为它在优化之下，因为它比相应的 Mathematica 版本慢得多：

ie = 200;
ez = ConstantArray[0., {ie + 1}];
hy = ConstantArray[0., {ie}];

fdtd1d = Compile[{{steps}}, 
   Module[{ie = ie, ez = ez, hy = hy}, 
    Do[ez[[2 ;; ie]] += (hy[[2 ;; ie]] - hy[[1 ;; ie - 1]]);
     ez[[1]] = Sin[n/10];
     hy[[1 ;; ie]] += (ez[[2 ;; ie + 1]] - ez[[1 ;; ie]]), {n, 
      steps}]; Sow@ez; Sow@hy]];

result = fdtd1d[10000]; // AbsoluteTiming

{0.1280000, Null}

那么，如何让fdtd1d的Julia版更快？

Answer 1

两件事：

第一次运行该函数时，时间将包括代码的编译时间。如果你想要与Mathematica中的编译函数进行比较，你应该运行两次函数并在第二次运行时运行。用我的代码得到：

elapsed time: 1.156531976 seconds (447764964 bytes allocated)

第一次运行，包括编译时间和

elapsed time: 1.135681299 seconds (447520048 bytes allocated)

第二次运行时，您不需要编译。

第二件事，可以说更重要的是，你应该避免性能关键代码中的全局变量。这是the performance tips section of the manual中的第一个提示。

以下是使用局部变量的相同代码：

function fdtd1d_local(steps, ie = 200)
    ez = zeros(ie + 1);
    hy = zeros(ie);
    for n in 1:steps
        for i in 2:ie
            ez[i]+= (hy[i] - hy[i-1])
        end
        ez[1]= sin(n/10)
        for i in 1:ie
            hy[i]+= (ez[i+1]- ez[i])
        end
    end
    return (ez, hy)
end

fdtd1d_local(10000)
@time fdtd1d_local(10000);

比较我机器上的Mathematica代码

{0.094005, Null}

@time fdtd1d_local的结果为：

elapsed time: 0.015188926 seconds (4176 bytes allocated)

或者大约快6倍。全局变量产生很大的不同。

Answer 2

我相信使用有限数量的循环并仅在需要时使用循环。表达式可用于代替循环。不可能避免所有循环，但如果我们减少其中一些循环，代码将被优化。在上面的程序中，我通过使用表达式进行了一些优化。时间几乎减少了一半。

原始代码：

ie = 200;
ez = zeros(ie + 1);
hy = zeros(ie);

fdtd1d (steps)=
    for n in 1:steps
        for i in 2:ie
            ez[i]+= (hy[i] - hy[i-1])
        end
        ez[1]= sin(n/10)
        for i in 1:ie
            hy[i]+= (ez[i+1]- ez[i])
        end
    end

@time fdtd1d(10000);

输出

julia> 
elapsed time: 1.845615295 seconds (239687888 bytes allocated)

优化代码：

ie = 200;
ez = zeros(ie + 1);
hy = zeros(ie);

fdtd1d (steps)=
    for n in 1:steps


        ez[2:ie] = ez[2:ie]+hy[2:ie]-hy[1:ie-1];
        ez[1]= sin(n/10);
        hy[1:ie] = hy[1:ie]+ez[2:end]- ez[1:end-1]

    end

@time fdtd1d(10000);

输出

julia>
elapsed time: 0.93926323 seconds (206977748 bytes allocated)

如何提高这段代码的性能？

2 个答案: