Question

在网上看到有关Julia并行性的一些教程之后，我决定实现一个小的并行代码段来计算谐波序列。

序列号是：

harmonic = function (n::Int64)
    x = 0
    for i in n:-1:1 # summing backwards to avoid rounding errors
        x +=1/i
    end
    x
end

我制作了2个并行版本，一个使用@distributed宏，另一个使用@everywhere宏（julia -p 2 btw）：

@everywhere harmonic_ever = function (n::Int64)
    x = 0
    for i in n:-1:1
        x +=1/i
    end
    x
end

harmonic_distr = function (n::Int64)
    x = @distributed (+) for i in n:-1:1
        x = 1/i
    end
    x
end

但是，当我运行上面的代码并@time时，却没有任何提速-实际上，@distributed版本的运行速度要慢得多！

@time harmonic(10^10)
>>> 53.960678 seconds (29.10 k allocations: 1.553 MiB) 23.60306659488827
job = @spawn harmonic_ever(10^10)
@time fetch(job)
>>> 46.729251 seconds (309.01 k allocations: 15.737 MiB) 23.60306659488827
@time harmonic_distr(10^10)
>>> 143.105701 seconds (1.25 M allocations: 63.564 MiB, 0.04% gc time) 23.603066594889185

让我完全困惑的是“ {0.04% gc time”。我显然丢失了一些东西，而且我看到的示例也不适用于1.0.1版本（例如，一个使用@parallel的示例）。

Answer 1

您的发行版应该是

function harmonic_distr2(n::Int64)
    x = @distributed (+) for i in n:-1:1
        1/i # no x assignment here
    end
    x
end

@distributed循环将在每个工作线程上累加1/i的值，然后在主进程上累加。

请注意，通常最好使用BenchmarkTools的@btime宏而不是@time进行基准测试。

julia> using Distributed; addprocs(4);

julia> @btime harmonic(1_000_000_000); # serial
  1.601 s (1 allocation: 16 bytes)

julia> @btime harmonic_distr2(1_000_000_000); # parallel
  754.058 ms (399 allocations: 36.63 KiB)

julia> @btime harmonic_distr(1_000_000_000); # your old parallel version
  4.289 s (411 allocations: 37.13 KiB)

如果仅在一个进程上运行，并行版本当然会比较慢：

julia> rmprocs(workers())
Task (done) @0x0000000006fb73d0

julia> nprocs()
1

julia> @btime harmonic_distr2(1_000_000_000); # (not really) parallel
  1.879 s (34 allocations: 2.00 KiB)

朱莉娅并行性：@distributed（+）比串行速度慢？

1 个答案: