Question

我花了大约一个月左右学习朱莉娅，我印象非常深刻。特别是我正在分析大量的气候模型输出，我将所有这些都放入SharedArrays并进行调整并将其全部并行绘制。到目前为止，它非常快速有效，而且我有一个完整的代码库。我目前的问题是创建一个可以在两个共享阵列上执行基本操作的函数。我已经成功编写了一个带有两个数组的函数以及如何处理它们。该代码基于julia doc并行部分中的示例，并使用myrange函数，如图所示

function myrange(q::SharedArray)
    idx = indexpids(q)
    #@show (idx)
    if idx == 0
        # This worker is not assigned a piece
        return 1:0, 1:0
        print("NO WORKERS ASSIGNED")
    end
    nchunks = length(procs(q))
    splits = [round(Int, s) for s in linspace(0,length(q),nchunks+1)]
    splits[idx]+1:splits[idx+1]
end

function combine_arrays_chunk!(array_1,array_2,output_array,func, length_range);
    #@show (length_range)
    for i in length_range
        output_array[i] = func(array_1[i], array_2[i]);
        #hardwired example for func = +
        #output_array[i] = +(array_1[i], array_2[i]);
    end
    output_array
end

combine_arrays_shared_chunk!(array_1,array_2,output_array,func) = combine_arrays_chunk!(array_1,array_2,output_array,func, myrange(array_1));

function combine_arrays_shared(array_1::SharedArray,array_2::SharedArray,func)
    if size(array_1)!=size(array_2)
        return print("inputs not of the same size")
    end
    output_array=SharedArray(Float64,size(array_1));
    @sync begin
        for p in procs(array_1)
            @async remotecall_wait(p, combine_arrays_shared_chunk!, array_1,array_2,output_array,func)
        end
    end
    output_array
end

可以做的工作

strain_div  = combine_arrays_shared(eps_1,eps_2,+);
strain_tot  = combine_arrays_shared(eps_1,eps_2,hypot);

使用正确的结果将输出作为共享数组根据需要。但是......这很慢。将sharedarray作为普通数组合并到一个处理器上实际上更快，计算然后转换回sharedarray（无论如何，对于我的测试用例，每个数组大约200MB，当我向上移动到GB时我猜不会）。我可以将combine_arrays_shared函数硬连接到仅添加（或其他一些函数），然后你得到速度增加，但是函数类型在combine_arrays_shared内传递，整个事情很慢（慢10倍）比硬连线添加）。

我查看了FastAnonymous.jl包，但在这种情况下我无法看到它是如何工作的。我试过了，但都失败了。有什么想法吗？

我可能只是为我使用的每个基本函数编写一个不同的combine_arrays_...函数，或者将func参数作为一个选项，并在combine_arrays_shared内调用不同的函数，但是我希望它更优雅！这也是了解朱莉娅的好方法。

哈利

Answer 1

这个问题实际上与SharedArrays无关，只是“我如何传递函数作为参数并获得更好的性能？”

FastAnonymous的工作方式 - 与封闭将很快在julia中工作的方式类似 - 就是创建一个带有call方法的类型。如果您出于某种原因遇到FastAnonymous问题，可以随时手动执行：

julia> immutable Foo end

julia> Base.call(f::Foo, x, y) = x*y
call (generic function with 1036 methods)

julia> function applyf(f, X)
           s = zero(eltype(X))
           for x in X
               s += f(x, x)
           end
           s
       end
applyf (generic function with 1 method)

julia> X = rand(10^6);

julia> f = Foo()
Foo()

# Run the function once with each type of argument to JIT-compile
julia> applyf(f, X)
333375.63216645207

julia> applyf(*, X)
333375.63216645207

# Compile anything used by @time
julia> @time 1
  0.000004 seconds (148 allocations: 10.151 KB)
1

# Now let's benchmark
julia> @time applyf(f, X)
  0.002860 seconds (5 allocations: 176 bytes)
333433.439233112

julia> @time applyf(*, X)
  0.142411 seconds (4.00 M allocations: 61.035 MB, 19.24% gc time)
333433.439233112

请注意速度的大幅提升和大大降低的内存消耗。

结合两个SharedArrays的基本操作

1 个答案: