Question

我有一个排序如下所示的矩阵

1 1 2 2 3

1 2 3 4 1

2 1 2 1 1

我很难描述顺序，但是希望可以从示例中清楚地看到。粗略的想法是，我们首先在第一行上排序，然后在第二行上排序，依此类推。

我想在矩阵中找到特定的列，并且该列可能存在也可能不存在。

我尝试了以下代码：

index = searchsortedfirst(1:total_cols, col, lt=(index,x) -> (matrix[: index] < x))

上面的代码有效，但是速度很慢。我分析了代码，并在“ _get_index”中花费了大量时间。然后，我尝试了以下

  @views index = searchsortedfirst(1:total_cols, col, lt=(index,x) -> (matrix[: index] < x))

正如预期的那样，这很有帮助，这可能是由于我正在切片的缘故。但是，是否有更好的方法可以解决此问题？似乎仍然有很多开销，而且我觉得可能有一种更简洁的编写方式，更容易优化。

但是，我绝对重视速度而不是清晰度。

这是我编写的一些用于比较二进制搜索与线性搜索的代码。

using Profile

function test_search()
    max_val = 20
    rows = 4
    matrix = rand(1:max_val, rows, 10^5)
    matrix = Array{Int64,2}(sortslices(matrix, dims=2))

    indices = @time @profile lin_search(matrix, rows, max_val, 10^3)
    indices = @time @profile bin_search(matrix, rows, max_val, 10^3)
end
function bin_search(matrix, rows, max_val, repeats)
    indices = zeros(repeats)
    x = zeros(Int64, rows)
    cols = size(matrix)[2]
    for i = 1:repeats
        x = rand(1:max_val, rows)
        @inbounds @views index = searchsortedfirst(1:cols, x, lt=(index,x)->(matrix[:,index] < x))
        indices[i] = index
    end
    return indices
end

function array_eq(matrix, index, y, rows)
    for i=1:rows
        @inbounds if view(matrix, i, index) != y[i]
            return false
        end
    end
    return true
end

function lin_search(matrix, rows, max_val, repeats)
    indices = zeros(repeats)
    x = zeros(Int64, rows)
    cols = size(matrix)[2]

    for i = 1:repeats
        index = cols + 1
        x = rand(1:max_val, rows)
        for j=1:cols
            if array_eq(matrix, j, x, rows)
                index = j;
                break
            end
        end
        indices[i] = index
    end
    return indices
end

Profile.clear()
test_search()

这是一些示例输出

0.041356 seconds (68.90 k allocations: 3.431 MiB)
0.070224 seconds (110.45 k allocations: 5.418 MiB)

添加更多@inbounds之后，看起来线性搜索比二进制搜索快。当有10 ^ 5列时似乎很奇怪。

Answer 1

如果速度是最重要的，为什么不简单利用Julia允许您编写快速循环这一事实呢？

julia> function findcol(M, col)                
           @inbounds @views for c in axes(M, 2)
               M[:,c] == col && return c       
           end                                 
           return nothing                      
       end                                     
findcol (generic function with 1 method)       

julia> col = [2,3,2];                          

julia> M = [1 1 2 2 3;                         
           1 2 3 4 1;                          
           2 1 2 1 1];                         

julia> @btime findcol($M, $col)                
  32.854 ns (3 allocations: 144 bytes)         
3

这应该足够快，甚至不考虑任何排序。

Answer 2

我发现了两个问题，即线性搜索和二进制搜索的固定结果都快得多。而且二分查找比线性查找更快。

首先，存在一些类型不稳定性。我将其中一行更改为

matrix::Array{Int64,2} = Array{Int64,2}(sortslices(matrix, dims=2))

这导致数量级加速。而且事实证明，在以下代码中使用@views不会执行任何操作

@inbounds @views index = searchsortedfirst(1:cols, x, lt=(index,x)->(matrix[:,index] < x))

我是Julia的新手，但我的直觉是，无论在匿名函数中使用什么方法，都将复制matrix [：，index]。这是有道理的，因为它允许关闭。

如果我编写一个单独的非匿名函数，则该副本将消失。线性搜索不会复制切片，因此也确实加快了二进制搜索的速度。

朱莉娅：在排序矩阵中搜索列

2 个答案: