data.table rleid()函数Julia等效

时间:2018-04-13 16:48:58

标签: julia

任何人都知道可用于Julia的data.table :: rleid()函数?

https://www.rdocumentation.org/packages/data.table/versions/1.10.4-2/topics/rleid

1 个答案:

答案 0 :(得分:2)

我不知道任何库函数。但在这里你有两个选择。

missing视为有效条目:

function rleid(x::AbstractVector)
    isempty(x) && return Int[]
    rle = similar(x, Int)
    idx = 1
    rle[1] = idx
    prev = x[1]
    for i in 2:length(x)
        this = x[i]
        if ismissing(this)
            if !ismissing(prev)
                prev = this
                idx += 1    
            end
        else
            if ismissing(prev) || this != prev
                prev = this
                idx += 1
            end
        end
        rle[i] = idx
    end
    rle
end

跳过missingmissing放在输出向量中:

function rleid_missing(x::AbstractVector)
    isempty(x) && return Union{Int,Missing}[]
    rle = similar(x, Union{Int, Missing})
    start_i = 1
    while start_i <= length(x) && ismissing(x[start_i])
        rle[start_i] = missing
        start_i += 1
    end
    if start_i <= length(x)
        idx = 1
        rle[start_i] = idx
        prev = x[start_i]
        start_i += 1
        for i in start_i:length(x)
            this = x[i]
            if ismissing(this)
                rle[i] = missing
            else
                if this != prev
                    prev = this
                    idx += 1
                end
                rle[i] = idx
            end
        end
    end
    rle
end

这是一个测试:

Main> rleid([missing,3,4,4,missing,1,1,missing,missing,6])
10-element Array{Int64,1}:
 1
 2
 3
 3
 4
 5
 5
 6
 6
 7

Main> rleid_missing([missing,3,4,4,missing,1,1,missing,missing,6])
10-element Array{Union{Int64, Missings.Missing},1}:
  missing
 1
 2
 2
  missing
 3
 3
  missing
  missing
 4

Main> rleid_missing([missing,3,4,4,missing,1,1,missing,missing,1,6])
11-element Array{Union{Int64, Missings.Missing},1}:
  missing
 1
 2
 2
  missing
 3
 3
  missing
  missing
 3
 4

(在最后一种情况下观察missing被视为 - 如果它不存在 - 如果你想要不同的东西,很容易调整行为。

Julia的美妙之处在于这些功能会很快 - 不需要在用C ++编写的外部库中实现它们。