Question

我有以下数据框function truncateString($string, $start, $limit){ $stripped_string =strip_tags($string); // if there are HTML or PHP tags $string_array =explode(' ',$stripped_string); $truncated_array = array_splice($string_array,$start,$limit); $truncated_string=implode(' ',$truncated_array); return $truncated_string; }

yearly

我想遍历每一行，然后找到该列，其后三列为0。我想要得到这样的内容，它表示至少三个月内没有0的月份：

ID   Jan Feb March April May Jun Jul Aug Sept Oct Nov Dec
ABC   0  0    0     1    0   0    0   0  1     0   0  0
DEF   0  0    0     1    1   0    0   0  1     0   0  0
GHI   0  0    0     1    0   1    0   0  0     1   0  0
MNO   0  0    0     1    0   1    0   0  1     0   0  0
QAL   0  1    1     1    0   0    1   0  0    1   0  0

我已经弄清楚了如何遍历向量并获得索引

ID    col1    col2 
ABC   April   Sept  
DEF   May     Sept 
GHI   Jun      N/A
MNO   Sept    N/A
QAL   N/A     N/A

但是我发现将它链接到原始数据框并获取列有点困难。有什么功能或资源可以指导我吗？

Answer 1

由于每行答案的数量是可变的，所以我选择一个列表。此方法使用rle查找零的游程，然后检查该游程中是否有2个以上。然后，它返回这些运行之前的月份名称。

# Data
df <- read.table(text = "ID   Jan Feb March April May Jun Jul Aug Sept Oct Nov Dec
ABC   0  0    0     1    0   0    0   0  1     0   0  0
           DEF   0  0    0     1    1   0    0   0  1     0   0  0
           GHI   0  0    0     1    0   1    0   0  0     1   0  0
           MNO   0  0    0     1    0   1    0   0  1     0   0  0
           QAL   0  1    1     1    0   0    1   0  0    1   0  0",
           header = TRUE)

# Repackage as list (rows become elements of list)
df_list <- setNames(split(df[, -1], seq(nrow(df))), rownames(df$ID))

# Count function
morpheus_count <- function(x){
  #Run Length Encoding
  tmp <- rle(x)

  # Return months preceding a run of three (or greater) zeroes
  names(tmp$values)[which(tmp$values==0 & tmp$lengths>2)-1]
}

# Run on list
lapply(df_list, morpheus_count)

结果：

# [[1]]
# [1] "April" "Sept" 
# 
# [[2]]
# [1] "May"  "Sept"
# 
# [[3]]
# [1] "Jun"
# 
# [[4]]
# [1] "Sept"
# 
# [[5]]
# character(0)

Answer 2

有多种解决方法：

字符串匹配

这种方法使用字符串匹配，因此依赖于字符长度为1的值：

review

type ValuePrism tag a = Prism (Value tag) (Value tag) a a

可以根据OP的要求将其调整为宽格式：

library(data.table)
library(magrittr)

yearly[, 
       {
         Reduce(paste0, .SD) %>% 
           stringr::str_locate_all("1000") %>% 
           as.data.table()
       }, 
       .SDcols = -"ID", by = "ID"][
         , .(ID, month = names(yearly)[start + 1L])]

    ID month
1: ABC April
2: ABC  Sept
3: DEF   May
4: DEF  Sept
5: GHI   Jun
6: MNO  Sept

以宽格式在滚动窗口中加入列

此方法有点类似于字符串匹配方法。它通过四个连续列的内部联接来查找匹配项，这些内部联接在滚动窗口中跨yearly[, { Reduce(paste0, .SD) %>% stringr::str_locate_all("1000") %>% as.data.table() }, .SDcols = -"ID", by = "ID"][ , .(ID, month = names(yearly)[start + 1L])][ , dcast(.SD, ID ~ rowid(ID, prefix = "col"))][ yearly[, ID], on = "ID"]的列移动，即，它尝试在列ID col1 col2 1: ABC April Sept 2: DEF May Sept 3: GHI Jun <NA> 4: MNO Sept <NA> 5: QAL <NA> <NA>中然后在列{{1}中查找匹配项}，依此类推，最后进入yearly列。

Jan, Feb, March, April

Feb, March, April, May

数据

Sept, Oct, Nov, Dec

Answer 3

数据：

df<-data.table::fread("
ID   Jan Feb March April May Jun Jul Aug Sept Oct Nov Dec
ABC   0  0    0     1    0   0    0   0  1     0   0  0
DEF   0  0    0     1    1   0    0   0  1     0   0  0
GHI   0  0    0     1    0   1    0   0  0     1   0  0
MNO   0  0    0     1    0   1    0   0  1     0   0  0
QAL   0  1    1     1    0   0    1   0  0     1   0  0") %>% setDF

代码：

library(magrittr)
rowNames <- df[,1,drop=T]
months   <- names(df[,-1])
fun1<-function(x) {
    n      <- 3 #at least 3 zeros (change if needed)
    pos    <- c(-1,cumsum(x)) %>% diff %>% as.logical %>% which
    counts <- table(cumsum(x)) %>% as.numeric %>% {. > n & as.logical(x[pos])}
    return(months[pos[counts]])
}

res <- apply(df[,-1],1,fun1)
names(res) <- rowNames

结果：

$ABC
[1] "April" "Sept" 

$DEF
[1] "May"  "Sept"

$GHI
[1] "Jun"

$MNO
[1] "Sept"

$QAL
character(0)

请注意：

确保数据的类型为data.frame。
确保仅将fun1应用于0,1数据。这就是调用df[,-1]的原因。
您可以将n内的fun1更改为其他条件。

遍历每行并比较要迭代的行中多列的值

3 个答案:

字符串匹配

以宽格式在滚动窗口中加入列

数据