Question

我试图了解如何在不使用循环的情况下条件替换数据帧中的值。我的数据框架结构如下：

> df
          a b est
1  11.77000 2   0
2  10.90000 3   0
3  10.32000 2   0
4  10.96000 0   0
5   9.90600 0   0
6  10.70000 0   0
7  11.43000 1   0
8  11.41000 2   0
9  10.48512 4   0
10 11.19000 0   0

并且dput输出为：

structure(list(a = c(11.77, 10.9, 10.32, 10.96, 9.906, 10.7, 
11.43, 11.41, 10.48512, 11.19), b = c(2, 3, 2, 0, 0, 0, 1, 2, 
4, 0), est = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), .Names = c("a", 
"b", "est"), row.names = c(NA, -10L), class = "data.frame")

我想要做的是检查b的值。如果b为0，我想将est设置为a的值。我了解df$est[df$b == 0] <- 23 est将b==0的所有值设置为23 est。我不明白的是当条件为真时如何将a设置为值df$est[df$b == 0] <- (df$a - 5)/2.533。例如：

Warning message:
In df$est[df$b == 0] <- (df$a - 5)/2.533 :
  number of items to replace is not a multiple of replacement length

发出以下警告：

{{1}}

有没有办法可以传递相关的单元格，而不是矢量？

Answer 1

由于您有条件地为df$est编制索引，因此您还需要有条件地索引替换向量df$a：

index <- df$b == 0
df$est[index] <- (df$a[index] - 5)/2.533

当然，变量index只是暂时的，我使用它来使代码更易读。你可以一步写出来：

df$est[df$b == 0] <- (df$a[df$b == 0] - 5)/2.533

为了更好的可读性，您可以使用within：

df <- within(df, est[b==0] <- (a[b==0]-5)/2.533)

结果，无论您选择哪种方法：

df
          a b      est
1  11.77000 2 0.000000
2  10.90000 3 0.000000
3  10.32000 2 0.000000
4  10.96000 0 2.352941
5   9.90600 0 1.936834
6  10.70000 0 2.250296
7  11.43000 1 0.000000
8  11.41000 2 0.000000
9  10.48512 4 0.000000
10 11.19000 0 2.443743

正如其他人所指出的，您的示例中的替代解决方案是使用ifelse。

Answer 2

试试data.table的:=运营商：

DT = as.data.table(df)
DT[b==0, est := (a-5)/2.533]

它快而短。有关:=

的更多信息，请参阅这些相关问题

Why has data.table defined :=

When should I use the := operator in data.table

How do you remove columns from a data.frame

R self reference

Answer 3

这是一种方法。 ifelse已向量化，它会检查所有行的b的零值，如果是这种情况，则将est替换为(a - 5)/2.53。

df <- transform(df, est = ifelse(b == 0, (a - 5)/2.53, est))

Answer 4

R-inferno或基本的R文档将解释为什么使用df $ *不是最好的方法。在“[”：

的帮助页面中

“索引为[类似于原子向量并选择指定元素的列表。两个[[和$选择列表中的单个元素。主要区别在于$不允许计算指数，而[[确实如此]。 x $ name相当于x [[“name”，exact = FALSE]]。此外，[[可以使用精确参数控制]的部分匹配行为。 “

我建议使用[row,col]表示法。示例：

Rgames: foo   
         x    y z  
   [1,] 1e+00 1 0  
   [2,] 2e+00 2 0  
   [3,] 3e+00 1 0  
   [4,] 4e+00 2 0  
   [5,] 5e+00 1 0  
   [6,] 6e+00 2 0  
   [7,] 7e+00 1 0  
   [8,] 8e+00 2 0  
   [9,] 9e+00 1 0  
   [10,] 1e+01 2 0  
Rgames: foo<-as.data.frame(foo)

Rgames: foo[foo$y==2,3]<-foo[foo$y==2,1]
Rgames: foo
       x y     z
1  1e+00 1 0e+00
2  2e+00 2 2e+00
3  3e+00 1 0e+00
4  4e+00 2 4e+00
5  5e+00 1 0e+00
6  6e+00 2 6e+00
7  7e+00 1 0e+00
8  8e+00 2 8e+00
9  9e+00 1 0e+00
10 1e+01 2 1e+01

Answer 5

另一种选择是使用case_when

require(dplyr)

transform(df, est = case_when(
    b == 0 ~ (a - 5)/2.53, 
    TRUE   ~ est 
))

如果需要区分两种以上的情况，此解决方案将变得更加方便，因为它可以避免嵌套的if_else构造。

Answer 6

这是我的另一个版本的解决方案，以行和方式解决我的问题。

my.assign <- function(col1, col2, col3){
                       if(col2==0) {col3 <- col1} else {
                       col3 <- 0
                      }
              }

my.max <- function(col1, col2, col3){
                     if(col1 >= 10 ) {max_r <- max(col2, col3, na.rm=TRUE)} 
                         else { max_r <- col2 }
              }


df$est <- with(df,mapply(my.assign,col1=a, col2=b, col3=est))
df$max_row <- with(df,mapply(my.max,col1=a, col2=b, col3=est))

> df
      a b    est max_row
1  11.77000 2  0.000    2.00
2  10.90000 3  0.000    3.00
3  10.32000 2  0.000    2.00
4  10.96000 0 10.960   10.96
5   9.90600 0  9.906    0.00
6  10.70000 0 10.700   10.70
7  11.43000 1  0.000    1.00
8  11.41000 2  0.000    2.00
9  10.48512 4  0.000    4.00
10 11.19000 0 11.190   11.19

有条件地替换data.frame中的值

6 个答案: