为什么这些操作不会产生相同的结果?管道进入。 (点)

时间:2020-03-16 07:07:38

标签: r magrittr

今天我在使用.%>%时遇到了一些我不太了解的事情。现在,我不确定我是否能理解其中任何一个。

数据

set.seed(1)
df <- setDT(data.frame(id = sample(1:5, 10, replace = T), value = runif(10)))

这三个为什么是等价的

df[, .(Mean = mean(value)), by = .(id)] %>% .$Mean %>% sum()
[1] 3.529399
df[, .(Mean = mean(value)), by = .(id)] %>% {sum(.$Mean)}
[1] 3.529399
sum(df[, .(Mean = mean(value)), by = .(id)]$Mean)
[1] 3.529399

但是这个答案是如此不同吗?

df[, .(Mean = mean(value)), by = .(id)] %>% sum(.$Mean)
[1] 22.0588

有人可以向我解释管道操作员实际上如何使用.。我过去常常考虑去获取%>% 左侧的内容。

调查让我更加困惑

我尝试将sum替换为print,以查看实际发生的情况

# As Expected
df[, .(Mean = mean(value)), by = .(id)] %>% .$Mean %>% print()
[1] 0.5111589 0.7698414 0.7475319 0.9919061 0.5089610
df[, .(Mean = mean(value)), by = .(id)] %>% print(.$Mean) %>% sum()
[1] 3.529399

# Surprised
df[, .(Mean = mean(value)), by = .(id)] %>% print(.$Mean)
    id      Mean
 1:  1 0.5111589
---             
 5:  3 0.5089610

# Same
df[, .(Mean = mean(value)), by = .(id)] %>% sum(print(.$Mean))
[1] 22.0588

# Utterly Confused
df[, .(Mean = mean(value)), by = .(id)] %>% print(.$Mean) %>% sum()
[1] 18.5294 #Not even the same as above??

编辑:似乎与 data.table 或它的分组方式无关,与 data.frame 相同:

x <- data.frame(x1 = 1:3, x2 = 4:6)

sum(x$x1)
# [1] 6
sum(x$x2)
# [1] 15

x %>% .$x1 %>% sum
# [1] 6
x %>% .$x2 %>% sum
# [1] 15

# Why?
x %>% sum(.$x1)
# [1] 27
x %>% sum(.$x2)
# [1] 36

1 个答案:

答案 0 :(得分:1)

更新后的简短示例会有所帮助。

我们知道使用管道时,第一个参数来自LHS(除非我们用{}“停止”它),所以发生的事情是:

x %>% sum(.$x1)
#[1] 27

等同于

sum(x, x$x1)
#[1] 27

数据帧的总和与列x1相加。


就原始示例而言,我们可以验证相同的行为

library(data.table)

temp <- df[, .(Mean = mean(value)), by = .(id)]
sum(temp, temp$Mean)
#[1] 22.0588
相关问题