dplyr自我加入过滤器

时间:2016-03-02 10:13:08

标签: r dplyr

我想从长格式数据框中的所有其他标记项中减去标记为“baseline”的行中的值。使用带有“baseline”子集的left_join,可以通过两个步骤轻松完成此操作。但是,我无法弄清楚如何将vas_1vas_diff合并到一个链中。

library(dplyr)
# Create test data
n_users = 5
vas = data_frame(
  user = rep(letters[1:n_users], each = 3),
  group = rep(c("baseline", "early", "late" ),n_users),
  vas = round(rgamma(n_users*3, 10,1.4 ))
)
# The above data are given


# Assume some other operations are required
vas_1 = vas %>%
  mutate(
    vas = vas * 2
  )
# I want to put the following into one
# chain with the above
# Use self-join to subtract baseline
vas_diff = vas_1 %>%
  filter(group != "baseline") %>%
  # Problem is vas_1 here. Using . gives error here
  # Adding copy = TRUE does not help
#  left_join(. %>% filter(group == "baseline") , by = c("user")) %>%
  left_join(vas_1 %>% filter(group == "baseline") , by = c("user")) %>%
  mutate(vas = vas.x - vas.y) %>% # compute offset
  select(user, group.x, vas) # remove temporary variables

vas_diff

2 个答案:

答案 0 :(得分:2)

我应该多次使用int main() { char m_cityCharCount[4]; // Input the number of cities fgets(m_cityCharCount, 4, stdin); return 0; } 时使用匿名函数:

.

因此,在您的情况下:

... %>% (function(df) { ... }) %>% ...

(这不会产生如上面评论中描述的理想结果,但是它显示了如何使用匿名函数)

但可能你想要这个:

vas_diff = vas_1 %>%
  filter(group != "baseline") %>%
  (function(df) left_join(df, df %>% filter(group == "baseline") , by = c("user"))) %>%
  mutate(vas = vas.x - vas.y) %>% # compute offset
  select(user, group.x, vas)

答案 1 :(得分:0)

这是一个类似的选项,并演示了您可以将整个管道链作为参数传递给联接。您可以将.作为参数传递给filter,而不是在.内移动eval,然后在右侧删除不需要的列。这主要是出于我自己的目的记录此方法。

vas_diff = vas_1 %>%
  left_join(x = eval(.) %>% 
                  filter(group != "baseline"),
            y = eval(.) %>% 
                  filter(group == "baseline") %>%
                  select(-group),
            by = c("user")) %>%
  mutate(vas = vas.x - vas.y) %>% # compute offset
  select(user, group, vas)

有人知道您为什么不能像.那样简单地通过x = . %>% filter .....吗?为什么我们需要eval(.)