dplyr操纵行分组变异

时间:2016-12-30 03:26:49

标签: r dplyr

我有数据集

x <- data.frame(Postcode = c(1, 2, 3, 4, 5, 6), 
                Latitude = c(3.1, 3.2, 3.3, 3.3, 3.4, 3.4),
                Longitude = c(100, 101, 102, 102, 103, 104),
                Exposure = c(1, 2, 3, 4, 5, 6))

我试图操纵x里面的数据变成

x <- data.frame(Postcode = c(1, 2, 3, 4, 5, 6), 
                Latitude = c(3.1, 3.2, 3.3, 3.3, 3.4, 3.4),
                Longitude = c(100, 101, 102, 102, 103, 104),
                Exposure = c(1, 2, 3, 4, 5, 6),
                coords = c("3.1, 100", "3.2, 101", "3.3, 102", "3.3, 102",
                           "3.4, 103", "3.4, 104"),
                postcode = c("1", "2", "3,4", "3,4", "5", "6"),
                exposure = c(1, 2, 7, 7, 5, 6))

新列postcode会将具有相同PostcodeLatitude的{​​{1}}粘贴在一起。 Longitude会粘贴coordsLatitude,而Longitude会将exposure与具有相同Exposure的{​​{1}}相加,即相同coords 1}}和Latitude

我可以使用Longitude包和dplyr循环

来完成此操作
for

如何仅使用x <- mutate(x, coords = paste(Latitude, Longitude, sep = ", ")) x <- cbind(x, postcode = rep(0, nrow(x)), exposure = rep(0, nrow(x))) for(i in unique(x$coords)){ x$postcode[x$coords == i] <- paste(x$Postcode[x$coords == i], collapse = ", ") x$exposure[x$coords == i] <- sum(x$Exposure[x$coords == i]) } 包而不使用dplyr循环来实现此目的?或者可能是比使用for循环更有效的其他方法,因为我的实际数据集非常大

3 个答案:

答案 0 :(得分:2)

library(dplyr)
library(tidyr)  # unite() was used to join Lat, Lon

x %>% unite(coords, Latitude, Longitude, sep = ",", remove = FALSE) %>% 
  group_by(coords) %>% mutate(exposure = sum(Postcode), postcode = toString(Postcode))

答案 1 :(得分:1)

以下是使用dplyr

执行此操作的方法
library(dplyr)
x %>% 
     group_by(coords = paste(Latitude, Longitude, sep = ", ")) %>% 
     mutate(postcode = toString(Postcode), exposure = sum(Exposure))

# Source: local data frame [6 x 7]
# Groups: coords [5]
# 
#   Postcode Latitude Longitude Exposure   coords postcode exposure
#      <dbl>    <dbl>     <dbl>    <dbl>    <chr>    <chr>    <dbl>
# 1        1      3.1       100        1 3.1, 100        1        1
# 2        2      3.2       101        2 3.2, 101        2        2
# 3        3      3.3       102        3 3.3, 102     3, 4        7
# 4        4      3.3       102        4 3.3, 102     3, 4        7
# 5        5      3.4       103        5 3.4, 103        5        5
# 6        6      3.4       104        6 3.4, 104        6        6

答案 2 :(得分:1)

我们可以使用data.table

执行此操作
library(data.table)
setDT(x)[, coords := paste(Latitude, Longitude, sep="," )
  ][, c("exposure", "postcode") :=.(sum(Postcode), toString(Postcode)), coords]
x
#   Postcode Latitude Longitude Exposure  coords exposure postcode
#1:        1      3.1       100        1 3.1,100        1        1
#2:        2      3.2       101        2 3.2,101        2        2
#3:        3      3.3       102        3 3.3,102        7     3, 4
#4:        4      3.3       102        4 3.3,102        7     3, 4
#5:        5      3.4       103        5 3.4,103        5        5
#6:        6      3.4       104        6 3.4,104        6        6