将数据帧从每个时间段的行转换为每天一行

时间:2017-11-09 17:38:00

标签: r dataframe

我正在使用一个数据框架,该数据框架在住院期间保存患者的位置。它的格式化方式是每一行代表该人(= ID)在特定时间段(从BeginTim到EndTime)的新位置(=部门,房间,床)。

以下是初始数据框的示例:

 print(data_perlocation[1:10,])
    ID      department      BeginTime                     EndTime             room   Bed
   <dbl>      <chr>            <chr>                       <chr>               <dbl> <dbl>
1 2156864       L14B 2016-03-02 09:40:00.0000000 2016-03-02 15:20:00.0000000   102     3
2 2161034       B51A 2016-06-07 00:00:00.0000000 2016-06-07 11:02:00.0000000   109     2
3 2161034       B51A 2016-06-06 09:00:00.0000000 2016-06-06 10:27:00.0000000   109     2
4 2161034       B51A 2016-06-06 12:47:00.0000000 2016-06-07 00:00:00.0000000   109     2
5 2161034       B51A 2016-06-06 10:27:00.0000000 2016-06-06 12:47:00.0000000   103     3
6 2176442       L14B 2016-02-04 07:15:00.0000000 2016-02-04 13:47:00.0000000   101     4
7 2176754       B61A 2016-03-15 07:16:00.0000000 2016-03-15 14:56:00.0000000   109     3
8 2176754       B61A 2016-03-16 08:10:00.0000000 2016-03-17 00:00:00.0000000   109     3
9 2176754       B61A 2016-03-15 14:56:00.0000000 2016-03-16 08:10:00.0000000   109     2
10 2176754      B61A 2016-03-17 00:00:00.0000000 2016-03-17 11:18:00.0000000   109     3
11 2184060      B61A 2016-03-10 20:25:00.0000000 2016-03-11 00:00:00.0000000   105     2
12 2184060      B61A 2016-03-10 20:01:00.0000000 2016-03-10 20:25:00.0000000   105     1
13 2184060      B61A 2016-03-11 00:00:00.0000000 2016-03-12 00:00:00.0000000   105     2
14 2184060      B61A 2016-03-12 00:00:00.0000000 2016-03-12 14:00:00.0000000   105     2

我想转换这个数据框,以便每天都有一行。因此,我创建了一个新的数据框,其中包含ID和入场日。像这样:

    ID       Date
1  2156864 2016-03-02
2  2161034 2016-06-06
3  2161034 2016-06-07
4  2176442 2016-02-04
5  2176754 2016-03-15
6  2176754 2016-03-16
7  2176754 2016-03-17
8  2184060 2016-03-10
9  2184060 2016-03-11
10 2184060 2016-03-12

现在,我想将data_bylocation数据框中每天存在的(多个)位置添加到匹配data_byday的{​​{1}}行,并ID 1}}匹配日期。

我最终结合了一个for和两个if语句。到目前为止,我的尝试并没有给出任何接近预期结果的东西,我认为它们必须是一种更容易实现的方法。我的最后一次尝试结束了这样的事情:

beginDate

期望的结果如下所示:

data_perday[,3] <- NA
for (index in 1:nrow(data_perlocation)){
  if (data_perlocation$ID[index]==data_perday$ID & as.Date(as.character(data_perlocation$BeginTime[index]), format="%Y-%m-%d")==as.Date(data_perday$Date, format="%Y-%m-%d")) {
    if (is.na(data_perday[index,3])){           
    ##code to assign location and time of for that day
} else {
    ##code to assign second location and time of for that day and place
      }}}

我对R很新,还在学习。我一直坚持这个问题。因此,非常感谢任何正确方向的提示!

编辑:

可重复的例子:

    ID       Date        BeginTime1  EndTime1  department1  room1  bed1       BeginTime2  EndTime2  department2  room2  bed2   [3rd location, etc]
1  2156864 2016-03-02              [first location of this day]                      [second location of this day] 
2  2161034 2016-06-06
3  2161034 2016-06-07
4  2176442 2016-02-04
5  2176754 2016-03-15
6  2176754 2016-03-16
7  2176754 2016-03-17
8  2184060 2016-03-10
9  2184060 2016-03-11
10 2184060 2016-03-12

第二个例子:

data_byday <- structure(list(ID = c(2156864, 2161034, 2161034, 2176442, 2176754, 2176754, 2176754, 2184060, 2184060, 2184060), Date = c("2016-03-02", "2016-06-06", "2016-06-07", "2016-02-04", "2016-03-15", "2016-03-16", "2016-03-17", "2016-03-10", "2016-03-11", "2016-03-12")), .Names = c("ID", "Date"), row.names = c(NA, 10L), class = "data.frame")


data_bylocation <- structure(list(ID = c(2156864, 2161034, 2161034, 2161034, 2161034, 2176442, 2176754, 2176754, 2176754, 2176754, 2184060, 2184060, 2184060, 2184060), department = c("L14B", "B51A", "B51A", "B51A", "B51A", "L14B", "B61A", "B61A", "B61A", "B61A", "B61A", "B61A", "B61A", "B61A"), BeginTime = c("2016-03-02 09:40:00.0000000", "2016-06-07 00:00:00.0000000", "2016-06-06 09:00:00.0000000", "2016-06-06 12:47:00.0000000", "2016-06-06 10:27:00.0000000", "2016-02-04 07:15:00.0000000", "2016-03-15 07:16:00.0000000", "2016-03-16 08:10:00.0000000", "2016-03-15 14:56:00.0000000", "2016-03-17 00:00:00.0000000", "2016-03-10 20:25:00.0000000", "2016-03-10 20:01:00.0000000", "2016-03-11 00:00:00.0000000", "2016-03-12 00:00:00.0000000"), EndTime = c("2016-03-02 15:20:00.0000000", "2016-06-07 11:02:00.0000000", "2016-06-06 10:27:00.0000000", "2016-06-07 00:00:00.0000000", "2016-06-06 12:47:00.0000000", "2016-02-04 13:47:00.0000000", "2016-03-15 14:56:00.0000000", "2016-03-17 00:00:00.0000000", "2016-03-16 08:10:00.0000000", "2016-03-17 11:18:00.0000000", "2016-03-11 00:00:00.0000000", "2016-03-10 20:25:00.0000000", "2016-03-12 00:00:00.0000000", "2016-03-12 14:00:00.0000000"), room = c(102, 109, 109, 109, 103, 101, 109, 109, 109, 109, 105, 105, 105, 105), Bed = c(3, 2, 2, 2, 3, 4, 3, 3, 2, 3, 2, 1, 2, 2)), .Names = c("ID", "department", "BeginTime", "EndTime", "room", "Bed"), row.names = c(NA, -14L), class = c("tbl_df", "tbl", "data.frame"))

2 个答案:

答案 0 :(得分:1)

以下是Collections::maxtidyverse的解决方案:

lubridate

使用library(dplyr) library(tidyr) library(lubridate) data_bylocation %>% mutate(Date = date(BeginTime)) %>% gather(variable, value, -ID, -Date) %>% group_by(ID, Date, temp = variable) %>% mutate(index = row_number(), variable = paste0(variable, index)) %>% ungroup() %>% select(-(temp:index)) %>% spread(variable, value) %>% select(ID, Date, c(6, 12, 9, 15, 3), c(6, 12, 9, 15, 3)+1, c(6, 12, 9, 15, 3)+2)

data.table

<强>结果:

library(data.table)
data = setDT(data_bylocation)

newDT = dcast(melt(data[, date := lubridate::date(BeginTime)],
                   id.vars = c("ID", "date"))[
                     ,.(variable2 = paste0(variable, 1:.N), value),
                     by = .(ID, date, variable)],
              ID + date ~ variable2, value.var = "value")

setcolorder(newDT, c(1:2, c(6, 12, 9, 15, 3), c(6, 12, 9, 15, 3)+1, c(6, 12, 9, 15, 3)+2))

注意:

我仍在尝试考虑重新排列列的更好方法。这是我现在能做的最好的事情。

答案 1 :(得分:0)

OP已要求将患者的位置按天制表,即使他们在特定地点停留的时间超过一天。

这需要将最长停留时间分成一天。这可以通过foverlaps()函数来完成。要从宽格式转换为长格式,可以使用data.table::dcast() 同时重塑多个列的功能

library(data.table)
# coerce time columns to POSIXct
cols <- c("BeginTime", "EndTime")
setDT(data_bylocation2)[, (cols) := lapply(.SD, as.POSIXct), .SDcols = cols]

# create sequence of days which cover the whole period
time_seq <- data_bylocation2[, seq(lubridate::floor_date(min(BeginTime), "day"),
                                   lubridate::ceiling_date(max(EndTime), "day"),
                                   by = "days")]
# chop longer stays into one day chunks
mDT <- foverlaps(data.table(Date = head(time_seq, -1L), end = tail(time_seq, -1L)),
                 setkey(data_bylocation2, BeginTime, EndTime), by.x = c("Date", "end"), nomatch = 0L)[
                   , (cols) := .(pmax(BeginTime, Date), pmin(EndTime, end))][
                     EndTime > BeginTime][
                     order(ID, Date, BeginTime)]

# reshape from long to wide form
val_cols <- c("BeginTime", "EndTime", "department", "room", "bed")
dcast(mDT, ID + Date ~ rowid(ID, Date), value.var = val_cols)[
  # reorder columns
  , setcolorder(.SD, c("ID", "Date", 
                       # create create column names in expected order using a cross join
                       mDT[, CJ(seq_len(max(rowid(ID, Date))), val_cols)[
                         , paste(V2, V1, sep = "_")]]))]
         ID       Date         BeginTime_1           EndTime_1 bed_1 department_1 room_1         BeginTime_2  EndTime_2 bed_2 department_2 room_2
 1: 2224003 2016-02-12 2016-02-12 08:00:00 2016-02-12 13:40:00     3         B12A    205 2016-02-12 13:40:00 2016-02-13     1         B12A    209
 2: 2224003 2016-02-13 2016-02-13 00:00:00 2016-02-14 00:00:00     1         B12A    209                <NA>       <NA>    NA           NA     NA
 3: 2224003 2016-02-14 2016-02-14 00:00:00 2016-02-15 00:00:00     1         B12A    209                <NA>       <NA>    NA           NA     NA
 4: 2224003 2016-02-15 2016-02-15 00:00:00 2016-02-15 16:17:00     1         B12A    209                <NA>       <NA>    NA           NA     NA
 5: 2248787 2016-04-20 2016-04-20 10:00:00 2016-04-21 00:00:00     2         B53A    306                <NA>       <NA>    NA           NA     NA
 6: 2248787 2016-04-21 2016-04-21 00:00:00 2016-04-22 00:00:00     2         B53A    306                <NA>       <NA>    NA           NA     NA
 7: 2248787 2016-04-22 2016-04-22 00:00:00 2016-04-23 00:00:00     2         B53A    306                <NA>       <NA>    NA           NA     NA
 8: 2248787 2016-04-23 2016-04-23 00:00:00 2016-04-24 00:00:00     2         B53A    306                <NA>       <NA>    NA           NA     NA
 9: 2248787 2016-04-24 2016-04-24 00:00:00 2016-04-24 11:47:00     2         B53A    306 2016-04-24 11:47:00 2016-04-25     4         B53A    311
10: 2248787 2016-04-25 2016-04-25 00:00:00 2016-04-26 00:00:00     4         B53A    311                <NA>       <NA>    NA           NA     NA
11: 2248787 2016-04-26 2016-04-26 00:00:00 2016-04-26 16:00:00     4         B53A    311                <NA>       <NA>    NA           NA     NA