对不起,我是R的新手,非常感谢对此有所帮助。我正在尝试根据时间合并以下两个数据框(labourproductivity和Depressiondframe):
Time LabourProductivity
1 2004 Q1 96.6
2 Q2 96.9
3 Q3 96.9
4 Q4 97.1
5 2005 Q1 97.6
6 Q2 99.0
和
Time DepressionCount
1 2004 875
2 2004.25 820
3 2004.5 785
4 2004.75 857
5 2005 844
6 2005.25 841
由于它们的时间值都不同,我不知道如何合并它们。理想情况下它看起来像:
Time DepressionCount LabourProductivity
1 2004 875 96.6
2 2004 820 96.9
3 2004 785 96.9
4 2004 857 97.1
5 2005 844 97.6
6 2005 841 99.0
答案 0 :(得分:1)
如果“df1”和“df2”是第一个和第二个数据集,则根据“df1”的“时间”列创建分组索引(“indx”)。使用ave
和as.yearqtr
library(zoo)
indx <- cumsum(grepl('^\\d+', df1$Time))
df1$Time <- with(df1, as.numeric(ave(Time, indx, FUN= function(x) {
x[-1] <- paste (sub(' .*', '', x[1]), x[-1])
as.yearqtr(x) })))
merge
数据集,transform
“时间”列(如果需要)
transform(merge(df1, df2), Time=trunc(Time))
# Time LabourProductivity DepressionCount
#1 2004 96.6 875
#2 2004 96.9 820
#3 2004 96.9 785
#4 2004 97.1 857
#5 2005 97.6 844
#6 2005 99.0 841
或使用data.table
library(data.table)
setDT(df1)[, TimeN:= as.numeric(as.yearqtr(c(Time[1L],
paste(sub(' .*', '', Time[1L]), Time[-1L])))),
list(Grp=cumsum(grepl('^\\d+', Time)))][,
Time:= TimeN][, TimeN:=NULL][]
setkey(df1, Time)[df2][, Time:=trunc(Time)][]
# Time LabourProductivity DepressionCount
#1: 2004 96.6 875
#2: 2004 96.9 820
#3: 2004 96.9 785
#4: 2004 97.1 857
#5: 2005 97.6 844
#6: 2005 99.0 841
df1 <- structure(list(Time = c("2004 Q1", "Q2", "Q3", "Q4", "2005 Q1",
"Q2"), LabourProductivity = c(96.6, 96.9, 96.9, 97.1, 97.6, 99
)), .Names = c("Time", "LabourProductivity"), class = "data.frame",
row.names = c("1", "2", "3", "4", "5", "6"))
df2 <- structure(list(Time = c(2004, 2004.25, 2004.5, 2004.75, 2005,
2005.25), DepressionCount = c(875L, 820L, 785L, 857L, 844L, 841L
)), .Names = c("Time", "DepressionCount"), class = "data.frame",
row.names = c("1", "2", "3", "4", "5", "6"))