我有一个大的data.frame(TOTAL
),其中包含一些值(cols11-16),我需要减去一个基数,根据TOTAL
中的两个条件,该值乘以一个值。
data.frame(TOTAL
)看起来有点像这样
Channel Hour Category cols11 cols12 cols13 cols14 cols15 base
TV1 01:00:00 New 2 5 4 5 6 2.4
TV5 23:00:00 Old 1 5 3 9 7 1.8
TV1 02:00:00 New 8 7 9 2 4 5.4
有4个不同的频道,24个不同的小时(00:00:00-23:00:00
)
我有四个其他带有条件变量的向量,需要在基数上乘以取决于小时和通道,所以对于每个通道,我有一个这样的向量:
TV1Slope:
TV1Slope00 TV1Slope01 TV1Slope02.. TV1Slope23
0.0012 0.0015 0.013 0.0009
TV5Slope:
TV5Slope00 TV5Slope01 TV5Slope02.. TV5Slope23
0.0032 0.0023 0.016 0.002
TOTAL$Uplift0 <- (TOTAL$cols11 - TOTAL$base * conditionedvariable)
TOTAL$Uplift1 <- (TOTAL$cols12 - TOTAL$base * conditionedvariable)
TOTAL$Uplift2 <- (TOTAL$cols13 - TOTAL$base * conditionedvariable)
TOTAL$Uplift3 <- (TOTAL$cols14 - TOTAL$base * conditionedvariable)
TOTAL$Uplift4 <- (TOTAL$cols15 - TOTAL$base * conditionedvariable)
如何让R根据条件选择条件变量?
例如:
对于TOTAL$Uplift0
,我会得到:
cols11 - base * conditionedvariable
对于Channel为TV1且小时为01:00:00: 2 - 2.4 *0.0015
的第一行
对于Channel为TV5且小时为23:00:00: 1 - 1.8 *0.002
的第二行
对于Channel为TV1且小时为02:00:00: 8 - 5.4 *0.013
答案 0 :(得分:1)
我们paste
&#39;频道&#39;和{&lt; 39;小时&#39;列一起(&#39; nm1&#39;),连接&#39; TV1Slope&#39;和&#39; TV5Slope&#39;向量(&#39; TV15&#39;),substring
&#39; nm1向量,名称为&#39; TV15&#39;删除&#39; Slope&#39;子串与match
,并得到相应的&#39; TV15&#39;值。将名称以&#39; cols&#39;开头的列子集。使用sub
,进行计算,并将其分配给新列(&#39; nm2&#39;)。
grep
注意:创建可重现的TV1Slope&#39;和&#39; TV5Slope&#39;实例
nm1 <- with(TOTAL, paste0(Channel, substr(Hour, 1,2)))
TV15 <- c(TV1Slope, TV5Slope)
val <- TV15[match(nm1, sub('Slope', '', names(TV15)))]
indx <- grep('^cols', names(TOTAL))
nm2 <- paste0('Uplift',seq_along(indx)-1)
TOTAL[nm2] <- TOTAL[indx]-(TOTAL$base*val)
TOTAL
# Channel Hour Category cols11 cols12 cols13 cols14 cols15 base Uplift0
#1 TV1 01:00:00 New 2 5 4 5 6 2.4 1.9946026
#2 TV5 23:00:00 Old 1 5 3 9 7 1.8 0.9823184
#3 TV1 02:00:00 New 8 7 9 2 4 5.4 7.9619720
# Uplift1 Uplift2 Uplift3 Uplift4
#1 4.994603 3.994603 4.994603 5.994603
#2 4.982318 2.982318 8.982318 6.982318
#3 6.961972 8.961972 1.961972 3.961972