根据一列排列数据帧,消除不必要的响应

时间:2018-07-19 11:27:44

标签: r dataframe dplyr rstudio sequence

我有这个数据

       date                                           signal 
1   2009-01-13 09:55:00  4645.00  4838.931  5358.883  Buy2
2   2009-01-14 09:55:00  4767.50  4718.254  5336.703  Buy1
3   2009-01-15 09:55:00  4485.00  4653.316  5274.384  Buy2
4   2009-01-16 09:55:00  4580.00  4537.693  5141.435  Buy1
5   2009-01-19 09:55:00  4532.00  4548.088  4891.041  Buy2
6   2009-01-27 09:55:00  4190.00  4183.503  4548.497  Buy1
7   2009-01-30 09:55:00  4436.00  4155.236  4377.907 Sell1
8   2009-02-02 09:55:00  4217.00  4152.626  4390.802 Sell2
9   2009-02-09 09:55:00  4469.00  4203.437  4376.277 Sell1
10  2009-02-12 09:55:00  4469.90  4220.845  4503.798 Sell2
11  2009-02-13 09:55:00  4553.00  4261.980  4529.777 Sell1
12  2009-02-16 09:55:00  4347.20  4319.656  4564.387 Sell2
13  2009-02-17 09:55:00  4161.05  4371.474  4548.912  Buy2
14  2009-02-27 09:55:00  3875.55  3862.085  4101.929  Buy1
15  2009-03-02 09:55:00  3636.00  3846.423  4036.020  Buy2
16  2009-03-12 09:55:00  3420.00  3372.665  3734.949  Buy1
17  2009-03-13 09:55:00  3656.00  3372.100  3605.357 Sell1
18  2009-03-17 09:55:00  3650.00  3360.421  3663.322 Sell2
19  2009-03-18 09:55:00  3721.00  3363.735  3682.293 Sell1
20  2009-03-20 09:55:00  3687.00  3440.651  3784.778 Sell2

并且必须以这种形式安排

2   2009-01-14 09:55:00  4767.50  4718.254  5336.703  Buy1
7   2009-01-30 09:55:00  4436.00  4155.236  4377.907 Sell1
8   2009-02-02 09:55:00  4217.00  4152.626  4390.802 Sell2
13  2009-02-17 09:55:00  4161.05  4371.474  4548.912  Buy2
14  2009-02-27 09:55:00  3875.55  3862.085  4101.929  Buy1
17  2009-03-13 09:55:00  3656.00  3372.100  3605.357 Sell1
18  2009-03-17 09:55:00  3650.00  3360.421  3663.322 Sell2

使数据按Buy1 Sell1 Sell2 Buy2的顺序排列,并消除中间的观察值。 我已经尝试了几个dplyr:filter命令,但是没有一个给出期望的输出。

3 个答案:

答案 0 :(得分:0)

如果我对您的问题有很好的了解,则以下代码可以解决该问题。改编自this discussion

想法是将序列定义为模式:

pattern <- c("Buy1", "Sell1", "Sell2", "Buy2")

然后在您的列中找到该模式的位置:

library(zoo)
 pos <- which(rollapply(data$signal, 4, identical, pattern, fill = FALSE, align = "left")) 

并提取模式位置之后的行:

rows <- unlist(lapply(pos, function(x, n) seq(x, x+n-1), 4))
data_filtered <- data[rows,]

Voilà。

编辑

由于我误解了您的问题,所以这里有一个新的解决方案。 您想在列中检索序列“ Buy1”,“ Sell1”,“ Sell2”,“ Buy2”,并消除不适合该序列的观察值。我没有看到简单的矢量化解决方案,因此这里有一个循环来解决该问题。根据数据的大小,您可能需要在RCPP中实现类似的算法或以某种方式对其进行矢量化。

sequence <- c("Buy1", "Sell1", "Sell2", "Buy2")
keep <- logical(length(data$signal))

s <- 0
for (i in seq(1, length(data$signal))){
    if (sequence[s +1] == data$signal[i]){
        keep[i] <- T
        s <- (s + 1) %% 4
    } else {
        keep[i] <- F
    }
}

data_filtered <- data[keep,]

如果效果更好,请告诉我。 如果有人有矢量化的解决方案,我会很好奇。

答案 1 :(得分:0)

您可以将列data $ signal强制为一个因子并定义级别。

data$signal <- as.factor(data.$signal, levels = c("Buy1","Sell1","Buy2","Sell2")

然后您可以对其进行排序

sorted.data <- data[order(signal),]

这是一个很好的答案,它说明了您想做什么:

Sort data frame column by factor

答案 2 :(得分:0)

这是一个my_text = visual.TextStim(win, pos=[0.5,0]) 解决方案:

Rcpp

这里是library(Rcpp) cppFunction('LogicalVector FindHit(const CharacterVector x, const CharacterVector y) { LogicalVector res(x.size()); int k = 0; for(int i = 0; i < x.size(); i++){ if(x[i] == y[k]){ res[i] = true; k = (k + 1) % y.size(); } } return res; }') dtt[FindHit(dtt$V6, c('Buy1', 'Sell1', 'Sell2', 'Buy2')),] # V1 V2 V3 V4 V5 V6 # 2 2009-01-14 09:55:00 4767.50 4718.254 5336.703 Buy1 # 7 2009-01-30 09:55:00 4436.00 4155.236 4377.907 Sell1 # 8 2009-02-02 09:55:00 4217.00 4152.626 4390.802 Sell2 # 13 2009-02-17 09:55:00 4161.05 4371.474 4548.912 Buy2 # 14 2009-02-27 09:55:00 3875.55 3862.085 4101.929 Buy1 # 17 2009-03-13 09:55:00 3656.00 3372.100 3605.357 Sell1 # 18 2009-03-17 09:55:00 3650.00 3360.421 3663.322 Sell2

dtt
相关问题