R:将每一行与前一行进行比较并保持得分

时间:2016-03-03 12:27:41

标签: r machine-learning quantitative-finance

我的data.frame看起来像这样:

> head(df,10)
              DateTime   BP1 BQ1   BP2 BQ2   BP3 BQ3   BP4 BQ4   BP5 BQ5
1  2015-09-16 09:15:01 70730   1     0   0     0   0     0   0     0   0
2  2015-09-16 09:15:01 70735   1 70730   1 70285   1 70185   1     0   0
3  2015-09-16 09:15:01 70905   1 70735   3 70730   1     0   0     0   0
4  2015-09-16 09:15:01 70910   1 70905   1 70735   2 70730   1     0   0
5  2015-09-16 09:15:03 70905   1 70900   1 70730   1 70230   1 70220   1
6  2015-09-16 09:15:06 70910   1 70905   2 70900   1 70795   1 70730   1
7  2015-09-16 09:15:06 70905   2 70900   1 70795   1 70730   1 70220   1
8  2015-09-16 09:15:06 70910   1 70905   2 70900   1 70795   1 70730   1
9  2015-09-16 09:15:07 70915   1 70910   1 70905   1 70900   1 70795   1
10 2015-09-16 09:15:07 71000   1 70915   1 70905   1 70785   1 70730   1

BP = BidPrice和BQ = BidQty

我的目标是将每一行与前一行进行比较,看看是否有任何修改/添加/取消,并根据它给它一个分数并添加它们。

就像你可以看到df [1,2] = df [2,4]这样,因为价格下降了1级,所以会得分为-1。

然后,df [4,4] = df [5,2]因此,由于价格上涨一个等级,因此会得到+1的分数。

依旧...... 对每一行执行此操作,然后对该行的分数求和。因此输出应该是一个矢量或具有正整数和负整数的data.frame。

到目前为止我所拥有的东西很少,而且我已经绞尽脑汁待了好几个小时,而且还没有到任何地方。我想我最终得到了一个永无止境的循环:我的代码如下:

mod<- function(file, level = 5){
  whole_data<- read.csv(file = file,header = FALSE,sep = "", col.names = c("DateTime","Seq","BP1","BQ1","BO1","AP1","AQ1","AO1","BP2","BQ2","BO2","AP2","AQ2","AO2","BP3","BQ3","BO3","AP3","AQ3","AO3","BP4","BQ4","BO4","AP4","AQ4","AO4","BP5","BQ5","BO5","AP5","AQ5","AO5","BP6","BQ6","BO6","AP6","AQ6","AO6","BP7","BQ7","BO7","AP7","AQ7","AO7","BP8","BQ8","BO8","AP8","AQ8","AO8","BP9","BQ9","BO9","AP9","AQ9","AO9","BP10","BQ10","BO10","AP10","AQ10","AO10"), colClasses = c(NA, rep("integer",31), rep("NULL", 30)))
  whole_data<- whole_data[which(whole_data$DateTime != 0),]
  whole_data$DateTime= as.POSIXct(whole_data$DateTime/(10^9), origin="1970-01-01")    #timestamp conversion 
  df<- data.frame(DateTime= whole_data$DateTime, BP1 = whole_data$BP1, BQ1=whole_data$BQ1, BP2 = whole_data$BP2, BQ2=whole_data$BQ2, BP3 = whole_data$BP3, BQ3=whole_data$BQ3, BP4 = whole_data$BP4, BQ4=whole_data$BQ4, BP5 = whole_data$BP5, BQ5=whole_data$BQ5)
  v<- NULL
for(i in 1:nrow(df)){
  for(j in seq_along(c(2,4,6,8,10)))
  if(level == 5){
    x <- df[i+1,j]==df[i,c(2,4,6,8,10)]
    y<- which(x==TRUE)
    v[i]<- c(v,(y-(j-(j-1))*-1))
    i=i+1
  }
  v
}
}

对于具有定量财务背景的任何人,我试图通过捕获订单修改,取消和新订单来分析电子订单。这只是我尝试做的许多事情的第一步,但任何指针/评论/帮助我肯定会走很长的路,我将不胜感激。

最快的方法是什么?对于循环?我真的迷失了我应该如何去做!任何能指向正确方向的东西都将不胜感激。

如果需要任何其他信息,请告诉我。

0 个答案:

没有答案