分析在R中导入的多个.csv

时间:2015-12-05 20:01:15

标签: r csv

我使用

将3组数据导入R,这是一个.csv文件
MyData <- read.csv(file="C:/120315.csv", header=TRUE, sep=",")
MyData2 <- read.csv(file="C:/120415.csv", header=TRUE, sep=",")
MyData3 <- read.csv(file="C:/120515.csv", header=TRUE, sep=",")

.csv文件的原始数据格式如下。 &#34;最后&#34;将是&#34;关闭&#34;每个股票市场价格。

"Stock","Open","High","Low","Last","Vol"
"ABCD",".490","8.550","8.350","8.350","101,500"
"ASDFG","11.800","11.800","11.570","11.700","110,900"
"XCVXCV","22.430","22.600","22.340","22.600","9,314,100"
"BCVBCVB","4.380","4.390","4.380","4.390","122,000"
"FSDFSDF","8.850","8.850","8.850","8.850","200"

我如何使用R来分析这3个.csv文件,列出连续2天价格上涨的股票?

含义,&#34;最后&#34;会连续2天增加(例如第1天 - 5.5,第2天 - 5.8,第3天 - 5.9)。

2 个答案:

答案 0 :(得分:0)

你可以这样做:

mydata1 <- read.csv(header=T, text='"Stock","Open","High","Low","Last","Vol"
"ABCD",".490","8.550","8.350","8.350","101,500"')
mydata2 <- read.csv(header=T, text='"Stock","Open","High","Low","Last","Vol"
"ABCD",".490","8.550","8.350","9.350","101,500"')
mydata3 <- read.csv(header=T, text='"Stock","Open","High","Low","Last","Vol"
"ABCD",".490","8.550","8.350","10.350","101,500"')
mydata4 <- read.csv(header=T, text='"Stock","Open","High","Low","Last","Vol"
"ABCD",".490","8.550","8.350","1.350","101,500"')

(mydata <- do.call(rbind, mget(grep("^mydata\\d+", ls(), val=T))))
#         Stock Open High  Low  Last     Vol
# mydata1  ABCD 0.49 8.55 8.35  8.35 101,500
# mydata2  ABCD 0.49 8.55 8.35  9.35 101,500
# mydata3  ABCD 0.49 8.55 8.35 10.35 101,500
# mydata4  ABCD 0.49 8.55 8.35  1.35 101,500

lapply(split(mydata, mydata$Stock), function(df) {
  with(rle(diff(df$Last) > 0), any(lengths[values==TRUE] >= 2)) # increased 2 consecutive days?
})
# $ABCD
# [1] TRUE

答案 1 :(得分:0)

使用dplyr

这是一种很好的方法
library(dplyr)

file_names <- list.files(pattern = "\\.csv$")

read_file <- function(file) {
  df <- read.csv(file)
  df$day <- as.Date(file, '%m%d%y')
  df
}

file_names %>% 
  lapply(read_file) %>%
  rbind_all() %>% 
  group_by(Stock) %>% 
  mutate(
    two_in_a_row = Last > lag(Last, 2, order_by = day) & Last > lag(Last, 1, order_by = day)
  )