Question

我从1980年开始在csv文件中有每日数据。但我想仅从1985年开始读取数据。因为另一个文件中的其他数据集始于1985年。如何在R语言中跳过1985年之前的数据？

Answer 1

我想你想看看?read.csv以查看所有选项。

如果没有看到您的数据样本，就很难给出确切的答案。

如果您的数据没有标题，并且您知道1985年数据的哪一行开始，您可以使用类似......

impordata <- read.csv(file,skip=1825)

...跳过前1825行。

否则，如果您的数据中包含年份变量，则只需在导入数据后对数据进行子集化。

impordata <- read.csv("skiplines.csv")
impordata <- subset(impordata,year>=1985)

如果您不知道1985年数据的起始位置，您可以使用grep在文件的日期变量中查找1985的第一个实例，然后只保留该行的开头：

impordata <- read.csv("skiplines.csv")
impordata <- impordata[min(grep(1985,impordata$date)):nrow(impordata),]

Answer 2

以下是一些替代方案。（您可能希望之后将第一列转换为"Date"类，并可能将整个事物转换为动物园对象或其他时间序列类对象。）

# create test data
fn <- tempfile()
dd <- seq(as.Date("1980-01-01"), as.Date("1989-12-31"), by = "day")
DF <- data.frame(Date = dd, Value = seq_along(dd))
write.table(DF, file = fn, row.names = FALSE)

read.table + subset

# if file is small enough to fit in memory try this:

DF2 <- read.table(fn, header = TRUE, as.is = TRUE)
DF2 <- subset(DF2, Date >= "1985-01-01")

<强> read.zoo

# or this which produces a zoo object and also automatically converts the 
# Date column to Date class.  Note that all columns other than the Date column
# should be numeric for it to be representable as a zoo object.
library(zoo)
z <- read.zoo(fn, header = TRUE)
zw <- window(z, start = "1985-01-01")

如果您的数据格式与示例的格式不同，则需要使用read.zoo的其他参数。

多个read.table's

# if the data is very large read 1st row (DF.row1) and 1st column (DF.Date)
# and use those to set col.names= and skip=

DF.row1 <- read.table(fn, header = TRUE, nrow = 1)
nc <- ncol(DF.row1)
DF.Date <- read.table(fn, header = TRUE, as.is = TRUE, 
   colClasses = c(NA, rep("NULL", nc - 1)))
n1985 <- which.max(DF.Date$Date >= "1985-01-01")

DF3 <- read.table(fn, col.names = names(DF.row1), skip = n1985, as.is = TRUE)

<强> sqldf

# this is probably the easiest if data set is large.

library(sqldf)
DF4 <- read.csv.sql(fn, sql = 'select * from file where Date >= "1985-01-01"')

Answer 3

一种data.table方法，它将提供速度和内存性能：

library(data.table)
fread(file, skip = 1825)

从特定行读取csv

3 个答案: