如何格式化csv文件以便R可以正确读取它

时间:2015-07-03 13:30:28

标签: r csv

所以我有一个csv文件,它是由一些基于java的代码(处理)即时创建的。问题是当我尝试在R中加载它时,它在开头添加一个列似乎没有理由,然后在中间留下一个填充NA的列。

这是csv文件的样子。

 x1,x2,y1,y2,angle,size1,size2,distance1,distance2
 400.0,1100.0,500.0,500.0,0.0,,0.0,0.0,-100.0,600.0

现在问题是,我试图在开放式办公室打开它只是为了咯咯笑,它开得很好。 enter image description here

现在在R中使用read.csv()它会打开它:enter image description here

所以我认为从调查开始的最佳位置是文件的创建位置。

以下是处理代码:

out.println("x1,"+ "x2," + "y1," + "y2," + "angle," + "size1," + "size2," + "distance1," + "distance2");
for (int i = 0; i < directions; i++)
{
  //extraneous code skipped
  String output =  pointX + "," +  point2X + "," +  pointY + "," + point2Y + "," + (double)angle + "," +  "," + size1 + "," + size2 + "," +  distance + "," + distance2;
  out.println(output);
}

无论如何,我可以使用一些提示来解决问题或解决建议。

2 个答案:

答案 0 :(得分:2)

如果我们计算字段,我们会看到有9个标题列但有10个数据列,所以它假设额外数据列是第一列,第一列表示行名称。

count.fields(textConnection(Lines), sep = ",")
[1]  9 10

要解决此问题,请跳过标题并在删除额外列6时读取数据。然后读入标题行并将标题应用于数据框。

# test data
Lines <- "x1,x2,y1,y2,angle,size1,size2,distance1,distance2
 400.0,1100.0,500.0,500.0,0.0,,0.0,0.0,-100.0,600.0"


DF <- read.table(text = Lines, skip = 1, sep = ",")[-6]
names(DF) <- unlist(read.table(text = Lines, nrows = 1, sep = ","))

我们已经使用text = Lines来保持这种自包含状态,但当然,您会改为使用类似file = "myfile.csv"的内容。

答案 1 :(得分:1)

正如评论中已经解释的那样,你输入的是一个双逗号(,,):

cat 'wrong.csv'
x1,x2,y1,y2,angle,size1,size2,distance1,distance2
 400.0,1100.0,500.0,500.0,0.0,,0.0,0.0,-100.0,600.0

删除它可以解决问题:

cat 'right.csv'
x1,x2,y1,y2,angle,size1,size2,distance1,distance2
 400.0,1100.0,500.0,500.0,0.0,0.0,0.0,-100.0,600.0

在这里你可以看到差异:

Rscript -e 'read.csv("wrong.csv");read.csv("right.csv")'
        x1  x2  y1 y2 angle size1 size2 distance1 distance2
400.0 1100 500 500  0    NA     0     0      -100       600
   x1   x2  y1  y2 angle size1 size2 distance1 distance2
1 400 1100 500 500     0     0     0      -100       600

原因是R,,视为没有值的列。由于不清楚这是character,因此不会将其解释为空字符串(""),而是将其解释为缺失值(NA)。 由于这种方式您的输入比标题多一个数据列,read.csv会将第一列解释为结果data.frame的行名称。 因此,您不会收到错误但会出现意外输出。 通过修改列号,R了解第1列实际上是x1,依此类推。

相关问题