Question

假设我有一个csv文件，如下所示：

int NthOccurrence(int year, int month, int n, int dayOfWeek) {
  // year is the current year (eg. 2015)
  // month is the target month (January == 1...December == 12)
  // Finds the date of the nth dayOfWeek (Sun == 0...Sat == 6)

  // Adjust month and year
  if (month < 3) { --year, month += 12; }
  // The gregorian calendar is a 400-year cycle
  year = year % 400;
  // There are no leap years in years 100, 200 and 300 of the cycle.
  int century = year / 100;
  int leaps = year / 4 - century;
  // A normal year is 52 weeks and 1 day, so the calendar advances one day.
  // In a leap year, it advances two days.
  int advances = year + leaps;
  // This is either magic or carefully contrived,
  // depending on how you look at it:
  int month_offset = (13 * (month + 1)) / 5;
  // From which, we can compute the day of week of the first of the month:
  int first = (month_offset + advances) % 7;
  // If the dayOfWeek we're looking for is at least the day we just
  // computed, we just add the difference. Otherwise, we need to add 7.
  // Then we just add the desired number of weeks.
  int offset = dayOfWeek - first;
  if (offset < 0) offset += 7;
  return 1 + offset + (n - 1) * 7;
}

期望的输出应该是：

Type,ID,NAME,CONTENT,RESPONSE,GRADE,SOURCE
A,3,"","I have comma, ha!",I have open double quotes",A,""

我尝试使用df <- data.frame(Type='A',ID=3, NAME=NA, CONTENT='I have comma, ha!', RESPONSE='I have open double quotes\"', GRADE=A, SOURCE=NA) df Type ID NAME CONTENT RESPONSE GRADE SOURCE 1 A 3 NA I have comma, ha! I have open double quotes" A NA，因为数据提供程序使用quote来转义字符串中的逗号，但是他们忘记在没有逗号的字符串中转义双引号，所以无论我是否禁用{{1中的引号我不会得到理想的输出。

我怎样才能在R中这样做？其他包装解决方案也欢迎。

Answer 1

来自fread的

data.table处理此问题就好了：

library(data.table)

fread('Type,ID,NAME,CONTENT,RESPONSE,GRADE,SOURCE
A,3,"","I have comma, ha!",I have open double quotes",A,""')
#   Type ID NAME           CONTENT                   RESPONSE GRADE SOURCE
#1:    A  3      I have comma, ha! I have open double quotes"     A

Answer 2

这不是有效的CSV，因此您必须自己进行解析。但是，假设惯例如下，您可以使用scan切换以利用其大部分功能：

如果字段以引号开头，则引用该字段。
如果该字段不以引号开头，则为原始

next_field<-function(stream) {
  p<-seek(stream)
  d<-readChar(stream,1)
  seek(stream,p)
  if(d=="\"")    
    field<-scan(stream,"",1,sep=",",quote="\"",blank=FALSE)   
  else
    field<-scan(stream,"",1,sep=",",quote="",blank=FALSE)
  return(field)
}

假设上述惯例，这足以解析如下

s<-file("example.csv",open="rt")
header<-readLines(s,1)
header<-scan(what="",text=header,sep=",")
line<-replicate(length(header),next_field(s))

setNames(as.data.frame(lapply(line,type.convert)),header)

  Type ID NAME           CONTENT                   RESPONSE GRADE SOURCE
1    A  3   NA I have comma, ha! I have open double quotes"     A     NA

但是，在实践中，您可能希望先将字段写回来，将每个字段引用到另一个文件，这样您就可以read.csv更正格式。

Answer 3

我对CSV文件的结构不太确定，但您说作者已在内容中的文本中删除了逗号。

这适用于阅读文本，最后是"。

read.csv2("Test.csv", header = T,sep = ",", quote="")

用双引号读取R中的csv文件

3 个答案: