用双引号读取R中的csv文件

时间:2015-08-19 19:04:50

标签: r csv

假设我有一个csv文件,如下所示:

int NthOccurrence(int year, int month, int n, int dayOfWeek) {
  // year is the current year (eg. 2015)
  // month is the target month (January == 1...December == 12)
  // Finds the date of the nth dayOfWeek (Sun == 0...Sat == 6)

  // Adjust month and year
  if (month < 3) { --year, month += 12; }
  // The gregorian calendar is a 400-year cycle
  year = year % 400;
  // There are no leap years in years 100, 200 and 300 of the cycle.
  int century = year / 100;
  int leaps = year / 4 - century;
  // A normal year is 52 weeks and 1 day, so the calendar advances one day.
  // In a leap year, it advances two days.
  int advances = year + leaps;
  // This is either magic or carefully contrived,
  // depending on how you look at it:
  int month_offset = (13 * (month + 1)) / 5;
  // From which, we can compute the day of week of the first of the month:
  int first = (month_offset + advances) % 7;
  // If the dayOfWeek we're looking for is at least the day we just
  // computed, we just add the difference. Otherwise, we need to add 7.
  // Then we just add the desired number of weeks.
  int offset = dayOfWeek - first;
  if (offset < 0) offset += 7;
  return 1 + offset + (n - 1) * 7;
}

期望的输出应该是:

Type,ID,NAME,CONTENT,RESPONSE,GRADE,SOURCE
A,3,"","I have comma, ha!",I have open double quotes",A,""

我尝试使用df <- data.frame(Type='A',ID=3, NAME=NA, CONTENT='I have comma, ha!', RESPONSE='I have open double quotes\"', GRADE=A, SOURCE=NA) df Type ID NAME CONTENT RESPONSE GRADE SOURCE 1 A 3 NA I have comma, ha! I have open double quotes" A NA ,因为数据提供程序使用quote来转义字符串中的逗号,但是他们忘记在没有逗号的字符串中转义双引号,所以无论我是否禁用{{1中的引号我不会得到理想的输出。

我怎样才能在R中这样做?其他包装解决方案也欢迎。

3 个答案:

答案 0 :(得分:8)

来自fread

data.table处理此问题就好了:

library(data.table)

fread('Type,ID,NAME,CONTENT,RESPONSE,GRADE,SOURCE
A,3,"","I have comma, ha!",I have open double quotes",A,""')
#   Type ID NAME           CONTENT                   RESPONSE GRADE SOURCE
#1:    A  3      I have comma, ha! I have open double quotes"     A       

答案 1 :(得分:2)

这不是有效的CSV,因此您必须自己进行解析。但是,假设惯例如下,您可以使用scan切换以利用其大部分功能:

  1. 如果字段以引号开头,则引用该字段。
  2. 如果该字段不以引号开头,则为原始
  3. next_field<-function(stream) {
      p<-seek(stream)
      d<-readChar(stream,1)
      seek(stream,p)
      if(d=="\"")    
        field<-scan(stream,"",1,sep=",",quote="\"",blank=FALSE)   
      else
        field<-scan(stream,"",1,sep=",",quote="",blank=FALSE)
      return(field)
    }
    

    假设上述惯例,这足以解析如下

    s<-file("example.csv",open="rt")
    header<-readLines(s,1)
    header<-scan(what="",text=header,sep=",")
    line<-replicate(length(header),next_field(s))
    
    setNames(as.data.frame(lapply(line,type.convert)),header)
    
      Type ID NAME           CONTENT                   RESPONSE GRADE SOURCE
    1    A  3   NA I have comma, ha! I have open double quotes"     A     NA
    

    但是,在实践中,您可能希望先将字段写回来,将每个字段引用到另一个文件,这样您就可以read.csv更正格式。

答案 2 :(得分:1)

我对CSV文件的结构不太确定,但您说作者已在内容中的文本中删除了逗号。

这适用于阅读文本,最后是"

read.csv2("Test.csv", header = T,sep = ",", quote="")