假设我有一个csv文件,如下所示:
int NthOccurrence(int year, int month, int n, int dayOfWeek) {
// year is the current year (eg. 2015)
// month is the target month (January == 1...December == 12)
// Finds the date of the nth dayOfWeek (Sun == 0...Sat == 6)
// Adjust month and year
if (month < 3) { --year, month += 12; }
// The gregorian calendar is a 400-year cycle
year = year % 400;
// There are no leap years in years 100, 200 and 300 of the cycle.
int century = year / 100;
int leaps = year / 4 - century;
// A normal year is 52 weeks and 1 day, so the calendar advances one day.
// In a leap year, it advances two days.
int advances = year + leaps;
// This is either magic or carefully contrived,
// depending on how you look at it:
int month_offset = (13 * (month + 1)) / 5;
// From which, we can compute the day of week of the first of the month:
int first = (month_offset + advances) % 7;
// If the dayOfWeek we're looking for is at least the day we just
// computed, we just add the difference. Otherwise, we need to add 7.
// Then we just add the desired number of weeks.
int offset = dayOfWeek - first;
if (offset < 0) offset += 7;
return 1 + offset + (n - 1) * 7;
}
期望的输出应该是:
Type,ID,NAME,CONTENT,RESPONSE,GRADE,SOURCE
A,3,"","I have comma, ha!",I have open double quotes",A,""
我尝试使用df <- data.frame(Type='A',ID=3, NAME=NA, CONTENT='I have comma, ha!',
RESPONSE='I have open double quotes\"', GRADE=A, SOURCE=NA)
df
Type ID NAME CONTENT RESPONSE GRADE SOURCE
1 A 3 NA I have comma, ha! I have open double quotes" A NA
,因为数据提供程序使用quote来转义字符串中的逗号,但是他们忘记在没有逗号的字符串中转义双引号,所以无论我是否禁用{{1中的引号我不会得到理想的输出。
我怎样才能在R中这样做?其他包装解决方案也欢迎。
答案 0 :(得分:8)
fread
的 data.table
处理此问题就好了:
library(data.table)
fread('Type,ID,NAME,CONTENT,RESPONSE,GRADE,SOURCE
A,3,"","I have comma, ha!",I have open double quotes",A,""')
# Type ID NAME CONTENT RESPONSE GRADE SOURCE
#1: A 3 I have comma, ha! I have open double quotes" A
答案 1 :(得分:2)
这不是有效的CSV,因此您必须自己进行解析。但是,假设惯例如下,您可以使用scan
切换以利用其大部分功能:
next_field<-function(stream) {
p<-seek(stream)
d<-readChar(stream,1)
seek(stream,p)
if(d=="\"")
field<-scan(stream,"",1,sep=",",quote="\"",blank=FALSE)
else
field<-scan(stream,"",1,sep=",",quote="",blank=FALSE)
return(field)
}
假设上述惯例,这足以解析如下
s<-file("example.csv",open="rt")
header<-readLines(s,1)
header<-scan(what="",text=header,sep=",")
line<-replicate(length(header),next_field(s))
setNames(as.data.frame(lapply(line,type.convert)),header)
Type ID NAME CONTENT RESPONSE GRADE SOURCE 1 A 3 NA I have comma, ha! I have open double quotes" A NA
但是,在实践中,您可能希望先将字段写回来,将每个字段引用到另一个文件,这样您就可以read.csv
更正格式。
答案 2 :(得分:1)
我对CSV文件的结构不太确定,但您说作者已在内容中的文本中删除了逗号。
这适用于阅读文本,最后是"
。
read.csv2("Test.csv", header = T,sep = ",", quote="")