Question

我在read.table（）中遇到错误：

data <- read.table(file, header=T, stringsAsFactors=F, sep="@")
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  : 
  line 160 did not have 28 elements

我检查了第160行，它确实有28个元素（它有27个@符号）。

我检查了所有30242行，有816534个@符号，每行27个，所以我很确定每一行都有28个元素。我还检查了文件以确认除了作为分隔符之外的其他地方没有@符号。

有没有人知道这里发生了什么？

编辑：文件的第160行

158 @精神状态：1。整体临床症状@ MD @ S @ 2002 @ CMP-005 @ 02 @ 20.67 @ 23.58 @氯氮平与精神分裂症的典型精神抑制药物@ IV @ 4.47 @ 02 @SENSITIVITY ANALYSIS - CHINESE TRIALS @ CD000059 @ 6.94 @固定@ 16 @ 5 @ 2 @ @ @中国试验@YES @ Xia 2002（CPZ）@ STD-Xia-2002-_x0028_CPZ_x0029_ @ 579 @ 566 @ 40

edit2：文件的第161行

159 @手术时间（分钟）@ MD @ Y @ 1995 @ CMP-001 @ 01 @ 59.0 @ 47.0 @Gamma和其他头部软骨髓内钉与成人囊外髋部骨折的髓外植入物@ IV @ 23.9 @ 01 @摘要：股骨钉（所有类型）与滑动髋螺钉（SHS）@ CD000093 @ 13.3 @ Random @ 12 @ 1 @ 1 @ 53 @Gamma nail @ YES @ O'Brien 1995 @ STD-O_x0027_Brien-1995 @ 958 @ 941 @ 49

Answer 1

我认为问题是有一个需要被quote参数识别的换行符。我们来看看。

txt <- c(
    "158@Mental state: 1. Overall clinical symptoms@MD@S@2002@CMP-005@02@20.67@23.58@Clozapine versus typical neuroleptic medication for schizophrenia@IV@4.47@02@SENSITIVITY ANALYSIS - CHINESE TRIALS@CD000059@6.94@Fixed@16@5@2@45@Chinese trials@YES@Xia 2002 (CPZ)@STD-Xia-2002-_x0028_CPZ_x0029_@579@566@40", 
    "159@Length of surgery (minutes)@MD@Y@1995@CMP-001@01@59.0@47.0@Gamma and other cephalocondylic intramedullary nails versus extramedullary implants for extracapsular hip fractures in adults@IV@23.9@01@Summary: Femoral nail (all types) versus sliding hip screw (SHS)@CD000093@13.3@Random@12@1@1@53@Gamma nail@YES@O'Brien 1995@STD-O_x0027_Brien-1995@958@941@49"
)

我们可以使用count.fields()预览文件中的字段长度。使用正常的sep = "@"而不是其他任何内容，我们会在行之间获得NA，并且计数不正确

count.fields(textConnection(txt), sep = "@")
# [1] 28 NA 24

但是当我们在quote中识别换行符时，它会返回正确的长度

count.fields(textConnection(txt), sep = "@", quote = "\n")
# [1] 28 28

因此，我建议您在quote = "\n"电话中添加read.table，看看是否能解决问题。它为我做了

read.table(text = txt, sep = "@")
# [1] V1  V2  V3  V4  V5  V6  V7  V8  V9  V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22 V23 V24 V25 V26 V27 V28
# <0 rows> (or 0-length row.names)

df <- read.table(text = txt, sep = "@", quote = "\n")
dim(df)
# [1]  2 28
anyNA(df)
# [1] FALSE

Answer 2

我有同样的问题。这个答案有所帮助，但引用=＆＃34; \ n＆＃34; 只能达到一定程度。文件中有一个元素＆＃34; 作为字符，因此我必须使用引用的默认值。我在其中一个元素中也有＃，因此我必须使用 comment.char =＆＃34;＆＃34; 。 read.table（）的帮助在几个地方引用了 scan（），所以我检查了一下，找到了 allowEscapes 参数错误作为默认值。我将其添加到 read.table（）调用中并将其设置为 True 。这是对我有用的完整命令： read.table（file =＆＃34; filename.csv＆＃34;，header = T，sep =＆＃34;，＆＃34;，comment.char =＆＃34;＆＃34;，allowEscapes = T）我希望这有助于某人。

read.table（）错误，即使所有元素都存在

2 个答案: