我正在尝试阅读这些数据:http://www.biostat.umn.edu/~brad/data/smoking.dat
到 R 。
我在http://stackoverflow.com/questions/11664075/import-dat-file-into-r
read.table("http://www.biostat.umn.edu/~brad/data/smoking.dat",
header=TRUE,, sep="\n", skip=2)
它有效,但提供了错误的数据。
head(x)
list.regions.81..num...c.8..5..3..8..5..1..6.
1 7, 3, 5, 7, 7, 2, 2, 5, 6, 6, 7, 4, 8, 7, 6,
2 6, 2, 8, 4, 4, 10, 4, 3, 7, 6, 5, 7, 7, 7, 5, 6, 4, 9, 4, 7,
3 4, 5, 9, 3, 7, 5, 5, 4, 5, 6, 6, 5, 2, 6, 2, 8, 7, 6, 5, 6,
4 3, 6, 6, 6, 6, 4, 10, 8, 3, 4, 2, 6, 5, 7, 7, 4, 7, 6,
5 2),sumnum=441,
6 adj=c(2, 5, 6, 8, 11, 45, 75, 80,
实际上,在这些数据中有一些列表。
答案 0 :(得分:0)
您无法使用read.table()
读取此文件,因为它不是表格。相反,它是R对象的文本表示(在这种情况下,两个列表),例如dput()
生成的。正如David Arenburg上面所说,你应该使用dget()
。我是httr
包的忠实粉丝。
修改:,在一个页面上显示任意数量的 list
对象:
put_multiple_objs_from_url <- function(url){
require(httr)
request <- GET(url)
stop_for_status(request)
text_lines <- readLines(textConnection(content(request, as = 'text')))
# look for lines that start with "list(" to determine file parts
start_lines <- grep('^list\\(', text_lines)
end_lines <- integer(length(start_lines))
for (i in 1:(length(start_lines)-1) ){
end_lines[i] <- start_lines[i+1] - 1
}
end_lines[length(start_lines)] <- length(text_lines)
# dget each of these file parts as an element of obj_list
obj_list <- vector("list",length(start_lines))
for( i in 1:length(start_lines) ){
obj_txt <- paste0(text_lines[start_lines[i]:end_lines[i]],
collapse=" ")
obj_list[[i]] <- dget(textConnection(obj_txt))
}
obj_list
}
x <- put_multiple_objs_from_url("http://www.biostat.umn.edu/~brad/data/smoking.dat")
str(x)
# List of 2
# $ :List of 4
# ..$ regions: num 81
# ..$ num : num [1:81] 8 5 3 8 5 1 6 7 3 5 ...
# ..$ sumnum : num 441
# ..$ adj : num [1:441] 2 5 6 8 11 45 75 80 1 8 ...
# $ :List of 9
# ..$ N : num 223
# ..$ Age : num [1:223] 49 47 50 55 59 41 55 42 51 49 ...
# ..$ SexF : num [1:223] 0 0 1 0 0 0 1 1 1 0 ...
# ..$ AgeStart : num [1:223] 18 14 19 15 18 16 15 18 18 18 ...
# ..$ SIUC : num [1:223] 1 0 1 1 1 1 1 1 1 1 ...
# ..$ F10Cigs : num [1:223] 30 20 12 40 20 40 18 40 20 18 ...
# ..$ censored.time1: num [1:223] 1.01 5 4.99 5.04 5 ...
# ..$ censored.time2: num [1:223] 1.97 100 100 100 100 ...
# ..$ County : num [1:223] 17 21 77 30 25 58 13 16 13 77 ...