将“.dat”读入R(将dat文件导入R)

时间:2015-09-02 09:10:39

标签: r

我正在尝试阅读这些数据:http://www.biostat.umn.edu/~brad/data/smoking.dat R 。 我在http://stackoverflow.com/questions/11664075/import-dat-file-into-r

中使用了答案
read.table("http://www.biostat.umn.edu/~brad/data/smoking.dat", 
           header=TRUE,, sep="\n", skip=2)

它有效,但提供了错误的数据。

head(x)
                  list.regions.81..num...c.8..5..3..8..5..1..6.
1                 7, 3, 5, 7, 7, 2, 2, 5, 6, 6, 7, 4, 8, 7, 6, 
2 6, 2, 8, 4, 4, 10, 4, 3, 7, 6, 5, 7, 7, 7, 5, 6, 4, 9, 4, 7, 
3  4, 5, 9, 3, 7, 5, 5, 4, 5, 6, 6, 5, 2, 6, 2, 8, 7, 6, 5, 6, 
4       3, 6, 6, 6, 6, 4, 10, 8, 3, 4, 2, 6, 5, 7, 7, 4, 7, 6, 
5                                                2),sumnum=441,
6                            adj=c(2, 5, 6, 8, 11, 45, 75, 80, 

实际上,在这些数据中有一些列表。

1 个答案:

答案 0 :(得分:0)

您无法使用read.table()读取此文件,因为它不是表格。相反,它是R对象的文本表示(在这种情况下,两个列表),例如dput()生成的。正如David Arenburg上面所说,你应该使用dget()。我是httr包的忠实粉丝。

修改:,在一个页面上显示任意数量的 list 对象:

put_multiple_objs_from_url <- function(url){
  require(httr)  
  request <- GET(url)
  stop_for_status(request)
  text_lines <- readLines(textConnection(content(request, as = 'text')))

  # look for lines that start with "list(" to determine file parts
  start_lines <- grep('^list\\(',  text_lines)
  end_lines <- integer(length(start_lines))
  for (i in 1:(length(start_lines)-1) ){
    end_lines[i] <- start_lines[i+1] - 1
  }
  end_lines[length(start_lines)] <- length(text_lines)

  # dget each of these file parts as an element of obj_list 
  obj_list <- vector("list",length(start_lines))
  for( i in 1:length(start_lines) ){
    obj_txt <- paste0(text_lines[start_lines[i]:end_lines[i]],
                      collapse=" ")
    obj_list[[i]] <- dget(textConnection(obj_txt))
  }
  obj_list
}  

x <- put_multiple_objs_from_url("http://www.biostat.umn.edu/~brad/data/smoking.dat")

str(x)
# List of 2
# $ :List of 4
# ..$ regions: num 81
# ..$ num    : num [1:81] 8 5 3 8 5 1 6 7 3 5 ...
# ..$ sumnum : num 441
# ..$ adj    : num [1:441] 2 5 6 8 11 45 75 80 1 8 ...
# $ :List of 9
# ..$ N             : num 223
# ..$ Age           : num [1:223] 49 47 50 55 59 41 55 42 51 49 ...
# ..$ SexF          : num [1:223] 0 0 1 0 0 0 1 1 1 0 ...
# ..$ AgeStart      : num [1:223] 18 14 19 15 18 16 15 18 18 18 ...
# ..$ SIUC          : num [1:223] 1 0 1 1 1 1 1 1 1 1 ...
# ..$ F10Cigs       : num [1:223] 30 20 12 40 20 40 18 40 20 18 ...
# ..$ censored.time1: num [1:223] 1.01 5 4.99 5.04 5 ...
# ..$ censored.time2: num [1:223] 1.97 100 100 100 100 ...
# ..$ County        : num [1:223] 17 21 77 30 25 58 13 16 13 77 ...