错误:无法分配大小为34.8 Gb的向量

时间:2016-04-23 10:52:03

标签: r memory memory-management bigdata

尝试使用lda函数时出现以下错误。我的训练数据集只有54683行,有12个变量。

Error: cannot allocate vector of size 34.8 Gb
In addition: Warning messages:
1: In rep.int(c(1, numeric(n)), n - 1L) :
  Reached total allocation of 3066Mb: see help(memory.size)
2: In rep.int(c(1, numeric(n)), n - 1L) :
  Reached total allocation of 3066Mb: see help(memory.size)
3: In rep.int(c(1, numeric(n)), n - 1L) :
  Reached total allocation of 3066Mb: see help(memory.size)
4: In rep.int(c(1, numeric(n)), n - 1L) :
  Reached total allocation of 3066Mb: see help(memory.size)

下面是我的sessionifo()

R version 3.2.3 (2015-12-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] MASS_7.3-45

loaded via a namespace (and not attached):
[1] tools_3.2.3

ram - 3gb 处理器:intel(r)core(tm)2duo t6600 @ 2.20ghz 2.20 ghz

我尝试在线搜索并查看bigmemory和ff包但很难理解。

我对R工作比较陌生。任何帮助都会受到赞赏。

我的代码:

d1=data.frame(read.csv("C:\\Users\\pankaj\\Downloads\\Mc Kinsey Hiring\\train.csv",na.strings = c(""," ")))
View(d1)
#d1$Email_ID = row.names(d1$Email_ID)
View(d1)
d1$Email_Status = as.factor(d1$Email_Status)
d1$Email_Type = as.factor(d1$Email_Type)
d1$Email_Source_Type  = as.factor(d1$Email_Source_Type)
d1$Customer_Location = as.factor(d1$Customer_Location)
d1$Email_Campaign_Type = as.factor(d1$Email_Campaign_Type)
d1$Time_Email_sent_Category = as.factor(d1$Time_Email_sent_Category)
summary(d1)


set.seed(1)
sp = sample(x = 68353,size = 54683)
train = d1[sp,]
test =  d1[-sp,]
View(train)
View(test)


# removed all unnecessary data from environment. Leaving only the training data
rm(sp)
rm(d1)
rm(test)

# running lda
library(MASS)
lda.fit = lda(train$Email_Status ~. - train$Email_ID,data = train)

0 个答案:

没有答案