Question

将大量列更改为数字的最快/最佳方法是什么？

我使用了以下代码，但它似乎重新排序了我的数据。

> head(stats[,1:2])
  rk                 team
1  1 Washington Capitals*
2  2     San Jose Sharks*
3  3  Chicago Blackhawks*
4  4     Phoenix Coyotes*
5  5   New Jersey Devils*
6  6   Vancouver Canucks*

for(i in c(1,3:ncol(stats))) {
    stats[,i] <- as.numeric(stats[,i])
}

> head(stats[,1:2])
  rk                 team
1  2 Washington Capitals*
2 13     San Jose Sharks*
3 24  Chicago Blackhawks*
4 26     Phoenix Coyotes*
5 27   New Jersey Devils*
6 28   Vancouver Canucks*

最好的方法是什么，除了命名每一列，如：

df$colname <- as.numeric(ds$colname)

Answer 1

将因子更改为数字时必须小心。下面是一行代码，可以将一组列从因子更改为数字。我在这里假设要更改为数字的列分别为1,3,4和5。你可以相应地改变它

cols = c(1, 3, 4, 5);    
df[,cols] = apply(df[,cols], 2, function(x) as.numeric(as.character(x)));

Answer 2

继Ramnath的回答之后，您遇到的行为是由于as.numeric(x)在R级别返回因子x的内部数字表示。如果你想保留作为因子级别的数字（而不是它们的内部表示），你需要先根据Ramnath的例子通过as.character()转换为字符。

您的for循环与apply调用一样合理，并且可能稍微更具可读性，因为代码的意图是什么。只需更改此行：

stats[,i] <- as.numeric(stats[,i])

阅读

stats[,i] <- as.numeric(as.character(stats[,i]))

这是R常见问题解答中的FAQ 7.10。

HTH

Answer 3

这可以在一行中完成，不需要循环，无论是for循环还是应用。改为使用unlist（）：

# testdata
Df <- data.frame(
  x = as.factor(sample(1:5,30,r=TRUE)),
  y = as.factor(sample(1:5,30,r=TRUE)),
  z = as.factor(sample(1:5,30,r=TRUE)),
  w = as.factor(sample(1:5,30,r=TRUE))
)
##

Df[,c("y","w")] <- as.numeric(as.character(unlist(Df[,c("y","w")])))

str(Df)

编辑：对于您的代码，这将成为：

id <- c(1,3:ncol(stats))) 
stats[,id] <- as.numeric(as.character(unlist(stats[,id])))

显然，如果你有一个单列数据框并且你不希望R的自动降维将其转换为向量，你将不得不添加drop=FALSE参数。

Answer 4

我知道这个问题很早就解决了，但我最近遇到了类似的问题，并且认为我已经找到了一个更优雅和功能性的解决方案，尽管它需要magrittr包。

library(magrittr)
cols = c(1, 3, 4, 5)
df[,cols] %<>% lapply(function(x) as.numeric(as.character(x)))

%<>%运算符管道和重新分配，这对于简化数据清理和转换非常有用。现在，通过仅指定要应用的函数，列表应用功能更容易阅读。

Answer 5

我认为ucfagls found why你的循环无效。

如果您仍然不想使用循环，请使用lapply：

factorToNumeric <- function(f) as.numeric(levels(f))[as.integer(f)] 
cols <- c(1, 3:ncol(stats))
stats[cols] <- lapply(stats[cols], factorToNumeric)

编辑。我发现更简单的解决方案。似乎as.matrix转换为字符。所以

stats[cols] <- as.numeric(as.matrix(stats[cols]))

应该做你想做的事。

Answer 6

lapply几乎是为这个

而设计的

unfactorize<-c("colA","colB")
df[,unfactorize]<-lapply(unfactorize, function(x) as.numeric(as.character(df[,x])))

Answer 7

我在其他几个重复的线程上找到了这个函数，并发现它是一种优雅而通用的方法来解决这个问题。此主题首先显示在此主题的大多数搜索中，所以我在这里分享它以节省一些时间。我不赞成这一点，因此请查看原始帖子here和here了解详情。

df <- data.frame(x = 1:10,
                 y = rep(1:2, 5),
                 k = rnorm(10, 5,2),
                 z = rep(c(2010, 2012, 2011, 2010, 1999), 2),
                 j = c(rep(c("a", "b", "c"), 3), "d"))

convert.magic <- function(obj, type){
  FUN1 <- switch(type,
                 character = as.character,
                 numeric = as.numeric,
                 factor = as.factor)
  out <- lapply(obj, FUN1)
  as.data.frame(out)
}

str(df)
str(convert.magic(df, "character"))
str(convert.magic(df, "factor"))
df[, c("x", "y")] <- convert.magic(df[, c("x", "y")], "factor")

Answer 8

我想指出，如果您在任何列中都有NA，则只使用下标将无效。如果因子中有NA，则必须使用Ramnath提供的apply脚本。

E.g。

Df <- data.frame(
  x = c(NA,as.factor(sample(1:5,30,r=T))),
  y = c(NA,as.factor(sample(1:5,30,r=T))),
  z = c(NA,as.factor(sample(1:5,30,r=T))),
  w = c(NA,as.factor(sample(1:5,30,r=T)))
)

Df[,c(1:4)] <- as.numeric(as.character(Df[,c(1:4)]))

返回以下内容：

Warning message:
NAs introduced by coercion 

    > head(Df)
       x  y  z  w
    1 NA NA NA NA
    2 NA NA NA NA
    3 NA NA NA NA
    4 NA NA NA NA
    5 NA NA NA NA
    6 NA NA NA NA

可是：

Df[,c(1:4)]= apply(Df[,c(1:4)], 2, function(x) as.numeric(as.character(x)))

返回：

> head(Df)
   x  y  z  w
1 NA NA NA NA
2  2  3  4  1
3  1  5  3  4
4  2  3  4  1
5  5  3  5  5
6  4  2  4  4

Answer 9

您可以使用“ varhandle”包形式的CRAN中的unfactor()函数：

library("varhandle")

my_iris <- data.frame(Sepal.Length = factor(iris$Sepal.Length),
                      sample_id = factor(1:nrow(iris)))

my_iris <- unfactor(my_iris)

Answer 10

我喜欢这段代码，因为它非常方便：

  data[] <- lapply(data, function(x) type.convert(as.character(x), as.is = TRUE)) #change all vars to their best fitting data type

这不是要求的确切内容（转换为数字），但在许多情况下甚至更合适。

Answer 11

这里有一些dplyr选项：

# by column type:
df %>% 
  mutate_if(is.factor, ~as.numeric(as.character(.)))

# by specific columns:
df %>% 
  mutate_at(vars(x, y, z), ~as.numeric(as.character(.))) 

# all columns:
df %>% 
  mutate_all(~as.numeric(as.character(.)))

Answer 12

我在使用apply()调用将所有列转换为数字时遇到问题：

apply(data, 2, as.numeric)

问题结果是因为某些字符串中有逗号 - 例如“1,024.63”而不是“1024.63” - 并且R不喜欢这种格式化数字的方式。所以我删除了它们然后运行as.numeric()：

data = as.data.frame(apply(data, 2, function(x) {
  y = str_replace_all(x, ",", "") #remove commas
  return(as.numeric(y)) #then convert
}))

请注意，这需要加载stringr包。

Answer 13

这对我有用。 apply()函数尝试将df强制转换为矩阵，并返回NA。

numeric.df <- as.data.frame(sapply(df, 2, as.numeric))

Answer 14

根据@SDahm的回答，这是我的tibble的“最佳”解决方案：

data %<>% lapply(type.convert) %>% as.data.table()

这需要dplyr和magrittr。

Answer 15

我在类似问题上尝试了一堆，并不断获得NA。 Base R具有一些真正令人讨厌的强制行为，通常在Tidyverse程序包中已解决。我曾经避免使用它们，因为我不想创建依赖关系，但是它们使生活变得非常轻松，现在我什至不必在大多数时间里试图找出Base R解决方案。

这是Tidyverse解决方案，它非常简单而优雅：

library(purrr)

mydf <- data.frame(
  x1 = factor(c(3, 5, 4, 2, 1)),
  x2 = factor(c("A", "C", "B", "D", "E")),
  x3 = c(10, 8, 6, 4, 2))

map_df(mydf, as.numeric)

Answer 16

df$colname <- as.numeric(df$colname)

我尝试过这种方式来更改一种列类型，如果您不打算更改所有列类型，我认为它比许多其他版本要好

df$colname <- as.character(df$colname)

反之亦然。

将类从因子更改为数据框中许多列的数字

16 个答案: