使用函数将字符值转换为R中的数值

时间:2015-11-12 13:27:52

标签: r

我希望加载和处理包含七个变量的CSV文件,一个是分组变量/因子(data$hashtag),另外六个是类别(data$support和其他)使用" X"或" x" (或留空)。

data <- read.csv("maet_coded_tweets.csv", stringsAsFactors = F)

names(data) <- c("hashtag", "support", "contributeConversation", "otherCommunities", "buildCommunity", "engageConversation", "unclear")

str(data)

'data.frame':   854 obs. of  7 variables:
 $ hashtag               : chr  "#capstoneisfun" "#capstoneisfun" "#capstoneisfun" "#capstoneisfun" ...
 $ support               : chr  "x" "x" "x" "x" ...
 $ contributeConversation: chr  "" "" "" "" ...
 $ otherCommunities      : chr  "" "" "" "" ...
 $ buildCommunity        : chr  "" "" "" "" ...
 $ engageConversation    : chr  "" "" "" "" ...
 $ unclear               : chr  "" "" "" "" ...

当我使用函数重新编码&#34; X&#34;或&#34; x&#34;到1,&#34;&#34; (空白)0,数据是奇怪的字符类型,不是预期的数字。

recode <- function(x) {

  x[x=="x"] <- 1
  x[x=="X"] <- 1
  x[x==""] <- 0
  x
}

data[] <- lapply(data, recode)

str(data)

'data.frame':   854 obs. of  7 variables:
 $ hashtag               : chr  "#capstoneisfun" "#capstoneisfun" "#capstoneisfun" "#capstoneisfun" ...
 $ support               : chr  "1" "1" "1" "1" ...
 $ contributeConversation: chr  "0" "0" "0" "0" ...
 $ otherCommunities      : chr  "0" "0" "0" "0" ...
 $ buildCommunity        : chr  "0" "0" "0" "0" ...
 $ engageConversation    : chr  "0" "0" "0" "0" ...
 $ unclear               : chr  "0" "0" "0" "0" ...

当我尝试在函数中使用as.numeric()强制使用字符时,它仍然无法正常工作。给出了什么 - 为什么将变量视为字符以及如何将变量设置为数字?

3 个答案:

答案 0 :(得分:2)

怎么样:

recode <- function(x) {
  ifelse(x %in% c('X','x'), 1,0)
}

说明:函数中的步骤按顺序计算,而不是同时计算。因此,当您将1&1 39分配给一个字符向量时,它们会被转换为&#34; 1&#34; s。

答案 1 :(得分:1)

这是什么意思?

# sample data with support being a character vector
data.frame(support = c("X","X","0","x","0"),a=1:5,stringsAsFactors = F)->myDat
# convert to a factor and check the order of the levels
myDat$support <- as.factor(myDat$support)
levels(myDat$support)
#"0" "x" "X"
# just to see that it worked make an additional variable
myDat$supportrecoded <- myDat$support
# change levels and convert
levels(myDat$supportrecoded) <- c("0","1","1")
myDat$supportrecoded <- as.integer(as.character(myDat$supportrecoded ))

答案 2 :(得分:1)

使用mapvalues中的plyr

data$support <- as.numeric(mapvalues(data$support, c("X", "x", ""), c(1, 1, 0)))

使用replace

data$support <- replace(x <- data$support, x == "X", 1)
data$support <- replace(x <- data$support, x == "x", 1)
data$support <- replace(x <- data$support, x == "", 0)
data$support <- numeric(data$support)