用字符串值替换数字值

时间:2019-02-06 17:23:13

标签: r dataframe dataset

在数据表中,所有单元格都是数字,而我想要做的就是将所有数字替换为这样的字符串:

[0,2]中的数字:将其替换为字符串“ Bad”

[3,4]中的数字:将其替换为字符串“ Good”

数字> 4:将其替换为字符串“ Excellent”

这是我的原始表的一个示例,名为“ data.active”: enter image description here

我这样做的尝试是:

x <- c("churches","resorts","beaches","parks","Theatres",.....)
for(i in x){
  data.active$i <- as.character(data.active$i)
  data.active$i[data.active$i <= 2] <- "Bad"
  data.active$i[data.active$i >2 && data.active$i <=4] <- "Good"
  data.active$i[data.active$i >4] <- "Excellent"
}

但是它不起作用。还有其他方法吗?

编辑

这是指向我的数据集GoogleReviews_Dataset的链接,这就是我上图中的表格的方式:

library(FactoMineR)
library(factoextra)
data<-read.csv2(file.choose())
data.active <- data[1:10, 4:8]

2 个答案:

答案 0 :(得分:1)

x<-c('x','y','z')
df[,x] <- lapply(df[,x], function(x) 
                         cut(x ,breaks=c(-Inf,2,4,Inf),labels=c('Bad','Good','Excellent'))))

数据

df<-structure(list(x = 1:5, y = c(1L, 2L, 2L, 2L, 3L), z = c(1L,3L, 3L, 3L, 2L), 
a = c(1L, 5L, 6L, 4L, 8L),b = c(1L, 3L, 4L, 7L, 1L)), 
class = "data.frame", row.names = c(NA, -5L))

答案 1 :(得分:1)

您可以使用tidyversemutate_all来限制范围:

library(tidyverse)

df<-structure(
  list(
    x = 1:5, 
    y = c(1L, 2L, 2L, 2L, 3L), 
    z = c(1L,3L, 3L, 3L, 2L),
    a = c(1L, 5L, 6L, 4L, 8L),
    b = c(1L, 3L, 4L, 7L, 1L)
  ), 
  class = "data.frame", 
  row.names = c(NA, -5L)
)

df %>% mutate_all(
  funs(
    case_when(
      . <= 2             ~ 'Bad',
      (. > 3) & (. <= 4) ~ 'Good',
      (. > 4)            ~ 'Excellent',
      TRUE               ~ as.character(.)
    )
  )
)

上面的.代表要评估的元素。结果是

          x   y   z         a         b
1       Bad Bad Bad       Bad       Bad
2       Bad Bad   3 Excellent         3
3         3 Bad   3 Excellent      Good
4      Good Bad   3      Good Excellent
5 Excellent   3 Bad Excellent       Bad

要仅更改选择的列,请使用mutate_at

df %>% mutate_at(
  vars(
    c('a', 'x', 'b')
  ),
  funs(
    case_when(
      . <= 2             ~ 'Bad',
      (. > 3) & (. <= 4) ~ 'Good',
      (. > 4)            ~ 'Excellent',
      TRUE               ~ as.character(.)
    )
  )
)

这产生

          x y z         a         b
1       Bad 1 1       Bad       Bad
2       Bad 2 3 Excellent         3
3         3 2 3 Excellent      Good
4      Good 2 3      Good Excellent
5 Excellent 3 2 Excellent       Bad

在这里,您可以使用tidyverse直接下载和分类源数据:

df <- read_csv(
  'https://archive.ics.uci.edu/ml/machine-learning-databases/00485/google_review_ratings.csv'
) %>% select(
  -X26 # Last column is blank (faulty CSV)
) %>% select_at(
  vars(
    paste('Category', 1:10) # Pick only Category 1-Category 10
  )
) %>% mutate_all(
  funs(
    case_when(
      . <= 2             ~ 'Bad',
      (. > 3) & (. <= 4) ~ 'Good',
      (. > 4)            ~ 'Excellent',
      TRUE               ~ as.character(.)
    )
  )
)