在R中使用分区将数据切成bin

时间:2018-09-10 11:25:31

标签: r

我正在使用R中Hmisc库函数中的cut2将我的数据集切成固定数量的bin,例如

library(Hmisc)
as.numeric(cut2(Catchment_Population_Log, g=4))

但是,有没有一种简单的方法来添加分区级别,因此每个Category都会得到n个削减?即,我希望基本上为每个类别分别使用cut2(或类似内容) (当我在SQL中执行类似操作时,将使用PARTITION BY)。

所以在我的脑海里,就像这样;

as.numeric(cut2(Catchment_Population_Log, g=4, partition_by=CategoryID))

但是在cut2文档中看不到任何允许这样做的内容。我已经使用过split()了,但是还没有任何东西可以工作。

示例数据,包括我希望实现的输出

library(Hmisc)
library(dplyr)
category <- c('Category_1','Category_1','Category_1','Category_1','Category_2','Category_2','Category_2','Category_2','Category_3','Category_3','Category_3','Category_3')
catchment_population_log <- c(0.3,0.2,0.1,0.4,0.4,0.2,0.6,0.9,0.2,0.6,0.2,0.4)
exp_result <- c(2,1,1,2,1,1,2,2,1,2,1,2)
data <- data.frame(category, catchment_population_log)

# Result just using cut2 - data is cut into 2 bins
# based on their catchment_population_log value
data %>%
  mutate(just_using_cut2 = as.numeric(cut2(catchment_population_log,g=2)))

# This time, I'll manually transpose the expected result; each Category 
# should be split into 2 bins based on the catchment_population value 
# independently of each other.
# As a result, a 0.4 value might fall in bin 1 for one category,
# but bin 2 for another category

data %>%
  mutate(just_using_cut2 = as.numeric(cut2(catchment_population_log,g=2))) %>%
  cbind(exp_result)

1 个答案:

答案 0 :(得分:1)

多亏了Moody_Mudskipper,我才能够使它完全按照我的需要工作。

# This works with cut in base, as well as cut2, but I'm using cut2
library(Hmisc)
data %>%
  group_by(category) %>%
  mutate(population_bin = as.numeric(cut2(catchment_population_log,g=2)))
相关问题