从数据框中按组删除异常值的功能

时间:2018-12-11 11:00:06

标签: r dplyr outliers

我正在尝试从包含xy变量(按变量cond分组)的数据框中删除离群值。

我创建了一个函数,该函数可以根据箱线图统计信息删除异常值,并返回df,而没有异常值。该功能适用​​于原始数据时效果很好。但是,如果将其应用于分组数据,则该功能将无法正常工作,并且我返回了一个错误:

Error in mutate_impl(.data, dots) : 
  Evaluation error: argument "df" is missing, with no default.

请问,如何纠正我的函数以向量df$xdf$y作为参数,并按组正确地​​排除异常值?

enter image description here


我的伪数据:

set.seed(955)
# Make some noisily increasing data
dat <- data.frame(cond = rep(c("A", "B"), each = 22),
                  xvar = c(1:10+rnorm(20,sd=3), 40, 10, 11:20+rnorm(20,sd=3), 85, 115),
                  yvar = c(1:10+rnorm(20,sd=3), 200, 60, 11:20+rnorm(20,sd=3), 35, 200))


removeOutliers<-function(df, ...) {

  # first, identify the outliers and store them in a vector
  outliers.x<-boxplot.stats(df$x)$out
  outliers.y<-boxplot.stats(df$y)$out

  # remove the outliers from the original data
  df<-df[-which(df$x %in% outliers.x),]
  df[-which(df$y %in% outliers.y),]
}

# REmove outliers (try if function works)
removeOutliers(dat)

# Apply the function to group
# Not working!!!

dat_noOutliers<- dat %>%
  group_by(cond) %>%
  mutate(removeOutliers)

我发现此功能可以从矢量数据中删除异常值。但是,我想从数据帧中的df$xdf$y向量中移除异常值。

remove_outliers <- function(x, na.rm = TRUE, ...) {
  qnt <- quantile(x, probs=c(.25, .75), na.rm = na.rm, ...)
  H <- 1.5 * IQR(x, na.rm = na.rm)
  y <- x
  y[x < (qnt[1] - H)] <- NA
  y[x > (qnt[2] + H)] <- NA
  y
}

remove outliers by group in R

2 个答案:

答案 0 :(得分:5)

由于您正在将此功能应用于整个df,因此应改为使用mutate_all。做:

dat_noOutliers<- dat %>%
  group_by(cond) %>%
  mutate_all(remove_outliers)

答案 1 :(得分:2)

您可以只过滤数据:

# Try to use a bit more memory (works only in 64-bit Java)
#options(java.parameters = "-Xmx8000m")

library(tidyverse)

set.seed(955)
dat <- data.frame(cond = rep(c("A", "B"), each = 22),
                  xvar = c(1:10+rnorm(20,sd=3), 40, 10, 11:20+rnorm(20,sd=3), 85, 115),
                  yvar = c(1:10+rnorm(20,sd=3), 200, 60, 11:20+rnorm(20,sd=3), 35, 200))

dat %>%
  ggplot(aes(x = xvar, y = yvar)) + 
  geom_point() + 
  geom_smooth(method = lm) +
  ggthemes::theme_hc()

reprex package(v0.2.1)于2018-12-11创建