识别出异常后将其归类

时间:2018-11-21 22:38:18

标签: r dataframe outliers tbl

我已经使用这篇文章(http://www.questionflow.org/2017/12/26/combined-outlier-detection-with-dplyr-and-ruler/)编写了一个函数,该函数是我在网上找到的,它基于z得分阈值之外或1.5 * IQR之外来标记tbl中的任何异常值。我想通过添加可以设置为均值,中位数或众数(或无,在这种情况下输出仅是逻辑逻辑的tbl)的“不正确”参数来增强功能,这将设置标记为该列的输入参数。我该怎么做呢?到目前为止,我的代码:

# Detecting outliers based on two outlier metrics (3 out of z-score or 1.5 out of IQR)
# Defining the functions to detect outliers
isnt_out_z <- function(x, thres = 3, na.rm = TRUE) {
  abs(x - mean(x, na.rm = na.rm)) <= thres * sd(x, na.rm = na.rm)
}

isnt_out_IQR <- function(x, k = 1.5, na.rm = TRUE) {
  quar <- quantile(x, probs = c(0.25, 0.75), na.rm = na.rm)
  iqr <- diff(quar)
  (quar[1] - k * iqr <= x) & (x <= quar[2] + k * iqr)
}

# Column-based non-outlier rows: row is not an outlier based on some column if it doesn't contain outlier (computed based on target column) on the intersection with that column

## Useable functions
find_Outliers <- function(data, method = 'z', impute = none) {
  if (method == 'z') {
    data %>%
      transmute_if(is.numeric, isnt_out_z)
  } else {
    data %>%
      transmute_if(is.numeric, isnt_out_IQR)
  }
}

0 个答案:

没有答案
相关问题