数据验证似乎不起作用

时间:2015-06-08 11:25:09

标签: r

我有一个数据集,其中第7列由状态名组成。

当我通过这样做打印数据集的相关列时:

outcome <- read.csv("outcome-of-care-measures.csv", colClasses = "character")
x <- outcome[, 7]
y <- unique(x)

我得到了第7列的唯一值。

我现在想在一个公式中进行验证,该公式检查我输入的值是否在列表中。

所以我创建了这个函数:

 name_in_list <- function(state) {
data <- read.csv("outcome-of-care-measures.csv", colClasses="character")

if((state %in% data$state) == FALSE) {
 stop("invalid outcome")
}
print("succes!")
 }

其中,我认为应该检查国家的价值是否在那里。但是当我进入时:

name_in_list("AL") I get ->  in name_in_list("AL") : invalid outcome

这是奇怪的原因我认为它应该评估为真(并且应该评估为“成功!”),因为AL在数据集中。打印唯一值给我:

"AL" "AK" "AZ" "AR" "CA" "CO" "CT" "DE" "DC" "FL" "GA" "HI" "ID" "IL" "IN" "IA" "KS" "KY" "LA" "ME" "MD" "MA" "MI" "MN" "MS" "MO" "MT" "NE" "NV" "NH" "NJ" "NM" "NY" "NC" "ND"
[36] "OH" "OK" "OR" "PA" "PR" "RI" "SC" "SD" "TN" "TX" "UT" "VT" "VI" "VA" "WA" "WV" "WI" "WY" "GU"

关于出了什么问题的任何线索?

1 个答案:

答案 0 :(得分:0)

test_names <- function(x, dfc) ifelse(all(dfc %in% approved), TRUE, FALSE)
test_names(state, data$state)

但是如果你想要一个更新的解决方案,它使用管道和a package designed for this purpose

library(assertr)
library(dplyr)
# install with install.packages(c('assertr', 'dplyr')) 

# List your approved states in a vector
# or just read the unique values into one
approved_species <- c("setosa", "virginica", "versicolor")
iris %>% 
  assert(in_set(approved_species), Species) %>% 
  # now perform some operation
  count(Species)

但是如果我们缩短批准的清单

approved_species <- c("setosa", "virginica")
iris %>% 
  assert(in_set(approved_species), Species) %>% 
  count(Species)

操作失败,因为数据验证步骤失败。这有帮助吗?显然,您会将物种换成数据集的状态。