将一个因子上的NAs替换为另一个因子的值

时间:2016-04-08 20:36:19

标签: r replace na factors

我在这里缺少一些非常基本的东西

notifcationUL.append('<li><div class="pull-left"><a href="<%=Page.ResolveUrl("~/App Web Pages/Generic/PageTableUIControl.aspx?Entity=Notifications&EntityObject=EIS_Notifications&UpdateModeLoad=false&WhereCondition=Main.[NotificationUserID]="' + $("#hdnUser").val() + '" and Main.NotificationCreatedDate=" '+ today + '"%>","mywindowtitle")"><strong>See All Notifications</strong><i class="fa fa-angle-right"></i></a></div><div class="pull-right"> <a href="#" class="btn btn-sm btn-info removenotifications">Clear Notifications <span class="glyphicon glyphicon-trash pull-right "></span></a></div></li>');

我有这段代码,但它不起作用

d <- data.frame(
g0  = c("A", "B", NA, NA, "C", "C"),
g1  = LETTERS[1:6])
d
    g0 g1
1    A  A
2    B  B
3 <NA>  C
4 <NA>  D
5    C  E
6    C  F

期望的结果。

d$g0[is.na(d$g0)] <- d$g1[is.na(d$g0)]

1 个答案:

答案 0 :(得分:4)

记住背后因素的原始设计理念总是有帮助的。它们用于采用一组固定值的分类变量。所以想象我稍微改变了你的例子:

d <- data.frame(color  = c("red", "blue", NA, NA, "green", "green"),
                amount  = c("high","low","low","mid","mid","high"))

> d
  color amount
1   red   high
2  blue    low
3  <NA>    low
4  <NA>    mid
5 green    mid
6 green   high

现在,当我们运行以下内容时R抱怨是完全有道理的:

> d$color[is.na(d$color)] <- d$amount[is.na(d$color)]
Warning message:
In `[<-.factor`(`*tmp*`, is.na(d$color), value = c(3L, 1L, NA, NA,  :
  invalid factor level, NA generated

因为我们为什么要一个color&#34;高&#34;或&#34; mid&#34;?这是没有意义的。这里的心理模型是两个因素实际上彼此无关,或者如果它们相同,它们的水平应该是相同的。所以,

levels(d$color) <- c(levels(d$color),"low","mid")
d$color[is.na(d$color)] <- d$amount[is.na(d$color)]

这没有问题:

> d
  color amount
1   red   high
2  blue    low
3   low    low
4   mid    mid
5 green    mid
6 green   high

即使结果在语义上是荒谬的。

当然,很多人发现所有这些因素水平的杂耍令人厌烦,而且会简单地完成:

d <- data.frame(color  = c("red", "blue", NA, NA, "green", "green"),
                amount  = c("high","low","low","mid","mid","high"), 
                stringsAsFactors = FALSE)

然后R根本不关心你填充NA值的内容,因为它们不再是因素。

相关问题