R - 用另一列中的条件值替换空行值

时间:2015-09-12 23:45:53

标签: r

我尝试搜索并找到了将空行值替换为其他列而不是条件的答案。让我解释。

我有一个如下所示的数据框:

Name    Grade    Test1    Test2    Test3
John    A        none     none
Jane             B ok     none
David            none     C barely
Sam     B        none
Thomas                             D fail

我想用其他列中的字母等级(删除以下注释)替换成绩列中缺少的成绩。 Test1 / Test2 / Test3列中永远不会有多个字母等级。所以我最喜欢的结果就是:

Name   Grade    Test1    Test2    Test3
John   A        none     none
Jane   B        B ok     none
David  C        none     C barely
Sam    B        none
Thomas D                          D fail

任何帮助将不胜感激!

3 个答案:

答案 0 :(得分:1)

我无耻地给@ akrun的数据留下了痕迹,以显示另一种符合split-apply-combine范例的方法

# define data
df1 <-  structure(list(Name = c("John", "Jane", "David", "Sam", "Thomas"
), Grade = c("A", "", "", "B", ""), Test1 = c("none", "B ok", 
"none", "none", ""), Test2 = c("none", "none", "C barely", "", 
""), Test3 = c("", "", "", "", "D fail")), .Names = c("Name", 
"Grade", "Test1", "Test2", "Test3"), class = "data.frame",
row.names = c(NA, -5L))

# load up libraries
library(dplyr)
library(tidyr)

# add a primary key
df1 <- df1 %>%
   mutate(PK = 1:nrow(df1))

# turn the test results into tidy format, first by making long and skinny
# and then by bringing it back to one entry per person who has a test result    
test_result <- df1 %>%
   select(PK, Test1:Test3) %>%
   gather(Variable, Value, -PK) %>%
   mutate(Value = ifelse(Value == "none", "", substring(Value, 1, 1))) %>%
   # drop all the unnecessary rows:
   filter(Value != "")

   # join back to the main data, fill in the test score when needed
df1 %>%
   select(PK, Name, Grade) %>%
   left_join(test_result, by = "PK") %>%
   mutate(
      Source = ifelse(Grade %in% LETTERS, "Grade", as.character(Variable)),
      Grade = ifelse(Grade %in% LETTERS, Grade, Value)) %>%
   select(-Value, - PK, -Variable)

这为您提供了一个非常整洁的数据集,应该更好地用于将来的分析和重复使用:

    Name Grade Source
1   John     A  Grade
2   Jane     B  Test1
3  David     C  Test2
4    Sam     B  Grade
5 Thomas     D  Test3

答案 1 :(得分:0)

假设列为character类,我们得到的等级为&#39;等级&#39;空白的元素(&#39; i1&#39;)

i1 <- df1$Grade==''

我们循环“测试”。列,即使用vapply的第3列到第5列,使用\\S对具有非空格(\\s),后跟空格(grep)的列中的元素进行子集,使用sub删除空格及其后面的字符,并将输出分配到&#39;等级&#39;中的空白元素。

df1$Grade[i1] <- vapply(df1[i1,3:5], function(x)
    sub('\\s+.*$', '', grep('^\\S\\s', x, value=TRUE)), character(1))
df1
#    Name Grade Test1    Test2  Test3
#1   John     A  none     none       
#2   Jane     B  B ok     none       
#3  David     C  none C barely       
#4    Sam     B  none                
#5 Thomas     D                D fail

数据

df1 <-  structure(list(Name = c("John", "Jane", "David", "Sam", "Thomas"
), Grade = c("A", "", "", "B", ""), Test1 = c("none", "B ok", 
"none", "none", ""), Test2 = c("none", "none", "C barely", "", 
""), Test3 = c("", "", "", "", "D fail")), .Names = c("Name", 
"Grade", "Test1", "Test2", "Test3"), class = "data.frame",
row.names = c(NA, -5L))

答案 2 :(得分:0)

当我在你的data上尝试它时,首先从数据框中取出,然后将每个字符串的等级部分子串,然后将所有列合并为一个并生成最终表:

data[data=="none"]=""
A=function(x) substring(x,1,1)
data1=data.frame(data[1],apply(data[2:5],2,a))
all.grades=paste(data1$grade,data1$test1,data1$test2,data1$test3,sep="")
data1$grade=all.grades
final.data=data.frame(data1[1:2],data[3:5])
final.data

name   grade   test1    test2    test3
john       A                      
jane       B    B ok                
david      C          C barely       
sam        B                      
thomas     D                    D fail