根据另一个变量区分ID

时间:2016-10-07 15:04:25

标签: r string

我需要区分列ID(带字母),如果它们在其他变量中不同(" Art"在这种情况下)。像这样:

Id<-c("RoLu1976", "RoLu1976", "AlBlKyFy1989", "ThSa1996", "AlBlKyFy1989","ThSa1996")
Art<-c("Econometric Policy Evaluation", "Policy Right", "Rules", "Expectations", "Nonneutrality of money","Expectations")
Yr<-c(1976, 1976, 1989, 1996, 1989, 1996)
df<-data.frame(Id,Art,Yr) 

在上面,Ids应该是:

Id             Art                                Yr
RoLu1976a      Econometric Policy Evaluation     1976
RoLu1976b      Policy Right                      1976
AlBlKyFi1989a  Rules                             1989
ThSa1996       Expectations                      1996
AlBlKyFi1989b  Nonneutrality of money            1989
ThSa1996       Expectations                      1996

在这种情况下,列ID在某些情况下是相同的(例如RoLu1976},但在&#34; Art&#34;列。

4 个答案:

答案 0 :(得分:4)

使用dplyr包:

library(dplyr)

df %>%
  arrange(Id, Art) %>%
  group_by(Id) %>%
  mutate(Id2 = if(length(unique(Art)) > 1) paste0(Id, "_", letters[as.numeric(factor(Art))]) else as.character(Id)) %>%
  ungroup %>%
  select(Id=Id2, everything(), -Id)
              Id                           Art    Yr
1 AlBlKyFy1989_a        Nonneutrality of money  1989
2 AlBlKyFy1989_b                         Rules  1989
3     RoLu1976_a Econometric Policy Evaluation  1976
4     RoLu1976_b                  Policy Right  1976
5       ThSa1996                  Expectations  1996
6       ThSa1996                  Expectations  1996

答案 1 :(得分:1)

使用dplyr

df%>%group_by(Id)%>%
  mutate(nb_art=length(unique(Art)))%>%
  mutate(lettre=letters[seq(nb_art)])%>%
  mutate(Id_letters=paste0(Id,ifelse(nb_art>1,lettre,"")))%>%
  ungroup()%>%
  mutate(Id=Id_letters)%>%
  select(Id,Art,Yr)

这可以缩短,但它使得阅读非常清楚(我希望)。

# A tibble: 7 x 3
             Id                           Art    Yr
          <chr>                        <fctr> <dbl>
1     RoLu1976a Econometric Policy Evaluation  1976
2     RoLu1976b                  Policy Right  1976
3 AlBlKyFy1989a                         Rules  1989
4      ThSa1996                  Expectations  1996
5 AlBlKyFy1989b        Nonneutrality of money  1989
6      ThSa1996                  Expectations  1996

答案 2 :(得分:1)

data.table解决方案

library(data.table)

setDT(df)
df[, tmp := seq(uniqueN(Art)), by = Id]
df[, addition := ifelse(.N>1, "",letters[tmp]), by = .(Id, Art)]
df[, Id := paste0(Id, addition)]
df[, c("tmp", "addition") := NULL]

答案 3 :(得分:1)

使用for循环:

SELECT ID
FROM USERS U
JOIN (
  -- Users in two projects
  SELECT USER_ID 
  FROM USER_PROJECT
  WHERE PROJECT_ID IN (1,2)
  GROUP BY USER_ID
  HAVING COUNT(DISTINCT PROJECT_ID) =  2
) UP ON U.ID = UP.USER_ID
JOIN (
  -- user ids that have appointments on two dates:
  SELECT USER_ID
  FROM APPOINTMENT 
  WHERE DATE IN ('2016-10-07','2016-11-15')
  GROUP BY USER_ID
  HAVING COUNT(DISTINCT DATE) =  2
) A ON U.ID = A.USER_ID