按组选择数据框中的最大值

时间:2018-08-17 07:19:03

标签: r dataframe

我有以下df

dat <- data.frame(Cases = c("Student3","Student3","Student3","Student1","Student1",
"Student2","Student2","Student2","Student4"), Class = rep("Math", 9),
Scores = c(9,5,2,7,3,8,5,1,7), stringsAsFactors = F)


> dat
   Cases    Class   Scores
1 Student3  Math      9
2 Student3  Math      5
3 Student3  Math      2
4 Student1  Math      7
5 Student1  Math      3
6 Student2  Math      8
7 Student2  Math      5
8 Student2  Math      1
9 Student4  Math      7

另一方面,我有另一个df,其中包含以下信息:

d <- data.frame(Cases = c("Student3", "Student1",
"Student2", "Student4"), Class = rep("Math", 4), stringsAsFactors = F)

    Cases  Class
1 Student3  Math
2 Student1  Math
3 Student2  Math
4 Student4  Math

对于这两个,我想为每个scores提取最高的student。所以我的输出看起来像这样:

> dat_output
    Cases  Class   Scores
1 Student3  Math      9
2 Student1  Math      7
3 Student2  Math      8
4 Student4  Math      7

我尝试使用merge,但它并不是仅提取最高的scores

7 个答案:

答案 0 :(得分:6)

我们可以在sapply中的每个Cases上使用d,将dat的{​​{1}}子集化,并得到Cases得分

max

将结果作为data.frame

sapply(d$Cases, function(x) max(dat$Scores[dat$Cases %in% x]))

#Student3 Student1 Student2 Student4 
#       9        7        8        7 

注意-我假设您的transform(d, Scores = sapply(d$Cases, function(x) max(dat$Scores[dat$Cases %in% x]))) # Cases Class Scores # Student3 Math 9 # Student1 Math 7 # Student2 Math 8 # Student4 Math 7

d

答案 1 :(得分:3)

如果我是对的,那么您不需要 TimeInterpolator mDefaultInterpolator = new ValueAnimator().getInterpolator(); view.animate().setInterpolator(mDefaultInterpolator); ,因为d中没有d中没有的其他信息。

您可以这样做:

dat

答案 2 :(得分:3)

使用dplyr,并考虑您的d包含您dat的一部分学生的情况

library(dplyr)
inner_join(d, dat %>% group_by(Cases, Class) %>% summarize(Scores=max(Scores)))

# Cases Class Scores
#1 Student3  Math      9
#2 Student1  Math      7
#3 Student2  Math      8
#4 Student4  Math      7

如果顺序无关紧要,那么以下方法会更有效:

inner_join(dat, d) %>% group_by(Cases, Class) %>% summarize(Scores=max(Scores))
# A tibble: 4 x 3
# Groups:   Cases [?]
#  Cases    Class Scores
#  <chr>    <chr>  <dbl>
#1 Student1 Math       7
#2 Student2 Math       8
#3 Student3 Math       9
#4 Student4 Math       7

答案 3 :(得分:3)

您还可以按以下方式使用# inventory.py stuff = {"coal":42,"dagger":1,"iron": 20,"torch":2} total_items = 0 def display_inventory(inventory): for k,v in stuff.items(): print(k+':',v,end=' ') # (k+': '+v,end =' ') global total_items total_items = total_items + v print("\n") print("Total: " + str(total_items)) display_inventory(stuff) ''' output coal: 42 dagger: 1 iron: 20 torch: 2 Total: 65 ''' 软件包:

sqldf

应用sqldf("select max(Scores), Cases from dat JOIN d USING(Cases) group by Cases") 操作,JOINgroup by cases以获得所需的输出:

select max(Scores),Cases

答案 4 :(得分:3)

您可以使用Scores以降序对order上的数据框进行排序。然后删除重复的Cases。这是base R解决方案。

dat <- dat[order(-dat$Scores),]
dat[duplicated(dat$Cases)==F,]

     Cases Class Scores
1 Student3  Math      9
6 Student2  Math      8
4 Student1  Math      7
9 Student4  Math      7

如果您首先要确保dat中的所有样本也都在d中,则可以在第一步中执行此操作。 %in%执行值匹配。但是,根据上面的示例,它没有什么区别。

dat <- dat[dat$Cases %in% d$Cases & dat$Class %in% d$Class,]

答案 5 :(得分:1)

使用dplyr

df %>% #df is dat in your example 
  group_by(Cases, Class) %>% 
  summarise(Scores = max(Scores))

# A tibble: 4 x 3
# Groups:   Cases [?]
  Cases    Class Scores
  <chr>    <chr>  <dbl>
1 Student1 Math      7.
2 Student2 Math      8.
3 Student3 Math      9.
4 Student4 Math      7.

考虑要匹配两个数据框:

df %>% #df is dat in your example 
  right_join(df2, by = c("Cases", "Class")) %>% #df2 is d on your example
  group_by(Cases, Class) %>% 
  summarise(Scores = max(Scores))

# A tibble: 4 x 3
# Groups:   Cases [?]
  Cases    Class Scores
  <chr>    <chr>  <dbl>
1 Student1 Math      7.
2 Student2 Math      8.
3 Student3 Math      9.
4 Student4 Math      7.

答案 6 :(得分:1)

使用 dplyr ,按学生分组,并根据得分获得第一价值:

library(dplyr)

dat %>% 
  filter(Cases %in% d$Cases) %>% 
  group_by(Cases) %>% 
  top_n(1, Scores) %>%
  ungroup()

# # A tibble: 4 x 3
#   Cases    Class Scores
#   <chr>    <chr>  <dbl>
# 1 Student1 Math       7
# 2 Student2 Math       8
# 3 Student3 Math       9
# 4 Student4 Math       7
相关问题