根据分组数据帧中组的前两个值获取数据帧

时间:2021-06-11 10:42:32

标签: python pandas dataframe data-science

我的数据框 df 是:

data = {'Election Year':['2000', '2000','2000','2000','2000','2000','2000','2000','2000','2005','2005','2005','2005','2005','2005','2005','2005','2005', '2010', '2010','2010','2010','2010','2010','2010','2010', '2010'],
    'Votes':[30, 50, 20, 26, 30, 45, 20, 46, 80, 60, 46, 95, 60, 10, 95, 16, 65, 35, 50, 100, 70, 26, 180, 100, 120, 46, 80], 
    'Party': ['A', 'B', 'C', 'A', 'B', 'C','A', 'B', 'C','A', 'B', 'C','A', 'B', 'C','A', 'B', 'C', 'A', 'B', 'C','A', 'B', 'C','A', 'B', 'C'],
    'Region': ['a', 'a', 'a', 'b', 'b', 'b','c', 'c', 'c','a', 'a', 'a', 'b', 'b', 'b','c', 'c', 'c','a', 'a', 'a', 'b', 'b', 'b','c', 'c', 'c']}
df = pd.DataFrame(data)
df

    
    Election Year   Votes   Party   Region
  0   2000           30      A       a
  1   2000           50      B       a
  2   2000           20      C       a
  3   2000           26      A       b
  4   2000           30      B       b
  5   2000           45      C       b 
  6   2000           20      A       c
  7   2000           46      B       c
  8   2000           80      C       c
  9   2005           60      A       a
  10  2005           46      B       a
  11  2005           95      C       a
  12  2005           60      A       b
  13  2005           10      B       b
  14  2005           95      C       b
  15  2005           16      A       c
  16  2005           65      B       c
  17  2005           35      C       c
  18  2010           50      A       a
  19  2010           100     B       a
  20  2010           70      C       a
  21  2010           26      A       b
  22  2010           180     B       b
  23  2010           100     C       b 
  24  2010           120     A       c
  25  2010           46      B       c
  26  2010           80      C       c

我想要获得显示在 2010 年选举中排名前 2 的每个政党在考虑每个地区的所有过去选举中获得的最低选票的子数据框。 所以期望的输出是:

 Election Year   Party   Votes   Region
     2005         B       10        b
     2000         C       20        a

首先,我试图根据 2010 年的总票数获得前两个政党。但它给出了每年前两个政党。

df1 = df.groupby(['Election Year','Party'])['Votes'].sum().reset_index()
df1 = df1.sort_values(['Election Year','Votes'], ascending=False)
top_2 = df1.groupby('Election Year').head(8).reset_index()
top_2 = top_2[['Election Year', 'Party']].to_string(index=False)
top_2

如何解决这个问题以获得 2010 年的前 2 个政党,然后检查所有年份的最低票数。

2 个答案:

答案 0 :(得分:1)

获得 2010 年选举中排名前 2 的政党:

m=df['Election Year'].eq('2010')
#create a msk to check condition
party=df[m].groupby(['Election Year','Party'],as_index=False)['Votes'].sum().sort_values('Votes',ascending=False).head(2)['Party'].values
#passed that mask and then grouping and sort values in descending order and get top 2 parties name

最终得到这两个政党的最低票数:

out=df[df['Party'].isin(party)].sort_values('Votes').drop_duplicates(subset=['Party'])
#checking minimum votes only for those parties

现在,如果您打印 out,您将获得预期的输出

答案 1 :(得分:0)

首先,我们尝试提取 2010 年表现最好的两个政党。

top_2_in_2010 = df[df['Election Year'] == 2010].groupby(['Election Year', 'Party'], \
       as_index = False).sum().sort_values('Votes', ascending = False)['Party'][:2].values

创建前几年的数据框:

df_2 = df[df['Election Year'] < 2010][df['Party'].isin(top_2_in_2010)]

最后,

result = df_2.sort_values('Votes', ascending = True).head(2)

打印结果将为您提供所需的输出。