我的数据框 df
是:
data = {'Election Year':['2000', '2000','2000','2000','2000','2000','2000','2000','2000','2005','2005','2005','2005','2005','2005','2005','2005','2005', '2010', '2010','2010','2010','2010','2010','2010','2010', '2010'],
'Votes':[30, 50, 20, 26, 30, 45, 20, 46, 80, 60, 46, 95, 60, 10, 95, 16, 65, 35, 50, 100, 70, 26, 180, 100, 120, 46, 80],
'Party': ['A', 'B', 'C', 'A', 'B', 'C','A', 'B', 'C','A', 'B', 'C','A', 'B', 'C','A', 'B', 'C', 'A', 'B', 'C','A', 'B', 'C','A', 'B', 'C'],
'Region': ['a', 'a', 'a', 'b', 'b', 'b','c', 'c', 'c','a', 'a', 'a', 'b', 'b', 'b','c', 'c', 'c','a', 'a', 'a', 'b', 'b', 'b','c', 'c', 'c']}
df = pd.DataFrame(data)
df
Election Year Votes Party Region
0 2000 30 A a
1 2000 50 B a
2 2000 20 C a
3 2000 26 A b
4 2000 30 B b
5 2000 45 C b
6 2000 20 A c
7 2000 46 B c
8 2000 80 C c
9 2005 60 A a
10 2005 46 B a
11 2005 95 C a
12 2005 60 A b
13 2005 10 B b
14 2005 95 C b
15 2005 16 A c
16 2005 65 B c
17 2005 35 C c
18 2010 50 A a
19 2010 100 B a
20 2010 70 C a
21 2010 26 A b
22 2010 180 B b
23 2010 100 C b
24 2010 120 A c
25 2010 46 B c
26 2010 80 C c
我想要获得显示在 2010 年选举中排名前 2 的每个政党在考虑每个地区的所有过去选举中获得的最低选票的子数据框。 所以期望的输出是:
Election Year Party Votes Region
2005 B 10 b
2000 C 20 a
首先,我试图根据 2010 年的总票数获得前两个政党。但它给出了每年前两个政党。
df1 = df.groupby(['Election Year','Party'])['Votes'].sum().reset_index()
df1 = df1.sort_values(['Election Year','Votes'], ascending=False)
top_2 = df1.groupby('Election Year').head(8).reset_index()
top_2 = top_2[['Election Year', 'Party']].to_string(index=False)
top_2
如何解决这个问题以获得 2010 年的前 2 个政党,然后检查所有年份的最低票数。
答案 0 :(得分:1)
获得 2010 年选举中排名前 2 的政党:
m=df['Election Year'].eq('2010')
#create a msk to check condition
party=df[m].groupby(['Election Year','Party'],as_index=False)['Votes'].sum().sort_values('Votes',ascending=False).head(2)['Party'].values
#passed that mask and then grouping and sort values in descending order and get top 2 parties name
最终得到这两个政党的最低票数:
out=df[df['Party'].isin(party)].sort_values('Votes').drop_duplicates(subset=['Party'])
#checking minimum votes only for those parties
现在,如果您打印 out
,您将获得预期的输出
答案 1 :(得分:0)
首先,我们尝试提取 2010 年表现最好的两个政党。
top_2_in_2010 = df[df['Election Year'] == 2010].groupby(['Election Year', 'Party'], \
as_index = False).sum().sort_values('Votes', ascending = False)['Party'][:2].values
创建前几年的数据框:
df_2 = df[df['Election Year'] < 2010][df['Party'].isin(top_2_in_2010)]
最后,
result = df_2.sort_values('Votes', ascending = True).head(2)
打印结果将为您提供所需的输出。