设置索引排序特定列pandas

时间:2016-12-08 18:26:37

标签: python pandas

尝试以特定格式准备此数据

import pandas as pd

voting = pd.read_json("GE2000.json")
voting.set_index(['county_fips','candidate_name','pty','vote_pct'],inplace=True)

print(voting)

然后返回

                                            vote
county_fips candidate_name  pty vote_pct
2000        Howard Phillips CS  0            596
            John Hagelin    NL  0            919
            Harry Browne    LB  1           2636
            George W. Bush  R   59        167398
            Al Gore         D   28         79004
1001        Howard Phillips I   0              9
            John Hagelin    I   0              5
            Harry Browne    LB  0             51
            George W. Bush  R   70         11993
            Al Gore         D   29          4942

在此之后,我想对vote_pct进行排序并抓住最大的,就像这样(我已经尝试过sort_values,sort_index等,并且不能让它产生所需的输出)

                                            vote
county_fips candidate_name  pty vote_pct
2000        George W. Bush  R   59        167398
1001        George W. Bush  R   70         11993

这是示例数据

[

  {
    "office" : "PRESIDENT",
    "county_name" : "Alaska",
    "vote_pct" : "0",
    "county_fips" : "2000",
    "pty" : "CS",
    "candidate_name" : "Howard Phillips",
  },
  {
    "office" : "PRESIDENT",
    "county_name" : "Alaska",
    "vote_pct" : "0",
    "county_fips" : "2000",
    "pty" : "NL",
    "candidate_name" : "John Hagelin",
  }
]

该数据继续

2 个答案:

答案 0 :(得分:2)

在执行groupby之前,您可以使用applyset_index获取最大值,然后再设置索引。这允许您在列上而不是在索引上使用groupby(这很奇怪):

voting = pd.read_json("GE2000.json")

get_largest_vote_pct = lambda row: row[row.vote_pct == row.vote_pct.max()]

largest = voting.groupby('county_fips').apply(get_largest_vote_pct)

largest.set_index(['county_fips','candidate_name','pty','vote_pct'],inplace=True) 

print(largest)

                                           vote
county_fips candidate_name pty vote_pct        
1001        George W. Bush R   70         11993
2000        George W. Bush R   59        167398

答案 1 :(得分:0)

您可以使用groupby例如voting.groupby('county_fips')['candidate_name'].max()

这里还有更详细的答案: Python : Getting the Row which has the max value in groups using groupby