对具有字符串和数字的DataFrame索引进行排序

时间:2014-05-06 11:24:48

标签: python pandas

我的df DataFrame索引如下所示:

Com_Lag_01
Com_Lag_02
Com_Lag_03
Com_Lag_04
Com_Lag_05
Com_Lag_06
Com_Lag_07
Com_Lag_08
Com_Lag_09
Com_Lag_10
Com_Lag_101
Com_Lag_102
Com_Lag_103
...
Com_Lag_11
Com_Lag_111
Com_Lag_112
Com_Lag_113
Com_Lag_114
...
Com_Lag_12
Com_Lag_120
...
Com_Lag_13
Com_Lag_14
Com_Lag_15

我想对此索引进行排序,以便数字从Com_Lag_1变为Com_Lag_120。如果我使用df.sort_index(),我会得到与上面相同的内容。有关如何正确排序此索引的任何建议?

3 个答案:

答案 0 :(得分:6)

可以通过对索引的编号版本执行排序来尝试这样的事情

import pandas as pd
# Create a DataFrame example
df = pd.DataFrame(\
    {'Year': [1991 ,2004 ,2001 ,2009 ,1997],\
    'Age': [27 ,25 ,22 ,34 ,31],\
    },\
    index = ['Com_Lag_1' ,'Com_Lag_12' ,'Com_Lag_3' ,'Com_Lag_24' ,'Com_Lag_5'])

# Add of a column containing a numbered version of the index
df['indexNumber'] = [int(i.split('_')[-1]) for i in df.index]
# Perform sort of the rows
df.sort(['indexNumber'], ascending = [True], inplace = True)
# Deletion of the added column
df.drop('indexNumber', 1, inplace = True)


编辑2017年 - V1

避免SettingWithCopyWarning:

df = df.assign(indexNumber=[int(i.split('_')[-1]) for i in df.index])

编辑2017年 - V2 for Pandas版本0.21.0

import pandas as pd
print(pd.__version__)
# Create a DataFrame example
df = pd.DataFrame(\
    {'Year': [1991 ,2004 ,2001 ,2009 ,1997],\
    'Age': [27 ,25 ,22 ,34 ,31],\
    },\
    index = ['Com_Lag_1' ,'Com_Lag_12' ,'Com_Lag_3' ,'Com_Lag_24' ,'Com_Lag_5'])

df.reindex(index=df.index.to_series().str.rsplit('_').str[-1].astype(int).sort_values().index)

答案 1 :(得分:3)

排序SELECT roll, exam_id, a.course_id, marks, status, course_title, course_credit FROM (SELECT roll, exam_id, course_id, AVE(marks) as marks, status FROM result) as a LEFT JOIN (SELECT course_id, course_title, course_credit FROM course) as b ON a.course_id = b.course_id WHERE exam_id = '1' -- you can remove this if you wanted all exam appear on your list GROUP BY roll, exam_id, a.course_id, marks, status, course_title, course_credit ORDER BY roll, course_id, marks 的{​​{1}} DataFrame.reindex没有新列的解决方案:

index

但如果需要重复值,请添加新列:

Series

答案 2 :(得分:1)

另一个解决方案是

    df.sort_index(key=lambda x: (x.to_series().str[8:].astype(int)), inplace=True)

8 来自数值开始的位置