Python-十大功能

时间:2017-03-12 01:30:26

标签: python pandas

我正在尝试创建一个用户放入年份的功能,并且使用此Lynda class作为模型,输出是支出的前十个国家/地区。

这是数据框

df.dtypes
Country Name     object
Country Code     object
Year              int32
CountryYear      object
Population        int32
GDP             float64
MilExpend       float64
Percent         float64
dtype: object



   Country Name Country Code    Year    CountryYear Pop         GDP   Expend    Percent
0   Aruba       ABW             1960    ABW-1960    54208       0.0   0.0       0.0

我尝试过这段代码并遇到错误:

代码:

def topten(Year):
    simple = df_details_merged.loc[Year].sort('MilExpend',ascending=False).reset_index()
    simple = simple.drop(['Country Code', 'CountryYear'],axis=1).head(10)
    simple.index = simple.index + 1

    return simple
    topten(1990)

这是我收到的相当大的错误: 我可以得到一些帮助吗?我甚至无法弄清楚错误是什么。 : - (

C:\Users\mycomputer\Anaconda3\lib\site-packages\ipykernel\__main__.py:2: FutureWarning: sort is deprecated, use sort_values(inplace=True) for INPLACE sorting
  from ipykernel import kernelapp as app
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
C:\Users\mycomputer\Anaconda3\lib\site-packages\pandas\core\series.py in _try_kind_sort(arr)
   1738                 # if kind==mergesort, it can fail for object dtype
-> 1739                 return arr.argsort(kind=kind)
   1740             except TypeError:

TypeError: '<' not supported between instances of 'numpy.ndarray' and 'str'

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
<ipython-input-105-0c974c6a1b44> in <module>()
----> 1 topten(1990)

<ipython-input-104-b8c336014d5b> in topten(Year)
      1 def topten(Year):
----> 2     simple = df_details_merged.loc[Year].sort('MilExpend',ascending=False).reset_index()
      3     simple = simple.drop(['Country Code', 'CountryYear'],axis=1).head(10)
      4     simple.index = simple.index + 1
      5 

C:\Users\mycomputer\Anaconda3\lib\site-packages\pandas\core\series.py in sort(self, axis, ascending, kind, na_position, inplace)
   1831 
   1832         return self.sort_values(ascending=ascending, kind=kind,
-> 1833                                 na_position=na_position, inplace=inplace)
   1834 
   1835     def order(self, na_last=None, ascending=True, kind='quicksort',

C:\Users\mycomputer\Anaconda3\lib\site-packages\pandas\core\series.py in sort_values(self, axis, ascending, inplace, kind, na_position)
   1751         idx = _default_index(len(self))
   1752 
-> 1753         argsorted = _try_kind_sort(arr[good])
   1754 
   1755         if not ascending:

C:\Users\mycomputer\Anaconda3\lib\site-packages\pandas\core\series.py in _try_kind_sort(arr)
   1741                 # stable sort not available for object dtype
   1742                 # uses the argsort default quicksort
-> 1743                 return arr.argsort(kind='quicksort')
   1744 
   1745         arr = self._values

TypeError: '<' not supported between instances of 'numpy.ndarray' and 'str'

1 个答案:

答案 0 :(得分:1)

.loc的第一个参数是标签。

当您致电df_details_merged.loc[1960]时,pandas会找到标有1960的行,并将该行作为系列返回。所以你得到一个索引为Country Name, Country Code, ...的系列,其值是该行的值。然后,您的代码会尝试按MilExpend对其进行排序,这就是失败的原因。

您需要的不是loc,而是一个简单的条件:df[df.Year == Year]。这是“给我整个数据框,但只有'年'列包含我在”年“变量中指定的内容(在您的示例中为1960)。

sort目前仍有效,但已被弃用,因此请改用sort_values。将它们放在一起:

simple = df_details_merged[df_details_merged.Year == Year].sort_values(by='MilExpend', ascending=False).reset_index()

然后你可以继续删除列,然后像现在一样获取前10行。