许多类的全局变量与许多等效的类属性?

时间:2013-11-19 12:18:14

标签: python pandas

首先,我意识到有很多关于效率的问题,所以我很抱歉,如果这是重复的,但我在这里是因为我找不到我想要的东西。我将用一个例子来问这个问题:

我有一些时间序列数据,我从excel导入到pandas数据帧中。它有id | name | date | total_monthly_rtn的列,我正在进行一些简单的操作以用于报告目的(月,3M,YTD,ITD等)。然后我按id / name拆分这个数据帧,并将其存储在Account对象中:

import pandas as ps
from pandas.tseries.offsets import YearEnd, DateOffset

class Account(object):
    def __init__(self, df):
        self.name = df['name'][0]
        self.id = df['id'][0]
        self.rtn = ps.TimeSeries(df['total_monthly_rtn'].values, index=ps.DatetimeIndex(df['date'], freq='M'))
        self.m = DateOffset(months=1)


# dataframe mentioned above with 4 columns (id;name;date;total_monthly_rtn)
data = parsexl(src_path) # parsexl not needed for this question

valdate = max(data['date'])
y = YearEnd()

# an example of one slice of 'data'
foo = Account(data[data['id'] == '123456'])

# where 'data[data['id'] == '123456']' looks like:
        id         name                date total_monthly_rtn
0   123456  Bank of Foo 2011-07-31 00:00:00             -2.75
1   123456  Bank of Foo 2011-08-31 00:00:00             -7.63
2   123456  Bank of Foo 2011-09-30 00:00:00             -4.03
3   123456  Bank of Foo 2011-10-31 00:00:00              5.68
4   123456  Bank of Foo 2011-11-30 00:00:00             -1.79
5   123456  Bank of Foo 2011-12-31 00:00:00              0.93
6   123456  Bank of Foo 2012-01-31 00:00:00            3.0773
7   123456  Bank of Foo 2012-02-29 00:00:00            5.4896
8   123456  Bank of Foo 2012-03-31 00:00:00            0.5089
9   123456  Bank of Foo 2012-04-30 00:00:00           -2.0739
10  123456  Bank of Foo 2012-05-31 00:00:00           -6.0472
11  123456  Bank of Foo 2012-06-30 00:00:00            4.7578
12  123456  Bank of Foo 2012-07-31 00:00:00            2.1529
13  123456  Bank of Foo 2012-08-31 00:00:00            1.0867
14  123456  Bank of Foo 2012-09-30 00:00:00            0.3791
15  123456  Bank of Foo 2012-10-31 00:00:00             1.143
16  123456  Bank of Foo 2012-11-30 00:00:00            3.3823
17  123456  Bank of Foo 2012-12-31 00:00:00            0.6535
18  123456  Bank of Foo 2013-01-31 00:00:00            7.3905
19  123456  Bank of Foo 2013-02-28 00:00:00            3.5779
20  123456  Bank of Foo 2013-03-31 00:00:00            2.3466
21  123456  Bank of Foo 2013-04-30 00:00:00            1.6874
22  123456  Bank of Foo 2013-05-31 00:00:00            0.6536
23  123456  Bank of Foo 2013-06-30 00:00:00           -2.7618
24  123456  Bank of Foo 2013-07-31 00:00:00             3.854
25  123456  Bank of Foo 2013-08-31 00:00:00           -3.6812
26  123456  Bank of Foo 2013-09-30 00:00:00            1.9478
27  123456  Bank of Foo 2013-10-31 00:00:00            3.9654

我最初为Account编写了这两个类函数:

    def ytd(self, ye, vd):
        return self.rtn.truncate(before=ye.rollback(vd), after=vd)[1:].sum()

    def year(self, vd):
        return self.rtn.truncate(before=vd-(11*m), after=vd).sum()

# called like:
foo.ytd(y, valdate) # returns 18.9802
foo.year(valdate) # returns 23.016

但后来我开始思考,将valdate和YearEnd存储为类属性会更好吗?从而将这两个功能改为:

def ytd(self):
    return self.rtn.truncate(before=self.ye.rollback(vd), after=self.vd)[1:].sum()

def year(self):
    return self.rtn.truncate(before=self.vd-(11*m), after=self.vd).sum()

在我的应用程序中,我在data处理约8,000行,代表100个帐户,所以也许不会有这样或那样的巨大影响,但总的来说呢?我的直觉告诉我,第一种方式更好,但如果有人知道他们的东西可以让我放心,我会很感激。谢谢。

==编辑==

我这里只包含了两个类函数,但如果它有所不同,实际上有10个类函数将valdate和YearEnd作为变量。

==编辑2 ==

如果我的例子让某些人感到困惑,我很抱歉。 如果你不知道:rtn = return; ytd =年初至今

1 个答案:

答案 0 :(得分:0)

如果问题只是关于perfs(或更准确的速度),函数本地查找比属性查找更快,所以你当前的解决方案没问题。