Question

我有一个用于新项目的for循环，但是这种形式的代码太慢了。我正在尝试找到最快的方法来解决它。也许是矢量？

我尝试了def方法，但是执行不正确。

%%time
for x in df2.index:   
    if x > 0: 
        if (
            (df2.loc[x,'DEF RANK'] == df2.loc[x,'OFF RANK']) 
            & (df2.loc[x,'W']=='nan')
            & (pd.isnull(df2.loc[(x-1),'Event2']) == False)
            & ((df2.loc[(x-1),'Event2'] == 'nan') == False)
        ):
            df2.loc[x,'W'] = df2.loc[(x-1),'W']
        else: # if the above isn't true - pass
            pass
    else: 
        pass

挂墙时间：6.76毫秒

Answer 1

在Python中，按位&运算符不会短路。这意味着您的所有比较每次都在发生，而不管它们之前的语句所评估的是什么。尝试进行演示：

bool(print('a')) & bool(print('b')) & bool(print('c'))

输出：

a
b
c

将其与and逻辑运算符进行比较，该逻辑运算符会缩短比较链：

bool(print('a')) and bool(print('b')) and bool(print('c'))

输出：

尝试用&来简化and，以限制进行比较的次数。

完成此操作后，您可以尝试摆弄应该首先进行的比较。您将需要按哪个/两个来对它们进行排序，哪一个最有可能得到False的评价，以及哪个最有效率。

Answer 2

在处理pandas数据框时，您要学习的第一件事是查看数据的整体，并尝试将其整体处理。因此，让我们深入研究您的代码，看看我们如何改进它。

for x in df2.index:   
    # the next if means you essentially want to look at 
    # df.index > 0 only
    if x > 0: 
        # this if clause chains several 'and' conditions:
        if (
            # this is df2['DEF RANK'].eq(df2['OFF RANK'])
            (df2.loc[x,'DEF RANK'] == df2.loc[x,'OFF RANK']) 

            # this is df2['W'].eq('nan')
            & (df2.loc[x,'W']=='nan')

            # this is df2.loc[df.index - 1, 'Event2'].notnull()
            & (pd.isnull(df2.loc[(x-1),'Event2']) == False)

            # this is df2.loc[df.index - 1, 'Event2'].ne('nan')
            & ((df2.loc[(x-1),'Event2'] == 'nan') == False)
        ):
            # here you copy some position to other position
            df2.loc[x,'W'] = df2.loc[(x-1),'W']

        # if you don't do anything after else, why not delete it?
        else: # if the above isn't true - pass
            pass
    else: 
        pass

因此，使用所有注释，我们如何编写运行速度更快的代码。显然来自您的代码：

idx_gt0 = (df.index > 0)
rank_filters = df2['DEF RANK'].eq(df2['OFF RANK'])
w_isnan = df2['W'].eq('nan')

# the next two conditions are more challenging:
# we start with looking at the series  df.loc[df.index-1, 'Event2']
df2['event2_shifted'] = df2.loc[df2.index-1, 'Event2'].values

event2_notnull = df2['event2_shifted'].notnull()
event2_notnan = df2['event2_shifted'].ne('nan')

# now we can merge all filters:
filters = (idx_gt0 & rank_filters 
           & w_isnan & event2_notnull & event2_notnan
          ) 

 # last assign:
 df2.loc[filters, 'W'] = df2.loc[df2.index - 1, 'W']

当然，这实际上是从您的代码翻译而来的。但是正如您所说，您的代码无法正常运行。因此，如果您提供样本输入数据及其预期输出，将会有所帮助。

Answer 3

不幸的是，没有更好的更快方法。

这里是一个建议：将代码中的所有windowWillClose更改为NSWindowDelegate。

尽管可能没有太大帮助。

您需要使用其他编程语言，例如C，C ++或Java，因为编译后的代码可以比解释后的代码运行得更快。如果您真的很沮丧，可以尝试使用汇编语言，但是我不确定仅通过for循环是否值得。

迭代速度比for循环快

3 个答案: