Question

我目前正在使用定价工具，并且目前正在改善代码质量，以使其运行更快。

我创建了一个函数，该函数将字符串中所有以'@'开头的单词替换为指定字典中的变量：

def ComputeString(self, String, DictName):
    #We replace the values preceded by @ in the string provided by the variable in the dict provided. We then evaluate the formula

    #If the variable doesn't exist, we raise an error
    #If there is no criteria sent, we put it as a True 
    if pd.isnull(String) or String == True:
        return True        
    else:
        try:                
            return eval(re.sub("@(\w+)",
                          (lambda m : str(DictName[m.group(1)]) if not(isinstance(DictName[m.group(1)], str))
                                                                else "'" + DictName[m.group(1)] + "'"),
                          String))
        except (KeyError, NameError) as e:
            print(f"Erreur ComputeString: The key'{e.args[0]}' doesn't exist for {self.DictSiteComponent}.")
            raise

此功能效果很好。

我有一个像这样的数据框：

我想做的是将ComputeString公式应用于df中的整个Criteria列，然后进行布尔索引以仅保留Criteria == True的行。例如，我有一个变量@Tariff，它可以是T1或T2或T3或T4或TP。我想每次进行评估，以确定我在哪种情况下。

我曾经做过一个for循环，并为每一行发送条件，但是我认为这样做效率不高，因为我正在处理数万行和计算，有时我的df比我提供的要大，大多数列都不被考虑。

我尝试过玩df['Criteria'].apply(self.ComputeString(df['Criteria'], MyDict), axis = 1)，但是它不起作用，我遇到了这样的错误：ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().，如果处于这种情况，它会首先阻塞

你能指导我吗？另外，如果您有更多关于此的信息，我在这里Pandas: How can I use the apply() function for a single column?处不使用.apply吗？

谢谢

Python Pandas Dataframe-具有整列功能的布尔索引

0 个答案: