Question

考虑

df['something'].unique()

这会在变量中生成唯一项。让我们将其视为唯一项目的数组。

array(['aabb','aacc','aadd','bbcc']).

现在，我想检查数组中的任何项目如何从“ aa”开始。

Answer 1

使用Series.str.startswith和sum的{{1}}数的熊猫解决方案：

True

替代方法是Series.drop_duplicates-则不需要print (pd.Series(df['something'].unique()).str.startswith('aa').sum()) 3构造函数：

Series

或者是纯python解决方案-生成print (df['something'].drop_duplicates().str.startswith('aa').sum()) 3的{{1}}和sum生成器：

startswith

Answer 2

您可以使用re模块来查找任何模式（不仅仅是'aa'）

例如：如果您具有以下数组arr = ['aabb','aacc','aadd','bbcc']，则可以在此代码行中找到以'aa'开头的元素数：

len([word for word in arr if re.match(r'aa', word)])

这行将为您提供3的输出

len([word for word in arr if re.match(r'bb', word)])

将显示1

Answer 3

您可以使用功能startswith()。因此，代码为：

number_of_aa = len([x for x in df['something'].unique() if x.startswith('aa')])

使用这种方法，您将获得一个过滤列表，其值以aa开头，然后使用len得到计数。如果您不希望保留这些值，则可以只使用True / False，然后对这些值求和：

number_of_aa = [True for x in df['something'].unique() if x.startswith('aa')].sum()