Question

我有一个名为arr的numpy数组，其中包含1154个元素。

array([502, 502, 503, ..., 853, 853, 853], dtype=int64)

我有一个名为df

的数据框

    team    Count
0   512     11
1   513     21
2   515     18
3   516     8
4   517     4

如何获取仅包含数组df中的值的数据框arr的子集

例如：

team         count
arr1_value1    45
arr1_value2    67

使这个问题更清楚：我有一个numpy数组['45', '55', '65']

我的数据框如下：

team  count
34      156
45      189
53       90
65       99
23       77
55       91

我需要一个新的数据框如下：

team    count
 45      189
 55       91
 65       99

Answer 1

我不知道你的数组值看起来像是字符串是不是错字，假设它不是，而且它们实际上是int，那么你可以通过调用isin来过滤你的df：

In [6]:

a = np.array([45, 55, 65])
df[df.team.isin(a)]
Out[6]:
   team  count
1    45    189
3    65     99
5    55     91

Answer 2

您可以使用DataFrame.loc方法

使用您的示例（注意团队是索引）：

arr = np.array(['45', '55', '65'])
frame = pd.DataFrame([156, 189, 90, 99, 77, 91], index=['34', '45', '53', '65', '23', '55'])
ans = frame.loc[arr]

这种索引是类型敏感的，所以如果frame.index是int，那么请确保你的索引数组也是int类型，而不是像这个例子中的str。

Answer 3

我正在回答＃34之后提出的问题;为了使这个问题更加明确＆＃34; 作为旁注：前4行可能是由你提供的，所以我不必自己输入，这也可能引入错误/误解。

我们的想法是创建一个Series as Index，然后根据该索引创建一个新的数据帧。我刚开始使用大熊猫，也许这可以更有效地完成。

import numpy as np
import pandas as pd

# starting with the df and teams as string
df = pd.DataFrame(data={'team': [34, 45, 53, 65, 23, 55], 'count': [156, 189, 90, 99, 77, 91]})
teams = np.array(['45', '55', '65'])

# we want the team number as int
teams_int = [int(t) for t in teams]

# mini function to check, if the team is to be kept
def filter_teams(x):
    return True if x in teams_int else False

# create the series as index and only keep those values from our original df
index = df['team'].apply(filter_teams)
df_filtered = df[index]

它返回此数据帧：

count  team
1    189    45
3     99    65
5     91    55

请注意，在这种情况下，df_filtered使用1,3,5作为索引（索引是原始数据帧的索引）。你的问题不明确，因为索引没有显示给我们。

从pandas数据帧中获取数组的子集

3 个答案: