Question

我正在尝试用Python创建一个Gale-Shapley算法，该算法可以稳定匹配医生和医院。为此，我给每位医生和每家医院随机选择了一个以数字表示的偏好。

由首选项组成的数据框

此后，我创建了一个函数，该函数为每位医院的一位特定医生（以ID表示）评分，然后对该评分进行排名，从而创建了两个新列。在对比赛进行评分时，我采用了偏好之间差异的绝对值，其中绝对值越小越好。这是第一位医生的公式：

  doctors_sorted_by_preference['Rating of Hospital by Doctor 1']=abs(doctors_sorted_by_preference['Preference Doctor'].iloc[0]-doctors_sorted_by_preference['Preference Hospital'])
    doctors_sorted_by_preference['Rank of Hospital by Doctor 1']=doctors_sorted_by_preference["Rating of Hospital by Doctor 1"].rank()

将导致下表：数据框由偏好和等级+医生排名组成

因此，由排名所代表的，医生1比其他所有医院都更喜欢第一家医院。

现在，我想通过创建循环（为每个医生创建两个新列并将它们添加到我的数据框）来为每个不同的医生重复此功能，但是我不知道该怎么做。我可以为所有10位不同的医生键入相同的功能，但是如果我将数据集增加到包括1000名医生和医院，这将变得不可能，必须有更好的方法... 这对于医生2来说是相同的功能：

doctors_sorted_by_preference['Rating of Hospital by Doctor 2']=abs(doctors_sorted_by_preference['Preference Doctor'].iloc[1]-doctors_sorted_by_preference['Preference Hospital'])
    doctors_sorted_by_preference['Rank of Hospital by Doctor 2']=doctors_sorted_by_preference["Rating of Hospital by Doctor 2"].rank()

提前谢谢！

Answer 1

您还可以将值附加到列表中，然后将其写入数据框。如果您的数据集很大，则追加到列表中会更快。

为了便于查看，我通过数据框将其命名为df：

for i in range(len(df['Preference Doctor'])):
    list1= []
    for j in df['Preference Hospital']:
         list1.append(abs(df['Preference Doctor'].iloc[i]-j))
    df['Rating of Hospital by Doctor_' +str(i+1)] = pd.DataFrame(list1)
    df['Rank of Hospital by Doctor_' +str(i+1)] = df['Rating of Hospital by Doctor_' 
                                                         +str(i+1)].rank()

我想创建循环遍历特定列的行的新数据框列

1 个答案: