从pandas数据框创建列表以获取列中的不同值

时间:2018-06-20 13:59:11

标签: python pandas dataframe

来自以下Pandas数据框。

    class MarcaViewSet(viewsets.ModelViewSet):
        queryset = Marca.objects.all()
        serializer_class = MarcaSerializer

        def list(self, request, *args, **kwargs):
            queryset = self.queryset
            serializer = self.get_serializer(queryset, many=True)
            return Response(serializer.data)

我试图创建源自df = pd.DataFrame({'Id': [102,102,102,303,303,944,944,944,944],'A':[1.2,1.2,1.2,0.8,0.8,2.0,2.0,2.0,2.0],'B':[1.8,1.8,1.8,1.0,1.0,2.2,2.2,2.2,2.2], 'A_scored_time':[10,25,0,33,0,40,0,90,0],'B_scored_time':[0,0,30,0,41,0,75,0,95]}) 组合的列表,以获得以下与唯一['A_scored_time','B_scored_time']相对应的列表:

Id

此列表将在下面的功能中应用。

Id(102) = A_Time = [10,25],      B_Time = [30]
Id(303) = A_Time = [33],         B_Time = [41]
Id(944) = A_Time = [40,90],      B_Time = [75,95]

对于范围内的i(区别ID),df在此处具有3个不同的ID。对于每个i,概率阵列y。

x1 = [1,0,0] 
x2 = [0,1,0] 
x3 = [0,0,1]

k = 100 # constant
total_timeslot = 100 # same as k
A_Time = []  
B_Time = [] 

输出将是len k的数组。一旦获得此值,我将对所有n(n个不同的Id)数组求和。我在追求什么。
y = np.array([1-(A + B)/k, A/k, B/k]) def sum_squared_diff(x1, x2, x3, y): ssd = [] for k in range(total_timeslot): if k in A_Time: ssd.append(sum((x2 - y) ** 2)) elif k in B_Time: ssd.append(sum((x3 - y) ** 2)) else: ssd.append(sum((x1 - y) ** 2)) return ssd 的结果是:

df

提供Id(102) = sum(sum_squared_diff(x1, x2, x3, y)) =5.872800000000018 Id(303) = sum(sum_squared_diff(x1, x2, x3, y)) = 3.9407999999999896 Id(944) = sum(sum_squared_diff(x1, x2, x3, y)) =7.760800000000006

1 个答案:

答案 0 :(得分:3)

要回答标题中的问题,请使用:

df.groupby('Id')[['A_scored_time','B_scored_time']]\
  .agg(lambda x: x[x != 0].tolist())\
  .reset_index()

输出:

    Id A_scored_time B_scored_time
0  102      [10, 25]          [30]
1  303          [33]          [41]
2  944      [40, 90]      [75, 95]