获取来自不同pandas行的元素组合

时间:2017-04-18 10:03:10

标签: python pandas dataframe

假设我有一个这样的数据框:

Date Artist           percent_gray percent_blue percent_black percent_red 
33   Leonardo             22           33            36          46
45   Leonardo             23           47            23          14
46   Leonardo             13           34            33          12
23   Michelangelo         28           19            38          25
25   Michelangelo         24           56            55          13
26   Michelangelo         21           22            45          13
13   Titian               24           17            23          22
16   Titian               45           43            44          13 
19   Titian               17           45            56          13
24   Raphael              34           34            34          45
27   Raphael              31           22            25          67

我想为同一位艺术家获得不同图片的最大色差。我也可以将percent_graypercent_blue进行比较,例如对于Lenoardo来说,最大的区别是percent_red (date:46) - percent_blue(date:45) = 12 - 47 = -35。我想看看它是如何随着时间的推移而发展的,所以我只是想比较同一艺术家的新图片和旧图片(在这种情况下,我可以比较第三张图片与第一张和第二张图片,第二张图片只与第一张图片比较)并得到最大差异。所以数据框应该看起来像

Date Artist          max_d 
33   Leonardo         NaN   
45   Leonardo         -32   
46   Leonardo         -35    
23   Michelangelo     NaN   
25   Michelangelo      37 
26   Michelangelo     -43   
13   Titian           NaN 
16   Titian            28   
19   Titian            43
24   Raphael          NaN   
27   Raphael           33

我想我必须使用groupby,但无法获得我想要的输出。

2 个答案:

答案 0 :(得分:2)

您可以使用:

#first sort in real data
df = df.sort_values(['Artist', 'Date'])
mi = df.iloc[:,2:].min(axis=1)
ma = df.iloc[:,2:].max(axis=1)
ma1 = ma.groupby(df['Artist']).shift()
mi1 = mi.groupby(df['Artist']).shift()
mad1 = mi - ma1
mad2 = ma - mi1
df['max_d'] = np.where(mad1.abs() > mad2.abs(), mad1, mad2)
print (df)
    Date        Artist  percent_gray  percent_blue  percent_black  \
0     33      Leonardo            22            33             36   
1     45      Leonardo            23            47             23   
2     46      Leonardo            13            34             33   
3     23  Michelangelo            28            19             38   
4     25  Michelangelo            24            56             55   
5     26  Michelangelo            21            22             45   
6     13        Titian            24            17             23   
7     16        Titian            45            43             44   
8     19        Titian            17            45             56   
9     24       Raphael            34            34             34   
10    27       Raphael            31            22             25   

    percent_red  max_d  
0            46    NaN  
1            14  -32.0  
2            12  -35.0  
3            25    NaN  
4            13   37.0  
5            13  -43.0  
6            22    NaN  
7            13   28.0  
8            13   43.0  
9            45    NaN  
10           67   33.0  

说明(使用新列):

#get min and max per rows
df['min'] = df.iloc[:,2:].min(axis=1)
df['max'] = df.iloc[:,2:].max(axis=1)
#get shifted min and max by Artist
df['max1'] = df.groupby('Artist')['max'].shift()
df['min1'] = df.groupby('Artist')['min'].shift()
#get differences
df['max_d1'] = df['min'] - df['max1']
df['max_d2'] = df['max'] - df['min1']
#if else of absolute values
df['max_d'] = np.where(df['max_d1'].abs() > df['max_d2'].abs(), df['max_d1'], df['max_d2'])
print (df)
    percent_red  min  max  max1  min1  max_d1  max_d2  max_d  
0            46   22   46   NaN   NaN     NaN     NaN    NaN  
1            14   14   47  46.0  22.0   -32.0    25.0  -32.0  
2            12   12   34  47.0  14.0   -35.0    20.0  -35.0  
3            25   19   38   NaN   NaN     NaN     NaN    NaN  
4            13   13   56  38.0  19.0   -25.0    37.0   37.0  
5            13   13   45  56.0  13.0   -43.0    32.0  -43.0  
6            22   17   24   NaN   NaN     NaN     NaN    NaN  
7            13   13   45  24.0  17.0   -11.0    28.0   28.0  
8            13   13   56  45.0  13.0   -32.0    43.0   43.0  
9            45   34   45   NaN   NaN     NaN     NaN    NaN  
10           67   22   67  45.0  34.0   -23.0    33.0   33.0  

如果使用第二种解释方案,请删除列:

df = df.drop(['min','max','max1','min1','max_d1', 'max_d2'], axis=1)
print (df)
    Date        Artist  percent_gray  percent_blue  percent_black  \
0     33      Leonardo            22            33             36   
1     45      Leonardo            23            47             23   
2     46      Leonardo            13            34             33   
3     23  Michelangelo            28            19             38   
4     25  Michelangelo            24            56             55   
5     26  Michelangelo            21            22             45   
6     13        Titian            24            17             23   
7     16        Titian            45            43             44   
8     19        Titian            17            45             56   
9     24       Raphael            34            34             34   
10    27       Raphael            31            22             25   

    percent_red  max_d  
0            46    NaN  
1            14  -32.0  
2            12  -35.0  
3            25    NaN  
4            13   37.0  
5            13  -43.0  
6            22    NaN  
7            13   28.0  
8            13   43.0  
9            45    NaN  
10           67   33.0  

答案 1 :(得分:1)

自定义应用功能如何?这有用吗?

first_value = getattr(aircraft_to_compare[0], key) 

输出:

def aircraft_delta(request):
  ids = [id for id in request.GET.get('ids') if id != ',']
  aircraft_to_compare = Aircraft.objects.filter(id__in=ids)

  property_keys = ['name', 'manufacturer', 'aircraft_type', 'body', 'engines',
                   'image', 'cost','maximum_range','passengers','maximum_altitude','cruising_speed',
                   'fuel_capacity','description','wing_span','length']

  column_descriptions = {
    'image': '',
    'name': 'Aircraft',
    'maximum_range': 'Range (NM)',
    'passengers': 'Passengers',
    'cruising_speed': 'Max Speed (kts)',
    'fuel_capacity': 'Fuel Capacity',
    'aircraft_type': 'Type',
    'body':'Body',
    'engines':'Engines',
    'cost':'Cost',
    'maximum_altitude':'Maximum Altitude',
    'description':'Description',
    'manufacturer':'Manufacturer',
    'wing_span':'Wing Span (FT)',
    'length':'Total Length (FT)'
  }

  data = []

  for key in property_keys:
    row = [column_descriptions[key]]


    first_value = getattr(aircraft_to_compare[0], key)
    second_value = getattr(aircraft_to_compare[1], key)

    if key not in ['image', 'name']:
        delta = abs(first_value - second_value)
    else:
        delta = ''

    row.append(first_value)
    row.append(delta)
    row.append(second_value)

    data.append(row)

  return render(request, 'aircraft/aircraft_delta.html', {
    'data': data
  })