为pandas multiindex数据帧

时间:2018-05-14 19:16:45

标签: python pandas dataframe assign

我想将特定行中的值和大熊猫数据帧df的多索引列重新分配给非NaN值,这些值已经计算并存储在数据帧的稍微较小的掩码子集df_sub中。 / p>

df =
    A                                                           B        
      0     1     2     3     4     5     6     7     8     9      0     1     2     3     4     5     6     7     8     9        
0   1.0   2.0   3.0   4.0   5.0   6.0   7.0   8.0   9.0  10.0  -51.0 -50.0 -49.0 -48.0 -47.0 -46.0 -45.0 -44.0 -43.0 -42.0   
1  11.0  12.0  13.0  14.0  15.0  16.0  17.0  18.0  19.0  20.0  -41.0 -40.0 -39.0 -38.0 -37.0 -36.0 -35.0 -34.0 -33.0 -32.0   
2  21.0  22.0  23.0  24.0  25.0  26.0  27.0  28.0  29.0  30.0  -31.0 -30.0 -29.0 -28.0 -27.0 -26.0 -25.0 -24.0 -23.0 -22.0   
3  31.0  32.0  33.0  34.0  35.0  36.0  37.0  38.0  39.0  40.0  -21.0 -20.0 -29.0 -28.0 -27.0 -26.0 -25.0 -24.0 -23.0 -22.0   
4  41.0  42.0  43.0  44.0  45.0  46.0  47.0  48.0  49.0  50.0  -11.0 -10.0  -9.0  -8.0  -7.0  -6.0  -5.0  -4.0  -3.0  -2.0  

df_sub =
      0     1     2     3     4     5     6     7     8     9 
1    NaN   NaN   NaN   NaN   NaN   0.3   0.2   0.1   NaN   NaN
3    NaN   NaN   NaN   0.6   0.9   0.7   NaN   NaN   NaN   NaN

我的目标是获得df.loc [:,'B']的结果,如下所示,其中df_sub中的非NaN值替换df (i.e., df.loc[1, pd.IndexSlice['B', 5:7]] = df_sub.loc[1, 5:7] and df.loc[3, pd.IndexSlice['B', 3:5]] = df_sub.loc[3, 3:5])的相应行和列:

df.loc[:,'B'] =
      0     1     2     3     4     5     6     7     8     9
0 -51.0 -50.0 -49.0 -48.0 -47.0 -46.0 -45.0 -44.0 -43.0 -42.0
1 -41.0 -40.0 -39.0 -38.0 -37.0   0.3   0.2   0.1 -33.0 -32.0
2 -31.0 -30.0 -29.0 -28.0 -27.0 -26.0 -25.0 -24.0 -23.0 -22.0
3 -21.0 -20.0 -19.0   0.6   0.9   0.7 -15.0 -14.0 -13.0 -12.0
4 -11.0 -10.0  -9.0  -8.0  -7.0  -6.0  -5.0  -4.0  -3.0  -2.0

然而,获得所需的价值,我得到的是NaNs:

df.loc[:,'B'] =
      0     1     2     3     4     5     6     7     8     9
0 -51.0 -50.0 -49.0 -48.0 -47.0 -46.0 -45.0 -44.0 -43.0 -42.0
1 -41.0 -40.0 -39.0 -38.0 -37.0   NaN   NaN   NaN -33.0 -32.0
2 -31.0 -30.0 -29.0 -28.0 -27.0 -26.0 -25.0 -24.0 -23.0 -22.0
3 -21.0 -20.0 -19.0   NaN   NaN   NaN -15.0 -14.0 -13.0 -12.0
4 -11.0 -10.0  -9.0  -8.0  -7.0  -6.0  -5.0  -4.0  -3.0  -2.0

我的简单示例代码包含在下面。从诊断开始,看起来一切都按预期运行:1)为df_sub的每一行识别非nan值及其来自df_sub的索引,2)原始df的切片看起来是正确的,并且3)转让是在没有投诉或“设置副本”警告的情况下进行的。

  1. 实现目标的适当方法是什么?
  2. 为什么会失败?
  3. 是否有更紧凑,更有效的方式来执行作业?
  4. 简化示例:

    # Create data for example case
    idf = pd.MultiIndex.from_product([['A', 'B'], np.arange(0,10)])
    df = pd.DataFrame(np.concatenate((np.arange(1.,51.).reshape(5,10), 
                      np.arange(-51., -1.).reshape(5,10)), axis=1), 
                      index=np.arange(0,5), columns=idf)
    df_sub = pd.DataFrame([[np.nan, np.nan, np.nan, np.nan, np.nan, 0.5, 0.6, 0.7, np.nan, np.nan], 
                          [np.nan, np.nan, np.nan, 0.3, 0.4, 0.5, np.nan, np.nan, np.nan, np.nan]],
                          index=[1,3], columns=np.arange(0,10))
    dfsub_idx = df_sub.index
    
    # Perform assignments
    for (idx, row) in df_sub.iterrows() :
       arr = row.index[~row.isnull()] 
       print 'row {}: \n{}'.format(idx, row)
       print 'non-nan indices: {}\n'.format(arr)
       print 'df before mod: \n{}'.format(df.loc[idx, pd.IndexSlice['B', arr.tolist()]])
       df.loc[idx, pd.IndexSlice['B', arr.tolist()]] = row[arr] 
       print 'df after mod: \n{}'.format(df.loc[idx, pd.IndexSlice['B', arr.tolist()]])
    

2 个答案:

答案 0 :(得分:2)

您应该在from tkinter import * import sys import speech_recognition as sr def NextWorkout1(): workout1.destroy() Workout2() def Workout1(): global workout1 workout1 = Tk() workout1.geometry("300x44") workout1.configure(background="lightblue") workout1.resizable(0,0) workout1.title("Pressups") insLabel = Label(workout1, text="5 pressups", fg="red", bg="lightblue", font="Arial 25 bold") insLabel.pack() workout1.mainloop() def Workout2(): global workout2 workout2 = Tk() workout2.geometry("300x50") workout2.configure(background="lightblue") workout2.resizable(0,0) workout2.title("Starjumps") insLabel = Label(workout2, text="15 starjumps", fg="red", bg="lightblue", font="Arial 25 bold") insLabel.pack() workout2.mainloop() def SpeechRecognition1(): r = sr.Recognizer() with sr.Microphone() as source: audio = r.listen(source) for i in range(1): command = (r.recognize_google(audio)) if command == "next": NextWorkout1() else: print("hi") def SpeechRecognition2(): global WorkoutNumber WorkoutNumber = 0 r = sr.Recognizer() with sr.Microphone() as source: audio = r.listen(source) for i in range(1): command = (r.recognize_google(audio)) if command == "next": NextWorkout2() else: print("hi") Workout1()

之后的values末尾添加df_sub
.iloc

答案 1 :(得分:2)

pandas.DataFrame.alignpandas.DataFrame.fillna

内联

使用level参数

pd.DataFrame.fillna(*df_sub.align(df, level=1))

      A                                                           B                                                      
      0     1     2     3     4     5     6     7     8     9     0     1     2     3     4     5     6     7     8     9
0   1.0   2.0   3.0   4.0   5.0   6.0   7.0   8.0   9.0  10.0 -51.0 -50.0 -49.0 -48.0 -47.0 -46.0 -45.0 -44.0 -43.0 -42.0
1  11.0  12.0  13.0  14.0  15.0   0.5   0.6   0.7  19.0  20.0 -41.0 -40.0 -39.0 -38.0 -37.0   0.5   0.6   0.7 -33.0 -32.0
2  21.0  22.0  23.0  24.0  25.0  26.0  27.0  28.0  29.0  30.0 -31.0 -30.0 -29.0 -28.0 -27.0 -26.0 -25.0 -24.0 -23.0 -22.0
3  31.0  32.0  33.0   0.3   0.4   0.5  37.0  38.0  39.0  40.0 -21.0 -20.0 -19.0   0.3   0.4   0.5 -15.0 -14.0 -13.0 -12.0
4  41.0  42.0  43.0  44.0  45.0  46.0  47.0  48.0  49.0  50.0 -11.0 -10.0  -9.0  -8.0  -7.0  -6.0  -5.0  -4.0  -3.0  -2.0

update

df.update(df_sub.align(df, level=1)[0])

澄清

此:

pd.DataFrame.fillna(*df_sub.align(df, level=1))

相当于

a, b = df_sub.align(df, level=1)
a.fillna(b)
# Or pd.DataFrame.fillna(a, b)