Question

我正在尝试在现有数据框中创建一个新变量（列）。

Participant   Session   Trial_number    Accuracy    Block
 G01S01          1             3             1          1
 G01S02          1             4             1          2
 G02S01          1             5             1          5
 G01S01          1             6             1          8
 G01S01          1             7             1          10

基本上，我想基于“块”列创建一个新变量“ Epoch”。 1-4之间的块值属于时期1，时期2，其他四个，依此类推。看起来像这样：

Participant   Session   Trial_number    Accuracy    Block    Epoch
 G01S01          1             3             1          1          1
 G01S02          1             4             1          2          1
 G02S01          1             5             1          5          2
 G01S01          1             6             1          8          2
 G01S01          1             7             1          10         3

此外，我还想基于参与者ID创建另一个变量，如果该变量以1结尾，则该参与者属于组1；如果它以2结尾，则该参与者属于组2。

我尝试做第一个问题，但是基本上没有用。

import pandas as pd

df = pd.read_csv('merge.csv')

Epoch = []

x = 0

while x < 179424:
    if df['Block'][x] < 5:
        Epoch == 1
    elif 4 < df['Block'][x] < 9:
        Epoch == 2
    elif 8 < df['Block'][x] < 13:
        Epoch == 3
    elif 12 < df['Block'][x] < 17:
        Epoch == 4
    else:
        Epoch == 5
    x += 1

（179424是电子表格中的行数）

Answer 1

为此，您可以使用pandas.cut来制作垃圾箱并根据这些垃圾箱分配标签：

df['Epoch'] = pd.cut(df['Block'], 
                     [1,4,8,12], 
                     labels=[1,2,3],
                     include_lowest=True)

print(df)
  Participant  Session  Trial_number  Accuracy  Block Epoch
0      G01S01        1             3         1      1     1
1      G01S02        1             4         1      2     1
2      G02S01        1             5         1      5     2
3      G01S01        1             6         1      8     2
4      G01S01        1             7         1     10     3

Answer 2

我认为，您想使用数据框的apply方法。该方法将一个函数作为参数，并将该函数应用于数据帧的每一行（或每一列，具体取决于axis的值）。从您的代码示例中，我怀疑这将是一个有意义的功能：

def derive_epoch(row):
    if row['Block'] < 5:
        return 1
    elif row['Block'] < 9:
        return 2
    elif row['Block'] < 13:
        return 3
    elif row['Block'] < 17:
        return 4
    else:
        return 5

然后，我就这样应用它：

df['Epoch'] = df.apply(derive_epoch, axis=1)

希望对您有帮助！]

Answer 3

您可以使用//提取时期号，然后将apply提取到“阻止”列中：

df['Epoch'] = df.apply(lambda x : x['Block']//4 +1)

Answer 4

另一个非常简单的解决方案：

#Import pandas 
import pandas as pd

# Read csv file
df = pd.read_csv('merge.csv', sep=';')

# Add epoch column
df['Epoch'] = df['Block'] // 4 + 1
# Add group column
df['Group'] = df['Participant'].str[-1]

print(df)

使用现有变量将新列添加到数据框

4 个答案: