Question

我有一个这样的文件：

 445546552657 GET_QUEUE 18
 1445546552658 GET_QUEUE 1
 1445546552658 GET_QUEUE 18
 1445546552659 GET_QUEUE 1
 1445546552659 GET_QUEUE 19
 1445546552660 GET_QUEUE 19
 1445546552660 GET_QUEUE 19
 1445546552661 GET_QUEUE 19
 1445546552662 GET_QUEUE 19

我需要能够：

将其加载到python
为其添加标题名称时间戳，类型，响应时间，这样当我执行df [＆＃39; timestamp＆＃39;]时，我会获得列时间戳中的所有数据。

到目前为止我已经这样做了：

header_row = ['timestamp','type','response_time']
df = read_csv(output_path,names=header_row)

但是当我这样做时它不起作用：

print df['timestamp']

它为我提供了所有数据，而不仅仅是列！

另外，如何获得第一行第一列等特定单元格？

这是我的代码：

主要功能：

xlabel = "Time in minutes"
    ylabel = "Response time in ms"
    header_row = ['timestamp','type','response_time']
    df = read_csv(output_path,names=header_row, sep=' ')
    '''df = refine(df)
    min_timestamp = np.min(df[df.columns[0]])
    max_resp = np.max(df[df.columns[2]])
    df[df.columns[0]] = df[df.columns[0]] - min_timestamp
    # convert time to minutes
    df[df.columns[0]] = np.round(df[df.columns[0]] / 60000)
    plt.plot(df[df.columns[0]], df[df.columns[0]], 'x-', color='g', label='ALL', lw=0.5)
    plt.xlim(xmin=0.0,xmax=5.0)
    plt.ylim(ymin=0.0,ymax=max_resp)
    plt.xlabel(xlabel)
    plt.ylabel(ylabel)
    plt.grid()
    plt.show()
    '''
    print df.iloc['timestamp']






warm_up = 100
cool_down = 100

函数细化：

 def refine(df):
        start_time = np.min(df[df.columns[0]])
        print start_time.columns[0]
        end_time = np.max(df[df.columns[0]])
        print end_time.columns[0]
        new_start_time = start_time + (100 * 1000)
        new_end_time = end_time - (100 * 1000)
        df = df[df[df.columns[0]] > new_start_time]
        df = df[df[df.columns[0]] < new_end_time]
        return df

   if __name__ == "__main__":
        main()

我已经定义了output_path，这不是问题，我展示的文件正好是我的文件，我试过了：

在文件本身中编写标题，即dint work
我尝试过分配标题，但是当我执行df [＆＃39; timestamp＆＃39;]时，我会收到整个数据!!!

我不知道该怎么做。注意：我的文件是空格分隔的日志文件!!它们就像meow.log一样，但格式完全一样！

Answer 1

你需要传入空间，所以读取一个空格分隔（而不是以逗号分隔）：

df = read_csv(output_path, names=header_row, sep=" ")

要获取特定的单元格，请使用iloc（或loc）：

df.loc[0, 'timestamp']

无法使用标头加载/操作空间分隔的文件pandas？

1 个答案: