Question

下面是一个带有一些虚拟标题的csv代码段，而实际帧由beerId锚定：

This work is an unpublished, copyrighted work and contains confidential information.  
beer consumption    
consumptiondate 7/24/2018
consumptionlab  H1
numbeerssuccessful  40
numbeersfailed  0
totalnumbeers   40
consumptioncomplete TRUE

beerId  Book
341027  Northern Light

此df = pd.read_csv(path_csv, header=8)代码有效，但问题在于，根据一天的时间，标头并不总是以8为单位。无法像

一样从help中找出如何使用lambda

skiprows ：类似于列表或整数或可调用，默认为无

要跳过的行号（索引为0）或要跳过的行数（整数）   文件的开头。

如果可调用，则将针对该行评估可调用函数   索引，如果应跳过该行，则返回True；否则返回False   除此以外。一个有效的可调用参数的示例是lambda x：   x在[0，2]中。

查找beerId的索引行

Answer 1

我认为首先需要预处理

path_csv = 'file.csv'
with open(path_csv) as f:
    lines = f.readlines()
    #get list of all possible lins starting by beerId
    num = [i for i, l in enumerate(lines) if l.startswith("beerId" )]
    #if not found value return 0 else get first value of list subtracted by 1
    num = 0 if len(num) == 0 else num[0] - 1
    print (num)
    8


df = pd.read_csv(path_csv, header=num)
print (df)
             beerId  Book
0  341027  Northern Light

熊猫read_csv跳过行-确定要跳过的行

1 个答案: