熊猫read_csv跳过行-确定要跳过的行

时间:2018-07-26 03:58:56

标签: python pandas

下面是一个带有一些虚拟标题的csv代码段,而实际帧由beerId锚定:

This work is an unpublished, copyrighted work and contains confidential information.  
beer consumption    
consumptiondate 7/24/2018
consumptionlab  H1
numbeerssuccessful  40
numbeersfailed  0
totalnumbeers   40
consumptioncomplete TRUE

beerId  Book
341027  Northern Light

df = pd.read_csv(path_csv, header=8)代码有效,但问题在于,根据一天的时间,标头并不总是以8为单位。无法像

一样从help中找出如何使用lambda
  

skiprows :类似于列表或整数或可调用,默认为无

     

要跳过的行号(索引为0)或要跳过的行数(整数)   文件的开头。

     

如果可调用,则将针对该行评估可调用函数   索引,如果应跳过该行,则返回True;否则返回False   除此以外。一个有效的可调用参数的示例是lambda x:   x在[0,2]中。

查找beerId的索引行

1 个答案:

答案 0 :(得分:2)

我认为首先需要预处理

path_csv = 'file.csv'
with open(path_csv) as f:
    lines = f.readlines()
    #get list of all possible lins starting by beerId
    num = [i for i, l in enumerate(lines) if l.startswith("beerId" )]
    #if not found value return 0 else get first value of list subtracted by 1
    num = 0 if len(num) == 0 else num[0] - 1
    print (num)
    8


df = pd.read_csv(path_csv, header=num)
print (df)
             beerId  Book
0  341027  Northern Light