为什么csvkit给我“List Index out of Range”错误?

时间:2014-01-27 00:38:55

标签: python csv

我正在使用zipcode dataset和csvkit,但无处可去。如果我csvcut -n zipcode.csv,我会看到一个清晰的列列表:

  1: zip
  2: city
  3: state
  4: latitude
  5: longitude
  6: timezone
  7: dst

但是我对csvgrep进行的任何搜索只会给我一个错误。这是一大块数据:

"99919","Thorne Bay","AK","55.677232","-132.55624","-9","1"
"99921","Craig","AK","55.456449","-133.02648","-9","1"
"99922","Hydaburg","AK","55.209339","-132.82545","-9","1"
"99923","Hyder","AK","55.941442","-130.0545","-9","1"
"99925","Klawock","AK","55.555164","-133.07316","-9","1"
"99926","Metlakatla","AK","55.123897","-131.56883","-9","1"
"99927","Point Baker","AK","56.337957","-133.60689","-9","1"
"99928","Ward Cove","AK","55.395359","-131.67537","-9","1"
"99929","Wrangell","AK","56.409507","-132.33822","-9","1"
"99950","Ketchikan","AK","55.875767","-131.46633","-9","1"

根据the docs,我预计csvgrep -c 2 -m "Hyder" zipcode.csv会出现匹配,但我会得到:

zip,city,state,latitude,longitude,timezone,dst
list index out of range

我可以在其他csv文件上使用csvgrep罚款 - 为什么会对这个文件感到窒息?

2 个答案:

答案 0 :(得分:1)

您的问题是“zipcodes.csv”格式错误;它包括空行。例如,第17行是空白的:

"00607","Aguas Buenas","PR","18.256995","-66.104657","-4","0"

"00609","Aibonito","PR","18.142002","-66.273278","-4","0"

该文档的作者可能已经这样做,表明邮政编码00608不存在,这在某些情况下可能会有所帮助,但是阻止您使用csvkit实用程序。

你可以使用sed,如果你使用的是基于* nix的操作系统,你已经安装了自动删除空行,如下所示:

$ sed '/^$/d' zipcode.csv > zipcode2.csv

这会将结果存储为“zipcode2.csv”。我们现在可以使用我们新的“固定”邮政编码文件:

$ csvgrep -c 2 -m "Hyder" zipcode2.csv 
zip,city,state,latitude,longitude,timezone,dst
99923,Hyder,AK,55.941442,-130.0545,-9,1

答案 1 :(得分:1)

为了防止大多数错误如上所述,我使用csvclean(也来自csvkit)来查找和纠正源csv中的损坏数据。另请查看this blog post以获取完整的操作方法