我有一个关键字healthy_list
列表,我想在csv文件的列中查看。如果列表中至少有一个关键字出现,那么我将整行写入新的csv文件。
我使用re.search检查关键字,然后记录行号,然后使用csv.writer写入新的csv。但是包含关键字的许多行都没有显示在我的新csv文件中。有什么意见吗?
healthy_new=[]
with open("Data 2017.csv","rb") as f:
csvreader=csv.reader(f,delimiter=",")
next(csvreader)
for line, row in enumerate(csvreader):
for word in healthy_list:
try:
if (re.search(word,row[4].lower()) ):
healthy_new.append(line)
except ValueError:
continue
healthy_new=list(set(healthy_new))
....
f = open("Data 2017.csv", "r")
reader = csv.reader(f)
data = open("healthy_new_output.csv", "w")
w = csv.writer(data, delimiter=',')
for idx, row in enumerate(reader):
idx+=-1
if idx in healthy_new:
my_row = row
w.writerow(my_row)
编辑: 一些数据2017.csv Data 2017.csv
healthy_list:
[...'diet', 'low-fat', 'light', 'diet', 'salad', 'salads', 'baked', 'grilled', 'whole grain']
答案 0 :(得分:0)
您可以使用pandas将其过滤掉,然后根据需要使用name,age,description
Andy,15,Having a bad stomach
Bobby,21,Having a good stomach and a little flu
Connie,22,Not having anything particularly bad
Derry,12,Bad stomach & lightheaded
方法将其输出到csv。
以下是有关其工作原理的基本说明:
数据2017.csv
In []: df = pd.read_csv('Data 2017.csv')
In []: word_flags = ['bad', 'flu', 'lightheaded']
In []: df_filtered = df.loc[:, :][df.description.str.contains("|".join(word_flags), re.IGNORECASE)]
In []: df_filtered
Out[]:
name age description
0 Andy 15 Having a bad stomach
1 Bobby 21 Having a good stomach and a little flu
2 Connie 22 Not having anything particularly bad
3 Derry 12 Bad stomach & lightheaded
In []: word_flags = ['flu', 'foo', 'bar']
In []: df_filtered = df.loc[:, :][df.description.str.contains("|".join(word_flags), re.IGNORECASE)]
In []: df_filtered
Out[]:
name age description
1 Bobby 21 Having a good stomach and a little flu
df_filtered.to_csv("Filtered Data 2017.csv", index=False)
这是如何工作的基本说明:
name,age,description
Bobby,21,Having a good stomach and a little flu
现在你有了这个:
In []: word_flags = ['bad', 'flu', 'lightheaded']
In []: df2 = pd.DataFrame()
In []: for col in df.select_dtypes(object):
...: df2 = pd.concat([df2, df[df[col].str.contains("|".join(word_flags), flags=re.IGNORECASE)]])
...:
In []: df2
Out[]:
name age description
0 Andy 15 Having a bad stomach
1 Bobby 21 Having a good stomach and a little flu
2 Connie 22 Not having anything particularly bad
3 Derry 12 Bad stomach & lightheaded
In []: word_flags = ['flu', 'foo', 'bar']
In []: df2 = pd.DataFrame()
In []: for col in df.select_dtypes(object):
...: df2 = pd.concat([df2, df[df[col].str.contains("|".join(word_flags), flags=re.IGNORECASE)]])
...:
In []: df2
Out[]:
name age description
1 Bobby 21 Having a good stomach and a little flu
要专门解决您的问题,请参阅下面的代码段落:
word_flags
但是,只有将过滤器定义为仅过滤掉特定列时,此方法才有效。假设您以这种方式定义In []: word_flags = ['flu', 'foo', 'bar', 'bobby']
:
In []: df2 = pd.DataFrame()
In []: for col in df.select_dtypes(object):
...: df2 = pd.concat([df2, df[df[col].str.contains("|".join(word_flags), flags=re.IGNORECASE)]])
...:
In []: df2
Out[]:
name age description
1 Bobby 21 Having a good stomach and a little flu
1 Bobby 21 Having a good stomach and a little flu
这将产生重复记录,需要进一步清理。
<ul id="navbar-main" class="navbar-nav mr-auto">
<li class="nav-item active">
<a href="https://travian.dev/materials" class="nav-link nav-materials">
<span class="invisible">Materials</span>
</a>
</li>
</ul>