在Python中从csv中提取关键字

时间:2018-02-13 12:16:50

标签: python pandas csv extract keyword

我有一个逗号分隔的csv文件,其中包含三列:

"Date", "URL", "Views" 

我试图在URL列中提取包含特定关键字的特定行,例如单词charger

import pandas as pd

keywords = {"charger"}

df = pd.read_csv("original_file.csv", sep=",")

listMatchURL = []

for i in range(len(df.index)):
    if any(x in df['URL'][i] for x in keywords):
        listMatchURL.append(df['URL'][i])

output = pd.DataFrame({'URL': listMatchURL})
output.to_csv("new_file.csv", index=False)

这会在新的csv文件中写入包含关键字的整个URL行。但是,如何只提取和写入关键字而不是整个URL呢?我不想提取整个http://www.example.com/search/iphone+charger.html,而只是提取charger

另外,如何在我正在编写的新csv文件中保留另外两个相应的列DateViews?目前,它仅提取URL列。

我希望获得一个包含以下列的新csv文件:

"Date", "Keyword", "Views"

1 个答案:

答案 0 :(得分:1)

作为替代方案,这可以在没有Pandas的情况下完成,如下所示:

import csv

keywords = {"charger"}

with open('original_file.csv', newline='') as f_input, open('new_file.csv', 'w', newline='') as f_output:
    csv_input = csv.reader(f_input)
    csv_output = csv.writer(f_output)

    header = next(csv_input)
    csv_output.writerow(['Date', 'Keyword', 'Views'])

    for date, url, views in csv_input:
        for keyword in keywords:
            if keyword in url:
                csv_output.writerow([date, keyword, views])
                break       # Remove if multiple keywords per url are allowed