从列表中删除包含python中特定子字符串的URL

时间:2018-06-03 12:18:59

标签: python string python-3.x substring list-comprehension

我想删除与给定列表中的某个关键字不匹配的网址。这意味着,我想删除所有包含' sale'或者' new"就我而言。

测试数据

url_list = ['https://www.test.com/men-fashion/', 'https://www.test.com/men-shirts', 'https://www.test.com/sale-fashion/', 'https://www.test.com/new-fashion/']

我的子字符串如下:

to_remove = ['sale','new']

我试图通过使用any()组合使用列表推导来尝试这样做,但这会过滤掉与我的" to_remove" -list匹配的所有网址。但我期待的是相反的结果。

url_list[:] = [url for url in url_list if any(substring in url for substring in to_remove)]
print(url_list)

1 个答案:

答案 0 :(得分:0)

使用正则表达式的一种方法:

import re
url_list = ['https://www.test.com/men-fashion/', 'https://www.test.com/men-shirts', 'https://www.test.com/sale-fashion/', 'https://www.test.com/new-fashion/']
to_remove = ['sale','new']

result = [i for i in url_list if not re.search("|".join(to_remove), i)]
print(result)

<强>输出:

['https://www.test.com/men-fashion/', 'https://www.test.com/men-shirts']
相关问题