Question

我将数据存储在列表中，基本上采用以下格式：

['http://www.website.com/category/apples',
'http://www.website.com/category/oranges',
'http://www.website.com/category/bananas',
'http://www.website.com/category/pears']

此列表中包含大约900个唯一链接。我希望在category之后返回文字（例如apples，oranges等。）

这可能是通过for循环完成的，如下所示，但我遇到了正确使用的功能。到目前为止，这基本上就是我所拥有的。该列表保存在links。

中

for l in links:
    new_list = l.search('category')
    return l

如何优化基本上＆＃34;修剪＆＃34;我列表中的每个元素？

Answer 1

l = ['http://www.website.com/category/apples',
'http://www.website.com/category/oranges',
'http://www.website.com/category/bananas',
'http://www.website.com/category/pears']

li =  [ x[x.rindex('/')+1:] for x in l ]

print(li)

<强>输出

[＆＃39; apple＆＃39;，＆＃39; oranges＆＃39;，＆＃39; bananas＆＃39;，＆＃39; pears＆＃39;]

Answer 2

这是您使用正则表达式的地方。您将字符串与匹配“category /”的正则表达式匹配，然后使用括号运算符返回字符后的字符。

import re
for l in links:
    m = re.match('.+/category/(.+)', l)
    new_list.append(m.group(1))
return new_list

要进行优化，您可以预编译表达式，这可能值得为900多个字符串执行：

import re
cat = re.compile('.+/category/(.+)')
for l in links:
    new_list.append(cat.match(l).group(1))
return new_list

这可以在列表理解中完成而不是for循环：

import re
cat = re.compile('.+/category/(.+)')
return [cat.match(l).group(1) for l in links]

删除列表中的部分项目？

2 个答案: