Question

正则表达式检索字符串的最后一部分：

https://play.google.com/store/apps/details?id=com.lima.doodlejump

我正在寻找后跟id=

的字符串

以下正则表达式似乎在python中不起作用 sampleURL =“https://play.google.com/store/apps/details?id=com.lima.doodlejump”

re.search("id=(.*?)", sampleURL).group(1)

上面应该给我一个输出：

com.lima.doodlejump

我的搜索组是否正确？

Answer 1

你的正则表达式

(.*?)

不起作用，因为它会在零和无限时间之间匹配，尽可能少（因为?）。因此，您有以下RegEx选项

(.*)      # Matches the rest of the string
(.*?)$    # Matches till the end of the string

但是，你根本不需要RegEx，只需split这样的字符串

data = "https://play.google.com/store/apps/details?id=com.lima.doodlejump"
print data.split("id=", 1)[-1]

<强>输出

com.lima.doodlejump

如果你真的必须使用RegEx，你可以这样做

data = "https://play.google.com/store/apps/details?id=com.lima.doodlejump"
import re
print re.search("id=(.*)", data).group(1)

<强>输出

com.lima.doodlejump

Answer 2

我很惊讶没人提到urlparse ......

>>> s = "https://play.google.com/store/apps/details?id=com.lima.doodlejump"
>>> urlparse.urlparse(s)
ParseResult(scheme='https', netloc='play.google.com', path='/store/apps/details', params='', query='id=com.lima.doodlejump', fragment='')
>>> urlparse.parse_qs(urlparse.urlparse(s).query)
{'id': ['com.lima.doodlejump']}
>>> urlparse.parse_qs(urlparse.urlparse(s).query)['id']
['com.lima.doodlejump']
>>> urlparse.parse_qs(urlparse.urlparse(s).query)['id'][0]
'com.lima.doodlejump'

这里的巨大优势是，如果url查询字符串获得更多组件，那么它可以轻松地破坏依赖于简单str.split的其他解决方案。但是，它不会混淆urlparse :)。

Answer 3

将它拆分到您想要的位置：

id = url.split('id=')[1]

如果您打印id，您将获得：

com.lima.doodlejump

这里不需要正则表达式：）

但是，如果您的字符串中有多个id=，并且您只想要最后一个：

id = url.split('id=')[-1]

希望这有帮助！

Answer 4

这有效：

>>> import re
>>> sampleURL = "https://play.google.com/store/apps/details?id=com.lima.doodlejump"
>>> re.search("id=(.+)", sampleURL).group(1)
'com.lima.doodlejump'
>>>

此代码贪婪地捕获一个或多个字符，而不是非贪婪地捕获零个或多个字符。

正则表达式检索字符串的最后几个字符

4 个答案: