匹配URL正则表达式中的破折号

时间:2008-12-02 20:12:00

标签: php regex

我使用以下正则表达式从文本中获取网址(例如"this is text http://url.com/blabla possibly some more text")。

'@(https?://([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?)@'

这适用于所有网址,但我发现它不适用于缩短的网址:"blabla bla http://ff.im/-bEnA blabla"在匹配后变为http://ff.im/

我怀疑它与斜杠-之后的短划线/有关。

1 个答案:

答案 0 :(得分:5)

简短回答:[\w/_\.]-不匹配,因此请[-\w/_\.]

答案很长:

@              - delimiter
(              - start of group
    https?://  - http:// or https://
    ([-\w.]+)+ - capture 1 or more hyphens, word characters or dots, 1 or more times.. this seems odd - don't know what the second + is for
    (:\d+)?    - optionally capture a : and some numbers (the port)
    (          - start of group
        /            - leading slash
        (            - start of group
            [\w/_\.] - any word character, underscore or dot - you need to add hyphen to this list or just make it [^?\S] - any char except ? or whitespace (the path + filename)
            (\?\S+)? - optionally capture a ? followed by anything except whitespace (the querystring)
        )?     - close group, make it optional
    )?         - close group, make it optional
)              - close group
@