Php regex to get various url parts

时间:2016-12-28 11:41:27

标签: php regex regex-lookarounds

I am writing a regex in php to help me get the various parts of a url like so :

$rule = "/^(?P<scheme>(http[s]?|ftp|mailto)):\/\/(?P<auth>([a-zA-Z]+:[a-zA-Z0-9-_]+))?@?(?P<domain>([a-zA-Z0-9-_]+).([a-z.]+)):?(?P<port>([0-9]{2,4}))?\/?(?P<path>[a-z0-9-\/]+(?=\/?))?\?(?P<query>[a-z0-9-_=\\?]+)?\/?(?P<hash>[#a-z0-9-_]+)$/";
$url = "https://user:password@store.example.co.uk:80/search?q=term?lang=en#anchor";

if (preg_match($rule, $url, $matches)) {
    foreach ($matches as $key => $match) {
        if (is_string($key)) {
            $params[$key] = $match;
        }
    }
    print_r($params);
}

The above code gives me:

Array
(
    [scheme] => https
    [auth] => user:password
    [domain] => store.example.co.uk
    [port] => 80
    [path] => search
    [query] => q=term?lang=en
    [hash] => #anchor
)

But i want to get something like so :

Array
(
    [scheme] => https
    [auth] => user:password
    [domain] => Array (
        [sub-domain] => store,
        [domain-name] => example,
        [top-level-domain] => co.uk
    [port] => 80
    [path] => search
    [query] => Array (
        [q] => term
        [lang] => en
    [hash] => #anchor
)

Is there a way i can achieve it using only regex or use some other php function & regex to get or separate the various parts again.

nb: the top level domain could be either .co.uk or .com, anything in that format & the domain could be www.example.com or example.com or store.example.com. Some parts of the url are optional & i still want to get each part if the optional ones are not specified.

For example if i skip the sub-domain part, "example" becomes the sub-domain and ".com" becomes the domain, i want to still get the "example" to be domain if there's no sub-domain specified.

Thank you.

0 个答案:

没有答案