在html中正则表达式以提取特定的href

时间:2015-03-23 16:13:14

标签: ios objective-c regex json nsregularexpression

我想在JSON数据中获取特定数据:此标记中的href中的每个链接<div id='gallery-1'

例如使用我的JSON数据:

<p><strong style=\"font-size: 13px;\">22nd March</strong></p>\n
<p>Swell is 3 foot and clean but wind swing south west later. Get on the early</p>\n
<p><span id=\"more-113\"></span></p>\n
<p>High tide: 1922 2.6m    <span style=\"color: #ff0000;\"> <a href=\"http://www.bundoransurfco.com/webcam/\">
<strong>CLICK HERE FOR LIVE PEAK WEBCAM</strong></a></span></p>\n
<p>Low Tide: 1249 -0.1m</p>\n<p><b>3 day forecast to March 23rd</b></p>\n
<p>Looks like a fun few days with light winds and a long period swell.</p>\n\n\t\t
<style type='text/css'>\n\t\t\t#gallery-1 {\n\t\t\t\tmargin: auto;\n\t\t\t}\n\t\t\t
#gallery-1 .gallery-item {\n\t\t\t\tfloat: left;\n\t\t\t\tmargin-top: 10px;\n\t\t\t\t
text-align: center;\n\t\t\t\twidth: 50%;\n\t\t\t}\n\t\t\t#gallery-1 img {\n\t\t\t\t
border: 2px solid #cfcfcf;\n\t\t\t}\n\t\t\t
#gallery-1 .gallery-caption {\n\t\t\t\t
margin-left: 0;\n\t\t\t}\n\t\t\t
/* see gallery_shortcode() in wp-includes/media.php */\n\t\t</style>\n\t\t
<div id='gallery-1' class='gallery galleryid-113 gallery-columns-2 gallery-size-medium'>
<dl class='gallery-item'>\n\t\t\t<dt class='gallery-icon portrait'>\n\t\t\t\t
<a rel=\"prettyPhoto[gallery-113]\" href='http://www.bundoransurfco.com/wp-content/uploads/2014/11/10411096_10152611456607000_886839954460588268_n.jpg'>
<img width=\"225\" height=\"300\" src=\"http://www.bundoransurfco.com/wp-content/uploads/2014/11/10411096_10152611456607000_886839954460588268_n-225x300.jpg\" 
class=\"attachment-medium colorbox-113 \" alt=\"10411096_10152611456607000_886839954460588268_n\" /></a>\n\t\t\t
</dt></dl>\n\t\t\t
<br style='clear: both' />\n\t\t</div>\n\n
<p><a href=\"http://www.bundoransurfco.com/webcam/\"> </a></p>\n
<h1> Wind Charts</h1>\n<p><a href=\"http://www.windguru.cz/int/index.php?sc=103244\">
<img class=\"size-thumbnail wp-image-747 alignleft\" title=\"wind guru\" src=\"http://www.bundoransurfco.com/wp-content/uploads/2010/12/wind-guru-67x68.jpg\" alt=\"\" width=\"67\" height=\"68\" /></a> <a href=\"http://www.xcweather.co.uk/\"><img class=\"alignnone size-thumbnail wp-image-749\" title=\"xcweathersmall\" src=\"http://www.bundoransurfco.com/wp-content/uploads/2010/12/xcweathersmall2-67x68.jpg\" alt=\"\" width=\"67\" height=\"68\" /></a>       <a href=\"http://www.buoyweather.com/wxnav6.jsp?region=UK&program=nww3BW1&grb=nww3&latitude=55.0&longitude=-8.75&zone=0&units=e\"><img class=\"alignnone size-thumbnail wp-image-750\" title=\"buoy weather\" src=\"http://www.bundoransurfco.com/wp-content/uploads/2010/12/buoy-weather-67x68.jpg\" alt=\"\" width=\"67\" height=\"68\" /></a> <a href=\"http://www.windguru.cz/int/index.php?sc=103244\">Wind Guru</a>       <a href=\"http://www.xcweather.co.uk/\">XC Weather</a>       <a href=\"http://www.buoyweather.com/wxnav6.jsp?region=UK&program=nww3BW1&grb=nww3&latitude=55.0&longitude=-8.75&zone=0&units=e\">Buoy Weather</a></p>\n<h1>Swell Charts</h1>\n<p><a href=\"http://magicseaweed.com/Bundoran-Surf-Report/50/\"><img class=\"alignnone size-thumbnail wp-image-753\" title=\"msw logo\" src=\"http://www.bundoransurfco.com/wp-content/uploads/2010/12/msw-logo-67x43.jpg\" alt=\"\" width=\"75\" height=\"43\" /></a>             <a href=\"http://magicseaweed.com/UK-Ireland-MSW-Surf-Charts/1/\"><img class=\"alignnone size-thumbnail wp-image-754\" title=\"magicseaweedwamchart\" src=\"http://www.bundoransurfco.com/wp-content/uploads/2010/12/magicseaweedwamchart1-67x68.png\" alt=\"\" width=\"67\" height=\"68\" /></a>       <a href=\"http://www.marine.ie/Home/site-area/data-services/marine-forecasts/wave-forecasts\"><img class=\"alignnone wp-image-755 size-thumbnail\" title=\"marine institute irish bouy data\" src=\"http://www.bundoransurfco.com/wp-content/uploads/2010/12/marine-institute-irish-bouy-data-67x42.jpg\" alt=\"\" width=\"67\" height=\"42\" /></a>                 <a href=\"http://magicseaweed.com/Bundoran-Surf-Report/50/\">Magic Seaweed</a>      <a href=\"http://magicseaweed.com/UK-Ireland-MSW-Surf-Charts/1/\">MSM WAM</a>          <a href=\"http://www.marine.ie/Home/site-area/data-services/marine-forecasts/wave-forecasts\">Marine Institute</a></p>\n<h1>Pressure, Weather, Tides</h1>\n<p><a href=\"http://news.bbc.co.uk/weather/forecast/13000\"><img class=\"alignnone size-thumbnail wp-image-756\" title=\"bbc pressure\" src=\"http://www.bundoransurfco.com/wp-content/uploads/2010/12/bbc-pressure-67x68.jpg\" alt=\"\" width=\"67\" height=\"68\" /></a>          <a href=\"http://www.met.ie/\"><img class=\"alignnone size-thumbnail wp-image-759\" title=\"met eireann\" src=\"http://www.bundoransurfco.com/wp-content/uploads/2010/12/met-eireann-67x68.jpg\" alt=\"\" width=\"67\" height=\"68\" /></a>            <a href=\"http://news.bbc.co.uk/weather/forecast/13000\">BBC Pressure</a>      <a href=\"http://www.met.ie/\">Met Eireann</a>      <a href=\"http://www.irishtimes.com/weather/tides.html\">Irish Tide Tables</a></p>\n

仅提取:http://www.bundoransurfco.com/wp-content/uploads/2014/11/10411096_10152611456607000_886839954460588268_n.jpg

之前我使用<a.+?href=\"([^\"]+)来获取href标记中的每个<a>,但这不是我想要的...

2 个答案:

答案 0 :(得分:1)

假设你的div是一个字符串并且只有一个href,你可以使用这个代码而不是正则表达式来获取href的开始和停止位置。

    NSRange range = [divString rangeOfString:@"href"]; // start
    [divString rangeOfString:@">" options:0 range:NSMakeRange(range.location, 100)]; // end (if your href is long you can replace 100 with something greater)

然后使用divString substringWithRange:获取您感兴趣的部分

答案 1 :(得分:1)

这是基于Alex回答的解决方案。适用于单个字符串中的多个href:

NSString *target = @"<div id='gallery-1' class='gallery galleryid-113 gallery-columns-2 gallery-size-medium'><dl class='gallery-item'>\n\t\t\t<dt class='gallery-icon portrait'>\n\t\t\t\t<a rel=\"prettyPhoto[gallery-113]\" href='http://www.bundoransurfco.com/wp-content/uploads/2014/11/10411096_10152611456607000_886839954460588268_n.jpg'><img width=\"225\" height=\"300\" href='http://www.bundoransurfco.com/wp-content/uploads/2014/11/10411096_10152611456607000_886839954460588268_n-225x300.jpg' class=\"attachment-medium colorbox-113 \" alt=\"10411096_10152611456607000_886839954460588268_n\" /></a>\n\t\t\t</dt></dl>\n\t\t\t<br style='clear: both' />\n\t\t</div>";
NSMutableArray *hrefs = [NSMutableArray array];
NSRange hrefRange = NSMakeRange(0, 0);
while (hrefRange.location != NSNotFound){
    hrefRange = [target rangeOfString:@"href='"
                                      options:0
                                        range:NSMakeRange(hrefRange.location, target.length - (hrefRange.location + hrefRange.length))];
    if (hrefRange.location == NSNotFound) {
        NSLog(@"Thats all");
        continue;
    }
    NSRange endRange = [target rangeOfString:@"'"
                                     options:0
                                       range:NSMakeRange(hrefRange.location + hrefRange.length, target.length - (hrefRange.location + hrefRange.length))];
    NSString *href = [target substringWithRange:NSMakeRange((hrefRange.location+hrefRange.length), endRange.location - (hrefRange.location + hrefRange.length))];
    [hrefs addObject:href];
    hrefRange.location = hrefRange.location+hrefRange.length;
}

如您所见,此实现对引号(单引号或双引号href值)很敏感。 附:可能看起来有点混乱,它的快速编码和测试。

编辑: 这里也是带有正则表达式的变体,但仅适用于 a 标记,并且还要小​​心引号:

NSError *error;
NSString *target = @"<div id='gallery-1' class='gallery galleryid-113 gallery-columns-2 gallery-size-medium'><dl class='gallery-item'>\n\t\t\t<dt class='gallery-icon portrait'>\n\t\t\t\t<a rel=\"prettyPhoto[gallery-113]\" href=\"http://www.bundoransurfco.com/wp-content/uploads/2014/11/10411096_10152611456607000_886839954460588268_n.jpg\"><img width=\"225\" height=\"300\" href=\"http://www.bundoransurfco.com/wp-content/uploads/2014/11/10411096_10152611456607000_886839954460588268_n-225x300.jpg\" class=\"attachment-medium colorbox-113 \" alt=\"10411096_10152611456607000_886839954460588268_n\" /></a>\n\t\t\t</dt></dl>\n\t\t\t<br style='clear: both' />\n\t\t</div>";
NSRegularExpression *regEx = [NSRegularExpression regularExpressionWithPattern:@"<[a|img]\\s+(?:[^>]*?\\s+)?href=\"([^\"]*)\""
                                                                       options:0
                                                                         error:&error];
NSArray *array = [regEx matchesInString:target
                                options:0
                                  range:NSMakeRange(0, target.length)];
for (NSTextCheckingResult *match in array){
    NSRange range = [match rangeAtIndex:1];
    NSString *result = [target substringWithRange:range];
    NSLog(@"HREF = %@", result);
}

我还编辑了第一个变体,将所有href保存到数组中。

相关问题