捕获组中的正则表达式捕获组

时间:2019-05-31 20:48:12

标签: c# .net regex

我正在搜索数据库以查找包含视频信息的跨度标签,以进行迁移。

我的正则表达式运行良好,并且我可以提取大部分所需的所有信息。我遇到的麻烦是样式标签的位置与预期不同。这使该表达式不成立,并导致了我期望的捕获次数的约2/3。

如果我尝试将样式捕获组嵌套在主捕获组中,它将无法捕获任何内容。我也尝试过使用负/正先行,但只有在将其设为可选捕获组时,它才有效。我认为问题是我无法正确嵌套。大多数相关问题的答案都是负面的,但我的理解是更多的是断言/量化。

那么,无论样式标签在span标签中的位置如何,我如何始终捕获样式标签?

正则表达式是.NET(服务器端)

我有一个Regexr设置

/(?<tag><span class='vidly-vid' data-thumb='(?<thumb>http.+\.jpg)'.+aspect-ratio='(?<aspect>\d{1,3}:\d{1,3})'.+sources='\[{"file":.+"(?<src>(?<uri>https:\/\/cf1234.cloudfront\.net\/Vids\/)(?<key>(?<ident>[0-9a-fA-F]{8}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{12}|[a-z0-9]{6})\/(?<mp4>mp4_1080.mp4|mp4_720.mp4|mp4_480.mp4|mp4_360.mp4|mp4.mp4))).+style='(?<style>.+width: (?<width>.+)px.+height: (?<height>.+)px.+)'.+<\/span>)/gmi

样本数据

所有这些都应该匹配。第一个不,其他三个不。

<span class='vidly-vid' data-thumb='https://cf1234.cloudfront.net/Vids/Thumbnails/691DBB43-5EC8-4D57-AF7B-99896D9BD5D1_19127.jpg' data-aspect-ratio='4:3' style='border-width: 0px; width: 352px; height: 240px;' data-sources='[{"file":"https://cf1234.cloudfront.net/Vids/6v1j0a/hls.m3u8","label":"HD"},{"file":"https://cf1234.cloudfront.net/Vids/6v1j0a/mp4_360.mp4","label":"360p SD"}]'>&#160;</span>

<span class='vidly-vid' data-thumb='https://cf1234.cloudfront.net/Vids/Thumbnails/b181cfa5-565d-470a-b93a-2610987bb4da_28142.jpg' data-aspect-ratio='160:117' data-sources='[{"file":"https://cf1234.cloudfront.net/Vids/b181cfa5-565d-470a-b93a-2610987bb4da/hls.m3u8","label":"HD"},{"file":"https://cf1234.cloudfront.net/Vids/b181cfa5-565d-470a-b93a-2610987bb4da/mp4_480.mp4","label":"480p SD"},{"file":"https://cf1234.cloudfront.net/Vids/b181cfa5-565d-470a-b93a-2610987bb4da/mp4_360.mp4","label":"360p SD"},{"file":"https://cf1234.cloudfront.net/Vids/b181cfa5-565d-470a-b93a-2610987bb4da/mp4_720.mp4","label":"720p HD"},{"file":"https://cf1234.cloudfront.net/Vids/b181cfa5-565d-470a-b93a-2610987bb4da/mp4_1080.mp4","label":"1080p HD"}]' style='border-width: 0px; width: 600px; height: 480px;'>&#160;</span>

<table align="left" border="0" cellpadding="5" cellspacing="5" style="width:600px">   <tbody>    <tr>     <td><img alt="" src="/content/generator/Course_90016206/Case-10-LMLO_MG_FLAVOR1label.jpg" style="height:497px; width:324px" /></td>     <td><span class='vidly-vid' data-thumb='https://cf1234.cloudfront.net/Vids/Thumbnails/b2a7cbd3-5d31-49a5-bf89-aef0cf9f7414_28142.jpg' data-aspect-ratio='146:225' data-sources='[{"file":"https://cf1234.cloudfront.net/Vids/b2a7cbd3-5d31-49a5-bf89-aef0cf9f7414/hls.m3u8","label":"HD"},{"file":"https://cf1234.cloudfront.net/Vids/b2a7cbd3-5d31-49a5-bf89-aef0cf9f7414/mp4_480.mp4","label":"480p SD"},{"file":"https://cf1234.cloudfront.net/Vids/b2a7cbd3-5d31-49a5-bf89-aef0cf9f7414/mp4_360.mp4","label":"360p SD"},{"file":"https://cf1234.cloudfront.net/Vids/b2a7cbd3-5d31-49a5-bf89-aef0cf9f7414/mp4_720.mp4","label":"720p HD"},{"file":"https://cf1234.cloudfront.net/Vids/b2a7cbd3-5d31-49a5-bf89-aef0cf9f7414/mp4_1080.mp4","label":"1080p HD"}]' style='border-width: 0px; width: 324px; height: 500px;'>&#160;</span></td>    </tr>   </tbody>  </table>

<span class='vidly-vid' data-thumb='https://cf1234.cloudfront.net/Vids/Thumbnails/231913a7-b608-4d8b-9332-64b6840c22f0_28142.jpg' data-aspect-ratio='16:9' data-sources='[{"file":"https://cf1234.cloudfront.net/Vids/231913a7-b608-4d8b-9332-64b6840c22f0/hls.m3u8","label":"HD"},{"file":"https://cf1234.cloudfront.net/Vids/231913a7-b608-4d8b-9332-64b6840c22f0/mp4_480.mp4","label":"480p SD"},{"file":"https://cf1234.cloudfront.net/Vids/231913a7-b608-4d8b-9332-64b6840c22f0/mp4_360.mp4","label":"360p SD"},{"file":"https://cf1234.cloudfront.net/Vids/231913a7-b608-4d8b-9332-64b6840c22f0/mp4_720.mp4","label":"720p HD"},{"file":"https://cf1234.cloudfront.net/Vids/231913a7-b608-4d8b-9332-64b6840c22f0/mp4_1080.mp4","label":"1080p HD"}]' style='border-width: 0px; width: 920px; height: 520px;'>&#160;</span>

1 个答案:

答案 0 :(得分:0)

我个人只是将正则表达式分成更易于管理的部分,例如:

var spanRegex = new Regex(@"<span class='vidly-vid'.+<\/span>");
var attrRegexes = new[]{
    @"data-thumb='(?<thumb>http.+\.jpg)'",
    @"aspect-ratio='(?<aspect>\d{1,3}:\d{1,3})'",
    @"sources='\[{""file"":.+""(?<src>(?<uri>https:\/\/cf1234.cloudfront\.net\/Vids\/)(?<key>(?<ident>[0-9a-fA-F]{8}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{12}|[a-z0-9]{6})\/(?<mp4>mp4_1080.mp4|mp4_720.mp4|mp4_480.mp4|mp4_360.mp4|mp4.mp4)))",
    @"style='(?<style>.+width: (?<width>.+)px.+height: (?<height>.+)px.+)'",
}
.Select(r => new Regex(r))
.ToList();
var results = inputs.Select(i => spanRegex.Match(i).Value)
    .Select(i => new
    {
        i,
        attributes = 
            from r in attrRegexes
            let match = r.Match(i)
            from g in match.Groups.Cast<Group>().Skip(1)
            select new {g.Name, capture = g.Value}
    });

Linqpad example LINQPad results

相关问题