查找某些/特定的换行符,同时忽略其他换行符

时间:2012-10-22 20:45:38

标签: javascript regex

我有一些SRT数据以\ r和\ n标签返回,作为每个句子中间的换行符。如何在文本/句子的中间只找到那些\ r和\ n标签,而不是其他表示其他换行符的标签。

示例来源:

18
00:00:50,040 --> 00:00:51,890
All the women gather
at the hair salon,

19
00:00:52,080 --> 00:00:56,210
all the mothers and daughters
and they dye their hair orange.

期望的输出:

18
00:00:50,040 --> 00:00:51,890
All the women gather at the hair salon,

19
00:00:52,080 --> 00:00:56,210
all the mothers and daughters and they dye their hair orange.

我是正则表达式的绝对废话,但我最好的猜测(无济于事)就像是

  

var reg = / [\ d \ r] [a-zA-z0-9 \ s +] + [\ r] /

然后在其上拆分()以删除其中一个值中间的任何\ r \ n。我确信它甚至不是正确的方式所以... stackoverflow !! :)

2 个答案:

答案 0 :(得分:1)

这将匹配你想摆脱的换行符,捕捉它前后的角色,把它们放回到空间周围:

var regex = /([a-z,.;:'"])(?:\r\n?|\n)([a-z])/gi;
str = str.replace(regex, '$1 $2');

关于正则表达式的一些事情。我使用了修饰符ig来使其不区分大小写并找到字符串中的所有换行符,而不是在第一个换行符后停止。此外,它假设可以在之后发生字母,逗号,句号,分号,冒号或单引号或双引号,之前另一个字母。正如@nnnnnn在上面的评论中提到的,这不会涵盖所有可能的句子,但它至少应该不会阻塞大多数标点符号。换行符必须是单个换行符,但它与平台无关(可以是\r\n\r\b)。我捕获了换行符之前的字符和换行符后面的字母(带括号),因此我可以使用$1$2在替换字符串中访问它们。这基本上就是它的全部内容。

答案 1 :(得分:1)

这个正则表达式可以解决这个问题:

/(\d+\r\d{2}:\d{2}:\d{2},\d{3} --> \d{2}:\d{2}:\d{2},\d{3}\r)([^\r]+)\r([^\r]+)(\r|$)/g

要使用更多行(必须是设定数字)才能使用此功能,只需添加更多([^\r]+)\r。 (请记住同时将$添加到匹配替换中(使用3行):'$1$2 $3 $4\r')。

用法

mystring = mystring.replace(/(\d+\r\d{2}:\d{2}:\d{2},\d{3} --> \d{2}:\d{2}:\d{2},\d{3}\r)([^\r]+)\r([^\r]+)(\r|$)/g, '$1$2 $3\r');

限制

  • 如果有超过2行文字,则无效。

示例1

工作正常!

<强>输入:

18
00:00:50,040 --> 00:00:51,890
All the women gather
at the hair salon,

19
00:00:52,080 --> 00:00:56,210
all the mothers and daughters
and they dye their hair orange.

<强>输出:

18
00:00:50,040 --> 00:00:51,890
All the women gather at the hair salon,

19
00:00:52,080 --> 00:00:56,210
all the mothers and daughters and they dye their hair orange

示例2

不起作用;超过2行

<强>输入:

18
00:00:50,040 --> 00:00:51,890
All the women gather
at the hair salon,
and they just talk

19
00:00:52,080 --> 00:00:56,210
all the mothers and daughters
and they dye their hair orange.
Except for Maria who dyes it pink.

<强>输出:

18
00:00:50,040 --> 00:00:51,890
All the women gather at the hair salon,
and they just talk

19
00:00:52,080 --> 00:00:56,210
all the mothers and daughters and they dye their hair orange.
Except for Maria who dyes it pink.