Question

我有一些带有强硬换行符的文字，如下所示：

This should all be on one line 
since it's one sentence.

This is a new paragraph that
should be separate.

我想删除单个换行符，但保留双换行符，如下所示：

This should all be on one line since it's one sentence.

This is a new paragraph that should be separate.

是否有一个正则表达式来执行此操作？（或一些简单的方法）

到目前为止，这是我唯一有效的解决方案，但感觉很自负。

txt = txt.gsub(/(\r\n|\n|\r)/,'[[[NEWLINE]]]')
txt = txt.gsub('[[[NEWLINE]]][[[NEWLINE]]]', "\n\n")
txt = txt.gsub('[[[NEWLINE]]]', " ")

Answer 1

替换所有未在换行符之后或之前的换行符：

text = <<END
This should all be on one line
since it's one sentence.

This is a new paragraph that
should be separate.
END

p text.gsub /(?<!\n)\n(?!\n)/, ' '
#=> "This should all be on one line since it's one sentence.\n\nThis is a new paragraph that should be separate. "

或者，对于没有外观的Ruby 1.8：

txt.gsub! /([^\n])\n([^\n])/, '\1 \2'

Answer 2

text.gsub!(/(\S)[^\S\n]*\n[^\S\n]*(\S)/, '\1 \2')

两个(\S)组与@ sln的正则表达式中的外观（(?<!\s)(?<!^)和(?!\s)(?!$)）具有相同的用途：

他们确认换行确实在句子的中间，
他们确保[^\S\n]*\n[^\S\n]*部分消耗换行周围的任何其他空白，使我们可以将其标准化为单个空格。

它们还使正则表达式更容易阅读，并且（可能最重要的是）它们在1.9版本的Ruby中工作，不支持lookbehinds。

Answer 3

格式化（关闭自动换行）比你想象的更多如果输出是格式化操作的结果，那么你应该去那些对原件进行逆向工程的规则。

例如，你在那里进行的测试是

This should all be on one line
since it's one sentence.

This is a new paragraph that
should be separate.

如果仅删除了单个换行符，它将如下所示：

This should all be on one line since it's one sentence.
This is a new paragraph thatshould be separate.

此外，其他格式（如故意换行）也会丢失，例如：

This is Chapter 1
   Section a 
   Section b

变成

This is Chapter 1   Section a   Section b

查找有问题的换行符很简单/(?<!\n)\n(?!\n)/
但是，你用什么来代替呢。

编辑：实际上，即使找到独立的换行也不容易，因为它们在视觉上位于隐藏的视图（水平）空白之间。

有4种方法可供选择。

删除换行符，保留周围的格式 $text =~ s/(?<!\s)([^\S\n]*)\n([^\S\n]*)(?!\s)/$1$2/g;
删除换行符和格式，替换空格
$text =~ s/(?<!\s)[^\S\n]*\n[^\S\n]*(?!\s)/ /g;

与上述相同但忽略字符串

开头或结尾的换行符

$text =~ s/(?<!\s)(?<!^)[^\S\n]*\n[^\S\n]*(?!$|\s)/ /g;
$text =~ s/(?<!\s)(?<!^)([^\S\n]*)\n([^\S\n]*)(?!$|\s)/$1$2/g;

正则表达式的示例细分（这是隔离单个换行符所需的最小值）：

(?<!\s)      # Not a whitespace behind us (text,number,punct, etc..)
[^\S\n]*     # 0 or more whitespaces, but no newlines
\n           # a newline we want to remove
[^\S\n]*     # 0 or more whitespaces, but no newlines
(?!\s)/      # Not a whitespace in front of us (text,number,punct, etc..)

Answer 4

嗯，有这个：

s.gsub /([^\n])\n([^\n])/, '\1 \2'

它不会对领先或尾随换行做任何事情。如果你根本不需要前导或尾随空格，那么你将获得这种变化：

s.gsub(/([^\n])\n([^\n])/, '\1 \2').strip

Answer 5

$ ruby -00 -pne 'BEGIN{$\="\n\n"};$_.gsub!(/\n+/,"\0")' file
This should all be on one line since it's one sentence.

This is a new paragraph thatshould be separate.

使用Ruby从文本中删除硬换行符

5 个答案: