Question

我有一些数据，我想将其转换为表格式。

这里是输入数据

1- This is the 1st line with a 
newline character
2- This is the 2nd line

每一行可能包含多个换行符。

输出

<td>1- This the 1st line with 
a new line character</td>
<td>2- This is the 2nd line</td>

我已尝试过以下

^（\ d {1,3} - ）[^ \ d] *

但它似乎只匹配到1中的数字1。

我想在我的字符串中找到另一个\ d {1,3} \ - 之后能够停止匹配。有什么建议吗？

编辑：我使用的是EditPad Lite。

Answer 1

您没有指定语言（有许多正则表达式实现），但一般来说，您正在寻找的是＆＃34;正向前瞻＆＃34;，它允许您添加将影响匹配的模式，但不会成为它的一部分。

在您正在使用的任何语言的文档中搜索前瞻。

编辑：以下示例似乎适用于vim。

:%s#\v(^\d+-\_.{-})\ze(\n\d+-|%$)#<td>\1</td>

下面的注释：

%      - for all lines
s#     - substitute the following (you can use any delimiter, and slash is most
         common, but as that will require that we escape slashes in the command
         I chose to use the number sign)
\v     - very magic mode, let's us use less backslashes
(      - start group for back referencing
^      - start of line
\d+    - one or more digits (as many as possible)
-      - a literal dash!
\_.    - any character, including a newline
{-}    - zero or more of these (as few as possible)
)      - end group
\ze    - end match (anything beyond this point will not be included in the match)
(      - start a new group
[\n\r] - newline (in any format - thanks Alan)
\d+    - one or more digits
-      - a dash
|      - or
%$     - end of file
)      - end group
#      - start substitute string
<td>\1</td> - a TD tag around the first matched group

Answer 2

这适用于vim，并使用zerowidth positive-lookahead：

/^\d\{1,3\}-\_.*[\r\n]\(\d\{1,3\}-\)\@=

步骤：

/^\d\{1,3\}-              1 to 3 digits followed by -
\_.*                      any number of characters including newlines/linefeeds
[\r\n]\(\d\{1,3\}-\)\@=   followed by a newline/linefeed ONLY if it is followed 
                          by 1 to 3 digits followed by - (the first condition)

编辑：这就是它在pcre / ruby中的表现：

/(\d{1,3}-.*?[\r\n])(?=(?:\d{1,3}-)|\Z)/m

请注意，您需要一个以换行符结尾的字符串，以匹配最后一个条目。

Answer 3

SEARCH:   ^\d+-.*(?:[\r\n]++(?!\d+-).*)*

REPLACE:  <td>$0</td>

[\r\n]++匹配一个或多个回车符或换行符，因此您不必担心该文件是使用Unix（\n），DOS（\r\n）还是较旧的Mac（\r）行分隔符。

(?!\d+-)断言行分隔符之后的第一个东西不是另一个行号。

我使用+中的占有[\r\n]++来确保它与整个分隔符匹配。否则，如果分隔符为\r\n，则[\r\n]+可能与\r匹配，而(?!\d+-)可能与\n匹配。

在EditPad Pro中测试过，但它也适用于Lite。

Answer 4

(\d+-.+(\r|$)((?!^\d-).+(\r|$))?)

Answer 5

您只能匹配分隔符并对它们进行拆分。例如，在C＃中，可以这样做：

string s = "1- This is the 1st line with a \r\nnewline character\r\n2- This is the 2nd line";
string ss = "<td>" + string.Join("</td>\r\n<td>", Regex.Split(s.Substring(3), "\r\n\\d{1,3}- ")) + "</td>";
MessageBox.Show(ss);

Answer 6

你分三个步骤做好事吗？

（这些是perl regex）：

替换第一个：

$input =~ s/^(\d{1,3})/<td>\1/;

替换其余部分

$input =~ s/\n(\d{1,3})/<\/td>\n<td>\1/gm;

添加最后一个：

$input .= '</td>';

正则表达式 - 贪婪但在字符串匹配之前停止

6 个答案: