文件格式如下：

some lines
<name>text1</name>
some lines
some lines 
<name>text2</name>
some lines
<name>text3</name>
some more lines

我需要提取每个名称标签之间出现的所有文本
```
<name> extract this text here </name>
```

上述文件的预期输出：

的text1
text2的
文字3

谢谢。

Answer 1

这适用于所提供的样本数据：

for /f "tokens=2 delims=<>" %A in ('type test.txt ^| findstr "<name>"') do @echo %A

如果在批处理脚本中使用此内容，请务必将%A更改为%%A。基本上，这将贯穿包含<name>的行，并使用<将行分为>和delims=<>个字符，为您提供name，{{1} }，text in between。 /name只将tokens=2设置为第二个字符串。

请注意，如果%A之前有任何内容，则无法使用此功能。这可能会使批处理中的事情变得更复杂，然后我建议使用另一种语言的解析库。

此外，如果您要提取的文字包含<name>或<，则无法使用此功能。

Answer 2

以下脚本提取作为命令行参数提供的文件的所需标记之间的文本：

@echo off
setlocal EnableExtensions DisableDelayedExpansion

rem // Resolve command line arguments:
for %%F in (%*) do (
    rem // Read a single line of text following certain criteria:
    for /F "delims=" %%L in ('
        findstr /R "^[^<>]*<name>[^<>][^<>]*</name>[^<>]*$" "%%~F"
    ') do (
        set "LINE=%%L"
        rem /* Extract the desired string portion;
        rem    the preceding `_` is inserted for the first token
        rem    never to appear empty to the `for /F` loop: */
        setlocal EnableDelayedExpansion
        for /F "tokens=3 delims=<>" %%K in ("_!LINE!") do (
            endlocal
            rem // Return found string portion:
            echo(%%K
        )
    )
)

endlocal
exit /B

这仅适用于只有一个标记<name>，后跟一些不包含<和>的文字，后跟一个标记</name>;此字符串必须位于一行中，并且可能在某些不包含<和>的文本之前或之后。

Answer 3

假设输入文件是input.txt。

这应该有效：

grep '<name>.*</name>' input.txt | sed -r 's/<name>(.*)<\/name>/\1/'

grep找到这些行 sed删除名称标签

批处理脚本提取两个给定单词之间的行

文件格式如下：

上述文件的预期输出：

3 个答案: