从现有文本文件中查找和提取文本

时间:2012-01-17 19:19:09

标签: batch-file extract text-files

我需要能够从现有的文本文件中提取数据。文本文件的结构看起来像这样......

this line contains a type of header and always starts at column 1
     this line contains other data and is always tabbed in
     this line contains other data and is always tabbed in
     this line contains other data and is always tabbed in
     this line contains other data and is always tabbed in
     this line contains other data and is always tabbed in
     this line contains other data and is always tabbed in

this line contains a type of header and always starts at column 1
     this line contains other data and is always tabbed in
     this line contains other data and is always tabbed in
     this line contains other data and is always tabbed in

this line contains a type of header and always starts at column 1
     this line contains other data and is always tabbed in
     this line contains other data and is always tabbed in
     this line contains other data and is always tabbed in
     this line contains other data and is always tabbed in

this line contains a type of header and always starts at column 1
     this line contains other data and is always tabbed in
     this line contains other data and is always tabbed in

如您所见,文本文件按部分排列。始终有一个标题行,后跟随机数量的其他数据行,并且各节之间始终有一个空行。不幸的是,标题部分的命名方案或其他数据行中包含的数据没有押韵或理由......只有上述结构有些一致。我需要搜索的数据位于其中一个数据行中,只在一个部分中,可以位于文本文件中的任何位置。我可以使用FIND命令找到我需要找到的文本,但是一旦我这样做,我需要能够将整个部分提取到一个新的文本文件。我无法弄清楚如何向第一个前面的空行添加多行,然后转到下一个空白行,并提取其间的所有内容。那有意义吗?不幸的是,VBScript根本不是这个应用程序的选项,或者它很久以前就已经完成了。有任何想法吗?感谢名单。

2 个答案:

答案 0 :(得分:1)

@echo off
setlocal enableDelayedExpansion
set input="test.txt"
set output="extract.txt"
set search="MY TEXT"

::find the line with the text
for /f "delims=:" %%N in ('findstr /n /c:!search! %input%') do set lineNum=%%N
set "begin=0"

::find blank lines and set begin to the last blank before text and end to the first blank after text
for /f "delims=:" %%N in ('findstr /n "^$" %input%') do (
  if %%N lss !lineNum! (set "begin=%%N") else set "end=%%N" & goto :break
)
::end of section not found so we must count the number of lines in the file
for /f %%N in ('find /c /v "" ^<%input%') do set /a end=%%N+1
:break

::extract the section bracketed by begin and end
set /a count=end-begin-1
<%input% (
  rem ::throw away the beginning lines until we reach the desired section
  for /l %%N in (1 1 %begin%) do set /p "ln="
    rem ::read and write the section
    for /l %%N in (1 1 %count%) do (
      set "ln="
      set /p "ln="
      echo(!ln!
    )
)>%output%

此解决方案的限制:

  • 行必须以<CR><LF>(Windows样式)
  • 终止
  • 行必须<= 1021字节长(不包括<CR><LF>
  • 将从每行删除尾随控制字符

如果限制是一个问题,那么可以编写一个效率较低的变体,使用FOR / F而不是SET / P

读取该部分

答案 1 :(得分:1)

下面的程序读取文件行并将一个部分的行存储在向量中,同时检查搜索文本是否在当前部分内。当该部分结束时,如果找到搜索到的文本,则输出当前部分作为结果;否则,该过程转到下一部分。

@echo off
setlocal EnableDelayedExpansion
set infile=input.txt
set outfile=output.txt
set "search=Any text"
set textFound=
call :SearchSection < %infile% > %outfile%
goto :EOF

:SearchSection
   set i=0
   :readNextLine
      set line=
      set /P line=
      if not defined line goto endSection
      set /A i+=1
      set "ln%i%=!line!"
      if not "!ln%i%!" == "!line:%search%=!" set textFound=True
   goto readNextLine
   :endSection
   if %i% == 0 echo Error: Search text not found & exit /B
if not defined textFound goto SearchSection
for /L %%i in (1,1,%i%) do echo !ln%%i!
exit /B

该计划的局限性与dbenham为其计划所述的相同。