根据文件名的部分查找并移动psedo重复文件

时间:2012-11-06 22:21:36

标签: windows file batch-file

我有一个直接从数据库导出的文件。 每天创建文件,并在文件名末尾附加一个版本。

文件名语法:DocumentNumber_DocumentName_Version.pdf

目标:使用Windows批处理文件将旧版本的新文件移动到/ old文件夹。在过去24小时内没有新版本的现有文件将被忽略。

DocumentNumber可以是可变长度,可以包含短划线,以下划线结尾 DocumentName:可以是可变长度,包括破折号,空格和/或下划线,并且可以随时间变化 版本总是在最后,它以下划线开头。是字母数字,alpha总是递增。

Ex1:不同版本的文件:目录有......

D00003456_BTW-FW001_OPTB_A12.pdf
D00003456_BTW-FW001_OPTB_B9.pdf
D00003456_BTW-FW001_OPTB_C6.pdf
D00003456_BTW-FW001_OPTB_D2.pdf (new)

DocumentNumber is D00003456
DocumentName is BTW-FW001_OPTB
Version is either A12, B9, C6 or D2

** Would like to move all to old folder except D2

Ex2:DocumentName可以从版本更改为版本:目录有...

DPP-456_BTW-FW001_OPTB_A1.pdf
DPP-456_BTW-FW001_OPTB_C45.pdf
DPP-456_NEW WIDGET_F6.pdf
DPP-456_NEWER WIDGET_Blue_W2.pdf (new)

DocumentNumber is DPP-456
DocumentName is BTW-FW001_OPTB, "NEW WIDGET" or "NEWER WIDGET_Blue"
Version is either A1, C45, F6 or W2

** Would like to move all to old folder except W2

Ex3:只移动旧版本保留所有文件的最新版本:目录有...

SD0001_I001_A1.pdf
SD0001_ClassyWidget_C45.pdf (new)
SD0034_WIDGET_F6.pdf
00000056_NEWER WIDGET_Gray_W2.pdf

DocumentNumber is SD0001, SD0034 and 00000056
DocumentName is I001, ClassyWidget, "WIDGET" or "NEWER WIDGET_Gray"
Version is either A1, C45, F6 or W2

** Would like to move SD001 all to old folder except C45, others (SD0034 and 00000056) would be ignored

1 个答案:

答案 0 :(得分:0)

在多次阅读本主题后,我得出结论认为同一文档的所有版本都已识别,因为它们都具有相同的文档编号(您应该说明这一点,而不是向我们展示多个示例,让我们猜猜它。)

下面的批处理文件解决了您的问题。

@echo off
setlocal EnableDelayedExpansion
set prevName=
for %%a in (*.pdf) do (
   if not defined prevName (
      rem Initialize process with first name and DocumentNumber
      set prevName=%%a
      for /F "delims=_" %%b in ("%%a") do set prevNumber=%%b
   ) else (
      rem Check if next name have the same DocumentNumber
      set nextName=%%a
      for /F "delims=_" %%b in ("%%a") do set nextNumber=%%b
      if "!prevNumber!" equ "!nextNumber!" (
         rem Yes: Two versions of same file, move the older one
         ECHO move "!prevName!" "C:\dest\dir"
      ) else (
         rem No: Different files, update DocumentNumber
         set prevNumber=!nextNumber!
      )
      rem Anyway, previous name was processed
      set prevname=!nextName!
   )
)

磁盘上的文件:

00000056_NEWER WIDGET_Gray_W2.pdf
D00003456_BTW-FW001_OPTB_A12.pdf
D00003456_BTW-FW001_OPTB_B9.pdf
D00003456_BTW-FW001_OPTB_C6.pdf
D00003456_BTW-FW001_OPTB_D2.pdf
DPP-456_BTW-FW001_OPTB_A1.pdf
DPP-456_BTW-FW001_OPTB_C45.pdf
DPP-456_NEW WIDGET_F6.pdf
DPP-456_NEWER WIDGET_Blue_W2.pdf
SD0001_ClassyWidget_C45.pdf
SD0001_I001_A1.pdf
SD0034_WIDGET_F6.pdf

计划结果:

move "D00003456_BTW-FW001_OPTB_A12.pdf" "C:\dest\dir"
move "D00003456_BTW-FW001_OPTB_B9.pdf" "C:\dest\dir"
move "D00003456_BTW-FW001_OPTB_C6.pdf" "C:\dest\dir"
move "DPP-456_BTW-FW001_OPTB_A1.pdf" "C:\dest\dir"
move "DPP-456_BTW-FW001_OPTB_C45.pdf" "C:\dest\dir"
move "DPP-456_NEW WIDGET_F6.pdf" "C:\dest\dir"
move "SD0001_ClassyWidget_C45.pdf" "C:\dest\dir"

编辑:回复新评论

正如我之前所说,你的解释并不清楚,所以我在你的例子中提出了我的解决方案。我以前的代码正确地适用于您的原始示例,除了一种情况:Ex3中的文档编号SD0001。但是,在这种情况下,您按相反的顺序列出了文件I001和ClassyWidget 。 DIR和FOR命令都首先列出ClassyWidget,然后是I001。你为什么要扭转他们?当我阅读问题描述时,我会关注如何解决它,而不是检查OP给出的数据是对还是错!这样,我的解决方案将最后列出的文件作为旧版本提供(如所有示例中所示)。

这个新的“细节”导致了一个完全不同的解决方案,因为它现在需要两次遍历所有文件才能首先识别每个DocNumber的最后一个版本。虽然在问题描述中您说“DocumentName可以包含下划线”,但显示的所有示例都包含只有一个下划线。当DocumentName只有一个或没有下划线时,下面的新解决方案正确获取旧版本;如果它可能有几个下划线,则需要进行“小”修改。

@echo off
setlocal EnableDelayedExpansion
rem First pass: get oldest version of each DocNumber
for %%a in (*.pdf) do (
   rem docNumber=First token (%%b), version=fourth (%%e) or third (%%d) token
   for /F "tokens=1-4 delims=_" %%b in ("%%~Na") do (
      set version=%%e
      if not defined version set version=%%d
      if "!version!" gtr "!oldestVersion[%%b]!" set oldestVersion[%%b]=!version!
   )
)
rem Second pass: move all versions of each DocNumber, except the oldest one
for %%a in (*.pdf) do (
   for /F "tokens=1-4 delims=_" %%b in ("%%~Na") do (
      set version=%%e
      if not defined version set version=%%d
      if "!version!" neq "!oldestVersion[%%b]!" ECHO move "%%a" "C:\dest\dir"
   )
)

磁盘上的文件:

00000056_NEWER WIDGET_Gray_W2.pdf
BI0000018307_MW531-ABZ2-Bond-Resc_xls_D2.pdf
BI0000018307_MW531-Triton-Bond-Res_xls_B9.pdf
BI0000018307_MW531-Triton-Bond-Res_xls_C5.pdf
D00003456_BTW-FW001_OPTB_A12.pdf
D00003456_BTW-FW001_OPTB_B9.pdf
D00003456_BTW-FW001_OPTB_C6.pdf
D00003456_BTW-FW001_OPTB_D2.pdf
DPP-456_BTW-FW001_OPTB_A1.pdf
DPP-456_BTW-FW001_OPTB_C45.pdf
DPP-456_NEW WIDGET_F6.pdf
DPP-456_NEWER WIDGET_Blue_W2.pdf
SD0001_ClassyWidget_C45.pdf
SD0001_I001_A1.pdf
SD0034_WIDGET_F6.pdf

计划结果:

move "BI0000018307_MW531-Triton-Bond-Res_xls_B9.pdf" "C:\dest\dir"
move "BI0000018307_MW531-Triton-Bond-Res_xls_C5.pdf" "C:\dest\dir"
move "D00003456_BTW-FW001_OPTB_A12.pdf" "C:\dest\dir"
move "D00003456_BTW-FW001_OPTB_B9.pdf" "C:\dest\dir"
move "D00003456_BTW-FW001_OPTB_C6.pdf" "C:\dest\dir"
move "DPP-456_BTW-FW001_OPTB_A1.pdf" "C:\dest\dir"
move "DPP-456_BTW-FW001_OPTB_C45.pdf" "C:\dest\dir"
move "DPP-456_NEW WIDGET_F6.pdf" "C:\dest\dir"
move "SD0001_I001_A1.pdf" "C:\dest\dir"

编辑相同问题的第三个解决方案......

@echo off
setlocal EnableDelayedExpansion

rem Filename syntax: DocumentNumber_DocumentName_Version.pdf
rem May be spaces and underscores in DocumentName
rem Goal: Find last Version of same DocumentNumber and keep it (DocumentName don't cares)
rem       Move the rest to other folder

rem First pass: Create *ordered* FILE array with this format: FILE[docNumber,Version.pdf]=docName
for %%a in (*.pdf) do (
   set "fileName=%%a"
   rem Change spaces by slashes in DocumentName
   set fileName=!fileName: =/!
   rem DocumentNumber=first token, Version=last token; separated by underscores
   set docNumber=
   for %%b in (!fileName:_^= !) do (
      if not defined docNumber (
         set docNumber=%%b
      ) else (
         set Version=%%b
      )
   )   
   rem Eliminate DocumentNumber and Version from filename (get DocumentName)
   set fileName=!fileName:*_=!
   for %%v in (!Version!) do set docName=!filename:_%%v=!
   rem Create the array element, restoring spaces in DocumentName
   set FILE[!docNumber!,!Version!]=!docName:/= !
)

rem Delete REM part in two next commands if you want to review the created FILE array
REM SET FILE[
REM ECHO/

rem Second pass: Move all array elements of same DocumentNumber, except the last one
set docNumber=/
for /F "tokens=2-4 delims=[,]=" %%a in ('set FILE[') do (
   rem %%a=docNumber, %%b=Version.pdf, %%c=docName
   if "!docNumber!" equ "%%a" (
      ECHO move "!docNumber!_!docName!_!Version!" "C:\dest\dir"
   )
   set docNumber=%%a
   set docName=%%c
   set Version=%%b
)