Python正则表达式,忽略字符,直到某些字符匹配多次

时间:2018-11-18 14:22:48

标签: python regex regex-group

我正在重命名我从种子文件中下载的一批文件,并想要获得剧集的名称,所以我认为正则表达式可以解决这个问题。我是regex的新手,所以感谢您的帮助。这就是我可以想到的:

我有一个与其他重命名功能相关的类,因此此处定义的功能在该类内,该类以文件目录的路径,要重命名的表达式和文件扩展名进行初始化。

im使用glob访问扩展名为“ .mkv”的所有文件

为了调试,我打印了所有文件名:

Mr.Robot.S02E01.eps2.0_unm4sk-pt1.tc.720p.WEB-DL.x264-[MULVAcoded].mkv
Mr.Robot.S02E02.eps2.0_unm4sk-pt2.tc.720p.WEB-DL.x264-[MULVAcoded].mkv
Mr.Robot.S02E03.eps2.1_k3rnel-pan1c.ksd.720p.WEB-DL.x264-[MULVAcoded].mkv
Mr.Robot.S02E04.eps2.2_init_1.asec.720p.WEB-DL.x264-[MULVAcoded].mkv
Mr.Robot.S02E05.eps2.3.logic-b0mb.hc.720p.WEB-DL.x264-[MULVAcoded].mkv
Mr.Robot.S02E06.eps2.4.m4ster-s1ave.aes.720p.WEB-DL.x264-[MULVAcoded].mkv
Mr.Robot.S02E07.eps2.5_h4ndshake.sme.720p.WEB-DL.x264-[MULVAcoded].mkv
Mr.Robot.S02E08.eps2.6.succ3ss0r.p12.720p.WEB-DL.x264-[MULVAcoded].mkv
Mr.Robot.S02E09.eps2.7_init_5.fve.720p.WEB-DL.x264-[MULVAcoded].mkv
Mr.Robot.S02E10.eps2.8_h1dden-pr0cess.axx.720p.WEB-DL.x264-[MULVAcoded].mkv
Mr.Robot.S02E11.eps2.9_pyth0n-pt1.p7z.720p.WEB-DL.x264-[MULVAcoded].mkv
Mr.Robot.S02E12.eps2.9_pyth0n-pt2.p7z.720p.WEB-DL.x264-[MULVAcoded].mkv




def strip_ep_name(self):
    for i, f in enumerate(self.files):
        f_list = f.split("\\")
        name, ext = os.path.splitext(f_list[-1])
        ep_name = name.strip(r'(.*?)".720p.WEB-DL.x264-[MULVAcoded]"')
        print(ep_name)

对我来说,目标是获得带有或不带有情节编号的情节名称,因为以后我可以给情节起一个新的名字。

,输出为:

r.Robot.S02E01.eps2.0_unm4sk-pt1.t
r.Robot.S02E02.eps2.0_unm4sk-pt2.t
r.Robot.S02E03.eps2.1_k3rnel-pan1c.ks
r.Robot.S02E04.eps2.2_init_1.as
r.Robot.S02E05.eps2.3.logic-b0mb.h
r.Robot.S02E06.eps2.4.m4ster-s1ave.aes
r.Robot.S02E07.eps2.5_h4ndshake.sm
r.Robot.S02E08.eps2.6.succ3ss0r.p1
r.Robot.S02E09.eps2.7_init_5.fv
r.Robot.S02E10.eps2.8_h1dden-pr0cess.a
r.Robot.S02E11.eps2.9_pyth0n-pt1.p7z
r.Robot.S02E12.eps2.9_pyth0n-pt2.p7z

我想删除剧集名称之前的所有“ .eps2.2”,但它们不遵循命令。

现在我不知道该如何继续。有人可以帮忙吗?

3 个答案:

答案 0 :(得分:1)

首先导入Python的regex模块:

import re

然后用它代替“ r.Robot.S02E01.eps2.0_unm4sk-pt1.t”:

ep_name = re.sub(r"eps2\.\d{1,2}(\.|\_)","",episode_name)

循环使用ep_name,并将情节名称一个接一个地传递给episode_name,然后打印ep_name

输出如下:

  

r.Robot.S02E01.unm4sk-pt1.t

答案 1 :(得分:1)

一步一步完成:

\.eps\d+\.\d+[-_.](.+?)(?:\.720p.+)\.(\w+)$

简而言之,内容为:

\.eps\d+\.\d+ # ".eps", followed by digits, a dot and other digits
[-_.]         # one of -, _ or .
(.+?)         # anything else lazily afterwards
(?:\.720p.+)  # until .720p is found (might need some tweaking)
\.            # a dot
(\w+)$        # some word characters (aka the file extension) at the end

需要用.\1.\2替换它才能最终获得所需的格式。


Python中的所有内容:

import re

filenames = """
Mr.Robot.S02E01.eps2.0_unm4sk-pt1.tc.720p.WEB-DL.x264-[MULVAcoded].mkv
Mr.Robot.S02E02.eps2.0_unm4sk-pt2.tc.720p.WEB-DL.x264-[MULVAcoded].mkv
Mr.Robot.S02E03.eps2.1_k3rnel-pan1c.ksd.720p.WEB-DL.x264-[MULVAcoded].mkv
Mr.Robot.S02E04.eps2.2_init_1.asec.720p.WEB-DL.x264-[MULVAcoded].mkv
Mr.Robot.S02E05.eps2.3.logic-b0mb.hc.720p.WEB-DL.x264-[MULVAcoded].mkv
Mr.Robot.S02E06.eps2.4.m4ster-s1ave.aes.720p.WEB-DL.x264-[MULVAcoded].mkv
Mr.Robot.S02E07.eps2.5_h4ndshake.sme.720p.WEB-DL.x264-[MULVAcoded].mkv
Mr.Robot.S02E08.eps2.6.succ3ss0r.p12.720p.WEB-DL.x264-[MULVAcoded].mkv
Mr.Robot.S02E09.eps2.7_init_5.fve.720p.WEB-DL.x264-[MULVAcoded].mkv
Mr.Robot.S02E10.eps2.8_h1dden-pr0cess.axx.720p.WEB-DL.x264-[MULVAcoded].mkv
Mr.Robot.S02E11.eps2.9_pyth0n-pt1.p7z.720p.WEB-DL.x264-[MULVAcoded].mkv
Mr.Robot.S02E12.eps2.9_pyth0n-pt2.p7z.720p.WEB-DL.x264-[MULVAcoded].mkv
"""

rx = re.compile(r'\.eps\d+\.\d+[-_.](.+?)(?:\.720p.+)\.(\w+)$', re.M)

filenames = rx.sub(r".\1.\2", filenames)
print(filenames)

哪个产量

Mr.Robot.S02E01.unm4sk-pt1.tc.mkv
Mr.Robot.S02E02.unm4sk-pt2.tc.mkv
Mr.Robot.S02E03.k3rnel-pan1c.ksd.mkv
Mr.Robot.S02E04.init_1.asec.mkv
Mr.Robot.S02E05.logic-b0mb.hc.mkv
Mr.Robot.S02E06.m4ster-s1ave.aes.mkv
Mr.Robot.S02E07.h4ndshake.sme.mkv
Mr.Robot.S02E08.succ3ss0r.p12.mkv
Mr.Robot.S02E09.init_5.fve.mkv
Mr.Robot.S02E10.h1dden-pr0cess.axx.mkv
Mr.Robot.S02E11.pyth0n-pt1.p7z.mkv
Mr.Robot.S02E12.pyth0n-pt2.p7z.mkv

请参见a demo on regex101.com

答案 2 :(得分:0)

我不确定我是否理解正确,因此我不知道该系列,也不知道标题。但是您真的需要re吗?

for f in files:
    print(f[23:-35].split('.')[0])

产生

unm4sk-pt1
unm4sk-pt2
k3rnel-pan1c                                                
init_1                                                      
logic-b0mb                                                  
m4ster-s1ave                                                
h4ndshake                                                   
succ3ss0r                                                  
init_5                                                      
h1dden-pr0cess                                              
pyth0n-pt1                                                  
pyth0n-pt2      

修改:

我仍然没有在您的帖子中看到实际的目标格式定义,但是以防万一@Jan是正确的,这也是为此节省re的解决方案:

for f in files:
    print(f[:16] + '.'.join(f[23:].split('.')[:2]) + '.mkv')

Mr.Robot.S02E01.unm4sk-pt1.tc.mkv
Mr.Robot.S02E02.unm4sk-pt2.tc.mkv
Mr.Robot.S02E03.k3rnel-pan1c.ksd.mkv
Mr.Robot.S02E04.init_1.asec.mkv
Mr.Robot.S02E05.logic-b0mb.hc.mkv
Mr.Robot.S02E06.m4ster-s1ave.aes.mkv
Mr.Robot.S02E07.h4ndshake.sme.mkv
Mr.Robot.S02E08.succ3ss0r.p12.mkv
Mr.Robot.S02E09.init_5.fve.mkv
Mr.Robot.S02E10.h1dden-pr0cess.axx.mkv
Mr.Robot.S02E11.pyth0n-pt1.p7z.mkv
Mr.Robot.S02E12.pyth0n-pt2.p7z.mkv