正则表达式:匹配指定字符串之间的所有匹配项

时间:2016-07-08 02:01:18

标签: regex

我正在处理一堆引用图像文件名的文本文件。这些文件名已经过清理(小写,空格用连字符替换) - 但引用它们的文本不是。

我需要像这样转换字符串:

(image: uploaded IMAGE.jpg caption: this is my caption)
(image: uploaded IMAGE copy.jpeg caption: this is my caption)
(image: IMG_6087.png caption: this is my caption)
(image: IMG_6087 copy.gif)
(image: IMG_9999_copy.jpg)
(image: somehow, a comma.jpg)
(image: other ridic'ulous characters!.jpg)

为:

(image: uploaded-image.jpg caption: this is my caption)
(image: uploaded-image-copy.jpeg caption: this is my caption)
(image: img_6087.png caption: this is my caption)
(image: img_6087-copy.gif)
(image: img_9999_copy.jpg)
(image: somehow-a-comma.jpg)
(image: other-ridiculous-characters.jpg)

这些字符串是较大的文本块的一部分,但它们都在各自的行上,如下所示:

This is not a short guide to write about art. Go in, out of the window, inside New York’s stars qualities, dreams and schemes. People are gathered together, brewing coffee — you have seen their faces? The artists in Manhattan.

(image: manhattan photo.jpg)

Drive till sunset and say goodbye to your body, because this is not a photograph. I saw sixteen americans, raised by wolves, probably lost in paradise city. I found your head — Do you still want it?

我正在使用Sublime文本并计划进行多次替换全部:

  1. 剥离空白
  2. 剥离不是字母数字或_或 -
  3. 的字符
  4. 制作小写
  5. 但是我无法捕获两个分隔符之间的所有实例。

    (?<=^\(image: )[what do I do here??](?=\.jpe?g|png|gif)

4 个答案:

答案 0 :(得分:0)

您可以使用非贪婪匹配所有.*?

所以^\(image: (.*?\.(:?jpe?g|png|gif))捕获文件名,包括扩展名

答案 1 :(得分:0)

您可以使用以下命令获取文件名:

(?<=image:\s)([^.]++)(?=\.jpe?g|\.png|\.gif)

之后,转换取决于您正在使用的语言。根据需要添加文件扩展名。现在,您支持jpgjpegpnggif

答案 2 :(得分:0)

这是一种在PHP中实现它的工作方式

<?php
$string =
"This is not a short guide to write about art. Go in, out of the window, inside New York’s stars qualities, dreams and schemes. People are gathered together, brewing coffee — you have seen their faces? The artists in Manhattan.

(image: uploaded IMAGE.jpg caption: this is my caption)
This is not a short guide to write about art. Go in, out of the window, inside New York’s stars qualities, dreams and schemes. People are gathered together, brewing coffee — you have seen their faces? The artists in Manhattan.

(image: uploaded IMAGE copy.jpeg caption: this is my caption)
(image: IMG_6087.png caption: this is my caption)
(image: IMG_6087 copy.gif) blah blah
(image: IMG_9999_copy.jpg)
(image: somehow, a comma.jpg)
(image: other ridic'ulous characters!.jpg)";

echo preg_replace_callback('~(?<=\(image: )(.*?)\.(jpg|jpeg|png|gif)~', function($matches)
{
    return preg_replace('~\W~', '-', stripslashes(strtolower($matches[1]))) . ".$matches[2]";
}, $string);

?>

[编辑]添加正则表达式解释:

  • (?<=image: ):是一个积极的外观 - 所以检查'image:'的存在,但没有捕获。
  • (.*?):以贪婪的方式捕捉图像扩展之前的所有内容 - 因此尽可能少地匹配文本。
  • \.(jpg|jpeg|png|gif):将.字面上与其中一个指定的扩展名匹配 - 并捕获扩展以重复使用。
  • ~:是分隔符,这个选择只是因为它很少用在字符串中而不需要\ /
  • \W:与\w相反,它将匹配任何非字母数字字符。

将输出(在视图源中):

This is not a short guide to write about art. Go in, out of the window, inside New York’s stars qualities, dreams and schemes. People are gathered together, brewing coffee — you have seen their faces? The artists in Manhattan.

(image: uploaded-image.jpg caption: this is my caption)
This is not a short guide to write about art. Go in, out of the window, inside New York’s stars qualities, dreams and schemes. People are gathered together, brewing coffee — you have seen their faces? The artists in Manhattan.

(image: uploaded-image-copy.jpeg caption: this is my caption)
(image: img_6087.png caption: this is my caption)
(image: img_6087-copy.gif) blah blah
(image: img_9999_copy.jpg)
(image: somehow--a-comma.jpg)
(image: other-ridic-ulous-characters-.jpg)

然后,您可以使用str_replace()在回调中微调要转换为什么字符。

希望它有所帮助! ;)

答案 3 :(得分:-1)

你能试试Jetbrains webstrom前端IDE。它提供了许多功能,可以以可读的方式实现任何正则表达式操作。选择要拆分的文本,检查分隔符或任何空格。

你将获得30天的试用版。也会很快分享你的正则表达式查询。

同时结帐http://myregexp.com/或一些插件以验证您的正则表达式查询

Online Regex editor