将RTF转换为纯文本数组

时间:2015-03-09 17:11:22

标签: php regex

$string = "
{\rtf1\ansi\ansicpg1252\uc1\htmautsp\deff2{\fonttbl{\f0\fcharset0 Times New Roman;}{\f2\fcharset0 Tahoma;}}{\colortbl\red0\green0\blue0;\red255\green255\blue255;\red0\green0\blue255;}\loch\hich\dbch\pard\plain\ltrpar\itap0{\lang1033\fs18\f2\cf0 \cf0\ql
{\f2\cf2 {\ltrch <- MBisono--2/13/2015 12:01:25 PM ->}\li0\ri0\sa0\sb0\fi0\ql\par}
{\f2\cf2 {\ltrch How are you? Hope all is well.  Just wanted to drop you a note that our benefits seem to be getting screwed up every time we have a new employee or if someone changes something. We have certain rules set up for Class 1 and Class 2 and it does not seem like the benefits dept is following them. }\li0\ri0\sa0\sb0\fi0\ql\par}
{\f2\cf2 {\ltrch  }\li0\ri0\sa0\sb0\fi0\ql\par}
{\f2\cf2 {\ltrch Payroll is great we love Christine. It just seems like there is always something wrong with our benefits.}\li0\ri0\sa0\sb0\fi0\ql\par}
{\f2\cf2 {\ltrch  }\li0\ri0\sa0\sb0\fi0\ql\par}
{\f2\cf2 {\ltrch Alexis}\li0\ri0\sa0\sb0\fi0\ql\par}
{\f2\cf2 {\ltrch  }\li0\ri0\sa0\sb0\fi0\ql\par}
{\f2\cf2 {\ltrch Met with Admin and Benefits to discuss MAcGuffin's benefits.  Admin has had no issues, Benefits advised that recently an employee was set up with contributions, when it should have been 100% employer paid. }\li0\ri0\sa0\sb0\fi0\ql\par}
}
}";

我有一个像上面那样的RTF字符串。如何将其转换为普通字符串?我希望它是一个像这样的数组。

array(
    '<- MBisono--2/13/2015 12:01:25 PM ->',
    'How are you? Hope all is well.  Just wanted to drop you a note that our benefits seem to be getting screwed up every time we have a new employee or if someone changes something. We have certain rules set up for Class 1 and Class 2 and it does not seem like the benefits dept is following them.',
    'Payroll is great we love Christine. It just seems like there is always something wrong with our benefits.',
    'Alexis',
    'Met with Admin and Benefits to discuss MAcGuffin\'s benefits.  Admin has had no issues, Benefits advised that recently an employee was set up with contributions, when it should have been 100% employer paid.'
)

字符串始终以&#34; \ ltrch&#34;开头。并以&#34;} \ li0&#34;结束。希望有所帮助。谢谢你的正则表达专业人士!

1 个答案:

答案 0 :(得分:0)

我不熟悉 RTF,但我已经根据您的输入设计了一个片段。

存储目标子字符串前后的子字符串,然后通过转义反斜杠为正则表达式引擎准备它们,然后通过调用 preg_quote() 要求在正则表达式中按字面处理特殊字符。

\S 要求匹配以非空白字符开头——这省略了空行。

\s* 在捕获组之后对 rtrim() 任何不需要的尾随空格起作用。

代码:(Demo)

$string = <<<'TEXT'
{\rtf1\ansi\ansicpg1252\uc1\htmautsp\deff2{\fonttbl{\f0\fcharset0 Times New Roman;}{\f2\fcharset0 Tahoma;}}{\colortbl\red0\green0\blue0;\red255\green255\blue255;\red0\green0\blue255;}\loch\hich\dbch\pard\plain\ltrpar\itap0{\lang1033\fs18\f2\cf0 \cf0\ql
{\f2\cf2 {\ltrch <- MBisono--2/13/2015 12:01:25 PM ->}\li0\ri0\sa0\sb0\fi0\ql\par}
{\f2\cf2 {\ltrch How are you? Hope all is well.  Just wanted to drop you a note that our benefits seem to be getting screwed up every time we have a new employee or if someone changes something. We have certain rules set up for Class 1 and Class 2 and it does not seem like the benefits dept is following them. }\li0\ri0\sa0\sb0\fi0\ql\par}
{\f2\cf2 {\ltrch  }\li0\ri0\sa0\sb0\fi0\ql\par}
{\f2\cf2 {\ltrch Payroll is great we love Christine. It just seems like there is always something wrong with our benefits.}\li0\ri0\sa0\sb0\fi0\ql\par}
{\f2\cf2 {\ltrch  }\li0\ri0\sa0\sb0\fi0\ql\par}
{\f2\cf2 {\ltrch Alexis}\li0\ri0\sa0\sb0\fi0\ql\par}
{\f2\cf2 {\ltrch  }\li0\ri0\sa0\sb0\fi0\ql\par}
{\f2\cf2 {\ltrch Met with Admin and Benefits to discuss MAcGuffin's benefits.  Admin has had no issues, Benefits advised that recently an employee was set up with contributions, when it should have been 100% employer paid. }\li0\ri0\sa0\sb0\fi0\ql\par}
}
}
TEXT;

$start = preg_quote(addcslashes('{\ltrch ', '//'), '/');
$end = preg_quote(addcslashes('}\li0\ri0\sa0\sb0\fi0\ql\par}', '//'), '/');
var_export(
    preg_match_all(
        "/$start(\S.*?)\s*$end/",
        $string,
        $matches
    )
    ? $matches[1]
    : 'no matches'
);

输出:

array (
  0 => '<- MBisono--2/13/2015 12:01:25 PM ->',
  1 => 'How are you? Hope all is well.  Just wanted to drop you a note that our benefits seem to be getting screwed up every time we have a new employee or if someone changes something. We have certain rules set up for Class 1 and Class 2 and it does not seem like the benefits dept is following them.',
  2 => 'Payroll is great we love Christine. It just seems like there is always something wrong with our benefits.',
  3 => 'Alexis',
  4 => 'Met with Admin and Benefits to discuss MAcGuffin\'s benefits.  Admin has had no issues, Benefits advised that recently an employee was set up with contributions, when it should have been 100% employer paid.',
)

如果 \ri0\sa0\sb0\fi0\ql\par 是可变文本,您可以从 $end 声明中删除该部分。