Lua模式匹配

时间:2013-02-04 22:16:18

标签: lua lua-patterns

我有一个文件可以从具有RTF格式标记的Microsoft Lync会话中提取值。示例文件如下:

  

{\ rtf1 \ fbidis \ ansi \ ansicpg1252 \ deff0 \ nouicompat \ deflang1033 {\ fonttbl {\ f0 \ fnil \ fcharset0> Segoe UI;} {\ f1 \ fnil Segoe UI;}}   {\ colortbl; \ red0 \ green0 \ blue0;}   {* \ generator Riched20 15.0.4420} {* \ mmathPr \ mwrapIndent1440} \ viewkind4 \ uc1   \ pard \ cf1 \ embo \ f0 \ fs20 Craig ... \ embo0 \ embo请\ embo0 \ embo关闭\ embo0 \ embo>输出\ embo0 \ embo of \ embo0 \ embo your \ embo0 \ embo old \ embo0 \ embo client \ embo0 \ embo>和\ embo0 \ embo重新打开\ embo0 \ f1 \ par   {* \ lyncflags rtf = 1}}

使用Lua脚本我试图删除RTF标签,只是拉出对话的文本。所以我的功能结果应该是:

  

Craig ...请关闭您的旧客户并重新打开

我尝试使用带有正则表达式的string.gsub来匹配模式,并用空格替换它们只留下文本但它不起作用。这是我到目前为止string.gsub:

的代码
result = string.gsub(s, "\{\*?\\[^{}]+}|[{}]|\\\n?[A-Za-z]+\n?(?:-?\d+)?[ ]?", " ")

任何建议都将不胜感激!

其他:

  
    

user1@capital.com @ 2013-01-18 17:48:03Z(TO:user2@capital.com)

  

{\ rtf1 \ fbidis \ ansi \ ansicpg1252 \ deff0 \ nouicompat \ deflang1033 {\ fonttbl {\ f0 \ fnil \ fcharset0 Segoe UI;} {\ f1 \ fnil Segoe UI;}} {\ colortbl; \ red0 \ green0 \ blue0;} {* \ generator Riched20 15.0.4420} {* \ mmathPr \ mwrapIndent1440} \ viewkind4 \ uc1 \ pard \ cf1 \ embo \ f0 \ fs20的作品\ embo0 \ embo为\ embo0 \ embo me .. \ embo0 \ embo如何\ embo0 \ embo关于\ embo0 \ embo嵌入\ embo0 \ embo图片?\ embo0 \ f1 \ par {* \ lyncflags rtf = 1}}

  
    

user1@capital.com @ 2013-01-18 17:48:57Z(TO:user2@capital.com)

  

{\ rtf1 \ fbidis \ ansi \ ansicpg1252 \ deff0 \ nouicompat \ deflang1033 {\ fonttbl {\ f0 \ fnil \ fcharset0 Segoe UI;} {\ f1 \ fnil Segoe UI;}} {\ colortbl; \ red0 \ green0 \ blue0;} {* \ generator Riched20 15.0.4420} {* \ mmathPr \ mwrapIndent1440} \ viewkind4 \ uc1 \ pard \ cf1 \ embo \ f0 \ fs20我\ embo0 \ embo看\ embo0 \ embo它\ embo0 \ f1 \ par {* \ lyncflags rtf = 1}}

  
    

user1@capital.com @ 2013-01-18 17:49:27Z(TO:user2@capital.com)

  

{\ rtf1 \ fbidis \ ansi \ ansicpg1252 \ deff0 \ nouicompat \ deflang1033 {\ fonttbl {\ f0 \ fnil \ fcharset0 Segoe UI;} {\ f1 \ fnil Segoe UI;}} {\ colortbl; \ red0 \ green0 \ blue0;} {* \ generator Riched20 15.0.4420} {* \ mmathPr \ mwrapIndent1440} \ viewkind4 \ uc1 \ pard \ cf1 \ embo \ f0 \ fs20让我们\ embo0 \ embo尝试\ embo0 \ embo \ embo0 \ embo会议。\ embo0 \ f1 \ par {* \ lyncflags rtf = 1}}

2 个答案:

答案 0 :(得分:2)

Lua模式没有or个运算符(|)或可选分组((?:...)?)。这样的事情可能有用:

s:match("{(.+)}"):gsub("%b{}", ""):gsub("\\%w+", "")

将返回:

"    Craig...  please  close  >out  of  your  old  client  >and  re-open "

首先gsub删除所有{}对及其内容,第二个gsub删除所有rtf标记(尽管似乎有一些允许空格,所以你可以需要调整模式)。

答案 1 :(得分:0)

尝试一下:

local s = '{\rtf1\fbidis\ansi\ansicpg1252\deff0\nouicompat\deflang1033{\fonttbl{\f0\fnil\fcharset0 >Segoe UI;}{\f1\fnil Segoe UI;}} {\colortbl ;\red0\green0\blue0;} {*\generator Riched20 15.0.4420}{*\mmathPr\mwrapIndent1440 }\viewkind4\uc1 \pard\cf1\embo\f0\fs20 Craig...\embo0 \embo please\embo0 \embo close\embo0 \embo >out\embo0 \embo of\embo0 \embo your\embo0 \embo old\embo0 \embo client\embo0 \embo >and\embo0 \embo re-open\embo0\f1\par {*\lyncflags rtf=1}}\n'
    ..'{\rtf1\fbidis\ansi\ansicpg1252\deff0\nouicompat\deflang1033{\fonttbl{\f0\fnil\fcharset0 Segoe UI;}{\f1\fnil Segoe UI;}} {\colortbl ;\red0\green0\blue0;} {*\generator Riched20 15.0.4420}{*\mmathPr\mwrapIndent1440 }\viewkind4\uc1 \pard\cf1\embo\f0\fs20 works\embo0 \embo for\embo0 \embo me..\embo0 \embo how\embo0 \embo about\embo0 \embo embedding\embo0 \embo pictures?\embo0\f1\par {*\lyncflags rtf=1}}\n'
    ..'{\rtf1\fbidis\ansi\ansicpg1252\deff0\nouicompat\deflang1033{\fonttbl{\f0\fnil\fcharset0 Segoe UI;}{\f1\fnil Segoe UI;}} {\colortbl ;\red0\green0\blue0;} {*\generator Riched20 15.0.4420}{*\mmathPr\mwrapIndent1440 }\viewkind4\uc1 \pard\cf1\embo\f0\fs20 I\embo0 \embo see\embo0 \embo it\embo0\f1\par {*\lyncflags rtf=1}}\n'
    ..'{\rtf1\fbidis\ansi\ansicpg1252\deff0\nouicompat\deflang1033{\fonttbl{\f0\fnil\fcharset0 Segoe UI;}{\f1\fnil Segoe UI;}} {\colortbl ;\red0\green0\blue0;} {*\generator Riched20 15.0.4420}{*\mmathPr\mwrapIndent1440 }\viewkind4\uc1 \pard\cf1\embo\f0\fs20 let\'s\embo0 \embo try\embo0 \embo a\embo0 \embo meeting.\embo0\f1\par {*\lyncflags rtf=1}}\n'
local text = string.gsub(s, '{(.-)}[}]?', ''):gsub('embo',''):gsub('0',''):gsub('iewkind4uc1 pardcf1',''):gsub('1par',''):gsub('s2',''):gsub('>','')
print(text)
  

输出

     

Craig... please close out of your old client and re-open
works for me.. how about embedding pictures?
I see it
let's try a meeting.