为什么这个正则表达式不匹配?

时间:2013-06-17 16:44:06

标签: python regex

我有一个格式如下的文件:

/* No comment provided by engineer. */
"Logout Successful!" = "Logout Successful!";

/* No comment provided by engineer. */
"London" = "London";

/* No comment provided by engineer. */
"Low Balance" = "Low Balance";

/* No comment provided by engineer. */
"Low-Cost Call" = "Low-Cost Call";

/* No comment provided by engineer. */
"Making A Low Cost Call" = "Making A Low Cost Call";

/* No comment provided by engineer. */
"Making FREE Calls" = "Making FREE Calls";

/* No comment provided by engineer. */
"MNO" = "MNO";

/* No comment provided by engineer. */
"more free credit" = "more free credit";

/* No comment provided by engineer. */
"My Phone Number" = "My Phone Number";

/* No comment provided by engineer. */
"My Purchase is Missing" = "My Purchase is Missing";

/* No comment provided by engineer. */
"Next" = "Next";

/* No comment provided by engineer. */
"NO" = "NO";

/* No comment provided by engineer. */
"No" = "No";

/* No comment provided by engineer. */
"No Balance" = "No Balance";

/* No comment provided by engineer. */
"Post Successful" = "Post Successful";

/* No comment provided by engineer. */
"Post to %d %@ Facebook Wall" = "Post to %1$d %2$@ Facebook Wall";

/* No comment provided by engineer. */
"Post to Facebook Wall" = "Post to Facebook Wall";

/* No comment provided by engineer. */
"Post To My Facebook Wall" = "Post To My Facebook Wall";

/* No comment provided by engineer. */
"Post to My Wall" = "Post to My Wall";

/* No comment provided by engineer. */
"Posted" = "Posted";

/* No comment provided by engineer. */
"Posting" = "Posting";

/* No comment provided by engineer. */
"Posting to Your Facebook Wall..." = "Posting to Your Facebook Wall...";

/* No comment provided by engineer. */
"PQRS" = "PQRS";

/* No comment provided by engineer. */
"Proceed" = "Proceed";

/* No comment provided by engineer. */
"Proceed, Don't Show Again" = "Proceed, Don't Show Again";

/* No comment provided by engineer. */
"Processing..." = "Processing...";

/* No comment provided by engineer. */
"Purchase History" = "Purchase History";

/* No comment provided by engineer. */
"Rates" = "Rates";

/* No comment provided by engineer. */
"Remind me later" = "Remind me later";

/* No comment provided by engineer. */
"Restart" = "Restart";

/* No comment provided by engineer. */
"Retry Failed" = "Retry Failed";

/* No comment provided by engineer. */
"Return to %@ after each call ends" = "Return to %@ after each call ends";

/* No comment provided by engineer. */
"Return To App After Call" = "Return To App After Call";

/* No comment provided by engineer. */
"Roaming Support" = "Roaming Support";

/* No comment provided by engineer. */
"Roaming Warning!" = "Roaming Warning!";

/* No comment provided by engineer. */
"Searching..." = "Searching...";

/* No comment provided by engineer. */
"See The Time In Any Country" = "See The Time In Any Country";

/* No comment provided by engineer. */
"Select All" = "Select All";

/* No comment provided by engineer. */
"Select the number for an iPhone with %@" = "Select the number for an iPhone with %@";

/* No comment provided by engineer. */
"Send" = "Send";

/* No comment provided by engineer. */
"Send a Text Message" = "Send a Text Message";

/* No comment provided by engineer. */
"Sending..." = "Sending...";

/* No comment provided by engineer. */
"Settings" = "Settings";

/* No comment provided by engineer. */
"Show All" = "Show All";

/* No comment provided by engineer. */
"Show Me How" = "Show Me How";

/* No comment provided by engineer. */
"Show Selected" = "Show Selected";

/* No comment provided by engineer. */
"Sign In" = "Sign In";

/* No comment provided by engineer. */
"Signing in..." = "Signing in...";

/* No comment provided by engineer. */
"Skip" = "Skip";

/* No comment provided by engineer. */
"SMS" = "SMS";

/* No comment provided by engineer. */
"Speed Dial & Favorites" = "Speed Dial & Favorites";

/* No comment provided by engineer. */
"Store" = "Store";

/* No comment provided by engineer. */
"Success" = "Success";

/* No comment provided by engineer. */
"Success!" = "Success!";

/* No comment provided by engineer. */
"Support" = "Support";

/* No comment provided by engineer. */
"System Status" = "System Status";

/* No comment provided by engineer. */
"Tapjoy Offers" = "Tapjoy Offers";

/* No comment provided by engineer. */
"Tell %d Friend%@" = "Tell %1$d Friend%2$@";

/* No comment provided by engineer. */
"Tell Facebook Friends" = "Tell Facebook Friends";

/* No comment provided by engineer. */
"Tell Friends" = "Tell Friends";

/* No comment provided by engineer. */
"Tell Friends About %@" = "Tell Friends About %@";

/* No comment provided by engineer. */
"Tell via E-Mail" = "Tell via E-Mail";

/* No comment provided by engineer. */
"Tell via SMS" = "Tell via SMS";

/* No comment provided by engineer. */
"Test Call" = "Test Call";

/* No comment provided by engineer. */
"Text Message" = "Text Message";

/* No comment provided by engineer. */
"Try Again" = "Try Again";

/* No comment provided by engineer. */
"Turning Caller ID ON/OFF" = "Turning Caller ID ON/OFF";

/* No comment provided by engineer. */
"TUV" = "TUV";

/* No comment provided by engineer. */
"Tweet to Friends" = "Tweet to Friends";

/* No comment provided by engineer. */
"Unable to Call" = "Unable to Call";

/* No comment provided by engineer. */
"Unable to Check Talk Time" = "Unable to Check Talk Time";

/* No comment provided by engineer. */
"Unable to connect." = "Unable to connect.";

/* No comment provided by engineer. */
"Unable to Create Account" = "Unable to Create Account";

/* No comment provided by engineer. */
"Unable to Purchase" = "Unable to Purchase";

/* No comment provided by engineer. */
"Unable to Sign In" = "Unable to Sign In";

/* No comment provided by engineer. */
"Unknown" = "Unknown";

/* No comment provided by engineer. */
"unknown caller" = "unknown caller";

/* No comment provided by engineer. */
"Unselect All" = "Unselect All";

/* No comment provided by engineer. */
"Updating Your Phone Number" = "Updating Your Phone Number";

/* No comment provided by engineer. */
"VoIP %@" = "VoIP %@";

/* No comment provided by engineer. */
"WARNING!" = "WARNING!";

我想使用正则表达式来解析它,只获取键和值而不将引号括在字典中:

def load_replacement_dict(file_name):
    with open(file_name, 'r') as f:
        content = f.read()
        resultDict = {}

        dictionary_regex = re.compile('"([^"]*)" = "([^"]*)"',)

        for result in dictionary_regex.finditer(content):
            resultDict[result.group(1)] = result.group(2)

        for key, value in resultDict.items():
            print (key+" = "+value).decode('utf-8')

        return resultDict

第一个子组匹配,但是当我在此之后添加任何内容时,它将停止匹配。我尝试使用空间,使用\ s,似乎没有任何东西匹配等号周围的空格。我在这里缺少什么?

编辑:我发现如果从文件开头删除unicode字节顺序标记,则正则表达式可以正常工作。显然不是解决方案,但可能是如何修改正则表达式的线索?

5 个答案:

答案 0 :(得分:5)

在我看来,使用字符串方法而不是正则表达式可以更轻松地完成您想要实现的目标:

>>> s = '"A Key With \"quotes\" in it" = " Another Value "'
>>> l,r = [v.strip().strip('"').strip() for v in s.split('=')]
>>> l,r
 ('A Key With "quotes" in it', 'Another Value')

转义将被保留,它只会因为我创建字符串的方式而丢失。我从文件中读取文本,然后会发生什么:

In [1]: lines = open('x.txt').read().splitlines()

In [2]: for s in lines: print [v.strip().strip('"').strip() for v in s.split('=')]
   ...: 
['Some Key', 'Some Value']
['Another Key', 'Another Value']
['A Key With \\"quotes\\" in it', 'Another Value']

答案 1 :(得分:3)

为避免转义引号问题,您可以使用此

"((?:[^"]+|(?<=\\)")*)" = "((?:[^"]+|(?<=\\)")*)"

答案 2 :(得分:1)

您没有检查正则表达式中值的引号,因此无法匹配。此外,为了处理键或值内的转义引号,我相信这应该涵盖它:

dictionary_regex = re.compile(r'"((?:(?:\\")|[^"])*)" = "((?:(?:\\")|[^"])*)"')

答案 3 :(得分:1)

它最终成为编码问题。该文件是UTF-16。一旦我加入:

with codecs.open(file_name, 'r', 'utf-16') as f:

正则表达式正常。

答案 4 :(得分:0)

使用已发布的示例键值对,以下正则表达式似乎正在运行:

re.compile('"(.*)" = "(.*)"')

我错过了什么吗?