我正在编写一个程序,用于确定特定字符串是否使用文本文件中指定的正则表达式的语言。在正则表达式中,有特殊符号被替换,因为它们不是传统的键盘字符。这些替换是:e表示epsilon,N表示空集,U表示联合,o表示连接,*表示星号运算符。我无法弄清楚是否需要编写代码来解释这种替换,以便程序正确运行,例如长度超过一个字符的字符串。例如,检查正则表达式是否包含U,然后定义其含义等等。
示例文本文件如下所示:
12
(((1U(2o1))U(2o2))o((1U2)*))
1
21
22
1212121
21121212
22121212
e
2
我的计划到目前为止:
'''
Searching text for string that are in the language
of a regular expression
'''
import sys
import re
# Part 1
# Read from file (alphabet, regular expression, sequence of strings)
fileName = sys.argv[1] # open file
# outPut = sys.argv[2]
alphabet = []
inputs = []
strings = {}
# Read lines from text file and store alphabet and regX
f = open(fileName, 'r')
lines = f.readlines()
alphabet = lines[0].strip()
del lines[0]
regx = lines[0]
del lines[0]
print(alphabet, regx, sep='\n') # debug print statement
# Remaining lines are the test regular expressions
# print(lines) # debug print statement
for line in lines:
splitLine = line.split()
strings[splitLine[0]] = ",".join(splitLine[0:])
print('Printing out the strings:', strings) # debug print statement
# Substitute for epsilon, concat, empty set, union and star
# Testing if the strings are apart of the regular expressions
for string in strings:
if string not in regx:
print('False', string)
else:
print('True', string)
在该特定文本文件的示例输出中:
12
(((1U(2o1))U(2o2))o((1U2)*))
Printing out the strings: {'1212121': '1212121', 'e': 'e', '22': '22'...}
False 1212121
False e
False 22
False 22121212
True 2
False 21
True 1
False 21121212
正确的输出应该是:
True 1
True 21
True 22
True 1212121
True 21121212
True 22121212
False e
False 2
答案 0 :(得分:0)
扩展tekim的建议:
你的正则表达式中的“o”只能像标准常规一样被删除 表达式,只是把两个东西放在一起就可以了 级联。 “U”变成“|”
strings
字典;它没有任何意义,只能混淆字符串的顺序。将您的正则表达式转换为标准语法:
regx = regx.strip().replace("U", "|").replace("o", "")+"$"
检查示例字符串并生成正确的输出:
for line in lines:
string = line.strip()
if re.match(regx, string):
print('True', string)
else:
print('False', string)