Question

我想一次读取一个文本文件（.bcp），每行100k，然后写入另一个文件。但是，有几行包含“替换”字符（十六进制值：001A），因此以下代码似乎只能读到该行，此后什么也看不到。我尝试在写入第二个文件之前从所述行中删除该字符，但是后来（使用print(lines,"\n")）意识到，遇到第一个这样的行后，什么也不会被读取。I used this article to understand the 'Substitute' character and get the Hex value，但基本上是我在Notepad ++中打开文件，它显示为'SUB'，黑色背景。请有人帮忙删除该字符吗？非常感谢！

with open('input' + '\\' + file) as fIn:

while 1:
    lines = fIn.readlines(100000)
    print(lines,"\n")
    if not lines:
        break
    for line in lines :
        #line = re.sub(r'\x001A', '', line)
        line = line.replace(r'\x001A', '')
        line = line.replace(r'\x009D', '')
        fp.write(line)

编辑：我在下面提供了一些示例行。第一行不含SUB字符，后两行包含SUB字符。因此，当尝试使用lines = fIn.readlines(100000)进行读取时，代码先读取第一行，然后读取第二行，直到遇到字符然后退出。您会注意到，当我在此处粘贴行时，SUB字符将被自动删除。

112411115ffg254b|302344|5.1234     |11111111|0|||1000|0|1015|          |0|5.1234     |11111111|1|0|1|1011|0|                                                                                                    |0
112400004eyg9gb5|302345|6.216     |22222222|0|||1001|0|1|          |0|6.216     |22222222|1|0|1|1|0|ù0                                                                                                |0
112200009ex12341|42581|3.119     |33333333|0|||1002|0|1|          |0|3.119     |33333333|1|0|1|1|1|Ù¸                                                                                                |0

读取文本文件时，Python会删除控制字符

0 个答案: