Question

我是Python的新手，我正在努力学习这一部分。文本文件中有大约25列，行数超过50,000列。对于其中一列＃11（ ZIP ），此列包含此格式的客户的所有邮政编码值＆＃34; 07598-XXXX ＆＃34; ，我只想获得前5个，所以＆＃34; 07598 ＆＃34;，我需要为整个专栏做这个，但我根据我当前的逻辑感到困惑怎么写呢。到目前为止，我的代码能够删除包含某些字符串的行，而且我还使用了＆＃39; |＆＃39;用于将其格式化为CSV的分隔符。

国家| ZIP（＃11）|第12栏| ....

NY | 60169-8547 | 98

NY | 60169-8973 | 58

NY | 11219-4598 | 25

NY | 11219-8475 | 12

NY | 20036-4879 | 56

如何遍历ZIP列并显示前5个字符？谢谢你的帮助！

import csv

my_file_name = "NVG.txt"
cleaned_file = "cleanNVG.csv"
remove_words = ['INAC-EIM','-INAC','TO-INAC','TO_INAC','SHIP_TO-inac','SHIP_TOINAC']


with open(my_file_name, 'r', newline='') as infile, open(cleaned_file, 'w',newline='') as outfile:
    writer = csv.writer(outfile)
    for line in csv.reader(infile, delimiter='|'):
        if not any(remove_word in element for element in line for remove_word in remove_words):
         writer.writerow(line)

Answer 1

'{:.5}'.format(zip_)

其中zip_是包含邮政编码的字符串。有关format的更多信息，请访问：https://docs.python.org/2/library/string.html#format-string-syntax

Answer 2

单独处理标题行，然后像往常一样逐行阅读，只需修改第二个line列，即截断为5个字符。

import csv

my_file_name = "NVG.txt"
cleaned_file = "cleanNVG.csv"
remove_words = ['INAC-EIM','-INAC','TO-INAC','TO_INAC','SHIP_TO-inac','SHIP_TOINAC']


with open(my_file_name, 'r', newline='') as infile, open(cleaned_file, 'w',newline='') as outfile:
    writer = csv.writer(outfile)
    cr = csv.reader(infile, delimiter='|')
    # iterate over title line and write it as-is
    writer.writerow(next(cr))
    for line in cr:
        if not any(remove_word in element for element in line for remove_word in remove_words):
            line[1] = line[1][:5]   # truncate
            writer.writerow(line)

或者，您可以使用line[1] = line[1].split("-")[0]来保留短划线字符左侧的所有内容。

注意标题行的特殊处理：cr是一个迭代器。我只是在for循环之前手动使用它来执行传递处理。

Answer 3

使用str[:6]

获取字符串中的前5个字符

在你的情况下：

with open(my_file_name, 'r', newline='') as infile, open(cleaned_file, 'w',newline='') as outfile:
    writer = csv.writer(outfile)
    for line in csv.reader(infile, delimiter='|'):
        if not any(remove_word in element for element in line for remove_word in remove_words):
            line[1] = line[1][:6]
            writer.writerow(line)

line[1] = line[1][:6]会将文件中的第二列设置为前5个字符。

将列重新格式化为前5个字符

3 个答案: