Question

不确定如何开始使用。我有许多制表符分隔文件，我希望能够放入数据库。然而，困难的是桌子没有以最好的方式布置。例如，父行将被指定为字母（D），然后该父对象下的行对应于父对象，直到列出下一个D行

理想情况下，我希望与父项在同一行中的所有子行。为了把它放入数据库并查询结果（除非有另一种方式）

以下是数据的链接：http://www.gasnom.com/ip/vector/archive.cfm?type=4

在任何人提及数据之前更好地直观表示数据，我无法抓取html数据，因为这是唯一具有相应网站的数据文件。

http://www.vector-pipeline.com/Informational-Postings/Index-of-Customers.aspx

Answer 1

我认为这很有效。它只是在“父”行列表中的每个“父”行的末尾添加一个“子”行列表。

customer_file = open('index_of_customers.txt', 'r') # you should of course do more try-except stuff in your script
database = []                                       # all data ends up here
for each_line in customer_file:                     # reads one line at a time
    each_line = each_line.strip('\n')               # removes newlines
    each_line = each_line.split('\t')               # split the line of text into a list. This should save any empty columns aswell
    if each_line[0] == 'D':                         # if line starts with a single D
        each_line.append([])                        # add a list for the other lines at the end of the D line
        database.append( each_line )                # add a D line to the "database" as a list
    else:                                           # if line don't start with a single D
        if len(database):                           # the first line is not a D line, so we need to check if the database is empty to avoid errors
            database[-1][-1].append(each_line)      # add the line to the last D line's list. 
for each_D_line in database:                        # prints out the database in an ugly way
    print( str(each_D_line[:-1]) )                  # first the D lines
    for each_other_line in each_D_line[-1]:
        print( '\t' + str(each_other_line) )        # then each other line

迭代嵌套表/ spreadhseet

1 个答案: