Question

data = open("state_towns.txt")
    for line in data:
        print(line)

返回以下列表：

Colorado[edit]

Alamosa (Adams State College)[2]

Boulder (University of Colorado at Boulder)[12]

Durango (Fort Lewis College)[2]

Connecticut[edit]

Fairfield (Fairfield University, Sacred Heart University)

Middletown (Wesleyan University)

New Britain (Central Connecticut State University)

我想返回一个包含状态和区域两列的数据框，如下所示：

    State        Town
0   Colorado     Alamosa
1   Colorado     Boulder
2   Colorado     Durango 
3   Connecticut  Fairfield
4   Connecticut  Middletown
5   Connecticut  New Britain

我如何拆分列表，以便将包含“ [edit]”的任何行添加到状态列？

我该如何删除城镇条目中括号中的所有文本？

谢谢

Answer 1

d = {"state":[], "town":[]} #dictionary to hold the data
state = "" #placeholder state var
town = "" #placeholder town var

data = open("state_towns.txt")
    for line in data:
        if "[edit]" in line:
            state = line.replace("[edit]","") #set the state var if it has edit
        else:
            town = line.split()[0] #remove the extra town line info
        if state != "" and town != "": # if both vars are filled add to dictionary
            d["state"].append(state)
            d["town"].append(town)


import pandas as pd
df = pd.DataFrame(d)
print(df)

这很奇怪，但确实可以做到。

占位符状态，在循环中定义的占位符镇。如果两者都定义，则将它们添加到字典中，完成后将字典转换为数据框。

将文件解析为数据框

1 个答案: