将文件解析为数据框

时间:2019-01-28 17:31:39

标签: python pandas

data = open("state_towns.txt")
    for line in data:
        print(line)

返回以下列表:

Colorado[edit]

Alamosa (Adams State College)[2]

Boulder (University of Colorado at Boulder)[12]

Durango (Fort Lewis College)[2]

Connecticut[edit]

Fairfield (Fairfield University, Sacred Heart University)

Middletown (Wesleyan University)

New Britain (Central Connecticut State University)

我想返回一个包含状态和区域两列的数据框,如下所示:

    State        Town
0   Colorado     Alamosa
1   Colorado     Boulder
2   Colorado     Durango 
3   Connecticut  Fairfield
4   Connecticut  Middletown
5   Connecticut  New Britain

我如何拆分列表,以便将包含“ [edit]”的任何行添加到状态列?

我该如何删除城镇条目中括号中的所有文本?

谢谢

1 个答案:

答案 0 :(得分:0)

d = {"state":[], "town":[]} #dictionary to hold the data
state = "" #placeholder state var
town = "" #placeholder town var

data = open("state_towns.txt")
    for line in data:
        if "[edit]" in line:
            state = line.replace("[edit]","") #set the state var if it has edit
        else:
            town = line.split()[0] #remove the extra town line info
        if state != "" and town != "": # if both vars are filled add to dictionary
            d["state"].append(state)
            d["town"].append(town)


import pandas as pd
df = pd.DataFrame(d)
print(df)

这很奇怪,但确实可以做到。

占位符状态,在循环中定义的占位符镇。如果两者都定义,则将它们添加到字典中,完成后将字典转换为数据框。