将文本文件读入矩阵 - python

时间:2018-02-11 01:25:43

标签: python matrix

我有一个文本文件,其中包含m行,如下所示:

  

0.4698537878,0.1361006627,0.2400000000,0.7209302326,0.0054816275,0.0116666667,1   0.5146649986,0.0449680289,0.4696969697,0.5596330275,0.0017155500,0.0033333333,0   0.4830107706,0.0684999306,0.3437500000,0.5600000000,0.0056351257,0.0116666667,0   0.4458490073,0.1175445834,0.2307692308,0.6212121212,0.0089169801,0.0200000000,0

我尝试读取文件并将其复制到矩阵中,如下面的代码所示:

import string

file = open("datasets/train.txt",encoding='utf8')

for line in file.readlines():
    tmp = line.strip()
    tmp = tmp.split(",")
    idx = np.vstack(tmp)
    idy = np.hstack(tmp[12])

matrix = idx

我想将文件读入矩阵,在我的样本数据中,矩阵大小应为:(4,6) 和idy:(4,1)#最后一行,标签

但是它垂直堆叠了文件的最后一行! 那样:

  

0.4458490073,

     

0.1175445834,

     

0.2307692308,

     

0.6212121212,

     

0.0089169801,

     

0.0200000000,

     

0

任何帮助?

2 个答案:

答案 0 :(得分:3)

由于您使用的是numpy,因此该功能已经可用:

arr = np.genfromtxt('file.csv', delimiter=',')

然后您可以按如下方式分隔标题:

data = arr[:, :-1]
header = arr[:, -1:]

答案 1 :(得分:1)

这可以为idx变量提供正确的形状(4,6),为标签提供(4,1)

alllines = open('train.txt', 'r').readlines()
# shape (4,6)
idx = np.matrix([line.replace('\n', '').split(',')[0:6] for line in alllines])
# reshape to (4,1) for labels
idy = np.matrix([line.replace('\n', '').split(',')[6] for line in alllines]).reshape(-1, 1)