Question

我有一个150500行的csv文件，我想将其拆分为包含500行（条目）的多个文件

我正在使用Jupyter，我知道如何打开和读取文件。但是，我不知道如何指定output_path来记录新文件的创建，而不是拆分大文件。

我已经在网上找到了此代码，但是又一次因为我不知道我的output_path是什么，所以我不知道如何使用它。而且，对于这段代码，我不理解我们如何指定输入文件。

function create() {
    $("<div class='question container' id='question' style='background-color:lavender; border:solid 15px darkcyan; height:500px; border-radius:32px'><div class='input-group' style='margin-top:10px'><input type='text' placeholder='Soru Gir...' class='form-control'/> &nbsp; &nbsp;<button class='btn btn-danger' onclick='remove()' >Sil</button>&nbsp; &nbsp;<button class='btn btn-success' onclick='createSelection()'>Şık Ekle</button></div></div></br>").insertBefore("#btn")
}

function remove() {$("#question").remove()}

我的文件名为DataSet2.csv，与ipynb笔记本正在运行时在jupyter中的文件相同。

Answer 1

number_of_small_files = 301
lines_per_small_file = 500

largeFile = open('large.csv', 'r')
header = largeFile.readline()

for i in range(number_of_small_files):
    smallFile = open(str(i) + '_small.csv', 'w')

    smallFile.write(header) # This line copies the header to all small files

    for x in range(lines_per_small_file):
        line = largeFile.readline()
        smallFile.write(line)

    smallFile.close()

largeFile.close()

这将在同一目录中创建许多小文件。其中约301个。它们将从0_small.csv到300_small.csv命名。

Answer 2

使用标准的unix实用程序：

cat DataSet2.csv | tail -n +2 | split -l 500 --additional-suffix=.csv output_

此管道获取原始文件，以'tail -n +2'去除第一行，然后将其余部分分成500行，并放入以'output_'开头并以''结尾的文件中.csv'

将CSV文件拆分为多个文件

2 个答案: