在python中读取文件的前N行

时间:2009-11-20 00:09:33

标签: python file head

我们有一个大的原始数据文件,我们想要修剪到指定的大小。 我在.net c#中很有经验,但是想在python中做这件事来简化事情并且没有兴趣。

如何在python中获取文本文件的前N行? 使用的操作系统会对实现产生影响吗?

17 个答案:

答案 0 :(得分:203)

Python 2

with open("datafile") as myfile:
    head = [next(myfile) for x in xrange(N)]
print head

Python 3

with open("datafile") as myfile:
    head = [next(myfile) for x in range(N)]
print(head)

这是另一种方式(Python 2和3)

from itertools import islice
with open("datafile") as myfile:
    head = list(islice(myfile, N))
print head

答案 1 :(得分:16)

N = 10
file = open("file.txt", "a")#the a opens it in append mode
for i in range(N):
    line = file.next().strip()
    print line
file.close()

答案 2 :(得分:13)

如果您想快速阅读第一行并且不关心性能,可以使用.readlines()返回列表对象,然后对列表进行切片。

E.g。对于前5行:

with open("pathofmyfileandfileandname") as myfile:
    firstNlines=myfile.readlines()[0:5] #put here the interval you want
  

注意:读取整个文件,从性能的角度来看不是最好的,但它   易于使用,编写速度快,易记,所以如果你想要只是执行   一些一次性计算非常方便

print firstNlines

答案 3 :(得分:8)

我所做的是使用pandas调用N行。我认为性能不是最好的,但例如N=1000

import pandas as pd
yourfile = pd.read('path/to/your/file.csv',nrows=1000)

答案 4 :(得分:5)

没有特定的方法来读取文件对象公开的行数。

我想最简单的方法是:

lines =[]
with open(file_name) as f:
    lines.extend(f.readline() for i in xrange(N))

答案 5 :(得分:4)

这两种最直观的方法是:

  1. 逐行迭代文件,break行后N

  2. 使用next()方法N次逐行迭代文件。 (这基本上只是顶部答案的不同语法。)

  3. 以下是代码:

    # Method 1:
    with open("fileName", "r") as f:
        counter = 0
        for line in f:
            print line
            counter += 1
            if counter == N: break
    
    # Method 2:
    with open("fileName", "r") as f:
        for i in xrange(N):
            line = f.next()
            print line
    

    最重要的是,只要您不将readlines()enumerate整个文件用于内存,就有很多选择。

答案 6 :(得分:4)

基于gnibbler最高投票回答(09年11月20日0:27):此类将head()和tail()方法添加到文件对象。

class File(file):
    def head(self, lines_2find=1):
        self.seek(0)                            #Rewind file
        return [self.next() for x in xrange(lines_2find)]

    def tail(self, lines_2find=1):  
        self.seek(0, 2)                         #go to end of file
        bytes_in_file = self.tell()             
        lines_found, total_bytes_scanned = 0, 0
        while (lines_2find+1 > lines_found and
               bytes_in_file > total_bytes_scanned): 
            byte_block = min(1024, bytes_in_file-total_bytes_scanned)
            self.seek(-(byte_block+total_bytes_scanned), 2)
            total_bytes_scanned += byte_block
            lines_found += self.read(1024).count('\n')
        self.seek(-total_bytes_scanned, 2)
        line_list = list(self.readlines())
        return line_list[-lines_2find:]

用法:

f = File('path/to/file', 'r')
f.head(3)
f.tail(3)

答案 7 :(得分:3)

我自己最方便的方式:

LINE_COUNT = 3
print [s for (i, s) in enumerate(open('test.txt')) if i < LINE_COUNT]

基于List Comprehension的解决方案 open()函数支持迭代接口。 enumerate()包含open()和返回元组(index,item),然后我们检查我们是否在可接受的范围内(如果我&lt; LINE_COUNT),然后只是打印结果。

享受Python。 ;)

答案 8 :(得分:2)

如果您有一个非常大的文件,并且假设您希望输出为numpy数组,则使用np.genfromtxt将冻结您的计算机。根据我的经验,这是非常好的:

def load_big_file(fname,maxrows):
'''only works for well-formed text file of space-separated doubles'''

rows = []  # unknown number of lines, so use list

with open(fname) as f:
    j=0        
    for line in f:
        if j==maxrows:
            break
        else:
            line = [float(s) for s in line.split()]
            rows.append(np.array(line, dtype = np.double))
            j+=1
return np.vstack(rows)  # convert list of vectors to array

答案 9 :(得分:2)

对于前5行,只需执行:

    NavigationView navigationView = (NavigationView)findViewById(R.id.nav_view);// initialization of navigation menu
    navigationView.setNavigationItemSelectedListener(this);//adding listener to navigation menu

    List<String> item = db.getAllMenu();//getting data from database

    ListView lv=(ListView)findViewById(R.id.list_view_inside_nav);//initialization of listview

    String[] lv_arr = new String[item.size()];//creating a String[] just as the size of the data retrieved from database

    //adding all data from item list to lv_arr[]
    for(int i=0;i<item.size();i++){
        lv_arr[i]= String.valueOf(item.get(i));
    }

    //setting adapter to listview
    lv.setAdapter(new ArrayAdapter<String>(MainActivity.this,
           android.R.layout.simple_list_item_1, lv_arr));  

答案 10 :(得分:2)

从Python 2.6开始,您可以利用IO base clase中更复杂的功能。所以上面评价最高的答案可以改写为:

    with open("datafile") as myfile:
       head = myfile.readlines(N)
    print head

(您不必担心您的文件少于N行,因为没有抛出StopIteration异常。)

答案 11 :(得分:2)

如果你想要一些显而易见的东西(没有在手册中查找深奥的东西),没有导入和try / except工作,并且适用于相当多的Python 2.x版本(2.2到2.6):

def headn(file_name, n):
    """Like *x head -N command"""
    result = []
    nlines = 0
    assert n >= 1
    for line in open(file_name):
        result.append(line)
        nlines += 1
        if nlines >= n:
            break
    return result

if __name__ == "__main__":
    import sys
    rval = headn(sys.argv[1], int(sys.argv[2]))
    print rval
    print len(rval)

答案 12 :(得分:1)

#!/usr/bin/python

import subprocess

p = subprocess.Popen(["tail", "-n 3", "passlist"], stdout=subprocess.PIPE)

output, err = p.communicate()

print  output

此方法为我工作

答案 13 :(得分:1)

我想通过读取整个文件来处理少于 n 行的文件

def head(filename: str, n: int):
    try:
        with open(filename) as f:
            head_lines = [next(f).rstrip() for x in range(n)]
    except StopIteration:
        with open(filename) as f:
            head_lines = f.read().splitlines()
    return head_lines

感谢 John La Rooy 和 Ilian Iliev。使用带有异常句柄的函数以获得最佳性能

Revise 1: 感谢 FrankM 的反馈,为了处理文件存在和读取权限我们可以进一步添加

import errno
import os

def head(filename: str, n: int):
    if not os.path.isfile(filename):
        raise FileNotFoundError(errno.ENOENT, os.strerror(errno.ENOENT), filename)  
    if not os.access(filename, os.R_OK):
        raise PermissionError(errno.EACCES, os.strerror(errno.EACCES), filename)     
   
    try:
        with open(filename) as f:
            head_lines = [next(f).rstrip() for x in range(n)]
    except StopIteration:
        with open(filename) as f:
            head_lines = f.read().splitlines()
    return head_lines

您可以使用第二个版本,也可以使用第一个版本,稍后再处理文件异常。检查速度很快,而且基本上从性能角度来看是免费的

答案 14 :(得分:0)

这对我有用

f = open("history_export.csv", "r")
line= 5
for x in range(line):
    a = f.readline()
    print(a)

答案 15 :(得分:0)

这适用于Python 2和3:

from itertools import islice

with open('/tmp/filename.txt') as inf:
    for line in islice(inf, N, N+M):
        print(line)

答案 16 :(得分:0)


fname = input("Enter file name: ")
num_lines = 0

with open(fname, 'r') as f: #lines count
    for line in f:
        num_lines += 1

num_lines_input = int (input("Enter line numbers: "))

if num_lines_input <= num_lines:
    f = open(fname, "r")
    for x in range(num_lines_input):
        a = f.readline()
        print(a)

else:
    f = open(fname, "r")
    for x in range(num_lines_input):
        a = f.readline()
        print(a)
        print("Don't have", num_lines_input, " lines print as much as you can")


print("Total lines in the text",num_lines)