我正在尝试编写一个脚本,该脚本将遍历我的目录和子目录,并列出特定大小的文件数。例如0kb-1kb:3,1kb-4kb:4,4-16KB:4,16kb-64-kb:11并且以4的倍数继续。我能够获得文件编号列表,大小为人类可读格式并查找大小组中的文件数。但我觉得我的代码非常混乱,并且没有接近标准。需要帮助翻新代码
import os
suffixes = ['B', 'KB', 'MB', 'GB', 'TB', 'PB']
route = raw_input('Enter a location')
def human_Readable(nbytes):
if nbytes == 0: return '0 B'
i = 0
while nbytes >= 1024 and i < len(suffixes)-1:
nbytes /= 1024.
i += 1
f = ('%.2f' % nbytes).rstrip('0').rstrip('.')
return '%s %s' % (f, suffixes[i])
def file_Dist(path, start,end):
counter = 0
counter2 = 0
for path, subdir, files in os.walk(path):
for r in files:
if os.path.getsize(os.path.join(path,r)) > start and os.path.getsize(os.path.join(path,r)) < end:
counter += 1
#print "Number of files less than %s:" %(human_Readable(end)), counter
print "Number of files greater than %s less than %s:" %(human_Readable(start), human_Readable(end)), counter
file_Dist(route, 0, 1024)
file_Dist(route,1024,4095)
file_Dist(route, 4096, 16383)
file_Dist(route, 16384, 65535)
file_Dist(route, 65536, 262143)
file_Dist(route, 262144, 1048576)
file_Dist(route, 1048577, 4194304)
file_Dist(route, 4194305, 16777216)
答案 0 :(得分:0)
以下是一些需要改进的建议。
os.path.getsize()
因符号链接断开而失败;我使用os.lstat().st_size
,它会产生正确的链接文件的树内大小。这是该计划的一个版本,并实施了上述建议。请注意,它仍然会忽略大小为16 MiB的文件 - 这也可以改进。
#!/usr/bin/env python
import math
import os
import sys
route = sys.argv[1]
suffixes = ['B', 'KB', 'MB', 'GB', 'TB', 'PB']
def human_Readable(nbytes):
if nbytes == 0: return '0 B'
i = 0
while nbytes >= 1024 and i < len(suffixes)-1:
nbytes /= 1024.
i += 1
f = ('%.2f' % nbytes).rstrip('0').rstrip('.')
return '%s %s' % (f, suffixes[i])
counter = [0]*8 # count files with size up to 4**(8-1) KB
for path, subdir, files in os.walk(route):
for r in files:
size = os.lstat(os.path.join(path, r)).st_size
group = (math.frexp(size/1024)[1]+1)/2
if group < len(counter):
counter[group] += 1
start = 0
for g in range(len(counter)):
end = 1024*4**g
print "Number of files at least %s less than %s:" \
%(human_Readable(start), human_Readable(end)), counter[g]
start = end
我认为产生与group = (math.frexp(size/1024)[1]+1)/2
对应的计数器列表元素的索引的行size
需要一些解释。考虑
>>> sizes = [0]+[1024*4**i for i in range(8)] >>> sizes [0, 1024, 4096, 16384, 65536, 262144, 1048576, 4194304, 16777216] >>> [math.frexp(s/1024) for s in sizes] [(0.0, 0), (0.5, 1), (0.5, 3), (0.5, 5), (0.5, 7), (0.5, 9), (0.5, 11), (0.5, 13), (0.5, 15)] >>> [math.frexp(2*s/1024) for s in sizes] [(0.0, 0), (0.5, 2), (0.5, 4), (0.5, 6), (0.5, 8), (0.5, 10), (0.5, 12), (0.5, 14), (0.5, 16)] >>> [math.frexp(3*s/1024) for s in sizes] [(0.0, 0), (0.75, 2), (0.75, 4), (0.75, 6), (0.75, 8), (0.75, 10), (0.75, 12), (0.75, 14), (0.75, 16)]
我们通过选择以KB为单位的浮动表示的基数2指数并将其调整一点(+1
来得到图片,因为尾数位于[0.5, 1[
而不是{{1}我们可以计算出正确的计数器列表索引,并且[1, 2[
从基数2转换为基数4。