Question

我在.txt文件中有一堆数据，我需要的格式可以在融合表/电子表格中使用。我假设该格式是一个csv，我可以写入另一个文件，然后我可以导入到电子表格中使用。

数据采用这种格式，多个条目用空行分隔。

Start Time
8/18/14, 11:59 AM
Duration
15 min
Start Side
Left
Fed on Both Sides
No

Start Time
8/18/14, 8:59 AM
Duration
13 min
Start Side
Right
Fed on Both Sides
No

(etc.)

但我最终需要这种格式（或者我可以用来将其放入电子表格中）

StartDate, StartTime, Duration, StartSide, FedOnBothSides
8/18/14, 11:59 AM, 15, Left, No
- ,      -,        -,  -,    -

我遇到的问题是：
- 我不需要所有信息或每行，但我不确定如何自动分开它们。我甚至都不知道我对排序每条线的方式是否聪明
- 我收到的错误是＆＃34;参数1必须是字符串或只读字符缓冲区，而不是列表＆＃34;当我有时使用.read（）或.readlines（）时（虽然它确实起作用）。我的两个论点都是.txt文件。
- 日期和时间不是常规长度的设定格式（它有8/4/14，上午5:14而不是08/04/14，上午05:14），我不是确定如何处理

这是我到目前为止所尝试的


from sys import argv
from os.path import exists

def filework():
    script, from_file, to_file = argv

    print "copying from %s to %s" % (from_file, to_file)

    in_file = open(from_file)
    indata = in_file.readlines() #.read() .readline .readlines .read().splitline .xreadlines

    print "the input file is %d bytes long" % len(indata)

    print "does the output file exist? %r" % exists(to_file)
    print "ready, hit RETURN to continue, CTRL-C to abort."
    raw_input()

    #do stuff section----------------BEGIN
    for i in indata:
        if i == "Start Time":
            pass #do something
        elif i== '{date format}':
            pass #do something
        else:
            pass #do something
        #do stuff section----------------END

    out_file = open(to_file, 'w')
    out_file.write(indata)

    print "alright, all done."

    out_file.close()
    in_file.close()



filework()

所以我在这样的脚本中相对没有多少复杂的部分。任何帮助和建议将不胜感激。对不起，如果这是一个混乱。
感谢

Answer 1

这段代码应该可行，虽然它不是最佳的，但我相信你会弄清楚如何让它变得更好！这段代码基本上是做什么的：

从输入数据中获取所有行
遍历所有行，并尝试识别不同的键（开始时间等）
如果识别出键，请获取其下方的行，并对其应用适当的功能
- 如果找到新行，请将当前条目添加到列表中，以便可以读取其他条目
将数据写入文件

如果您之前没有看到字符串格式化： "{0:} {1:}".format(arg0, arg1)，{0:}只是为变量定义占位符的一种方式（此处：arg0），0只定义要使用的参数。

在此处了解更多信息：

如果您使用的是python版本＆lt; 2.7，您可能必须使用pip install ordereddict安装其他版本的有序订单。如果这不起作用，只需将data = OrderedDict()更改为data = {}即可。但是每次生成时输出看起来会有所不同，但它仍然是正确的。

from sys import argv
from os.path import exists
# since we want to have a somewhat standardized format
# and dicts are unordered by default
try:
    from collections import OrderedDict
except ImportError:
    # python 2.6 or earlier, use backport
    from ordereddict import OrderedDict

def get_time_and_date(time):
    date, time = time.split(",")
    time, time_indic = time.split()

    date = pad_time(date)
    time = "{0:} {1:}".format(pad_time(time), time_indic)

    return time, date
"""
   Make all the time values look the same, ex turn 5:30 AM into 05:30 AM
"""
def pad_time(time):
    # if its time
    if ":" in time:
        separator = ":"
    # if its a date
    else:
        separator = "/"

    time = time.split(separator)
    for index, num in enumerate(time):
        if len(num) < 2:
            time[index] = "0" + time[index]

    return separator.join(time)

def filework():
    from_file, to_file = argv[1:]
    data = OrderedDict() 

    print "copying from %s to %s" % (from_file, to_file)
    # by using open(...) the file closes automatically
    with open(from_file, "r") as inputfile:
        indata = inputfile.readlines()
        entries = []

        print "the input file is %d bytes long" % len(indata)
        print "does the output file exist? %r" % exists(to_file)
        print "ready, hit RETURN to continue, CTRL-C to abort."
        raw_input()

        for line_num in xrange(len(indata)):
            # make the entire string lowercase to be more flexible,
            # and then remove whitespace
            line_lowered = indata[line_num].lower().strip()

            if "start time" == line_lowered:
                time, date = get_time_and_date(indata[line_num+1].strip())
                data["StartTime"] = time
                data["StartDate"] = date
            elif "duration" == line_lowered:
                duration = indata[line_num+1].strip().split()
                # only keep the amount of minutes
                data["Duration"] = duration[0]
            elif "start side" == line_lowered:
                data["StartSide"] = indata[line_num+1].strip()
            elif "fed on both sides" == line_lowered:
                data["FedOnBothSides"] = indata[line_num+1].strip()
            elif line_lowered == "":
                # if a blank line is found, prepare for reading a new entry
                entries.append(data)
                data = OrderedDict()

        entries.append(data)

    # create the outfile if it does not exist
    with open(to_file, "w+") as outfile:
        headers = entries[0].keys()
        outfile.write(", ".join(headers) + "\n")
        for entry in entries:
            outfile.write(", ".join(entry.values()) + "\n")

filework()

将.text中的数据片段复制到电子表格的另一个文件中

1 个答案: