Question

所以我一直在玩原始的WSGI，cgi.FieldStorage和文件上传。我只是无法理解它如何处理文件上传。

起初它似乎只是将整个文件存储在内存中。而且我认为嗯，这应该很容易测试 - 一个大文件应该堵塞内存！..但事实并非如此。但是，当我请求文件时，它是一个字符串，而不是迭代器，文件对象或任何东西。

我已经尝试过读取cgi模块的源代码并找到了一些关于临时文件的东西，但它返回了一个怪异的字符串，而不是文件（类似的）对象！那么......它是如何运作的？！

这是我用过的代码：

import cgi
from wsgiref.simple_server import make_server

def app(environ,start_response):
    start_response('200 OK',[('Content-Type','text/html')])
    output = """
    <form action="" method="post" enctype="multipart/form-data">
    <input type="file" name="failas" />
    <input type="submit" value="Varom" />
    </form>
    """
    fs = cgi.FieldStorage(fp=environ['wsgi.input'],environ=environ)
    f = fs.getfirst('failas')
    print type(f)
    return output


if __name__ == '__main__' :
    httpd = make_server('',8000,app)
    print 'Serving'
    httpd.serve_forever()

提前致谢！：）

Answer 1

检查cgi module description，有一段讨论如何处理文件上传。

如果某个字段代表上传的文件，则通过值属性或getvalue()方法访问该值会将整个文件作为字符串读取到内存中。这可能不是你想要的。您可以通过测试文件名属性或文件属性来测试上传的文件。然后，您可以从文件属性中随意读取数据：

fileitem = form["userfile"]
if fileitem.file:
    # It's an uploaded file; count lines
    linecount = 0
    while 1:
        line = fileitem.file.readline()
        if not line: break
        linecount = linecount + 1

关于您的示例，getfirst()只是getvalue()的一个版本。尝试更换

f = fs.getfirst('failas')

与

f = fs['failas'].file

这将返回一个类似文件的对象，可以“闲暇时”阅读。

Answer 2

最好的方法是不要像gimel建议的那样一次读取文件（甚至每行一行）。

您可以使用一些继承并从FieldStorage扩展一个类，然后覆盖make_file函数。当FieldStorage的类型为file时，将调用make_file。

供您参考，默认的make_file如下所示：

def make_file(self, binary=None):
    """Overridable: return a readable & writable file.

    The file will be used as follows:
    - data is written to it
    - seek(0)
    - data is read from it

    The 'binary' argument is unused -- the file is always opened
    in binary mode.

    This version opens a temporary file for reading and writing,
    and immediately deletes (unlinks) it.  The trick (on Unix!) is
    that the file can still be used, but it can't be opened by
    another process, and it will automatically be deleted when it
    is closed or when the current process terminates.

    If you want a more permanent file, you derive a class which
    overrides this method.  If you want a visible temporary file
    that is nevertheless automatically deleted when the script
    terminates, try defining a __del__ method in a derived class
    which unlinks the temporary files you have created.

    """
    import tempfile
    return tempfile.TemporaryFile("w+b")

而不是创建临时文件，永久地在任何地方创建文件。

Answer 3

使用@hasanatkazmi的答案（在Twisted应用程序中使用）我有类似的东西：

#!/usr/bin/env python2
# -*- coding: utf-8 -*-
# -*- indent: 4 spc -*-
import sys
import cgi
import tempfile


class PredictableStorage(cgi.FieldStorage):
    def __init__(self, *args, **kwargs):
        self.path = kwargs.pop('path', None)
        cgi.FieldStorage.__init__(self, *args, **kwargs)

    def make_file(self, binary=None):
        if not self.path:
            file = tempfile.NamedTemporaryFile("w+b", delete=False)
            self.path = file.name
            return file
        return open(self.path, 'w+b')

请注意， cgi 模块不会始终创建文件。根据这些cgi.py行，只有在内容超过1000字节时才会创建它：

if self.__file.tell() + len(line) > 1000: self.file = self.make_file('')

因此，您必须检查文件是否实际上是通过查询自定义类来创建的。 path这样的字段：

if file_field.path: # Using an already created file... else: # Creating a temporary named file to store the content. import tempfile with tempfile.NamedTemporaryFile("w+b", delete=False) as f: f.write(file_field.value) # You can save the 'f.name' field for later usage.

如果也为该字段设置Content-Length，这似乎很少，则该文件也应该由 cgi 创建。

那就是它。这样，您可以预测存储文件，从而减少应用程序的内存使用量。

cgi.FieldStorage如何存储文件？

3 个答案: