Question

这是我在这里的第一个问题，因为我对这个世界还很新！我花了几天时间试图为自己解决这个问题，但到目前为止还没有找到任何有用的信息。

我尝试使用以下内容从S3中存储的文件中检索字节范围：

S3Key.get_contents_to_file(tempfile, headers={'Range': 'bytes=0-100000'}

我尝试恢复的文件是视频文件，特别是MXF。当我请求一个字节范围时，我在tempfile中找回的信息多于请求的信息。例如，使用一个文件，我请求100,000个字节并返回100,451。

关于MXF文件需要注意的一点是它们合法地包含0x0A（ASCII换行）和0x0D（ASCII回车）。

我有一个挖掘，似乎任何时候文件中都存在0D字节，检索到的信息会添加0A 0D而不仅仅是0D，因此似乎检索的信息多于所需的信息。

例如，原始文件包含Hex字符串：

02 03 00 00 00 00 3B 0A 06 0E 2B 34 01 01 01 05

但从S3下载的文件有：

02 03 00 00 00 00 3B 0D 0A 06 0E 2B 34 01 01 01 05

我已经尝试调试代码并按照Boto逻辑工作，但我在这方面比较新，所以很容易迷失。

我为测试创建了这个，显示了问题

from boto.s3.connection import S3Connection
from boto.s3.connection import Location
from boto.s3.key import Key
import boto
import os


## AWS credentials
AWS_ACCESS_KEY_ID = 'secret key'
AWS_SECRET_ACCESS_KEY = 'access key'

## Bucket name and path to file
bucketName = 'bucket name'
filePath = 'path/to/file.mxf'

#Local temp file to download to
tempFilePath = 'c:/tmp/tempfile'


## Setup the S3 connection and create a Key to access the file specified
## in filePath
conn = S3Connection(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
bucket = conn.get_bucket(bucketName)
S3Key = Key(bucket)
S3Key.key = filePath

def testRangeGet(bytesToRead=100000): # default read of 100K
    tempfile = open(tempFilePath, 'w')
    rangeString = 'bytes=0-' + str(bytesToRead -1)  #create byte range as string
    rangeDict = {'Range': rangeString} # add this to the dictionary
    S3Key.get_contents_to_file(tempfile, headers=rangeDict) # using Boto
    tempfile.close()
    bytesRead = os.path.getsize(tempFilePath)
    print 'Bytes requested = ' + str(bytesToRead)
    print 'Bytes recieved = ' + str(bytesRead)
    print 'Additional bytes = ' + str(bytesRead - bytesToRead)

我猜Boto代码中有一些东西正在查找某些ASCII转义字符并修改它们，我无法找到任何方法来指定将其视为二进制文件。

有没有人遇到类似的问题，可以分享一下吗？

由于

添

Answer 1

以二进制文件的形式打开输出文件。否则写入该文件会自动将LF转换为CR / LF。

tempfile = open(tempFilePath, 'wb')

当然，这仅适用于Windows系统。无论文件是作为文本还是作为二进制文件打开，Unix都不会转换任何内容。

上传时也应注意，首先不会将类似损坏的数据导入S3。

Boto“获取字节范围”返回的结果超出预期

1 个答案: