Question

我正在尝试从Amazon S3中提取特定文件，而不必读取所有字节，因为存档可能很大，我只需要2或3个文件。

我正在使用AWS Java SDK。这是代码（跳过异常处理）：

AWSCredentials credentials = new BasicAWSCredentials("accessKey", "secretKey");
AWSCredentialsProvider credentialsProvider = new AWSStaticCredentialsProvider(credentials);
AmazonS3 s3Client = AmazonS3ClientBuilder.standard().withRegion(Regions.US_EAST_1).withCredentials(credentialsProvider).build();
S3Object object = s3Client.getObject("bucketname", "file.tar.gz");
S3ObjectInputStream objectContent = object.getObjectContent();

TarArchiveInputStream tarInputStream = new TarArchiveInputStream(new GZIPInputStream(objectContent));
TarArchiveEntry currentEntry;
while((currentEntry = tarInputStream.getNextTarEntry()) != null) {
    if(currentEntry.getName().equals("1/foo.bar") && currentEntry.isFile()) {
        FileOutputStream entryOs = new FileOutputStream("foo.bar");
        IOUtils.copy(tarInputStream, entryOs);
        entryOs.close();
        break;
    }
}
objectContent.abort();  // Warning at this line
tarInputStream.close(); // warning at this line

当我使用这个方法时，它会发出一个警告，表示并非所有来自流的字节都是我有意识地读取的。

WARNING: Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged GET or drain the input stream after use.

是否有必要排空流，不做的不利之处是什么？我可以忽略警告吗？

Answer 1

您不必担心警告 - 它只会警告您将导致HTTP连接关闭，并且可能存在您将错过的数据。由于close()委托给abort()，因此您会在任一电话中收到警告。

请注意，如果您感兴趣的文件位于存档的末尾，则无法保证读取整个存档。

S3 的HTTP服务器支持范围，因此，如果您可以影响存档的格式，或者在创建存档的过程中生成一些元数据，您实际上可以跳过或者可能只请求您感兴趣的文件。

从Amazon S3部分读取tar.gz文件

1 个答案: