将近400k图像传输到S3的最有效方法

时间:2011-03-13 12:28:03

标签: file amazon-web-services amazon-s3 transfer

我目前负责将一个站点从当前服务器转移到EC2,该项目的一部分完成并且很好,另一部分是我正在努力的部分,该站点目前有近400K图像,所有已排序在主userimg文件夹中的不同文件夹中,客户端希望所有这些图像都存储在S3上 - 我的主要问题是如何将近400,000个图像从服务器传输到S3 - 我一直在使用http://s3tools.org/s3cmd是辉煌的,但如果我要用s3cmd传输userimg文件夹,它将花费将近3天的时间,如果连接中断或类似的问题,我将在s3上有一些图像,有些没有,没有办法继续过程...

任何人都可以建议一个解决方案,有没有人遇到过这样的问题呢?

5 个答案:

答案 0 :(得分:3)

我建议你写一个简单的Java实用程序(或者让别人写一下):

  1. 读取客户端目录的结构(如果需要)
  2. 对于每个图像,在s3上创建相应的密钥(根据1中读取的文件结构),并使用AWS SDK或jets3t API以并列方式启动多部分上载。
  3. 我是为我们的客户做的。它不到200行java代码,非常可靠。 下面是执行多部分上传的部分。读取文件结构的部分很简单。

    /**
     * Uploads file to Amazon S3. Creates the specified bucket if it does not exist.
     * The upload is done in chunks of CHUNK_SIZE size (multi-part upload).
     * Attempts to handle upload exceptions gracefully up to MAX_RETRY times per single chunk.
     * 
     * @param accessKey     - Amazon account access key
     * @param secretKey     - Amazon account secret key
     * @param directoryName - directory path where the file resides
     * @param keyName       - the name of the file to upload
     * @param bucketName    - the name of the bucket to upload to
     * @throws Exception    - in case that something goes wrong
     */
    public void uploadFileToS3(String accessKey
            ,String secretKey
            ,String directoryName
            ,String keyName // that is the file name that will be created after upload completed
            ,String bucketName ) throws Exception {
    
        // Create a credentials object and service to access S3 account
        AWSCredentials myCredentials =
            new BasicAWSCredentials(accessKey, secretKey);
    
        String filePath = directoryName
        + System.getProperty("file.separator")
        + keyName;   
    
        log.info("uploadFileToS3 is about to upload file [" + filePath + "]");
    
        AmazonS3 s3Client = new AmazonS3Client(myCredentials);        
        // Create a list of UploadPartResponse objects. You get one of these
        // for each part upload.
        List<PartETag> partETags = new ArrayList<PartETag>();
    
        // make sure that the bucket exists
        createBucketIfNotExists(bucketName, accessKey, secretKey);
    
        // delete the file from bucket if it already exists there
        s3Client.deleteObject(bucketName, keyName);
    
        // Initialize.
        InitiateMultipartUploadRequest initRequest = new InitiateMultipartUploadRequest(bucketName, keyName);
        InitiateMultipartUploadResult initResponse = s3Client.initiateMultipartUpload(initRequest);
    
        File file = new File(filePath);
    
        long contentLength = file.length();
        long partSize = CHUNK_SIZE; // Set part size to 5 MB.
        int numOfParts = 1;
        if (contentLength > CHUNK_SIZE) {
            if (contentLength % CHUNK_SIZE != 0) {
                numOfParts = (int)((contentLength/partSize)+1.0);
            }
            else {
                numOfParts = (int)((contentLength/partSize));
            }
        }
    
        try {
            // Step 2: Upload parts.
            long filePosition = 0;
            for (int i = 1; filePosition < contentLength; i++) {
                // Last part can be less than 5 MB. Adjust part size.
                partSize = Math.min(partSize, (contentLength - filePosition));
    
                log.info("Start uploading part[" + i + "] of [" + numOfParts + "]");
    
                // Create request to upload a part.
                UploadPartRequest uploadRequest = new UploadPartRequest()
                .withBucketName(bucketName).withKey(keyName)
                .withUploadId(initResponse.getUploadId()).withPartNumber(i)
                .withFileOffset(filePosition)
                .withFile(file)
                .withPartSize(partSize);
    
                // repeat the upload until it succeeds or reaches the retry limit
                boolean anotherPass;
                int retryCount = 0;
                do {
                    anotherPass = false;  // assume everything is ok
                    try {
                        log.info("Uploading part[" + i + "]");
                        // Upload part and add response to our list.
                        partETags.add(s3Client.uploadPart(uploadRequest).getPartETag());
                        log.info("Finished uploading part[" + i + "] of [" + numOfParts + "]");
                    } catch (Exception e) {
                        log.error("Failed uploading part[" + i + "] due to exception. Will retry... Exception: ", e);
                        anotherPass = true; // repeat
                        retryCount++;
                    }
                }
                while (anotherPass && retryCount < CloudUtilsService.MAX_RETRY);
    
                filePosition += partSize;
                log.info("filePosition=[" + filePosition + "]");
    
            }
            log.info("Finished uploading file");
    
            // Complete.
            CompleteMultipartUploadRequest compRequest = new 
            CompleteMultipartUploadRequest(
                    bucketName, 
                    keyName, 
                    initResponse.getUploadId(), 
                    partETags);
    
            s3Client.completeMultipartUpload(compRequest);
    
            log.info("multipart upload completed.upload id=[" + initResponse.getUploadId() + "]");
        } catch (Exception e) {
            s3Client.abortMultipartUpload(new AbortMultipartUploadRequest(
                    bucketName, keyName, initResponse.getUploadId()));
    
            log.error("Failed to upload due to Exception:", e);
    
            throw e;
        }
    }
    
    
    /**
     * Creates new bucket with the names specified if it does not exist.
     * 
     * @param bucketName    - the name of the bucket to retrieve or create
     * @param accessKey     - Amazon account access key
     * @param secretKey     - Amazon account secret key
     * @throws S3ServiceException - if something goes wrong
     */
    public void createBucketIfNotExists(String bucketName, String accessKey, String secretKey) throws S3ServiceException {
        try {
            // Create a credentials object and service to access S3 account
            org.jets3t.service.security.AWSCredentials myCredentials =
                new org.jets3t.service.security.AWSCredentials(accessKey, secretKey);
            S3Service service = new RestS3Service(myCredentials);
    
            // Create a new bucket named after a normalized directory path,
            // and include my Access Key ID to ensure the bucket name is unique
            S3Bucket zeBucket = service.getOrCreateBucket(bucketName);
            log.info("the bucket [" + zeBucket.getName() + "] was created (if it was not existing yet...)");
        } catch (S3ServiceException e) {
            log.error("Failed to get or create bucket[" + bucketName + "] due to exception:", e);
            throw e;
        }
    }
    

答案 1 :(得分:1)

听起来像是Rsync的工作。我从未将它与S3结合使用,但S3Sync似乎就是你所需要的。

答案 2 :(得分:1)

如果您不想实际上传所有文件(或者实际上是管理它),那么可以使用AWS Import/Export,这基本上只需要向Amazon发送硬盘。

答案 3 :(得分:1)

您可以使用superflexiblefilesychronizer。它是商业产品,但Linux版本是免费的。

它可以比较和同步文件夹,并且可以并行传输多个文件。它很快。接口可能不是最简单的,但这主要是因为它有一百万个配置选项。

注意:我与该产品没有任何关联,但我已经使用过它。

答案 4 :(得分:1)

考虑Amazon S3 Bucket Explorer

  1. 它允许您并行上传文件,这样可以加快进程。
  2. 该程序有一个作业队列,因此如果其中一个上传失败,它将自动重试上传。