阅读背景:

AWS S3 Java SDK:RequestClientOptions.setReadLimit

来源:互联网 

If we consider this S3 upload code

如果我们考虑这个S3上传代码

val tm: TransferManager = ???
val putRequest = new PutObjectRequest(bucketName, keyName, inputStream, metaData)
putRequest.setStorageClass(storageClass)
putRequest.getRequestClientOptions.setReadLimit(100000)
tm.upload(putRequest)

What is the use of the setReadLimit method? The AWS SDK Javadoc contains the following description:

setReadLimit方法有什么用? AWS开发工具包Javadoc包含以下描述:

Sets the optional mark-and-reset read limit used for signing and retry purposes. See Also: InputStream.mark(int)

设置用于签名和重试目的的可选标记和重置读取限制。另请参见:InputStream.mark(int)

Is my assumption correct in that it is to provide some kind of "checkpointing", such that if the network fails in the middle of an upload process, the API will (internally) perform a retry from the last "marked" position instead of from the beginning of the file?

我的假设是正确的,因为它是提供某种“检查点”,这样如果网络在上传过程中失败,API将(内部)从最后一个“标记”位置执行重试,而不是从文件的开头?

1 个解决方案

#1


The TransferManager does have support for "checkpointing" as you describe, although it's not directly related to the readLimit parameter. S3 allows you to upload large objects in multiple parts, and the TransferManager automatically takes care of doing this for you for uploads over a certain size. If the upload of a single part fails, the underlying AmazonS3Client only needs to retry the upload of that individual part. If you pass the TransferManager a File instead of an InputStream, it can even upload multiple parts of the file in parallel to speed up the transfer.

TransferManager确实支持您所描述的“检查点”,尽管它与readLimit参数没有直接关系。 S3允许您在多个部分上传大型对象,TransferManager会自动为您执行此操作,以便上传超过一定大小的内容。如果单个部件的上载失败,则底层AmazonS3Client仅需要重试该单个部件的上载。如果您传递TransferManager文件而不是InputStream,它甚至可以并行上传文件的多个部分以加快传输速度。

The readLimit parameter is used when you pass the TransferManager (or the underlying AmazonS3Client) an InputStream instead of a File. Compared to a File, which you can easily seek around in if you need to retry part of an upload, the InputStream interface is much more restrictive. In order to support retries on InputStream uploads, the AmazonS3Client uses the mark and reset methods of the InputStream interface, marking the stream at the beginning of each upload and reseting to the mark if it needs to retry.

当您传递TransferManager(或底层AmazonS3Client)InputStream而不是File时,将使用readLimit参数。与文件相比,如果您需要重试部分上传,可以轻松查找,而InputStream接口则更具限制性。为了支持InputStream上传的重试,AmazonS3Client使用InputStream接口的mark和reset方法,在每次上传开始时标记流,并在需要重试时重置为标记。

Notice that the mark method takes a readlimit parameter, and is only obligated to "remember" as many bytes from the InputStream as you ask it for in advance. Some InputStreams implement mark by allocating a new byte[readlimit] to buffer the underlying data in memory so it can be replayed if reset is called, which makes it dangerous to blindly mark using the length of the object to be uploaded (which might be several gigabytes). Instead, the AmazonS3Client defaults to calling mark with a value of 128KB - if your InputStream cares about the readlimit, this means the AmazonS3Client won't be able to retry requests that fail after it has sent more than the first 128KB.

请注意,mark方法采用readlimit参数,并且只需要提前“记住”InputStream中的字节数。一些InputStream通过分配一个新的字节[readlimit]来实现标记,以缓冲内存中的底层数据,这样如果调用reset就可以重放它,这使得使用要上载的对象的长度进行盲目标记是危险的(这可能是几个千兆字节)。相反,AmazonS3Client默认调用标记值为128KB - 如果您的InputStream关心readlimit,这意味着AmazonS3Client在发送超过前128KB的请求后将无法重试失败的请求。

If you're using such an InputStream and would like to dedicate more memory to buffering the uploaded data so the AmazonS3Client can retry on failures further along in the upload (or conversely if you'd like to use a smaller buffer and potentially see more failures), you can tune the value that gets used via setReadLimit.

如果你正在使用这样的InputStream并且想要更多的内存来缓冲上传的数据,那么AmazonS3Client可以在上传中进一步重试失败(或者相反,如果你想使用更小的缓冲区并且可能看到更多的失败),您可以通过setReadLimit调整使用的值。


分享到: