Presigning URLs in batches for AWS S3 Multipart Upload

I’ve been testing out Uppy lately to see if it will fit our use case at Discourse. So far things are looking great, however there is a couple of things that concern me, and I wanted to post here to ask if these are the kind of things you would be open to contributions for. The main problem we have is with the S3 multipart uploader. The way it is now, each part needs to take a trip to our server to generate a presigned URL for that part:

This could potentially cause a huge number of requests to the server, depending on the size of the file and the chunk size. I see in the docs you have addressed this partially with the suggestion to increase chunk size for larger files:

The S3 Multipart plugin uploads files in chunks. Each chunk requires a signing request prepareUploadPart(). To reduce the amount of requests for large files, you can choose a larger chunk size, at the cost of having to re-upload more data if one chunk fails to upload.

My proposal is to instead generate the presigned URLs in batches, rather than one each time a chunk needs it. In the code above there is already the candidates which is controlled partly by the limit option. I think it would be better to generate the presigned URLs for a batch of candidates at once, something like this (pseudocode-ish):

this.options.prepareUploadParts.then((partsWithUrl) => {
  candidates.forEach((index) => {
    this._uploadPartRetryable(index, partsWithUrl[index]).then(() => {
      // Continue uploading parts
      this._uploadParts()
    }, (err) => {
      this._onError(err)
    })
  })
});

////

prepareUploadParts (parts, { key, uploadId }) {
  this.assertHost('prepareUploadParts')

  const filename = encodeURIComponent(key)
  return this.client.post(`s3/multipart/${uploadId}/prepare-parts`, { parts, key })
    .then(assertServerError)
}

////

_uploadPart (index, presignedUrl = null) {
  const body = this.chunks[index]
  this.chunkState[index].busy = true

  return Promise.resolve().then(() =>
    if (presignedUrl) {
      return Promise.resolve({ url: presignedUrl });
    } else {
      this.options.prepareUploadPart({
        key: this.key,
        uploadId: this.uploadId,
        body,
        number: index + 1,
      })
    }).then((result) => {
    const valid = typeof result === 'object' && result
      && typeof result.url === 'string'
    if (!valid) {
      throw new TypeError('AwsS3/Multipart: Got incorrect result from `prepareUploadPart()`, expected an object `{ url }`.')
    }

    return result
  }).then(({ url, headers }) => {
    if (this._aborted()) {
      this.chunkState[index].busy = false
      throw createAbortError()
    }

    return this._uploadPartBytes(index, url, headers)
  })
}

That way the number of round trips to Companion or the custom server that someone has set up is drastically reduced. For example, a 1GB file uploaded in 5mb chunks will normally require 200 trips to the server. With this change it would require only 50 (if the limit was 4).

I think this will necessitate longer expiry times for presigned URLs as well, which should be made known in the documentation. Would you be open to us opening a pull request to add this functionality? If so, we would add the corresponding functionality to Companion if needed.

If this is not the right channel for such a proposal, please let me know and I’ll propose it somewhere else. Thanks!

1 Like

Hi Martin, thanks for reaching out. We’re Discourse users ourselves (surprise :grinning_face_with_smiling_eyes:) and I’m excited about the prospect of possibly working together. I think your proposal makes sense and have asked the Uppy team to take a closer look and get back to you here.

1 Like

Thanks a lot! I another proposal to do with resuming multipart S3 uploads that I will flesh out at some point, but for now it’s great to hear that you are open to the contribution :slight_smile:

Hi @martinbrennan, I think this makes a lot of sense and we’d be happy to accept a PR for it if you are willing to work on it. Let me know if there are any questions.

1 Like

Hey @Merlijn I’ve opened up a PR for this functionality now. It was a fun learning experience, I haven’t used much pure JS in a while, and Jest is a great test runner: