Allow duplicate files whist keeping the resumable feature

moonraker595 · April 24, 2024, 12:14pm

Hi there,

I have an Uppy/Tus node js application with s3 integration.

Every time a user starts a new upload, a request is sent to the server which responds with a unique file id in the form of an uploadURL. These upload urls are then stored by the FE and sent to a metadata db for storage.

We have the concepts of datasets in this metadata db so we’d like users to be able to upload the same file, should they wish, in different datasets.

The issue is that whilst i can create random unique IDs for every uploaded file (using a namingFunction), there are some low-level comms going on between uppy and tus which means my endpoint isn’t being hit. I haven’t figured out how exactly how it knows, I’m assuming it builds a sha based on some metadata.

I presume this ties in with the resumable uploads feature as well, in that forcing a new upload every time will render the resumable functionality pointless.

Is there any around this?

Thanks

This is an example of my working server:

const datafileS3Store = new S3Store({
  s3ClientConfig: {
    bucket: "test",
    region: 'eu-west-2',
    endpoint: process.env.S3_ENDPOINT,
    credentials: {
      accessKeyId: "test",
      secretAccessKey: "test"
    }
  }
})

/**
 * init the server and set the callback, these are blocking calls, EVENTS are not
 */
const tusDatafileServer = new Server({
  respectForwardedHeaders: true,
  path: '/upload',
  datastore: datafileS3Store,
  namingFunction: () => {
    // Generate a 32-character hexadecimal ID
    return crypto.randomBytes(4).toString('hex')
  },
  // callback provided by tus, gets called for each upload
  async onUploadCreate (request, reply, upload) {
    logger.info(process.env.S3_ENDPOINT)
    try {
     // do some logic
      logger.info(` uploading : id:${upload.id}, size:${upload.size}`)
    } catch (err) {
      logger.error(` Error in onUploadCreate: ${err}`)
      throw (err)
    }
    logger.info(` reply code: ${reply.statusCode}`)
    return reply
  }
})

Merlijn · April 24, 2024, 1:28pm

I’m sure I understand what you want. There is a PR open to do duplicate tus uploads with multiple Uppy instances on the client. However it’s probably preferable to handle duplicate uploads on the server.

What I’m assuming you want is:

When a user is starting a new upload and they want it duplicated.
- Send along extra tus metadata to indicate this and handle it in onUploadFinish with custom logic to duplicate the upload elsewhere in S3 with the S3 SDK.
The user already uploaded a file, comes back later, and wants it duplicated.
- Probably best to call your own backend to handle this.
- Or when you think about it, a user wanting to upload to different datasets, you could just add the dataset in the metadata and use the request to determine where to put it. Then you can reuse the exact same flow on the FE, assuming you’re okay with letting them do two uploads.

empz · February 6, 2025, 3:04pm

This is still an issue and not very well handled I believe. I could not figure out a way to handle re-uploads gracefully in the client. The server will simply ignore a duplicate, but it will simply return 200 OK (but not running actually hitting any of the server-side events) as if it were actually uploaded.

I’m not familiar at all with the protocol itself, but I’m pretty sure there should be a way to distinguish from a resuming request from a completely new one, without having to do some juggling in our server and/or client.

Any updates on this?

This is currently blocking my implementation because if a user re-uploads a file, my onUploadFinish handler won’t execute and I can’t return the right URL to the client. Is there some logic one could implement to handle this? Maybe in the onIncomingRequest event which seems to be the only one firing for a duplicate request.

marius · February 7, 2025, 9:18am

You can configure the tus client to not resume completed uploads. Then your user can resume interrupted uploads, but won’t be prevented from uploading duplicate files. The configuration depends on the client, but in tus-js-client you can use removeFingerprintOnSuccess.

empz · February 7, 2025, 5:34pm

Why would anybody want to “resume a complete upload”? If it’s already completed, there’s nothing to resume.

Anyway, if that’s the default, then why is it triggering upload-success if it’s not really uploading anything?

marius · February 12, 2025, 9:10am

We call it resuming, when the client checks on the upload’s current progress and then transfers any remaining data if necessary. If the upload is already complete, no data is transferred, but we still call the overall process resumption. If you want to give it another name, feel free to do so

The upload-success event is always triggered when the client notices that the upload is done. Either from finishing a new upload or resuming an existing upload.

empz · February 12, 2025, 1:32pm

I understand, it just makes it hard to have a consistent usage from the client perspective. There should be a single standard way to deal with new uploads and duplicated uploads. All I need on the client is some metadata from the file (i.e. a URL), whether it has been previously uploaded or it’s a new uploaded shouldn’t make it different.

At the moment, I’m simply allowing duplicates because I could not find a way to deal with this properly. But I’d much rather not allow people to upload the same file over and over again. It’s not cost effective.

marius · February 12, 2025, 2:49pm

Thank you for the feedback. We’ll keep it in mind to try to improve the ecosystem. In How to customize the URL reported back to the client in @tus/server - #14 by marius, I have touched briefly on a potential solution for this.

Topic		Replies	Views
Unexpected result re-uploading file Tus	2	52	August 21, 2024
How to upload file to the specific structure at S3 bucket with uppy and tus server Tus	2	36	April 7, 2025
Detecting the upload as a whole Tus	8	1270	May 13, 2024
Auto rename files into s3 Uppy	3	2459	April 28, 2021
Can I control the s3 bucket for each upload using TUS Tus	7	3430	September 9, 2024

Allow duplicate files whist keeping the resumable feature

Related topics