I have an Uppy/Tus node js application with s3 integration.
Every time a user starts a new upload, a request is sent to the server which responds with a unique file id in the form of an uploadURL. These upload urls are then stored by the FE and sent to a metadata db for storage.
We have the concepts of datasets in this metadata db so we’d like users to be able to upload the same file, should they wish, in different datasets.
The issue is that whilst i can create random unique IDs for every uploaded file (using a namingFunction), there are some low-level comms going on between uppy and tus which means my endpoint isn’t being hit. I haven’t figured out how exactly how it knows, I’m assuming it builds a sha based on some metadata.
I presume this ties in with the resumable uploads feature as well, in that forcing a new upload every time will render the resumable functionality pointless.
Is there any around this?
Thanks
This is an example of my working server:
const datafileS3Store = new S3Store({
s3ClientConfig: {
bucket: "test",
region: 'eu-west-2',
endpoint: process.env.S3_ENDPOINT,
credentials: {
accessKeyId: "test",
secretAccessKey: "test"
}
}
})
/**
* init the server and set the callback, these are blocking calls, EVENTS are not
*/
const tusDatafileServer = new Server({
respectForwardedHeaders: true,
path: '/upload',
datastore: datafileS3Store,
namingFunction: () => {
// Generate a 32-character hexadecimal ID
return crypto.randomBytes(4).toString('hex')
},
// callback provided by tus, gets called for each upload
async onUploadCreate (request, reply, upload) {
logger.info(process.env.S3_ENDPOINT)
try {
// do some logic
logger.info(` uploading : id:${upload.id}, size:${upload.size}`)
} catch (err) {
logger.error(` Error in onUploadCreate: ${err}`)
throw (err)
}
logger.info(` reply code: ${reply.statusCode}`)
return reply
}
})
I’m sure I understand what you want. There is a PR open to do duplicate tus uploads with multiple Uppy instances on the client. However it’s probably preferable to handle duplicate uploads on the server.
What I’m assuming you want is:
When a user is starting a new upload and they want it duplicated.
Send along extra tus metadata to indicate this and handle it in onUploadFinish with custom logic to duplicate the upload elsewhere in S3 with the S3 SDK.
The user already uploaded a file, comes back later, and wants it duplicated.
Probably best to call your own backend to handle this.
Or when you think about it, a user wanting to upload to different datasets, you could just add the dataset in the metadata and use the request to determine where to put it. Then you can reuse the exact same flow on the FE, assuming you’re okay with letting them do two uploads.
This is still an issue and not very well handled I believe. I could not figure out a way to handle re-uploads gracefully in the client. The server will simply ignore a duplicate, but it will simply return 200 OK (but not running actually hitting any of the server-side events) as if it were actually uploaded.
I’m not familiar at all with the protocol itself, but I’m pretty sure there should be a way to distinguish from a resuming request from a completely new one, without having to do some juggling in our server and/or client.
Any updates on this?
This is currently blocking my implementation because if a user re-uploads a file, my onUploadFinish handler won’t execute and I can’t return the right URL to the client. Is there some logic one could implement to handle this? Maybe in the onIncomingRequest event which seems to be the only one firing for a duplicate request.
You can configure the tus client to not resume completed uploads. Then your user can resume interrupted uploads, but won’t be prevented from uploading duplicate files. The configuration depends on the client, but in tus-js-client you can use removeFingerprintOnSuccess.
We call it resuming, when the client checks on the upload’s current progress and then transfers any remaining data if necessary. If the upload is already complete, no data is transferred, but we still call the overall process resumption. If you want to give it another name, feel free to do so
The upload-success event is always triggered when the client notices that the upload is done. Either from finishing a new upload or resuming an existing upload.
I understand, it just makes it hard to have a consistent usage from the client perspective. There should be a single standard way to deal with new uploads and duplicated uploads. All I need on the client is some metadata from the file (i.e. a URL), whether it has been previously uploaded or it’s a new uploaded shouldn’t make it different.
At the moment, I’m simply allowing duplicates because I could not find a way to deal with this properly. But I’d much rather not allow people to upload the same file over and over again. It’s not cost effective.