Hi –
Quick question: is a tus server implementation allowed to impose any restrictions on the Upload-Offsets that PATCH requests can start with, or no?
Reason for the question: I’m thinking about implementing the tus protocol for a secure file storage server. However, in such a server, one might want to store things in blocks/chunks with fixed boundaries, particularly if one is storing an encrypted format where a given byte of the encrypted chunk depends on plaintext bytes arbitrarily far back in the same chunk. In such cases, you can’t easily begin writing to the middle of a chunk. (You can, however, easily finish writing in the middle of a chunk, given a streaming cipher or some way of padding out the chunk.)
This part of the spec makes me think that perhaps tus isn’t designed for such storage scenarios:
The Client SHOULD send all the remaining bytes of an upload in a single PATCH request, but MAY also use multiple small requests successively for scenarios where this is desirable. One example for these situations is when the Checksum extension is used.
The Server MUST acknowledge successful PATCH requests with the 204 No Content status. It MUST include the Upload-Offset header containing the new offset. The new offset MUST be the sum of the offset before the PATCH request and the number of bytes received and processed or stored during the current PATCH request.
For concreteness, let’s suppose the server has storage broken into 64MB chunks. Suppose the client starts a PATCH with an Upload-Offset of 0, and successfully sends 67MB of data to the server in that PATCH. If I’m reading the spec correctly, the server is required to respond with an Upload-Offset of 67MB, even if the closest offset at which it could accept a subsequent PATCH is actually 64MB. Question: is the client then allowed to assume the next PATCH can start at the Upload-Offset returned by the previous PATCH (which I’m guessing is the intent), which in this case would be 67MB? If so, this would seem to rule out server implementations where all writes must start at specific boundaries.
I might have thought “Well, if that’s not allowed, then I guess I could just store things temporarily in arbitrary-sized chunks corresponding to the arbitrarily-sized PATCH requests sent by the client, and then when we’re ‘done’ with the upload, we could decrypt and reencrypt stuff into some other less arbitrary format.” But that requires a lot of unnecessary overhead, and more importantly I don’t see any way of the server ever knowing for sure that the client is actually ‘done’ and won’t send any more PATCH requests.
I might also have thought “OK, fine! Every time a PATCH starts comes in, I’ll decrypt the last previously existing partial chunk, and then re-encrypt that partial chunk’s contents but then instead of closing the encryption stream I’ll add the start of the new PATCH’s contents to the encryption stream.” However, I haven’t yet noticed in the spec any minimum size that the client is allowed to break PATCH requests into, and if the storage chunks are significantly larger than the sizes of the PATCH requests coming in, then you wind up with truly massive overhead (increasing runtime by a factor of O((chunk size / patch size)^2)).
Am I missing something and/or confused, or is it perhaps a bad idea to try using tus in situations where appends starting at arbitrary offsets are nontrivial?
Thanks,
– Scott