Upload completed, filesize matches, but SHA-256 hash mismatch — video file corrupted

vunguyen1989 · June 27, 2025, 7:53am

Hi all,

I’m using tusd behind a Cloudflare proxy. Uploads seem to complete — the file size exactly matches the original file on the client. However, after upload:

The resulting file has a different SHA-256 hash
The uploaded video is corrupted

Here is an excerpt from the tusd logs during the problematic upload:

1750237625900	[tusd] 2025/06/18 09:07:05.900871 event="ResponseOutgoing" status="500" method="PATCH" path="/api/v1/tus/upload/e006bd335374fbf6c26c3e492e827683" error="read tcp: connection reset by peer" requestId="" 
1750237625900	[tusd] 2025/06/18 09:07:05.900837 event="ChunkWriteComplete" id="e006bd335374fbf6c26c3e492e827683" bytesWritten="8527872" 
1750237625900	[tusd] 2025/06/18 09:07:05.900815 event="BodyReadError" id="e006bd335374fbf6c26c3e492e827683" error="read tcp 10.93.30.126:3001->10.93.30.27:62392: read: connection reset by peer" 
1750237622826	[tusd] 2025/06/18 09:07:02.826004 event="ChunkWriteStart" id="e006bd335374fbf6c26c3e492e827683" maxSize="16777216" offset="2349973504"

My questions:

How could the final file have the correct size but still be corrupted / wrong hash?
Does tusd flush incomplete chunk writes even on a BodyReadError?
What can I do to ensure data integrity if a PATCH request is interrupted?
Is there a way to prevent tusd from finalizing the upload if there’s a read error or TCP reset mid-chunk?

Some context:

Tus-js-client is used.
tusd (GitHub - tus/tusd: Reference server implementation in Go of tus: the open protocol for resumable file uploads) is running with shared storage.
We use Cloudflare load balancing with sticky sessions.

Any ideas or suggestions?

Thanks!

marius · June 27, 2025, 8:11am

Hello there,

the TCP reset is not necessarily a problem. It just indicates that the upload got interrupted at some point, but resumed properly afterwards (that’s what resumable uploads are for). The presence of such errors in tusd’s log usually don’t indicate a problem as the upload procedure will recover from them.

Regarding the mismatching checksums, I haven’t experienced that on my own. Is this a frequent problem for you? Is it somewhat reproducible?

You mentioned the use of a shared storage. Is it possible that data got corrupted on there? If it’s a shared network disk and sticky sessions don’t work properly, the storage access from the different instances could collide with each other.

The tus protocol has methods for exchanging checksums, so the client and/or server can verify the integrity of the uploaded data, but tusd does currently not implement them. We hope to improve support for this in the future.

Hope that helps!

vunguyen1989 · June 27, 2025, 11:12am

Hi, thanks for the clarification.

I’m not entirely sure whether this is due to a sticky session issue or a shared backend conflict — we don’t currently log which tusd instance is handling each request, so I can’t confirm if requests are routed inconsistently.

However, I did grep through our logs and found repeated mismatched offset errors for the same upload ID:

grep "error=" full_log.log
1750239546208	[tusd] 2025/06/18 09:39:06.208923 event="ResponseOutgoing" status="409" method="PATCH" path="/api/v1/tus/upload/e006bd335374fbf6c26c3e492e827683" error="mismatched offset" requestId="" 
1750238685207	[tusd] 2025/06/18 09:24:45.207121 event="ResponseOutgoing" status="409" method="PATCH" path="/api/v1/tus/upload/e006bd335374fbf6c26c3e492e827683" error="mismatched offset" requestId="" 
1750237645615	[tusd] 2025/06/18 09:07:25.615546 event="ResponseOutgoing" status="409" method="PATCH" path="/api/v1/tus/upload/e006bd335374fbf6c26c3e492e827683" error="mismatched offset" requestId="" 
1750237633831	[tusd] 2025/06/18 09:07:13.831126 event="ResponseOutgoing" status="409" method="PATCH" path="/api/v1/tus/upload/e006bd335374fbf6c26c3e492e827683" error="mismatched offset" requestId="" 
1750237625900	[tusd] 2025/06/18 09:07:05.900871 event="ResponseOutgoing" status="500" method="PATCH" path="/api/v1/tus/upload/e006bd335374fbf6c26c3e492e827683" error="read tcp: connection reset by peer" requestId="" 
1750237625900	[tusd] 2025/06/18 09:07:05.900815 event="BodyReadError" id="e006bd335374fbf6c26c3e492e827683" error="read tcp 10.93.30.126:3001->10.93.30.27:62392: read: connection reset by peer"

The first read tcp: connection reset by peer seems explainable (possibly a client disconnect), but the subsequent 409 mismatched offset errors look suspicious — they occurred several minutes apart during the same upload session.

Could this suggest a race condition or desync between multiple tusd instances accessing the same upload resource (file) — possibly caused by inconsistent routing or access to a shared disk?

Do you have any recommendation for logging or guarding against this kind of situation?

marius · June 27, 2025, 11:23am

Yes, if requests are routed to different instances and the tusd instances are not synchronized via a distributed locking mechanism, such issues can appear. I recommend you to read Upload locks | tusd documentation, which explains this in detail.

Of course, 409 can also be triggered if the client is misbehaving and not resuming correctly after interruptions (for example, by not fetching the new offset with a HEAD request first).

vunguyen1989 · June 27, 2025, 2:02pm

Yes, that’s what I’m trying to confirm before moving forward. I want to make sure sticky session misrouting isn’t the root cause.

I’m planning to remove the shared backend altogether — so that each tusd instance has its own storage directory. That should technically eliminate the need for distributed locking, correct?

marius · June 27, 2025, 2:18pm

If the sticky sessions work properly and reliably, yes, then there is no need for a distributed lock. However, when cookie-based stickyness is used, it might not work with clients that ignore cookies (esp. non-browser clients). If requests don’t get routed properly, you will see 404 errors where requests are routed to tusd instance that don’t have the corresponding upload on their local disk.

vunguyen1989 · June 28, 2025, 4:02am

I found something interesting: two ChunkWriteStart events in a row without a ChunkWriteComplete in between. Client did a HEAD, saw the new offset, and started a new PATCH before the last chunk finished — classic race condition.

1750237644875	[tusd] 2025/06/18 09:07:24.875680 event="ChunkWriteComplete" id="e006bd335374fbf6c26c3e492e827683" bytesWritten="16777216" 
1750237634858	[tusd] 2025/06/18 09:07:14.858293 event="ChunkWriteStart" id="e006bd335374fbf6c26c3e492e827683" maxSize="16777216" offset="2374115328" 
1750237633070	[tusd] 2025/06/18 09:07:13.070585 event="ChunkWriteComplete" id="e006bd335374fbf6c26c3e492e827683" bytesWritten="16777216" 
1750237625900	[tusd] 2025/06/18 09:07:05.900837 event="ChunkWriteComplete" id="e006bd335374fbf6c26c3e492e827683" bytesWritten="8527872" 
1750237622826	[tusd] 2025/06/18 09:07:02.826004 event="ChunkWriteStart" id="e006bd335374fbf6c26c3e492e827683" maxSize="16777216" offset="2349973504" 
1750237621063	[tusd] 2025/06/18 09:07:01.063648 event="ChunkWriteStart" id="e006bd335374fbf6c26c3e492e827683" maxSize="16777216" offset="2348810240" 
1750237620305	[tusd] 2025/06/18 09:07:00.305307 event="ChunkWriteComplete" id="e006bd335374fbf6c26c3e492e827683" bytesWritten="16777216" 
1750237610368	[tusd] 2025/06/18 09:06:50.368276 event="ChunkWriteStart" id="e006bd335374fbf6c26c3e492e827683" maxSize="16777216" offset="2332033024"

I think we need lock upload resource event when using sticky session.

P/s: thanks much for your quickly responses.

marius · June 30, 2025, 5:16am

Yes, I think in that case upload locks would definitely be advisable. Please let us know if this improves or solves your problem, so we can adjust the advice we give to people in similar situations.

vunguyen1989 · July 30, 2025, 4:00am

Hi,

I worked with the Infrastructure team and identified the issue: the network-mounted NVMe SSD had performance problems — writing a ~16MB chunk was taking around 10 seconds, which is significantly slow. It’s the issue of Infra team, the writing time should be smaller.

I proposed replacing the drive, and they’ve since resolved the issue. I haven’t heard any further complaints so far, but I’ll keep monitoring it to ensure everything remains stable.

marius · July 31, 2025, 10:11am

Great to hear that you found the underlying issue. Thanks for updating us!

Topic		Replies	Views
Question about Upload-Offset requirements Tus	6	3152	December 8, 2018
Upload from device "tus failed to create upload caused by object progressevent originated from request response code 0 response text" Tus	3	4625	March 12, 2020
Why checksum ? It is not necessary Tus	1	2509	October 7, 2017
Error uploading large file to CloudFlare using Tus Protocol Tus	3	2306	November 17, 2021
Unexpected result re-uploading file Tus	2	73	August 21, 2024

Upload completed, filesize matches, but SHA-256 hash mismatch — video file corrupted

My questions:

Related topics