Some general questions about AWS-S3-Multipart and React

#1

Hi all, I’ve read through most of the documentation and have had trouble getting the React components to render at all. In fact, after a couple of full days, I’m still unable to get even the most simple example code working in my app. I’m seeing the very same errors described here: https://github.com/transloadit/uppy/issues/1506

I’m concerned about using Uppy now, because my use case is not a simple one – I’m concerned that I may not be able to get things to work. Also, I have general questions about what it is exactly that I need to implement.

So if I can describe what I’m trying to do and then ask some questions, I’m hoping I can get some clear guidance on how to go about doing it (what Uppy plugins I need to use, etc). Thanks in advance…

My situation is that I have a React app and am using Redux, and the server API layer and back end is .NET Core with a PostGres DB. We already have an upload form for CSV files, and it works fine. I post the form to the REST API, and the .NET Core server processes the POST and writes the contents of the CSV upload to PostGres. The entire upload is written into memory and discarded when the processing is complete; the upload is never actually written to disk. We can process 50MB CSV files this way, no problem.

However, now we want to scale up the app to be able to process 500MB CSV files. So I need to build a new ingestion pipeline. I would like to use an open-source S3 container and use AWS S3 multipart to store the 500MB files. And I plan to build a new backend .NET IngestionService.exe that runs as a separate process from the web server. The new .exe would watch the S3 upload bucket and process new files when they have finished uploading. The .exe would be responsible for CSV validation and writing the data to PostGres.

So I’ve added a MinIO open source, S3-compliant server through Docker, and it’s running fine in my dev environment. My thought was that I could use Uppy’s AwsS3Multipart plugin to handle the chunked uploading directly to MinIO. The expectation is that the upload could be cancellable and resumable, so I figure I’d want to use the ProgressBar plugin as well.

What is confusing me is what I need to do to get S3 multipart working. It seems like the couple of examples I could find used Companion. I’m unsure of what the role of Companion is exactly. Do I absolutely need it? If it makes things easier, then of course I’ll use it. But unsure why it’s necessary and how to set it up? (For example, what are the env variables CompanionAccessKey, etc? Is that a separate S3 instance? Or the same S3 credentials as the existing MinIO instance I already created?)

Companion seems to allow transfer of third-party sources (Dropbox files, etc). So the S3 option of Companion seems to allow transfer of files from a third-party S3 bucket – which is NOT what I’m trying to do. Is this a correct assumption? I’m not trying to hook up an S3 bucket to another S3 bucket.

I’ve read in a few places that GoldenRetriever is necessary for management of file-chunking. Is this true if I’m using the AwsS3Multipart plugin? (Also, I saw that GoldenRetriever currently isn’t able to allow for resumable uploads for S3 files of a certain size. Is this true? Makes it seem like I can’t do resumable S3 multipart of a 500MB file, period.)

Also, it appears that I need a C# script to sign the file before writing to the bucket? This is where my understanding becomes really hazy. Does the signing HAVE to come from the server? Is there any sample code in .NET of S3-signing? Can I just grab S3-signing code from elsewhere?

Signing seems necessary eventually, but the app doesn’t have to implement https at first. That’s something I can add at a later time.

I’m not even worrying about React/Redux at this moment. I’d be happy to get a proof of concept running just using plain HTML and javascript, and worry about integrating the POC into the Redux app later. Again, many thanks.

#2

Companion is kind of a grab-bag of different functionality—basically whatever Uppy may need in terms of server components. The main two functions at the moment are:

  • Importing files from third parties to upload them to a Tus or HTTP endpoint
  • Acting as a signing server for simple uploads to S3
  • Acting as a “proxy” server for API calls, for multipart uploads to S3

The reason we have a signing/API server is to avoid users having to put their S3 credentials into their client-side JS, which can be very unsafe, especially if the credentials also have access to other buckets or worse. Basically, every S3 API call goes through Companion, so secrets stay on the server where people can’t reach them.

You can use Companion and only provide S3 credentials; then all of the importing features will not work, but that’s fine if you don’t need them. You could also opt to implement the multipart API calls yourself, which should not be difficult using an AWS SDK for C# (I’m guessing they have one!). Then you can override the relevant AwsS3Multipart options to use your server’s endpoints instead.

The credentials should be credentials that allow Companion to use the API, I’m unsure which those are for MinIO and couldn’t find out with a cursory search.

As for Golden Retriever, that is the plugin that handles saving state across page refreshes, so you need it for resumability if the user may eg. close the page and come back later. For temporary connection problems and the like, so long as the user doesn’t refresh the page, the AwsS3Multipart plugin works on its own. Golden Retriever indeed doesn’t support storing state for large files yet.

I hope that clears up at least some of the vagueness, I didn’t get too deep into the weeds because I don’t know C# or MinIO, but if you need more specifics let me know!

#3

Hi, thanks for the helpful info. I was reading through the AWS Multipart documentation last night, and it’s a little overwhelming. What I’m trying to do for now is just get something running. I think using Tusd with an open source S3-compatible container would be fine with me. That way, I can still have multipart and resumable and cancellable features.

So I do have a local tus daemon running, and just a plain HMTL page with a script reference to uppy.min.js. It seems to be MOSTLY working, but there is an issue I’m running into and will post here today.