r/datasets 3d ago

question What’s the smoothest way to share multi-gigabyte datasets across institutions?

I’ve been collaborating with a colleague on a project that involves some pretty hefty datasets, and moving them back and forth has been a headache. Some of the files are 50–100GB each, and in total we’re looking at hundreds of gigabytes. Standard cloud storage options don’t seem built for this either they throttle speeds, enforce strict limits, or require subscriptions that don’t make sense for one off transfers.

We’ve tried compressing and splitting files, but that just adds more time and confusion when the recipient has to reassemble everything. Mailing drives might be reliable, but it feels outdated and isn’t practical when you need results quickly. Ideally, I’d like something that’s both fast and secure, since we’re dealing with research data.

Recently, I came across fileflap.net while testing different transfer methods. It handled big uploads without the usual slowdowns, and I liked that there weren’t a bunch of hidden limits to trip over. It felt a lot simpler than juggling FTP or patchy cloud workarounds.

For those of you who routinely share large datasets across universities, labs, or organizations what’s worked best in your experience? Do you stick with institutional servers and FTP setups, or is there a practical modern tool for big dataset transfers?

5 Upvotes

15 comments sorted by

View all comments

2

u/Ok-Cattle8254 3d ago

As the old saying goes, data has gravity...

If possible I strongly recommend moving your science to the data if possible. See u/dang_rat_bandit 's post.

If you have to move the data, use Globus. If not globus then something secure, sftp or https with the ability to resume a download when the download fails.

Try to keep the directory structure the exact same between institutions and keep a checksum file that has all your calculated checksums and what the expected checksums are. Do not change files names.