Building Faster Rsync From Scratch In Go

For a quick file transfer between two computers, SCP is a fine program to use. For more complex, large, or regular backups, however, the go-to tool is rsync. It’s faster, more efficient, and usable in a wider range of circumstances. For all its perks, [Michael Stapelberg] felt that it had one major weakness: it is a tool written in C. [Michael] is philosophically opposed to programs written in C, so he set out to implement rsync from scratch in Go instead.

[Michael]’s path to deciding to tackle this project is a complicated one. His ISP upgraded his internet connection to 25 Gbit/s recently, which means that his custom router was the bottleneck in his network. To solve that problem he migrated his router to a PC with several 25 Gbit/s network cards. To take full advantage of the speed now theoretically available, he began using a tool called gokrazy, which turns applications written in Go into their own appliance. That means that instead of installing a full Linux distribution to handle specific tasks (like a router, for example), the only thing loaded on the computer is essentially the Linux kernel, the Go compiler and libraries, and then the Go application itself.

With a new router with hardware capable of supporting these fast speeds and only running software written in Go, the last step was finally to build rsync to support his tasks on his network. This meant that rsync itself needed to be built from scratch in Go. Once [Michael] completed this final task, he found that his implementation of rsync is actually much faster than the version built in C, thanks to the modernization found in the Go language and the fact that his router isn’t running all of the cruft associated with a standard Linux distribution.

For a software project of this scope, we find [Michael]’s step-by-step process worth taking note of for any problem any of us attempt to tackle. Not only that, refactoring a foundational tool like rsync is an involved task on its own, let alone its creation simply to increase network speeds beyond what most of us would already consider blazingly fast. We’re leaving out a ton of details on this build so we definitely recommend checking out his talk in the video below.

Thanks to [sarinkhan] for the tip!

Continue reading “Building Faster Rsync From Scratch In Go”

Optimized Super Mario 64 Offers Exciting Possibilities

When working on any software project, the developers have to balance releasing on time with optimizations. As long as you are hitting your desired time constraints, why not just ship it earlier? It’s no secret that Super Mario 64, a hotly anticipated launch title for the Nintendo 64 console in 1996, had a lot of optimizations left on the table in order to get it out the door on time. In that spirit, [Kaze Emanuar] has been plumbing the depths of the code, refactoring and tweaking until he had a version with┬áserious performance gains.

Why would anyone spend time improving the code for an old game that only runs on hardware released over two decades ago? There exists a healthy modding community for the game, and many of the newer levels that people are creating are more ambitious than what the original game could handle. But with the performance improvements that [Kaze] has been working on, your budget for larger and more complex levels suddenly becomes much more significant. In addition, it’s rumored that a multi-player mode was originally planned for the game, but Nintendo had to scrap the feature when it was found that the frame rate while rendering two cameras wasn’t up to snuff. With these optimizations, the game can now handle two players easily.

Luigi has been waiting 26 years for his chance to shine.

[Kaze] has a multi-step plan for improving the performance involving RAM alignment, compiler optimizations, rendering improvements, physics optimizations, and generally reducing “jankiness.” To be fair to the developers at Nintendo, back then they were working with brand new hardware and pushing the boundaries of what home consoles were capable of. Modeling software, toolchains, compilers, and other supporting infrastructure have vastly improved over the last 20+ years. Along the way, we’ve picked up many tricks around rendering that just weren’t as common back then.

The central theme of [Kaze]’s work is optimizing Rambus usage. As the RCP and the CPU have to share it, the goal is to have as little contention as possible. This means laying out items to improve cachability and asking the compiler to generate smaller code rather than faster code (no loop unrolling here). In addition, certain data structures can be put into particular regions of memory that are write-only or read-only to improve resource contention. Logic bugs are fixed and rendering techniques were improved. The initial results are quite impressive, and while he isn’t done, we’re very much looking forward to playing with the final product.

With the Nintendo 64 on its way to becoming a mainline-supported Linux platform, the old console is certainly seeing a lot of love these days.

Continue reading “Optimized Super Mario 64 Offers Exciting Possibilities”