Building A Better BitTorrent Client In Go

When it comes to peer-to-peer file sharing protocols, BitTorrent is probably one of the best known. It requires a client implementing the program and a tracker to list files available to transfer and to find peer users to transfer those files. Developed in 2001, BitTorrent has since acquired more than a quarter billion users according to some estimates.

While most users choose to use existing clients, [Jesse Li] wanted to build one from scratch in Go, a programming language commonly used for its built-in concurrency features and simplicity compared to C.

Client-server versus peer-to-peer architecture
Client-server versus peer-to-peer architecture

The first step for a client is finding peers to download files from. Trackers, web servers running over HTTP, serve as centralized locations for introducing peers to one another. Due to the centralization, the servers are at risk of being discovered and shut down if they facilitate illegal content exchange. Thus, making peer discovery a distributed process is a necessity for preventing trackers from following in the footsteps of the now-defunct TorrentSpy, Popcorn Time, and KickassTorrents.

The client starts off by reading a .torrent file, which describes the contents of the desired file and how to connect to a tracker. The information in the file includes the URL of the tracker, the creation time, and SHA-1 hashes of each piece, or a chunk of the file. One file can be made up of thousands of pieces – the client will need to download the pieces from peers, check the hashes against the torrent file, and finally assemble the pieces together to finally retrieve the file. For the implementation, [Jesse] chose to keep the structures in the Go program reasonably flat, separating application structs from serialization structs. Pieces are also separated into slices of hashes to more easily access individual hashes.

The bitfield explained as a coffee shop loyalty card.
The bitfield explained as a coffee shop loyalty card.

Next, a GET request to an `announce` URL in the torrent file announces the presence of the client to peers and retrieves a response from the tracker with the list of peers. To start downloading pieces, the client starts a TCP connection with a peer, completes a two-way BitTorrent handshake, and exchanges messages to download pieces.

One interesting data structure exchanged in the messages is a bitfield, which acts as a byte array that checks which pieces a peer has. Bits are flipped when their respective piece’s status changes, acting somewhat like a loyalty card with stamps.

While talking to one peer may be straightforward, managing the concurrency of talking to multiple peers at once requires solving a classically Hard problem. [Jesse] implements this in Go by using channels as thread-safe queues, setting up two channels to assign work and collect downloaded pieces. Requests are later pipelined to increase throughput since network round-trips are expensive and sending blocks individually inefficient.

The full implementation is available on GitHub, and is easy enough to use as an alternative client or as a walkthrough if you’d prefer to build your own.

25 thoughts on “Building A Better BitTorrent Client In Go

    1. You’re right… Randall Munroe invented the stick-figure drawing style didn’t he?

      Somewhere I’ve got infringing artworks that predate xkcd.com by about 14 years.

      1. There is wide speculation that Randall is in fact in possession of an operational time machine.
        Statistically there is a 50/50 chance the inventor of the drawn stick figure is either Randall, or ancient alien astronauts.

      1. You must be the kind of person that only reads headlines. The author of the Go software did the artwork. What Sharon Lin has done here is essentially hotlinked it without attribution.

    1. That’s because you don’t understand Go idiosyncrasy. All Go developers I know (obviously I don’t know them all) like to rewrite things in Go, because what is not in Go is impure.
      That may sound crazy, but it is not specific to Go, I find that it is a common trend on many domains nowadays.

      Note that I’m not talking about people (re)writing a piece of software to learn, like many people writing countless emulators, compilers, frameworks and other tools that contribute to software diversity.
      The Go culture, at least the one I’ve been exposed to, is that of rewriting because it will be better if it is in Go, regardless of the merits of the original implementation.

      I believe this has the roots on Go’s strict rules that, while with good intentions, have the side effect of closing down people’s minds into a monoculture (gofmt for instance…)

      I have seen people disregard the official implementation of a DB driver written in C for instance, and rewrite from scratch in Go, as opposed to just use Go C-bindings.

      Another thing that comes to mind is the so-called “social network effect”: many developers want to have a “badge” to show off, and therefore prefer having a github project of pure Go to show they know the tricks (and often use obscure features, just to show off), than a github page of a wrapper.

      1. I mostly use C but I also use Go a lot, and you’re half-way right.

        When I’m using Go, I want an idiomatic implementation in Go. I don’t really want to link in the C library. For reasons having to do with scaling, mostly. If you just use the C bindings, you might as well use a scripting language for your front end, why use Go? If you do want to use the C binding, you have to wrap it carefully, you probably need a new font end. Tacking on a new front end will usually be more work than a rewrite.

        Go is similar enough to C that it can do well at most of these things.

        That said, in my case I’m generally just porting things, not trying to write some pile from scratch.

        It is intended as an infrastructure language, and it actually tends to limit code sharing in favor of isolation, so there are no surprises in any of this.

  1. Please do not turn this site into another site about software. It is for hacks for hardware.

    From the About page: “Hacking is an art form that uses something in a way in which it was not originally intended”

  2. The biggest problem with Go: the googlers behind it. Witness:

    https://github.com/golang/go/issues/29349

    Tldr version: “TLS 1.3 is perfect, you do not need to be able to disable particular ciphers, and IF there turns out to be a vulnerability/weakness, THEN we’ll work on giving you the ability to disable ciphers.” That’s basically what Filo, a Googler and one of the major authors of the TLS 1.3 standard, said.

    So if your distro uses Go and a TLS 1.3 cipher is found to be weak or vulnerable, you will have to wait for them to implement “be able to turn off TLS 1.3 ciphers”, then wait for patches to get backported by their distro, or do a rush upgrade to the version that is released. Both of which are hugely problematic for a large-scale production environment.

    You have to either be in the NSA’s pocket or incredibly oblivious to not see how poorly thought out his statement is.

  3. I have to take issue with your coffee shop loyalty card explanation of bitfields. I don’t see how it helps explain anything. Saying in the text that each bit represents a piece of the whole, indicating that the client has that piece, was perfectly adequate. All a loyalty card counts is how many, so saying it’s like a loyalty card is completely pointless.

    Einstein said that one should explain things as simply as possible, but no simpler. I’m afraid you missed the last part of that.

    1. A loyalty card represents a scalar quantity, whereas a bitfield is a vector.

      Maybe a stack of n loyalty cards whose values are mapped to 0 and 1 by an indicator function and whose ordering in the stack does not change would be the simplest way to use a loyalty card analogy for n slices.

      1. The only way I see it working is if you have a very strange coffee shop that requires you to buy one of each of eight types of cups of coffee, and they mark off each one as you purchase it.

  4. While none of you do this, am I correct that the vast majority of file sharing is “illegal”? As in, “I ripped a movie or CD and am sharing it.” I’m guessing that those here share files routinely, but you represent an extremely small fraction of file sharing. I understand the purpose, I’m questioning what the volume consists of.

    1. Most people download Linux distro images with torrents these days, because it is so much faster.

      Also container images.

      The whole idea of pulling large files down from one location becomes unfortunate when the file is large.

      In the olden days we had FTP, and if you knew how you could restart a download, but using HTTP you often end up with retransmission of multiple gigabytes for no good reason.

    2. Well linux distros and associated software are big chunks of legal bit-torrent traffic, but i have also downloaded large packages of corporate software over bit torrent as well. Pretty much any file over 500 MB can take advantage of bit torrent, as well as locations with spotty internet connections. Essentially anywhere where the download might be interrupted or take a significantly long time due to low bandwidth of the source would and probably does benifit from using the bit-torrent protocol

  5. Who cares. Forget about “Go” google nonsense. The ORIGINAL BitTorrent was made by ONE person in ONE month (if I remember correctly) in PYTHON.

    I utterly HATE garbage like this. “Oh how wonderful”. “No, no it is not. It is a circle jerk.” Before it was .Net vs Mono, then Ruby vs Python, then SystemD vs OpenRC (still better).

    Now its “Go” vs Rust. There WERE (God (or TFSM for the diversity points) knows where) that 2 sites that had beautiful code samples of common and beautiful samples like a curl with a FFT in several languages.

    Hell, PERL, Python, C++, Ruby, Java where all in one column. To date I don’t know if the Elucidian Geometry RUST puzzle challenge site has been completed or is still up.

    Better? How about decentralizated.
    i2p, and Frostwire already exist.

    This article REEKS of paid Alphabet SHILLERY.

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.