Implementation of the BitTorrent protocol in Golang
{
"announce": "https://torrent.ubuntu.com/announce",
"comment": "Ubuntu-MATE CD cdimage.ubuntu.com",
"creation date": 1581513754,
"info": {
"length": 1964244992,
"name": "ubuntu-mate-18.04.4-desktop-amd64.iso",
"piece length": 524288,
"pieces": "..."
}
}
{
"created by": "kimbatt.github.io/torrent-creator",
"creation date": 1585615864,
"info": {
"files": [
{
"length": 324746,
"path": [
"6VEmY3n.jpg"
]
},
{
"length": 80702,
"path": [
"203wnl7t91321.jpg"
]
},
{
"length": 331746,
"path": [
"5xzu25hn1lm21.png"
]
}
],
"length": 0,
"name": "test",
"piece length": 16384,
"pieces": "..."
}
}
For .torrent
files downloaded from the Internet Archive, the client requires web seeding support. This is clearly indicated
in the torrent files downloaded. This has yet to be added to torrent.go
so there might be a limitation to the features of
the system.
The peer protocol is performed over a TCP connection. As there may be multiple peers to be connected to at once, each peer connection will maintain its own flow.
The key actors of the process is the client and peer.
- Client attempts to establish a TCP connection to the peer
- If the peer is available, the peer will allow the TCP connection
- Client sends the a handshake request to the peer with the following format:
<pstrlen><pstr><reserved><info_hash><peer_id>
- If the peer wants to communicate with the client, it will respond with a response using the same format; this will mean that the TCP connection has been established and the client can begin requesting the peer for pieces of the file
- Client and peer will communciate in an alternating fashion
- If the client already has some pieces of the file, it can send a
bitfield
request to the peer; this request informs the peer of the bits that it already has
Ideally, the moment the client establishes the connection with the peer, it will inform the peer that it is interested and
is not choked. Then, begin to request for the pieces of data. Once a piece is received, the client will use the pieces bytes
from the torrent file to verify if the received piece has a SHA1 hash that is matches the .torrent
file. Once verified, the
client can inform the peer that it has the piece via a have
request. Repeat the request process while the client remains
interested and the peer remains unchoked.
After some investigation and grokking, I had come to the realisation that the original idea of how peers communicated was naive. I will be focusing on the P2P aspect of the communication, assuming the handshake and connection has already been established.
To split the pieces, we need to think of it as such. A file is split into pieces. We know the order of these pieces because we
have the metainfo of the torrent file. This gives us two pieces of information - piece position and SHA1 hash. We can also deduce
how long the piece will be as the metainfo announces to us the piece length
. We know that we have to query each piece individually,
so we want to split this pieces array out and create a piece of work to perform. This piece of work will contain the index of the
piece in the file, the hash of the piece, and the length of it. The reason why we need to add the length is because the last piece in
the file might be slightly less than the others. So it's best to include it.
Then we know that each piece is split into blocks. According to the documentation, current implementations of block sizes usually
have the size to be 16kiB. This also means that we have to keep track of which blocks we receive. This is easier because we can do
this sequentially. To track the piece progress, we need several things, we need the piece we are tracking - or the index of the
piece, the current client connection, how much is downloaded, the data that is downloaded, and finally what we are requesting. To
keep track of what we have downloaded, we continually request for blocks with the fixed block size. As we are downloading these
bytes, we will add to the downloaded section. Ideally, by the end of the download, the downloaded should be the same length as the
piece length. The requsted size is the same as the downloaded size and we must increment both. The only difference is that the
requested size increases after each REQUEST
while the downloaded size increases after each confirmed PIECE
.
Once we download the entire piece, we then check the piece against the SHA1 hash. Once validated, we can send a HAVE
to the peer
to confirm the download.
package pieces
type pieceWork struct {
index int
hash [20]byte
length int
}
To implement the logic in goroutines, we fill up a channel with the pieces we want to retrieve. Then, we have another channel to receive the downloaded pieces. For the work queue, we can actually just iterate over the channel until it is empty. For the downloaded pieces, we need the index of the piece and the bytes of the piece:
package pieces
type piece struct {
index int
buf []byte
}
Once the handshake is completed, the peer we are connecting to must send us a bitfield. The definition of a bitfield was
originally quite confusing. While it was clear that it represents the pieces present on a peer, it doesn't specify who should be
the one emitting the initial bitfield or how it interacts with the algorithm. Turns out, the bitfield is used to represent what
pieces a peer has. This means that in order to understand what pieces I can request from the peer, the client must keep track
of what pieces the peer has offered. This will represent which pieces we can request from the peer. BUT it is also important to
know that the peer might not advertise all pieces at once. As such, it can also use HAVE
messages to inform the client that new
pieces are available.
- How does BitTorrent work - Overview
- Understanding the BitTorrent specification
- Implementing BitTorrent using AsyncIO in Python
- Torrent files
- Specification
- How to write a BitTorrent client
-
Set the valid parsing of the bencoding to use defers to catch the appropriate panics - Add data validation for bencoding decoding
- Add encoding to bencoding format
- Update test suite and any other location where "parse" is used instead of decode
- Retry downloading if the original torrent failed
- Support web seeding
- Set up event bus system for when adding CLI
- Add proper logging and error handling to allow for dynamic system
- Clean up bencoding code (e.g. place the counter variable in the for loop)
- Flatten out the folder structure to allow the torrent related files to be split out respectively
- Use
select{}
withtime.After
to toggle a retry mechanism for server retries