Mirroring Wikimedia project XML dumps/Torrents
rTorrent is quite efficient at handling thousands of torrents and can be useful at spreading toollabs:dump-torrents on the trackers and DHT networks so that they're indexed on various BitTorrent search engines.
Interface
editIt's easy to download all torrents at once:
wget -r -np -nH -nd -A torrent https://tools.wmflabs.org/dump-torrents/
This can take 1 hour from Labs or some hours on another server:
Total wall clock time: 4h 26m 15s Downloaded: 128558 files, 325M in 41s (7.86 MB/s)
Then just press enter and type the pattern of the files then enter, e.g. *pages-meta*torrent or *7z*torrent. See https://github.com/rakshasa/rtorrent/wiki/User-Guide for more.
You can then start all torrents at once.
You can also use a watch directory and copy/move there the torrents you want to add.
Performance
editAdding several thousands of torrents at once is likely to "freeze" the rtorrent interface for a while and make it work at 100 % CPU for several minutes, but it usually recovers eventually. At startup, "loading" all the torrents previously added may take a minute every couple thousands torrents.
As rTorrent 0.9.2/0.13.2, it does not support webseeds, so it will just leech any added torrent and keep a 0-byte file open for each.
[Throttle off/ 20 KB] [Rate 5.3/ 5.4 KB] [Port: 6922] [U 0/0] [D 0/14] [H 0/5000] [S 1615/1617/15000] [F 9435/50000]
With 3 trackers per torrents (including a broken one) and 10k torrents, quite a bit of connections are generated: about 30k at startup and 10k at most times.
$ lsof -i -c rtorrent | wc -l 12864
Most connections (most trackers and DHT) are UDP:
$ lsof -i -c rtorrent | grep -c UDP 15849
Changing the IP of broken trackers in /etc/hosts or reducing the curl timeout might help. By default rtorrent doesn't timeout connections much (perhaps because many private trackers are quite slow).
When "idling" as above with around 10k torrents, rtorrent uses about 1 GB RAM (700 MB RES) and about 15 CPU-minutes per hour on a Intel(R) Xeon(R) CPU E5-2650L v3 @ 1.80GHz.
When adding all the *xml*torrent files (about 34k), with 3 trackers each, rtorrent consumes about 1.8 GB RAM (1.5 RES) and seems to spend 100 % CPU sending announcement to trackers, without actually succeeding.
[Throttle off/ 20 KB] [Rate 0.0/ 0.0 KB] [Port: 6900] [U 0/0] [D 0/14] [H 0/5000] [S 1305/1307/15000] [F 33681/50000]
DHT
editDHT, unlike trackers, requires rtorrent to be connectable (public IP or port mapping, port open in the firewall).
To ensure that DHT is working, check the tracker.log file. If not, DHT may need to be bootstrapped with ctrl-x, dht.add_node = dht.transmissionbt.com , enter.
With the tested version of rtorrent, however, having thousands of torrents in DHT is likely to result in segmentation faults: https://github.com/rakshasa/rtorrent/wiki/Using-DHT#segmentation-faults
To verify that DHT is working and rtorrent can be reached to fetch metadata, add the info_hash on your torrent client at home and see if you get the torrent name etc. (On Transmission: ctrl-U, paste the hash, enter.)
Configuration
editIn /etc/security/limits.conf (Debian), have something like
torrent-user-name soft nofile 50000 torrent-user-name hard nofile 100000
The ~/.rtorrent.rc can be something like:
directory = ~/rtorrent session = ~/.rtorrent.session/ dht = auto # We don't actually want to fill our disk throttle.global_down.max_rate.set_kb = 20 schedule = watch_dir, 20, 10, "load.start=~/rtorrent/autodownload/*.torrent" network.max_open_files.set = 50000 network.max_open_sockets.set = 15000 network.http.max_open.set = 5000 # No point waiting multiple seconds for DNS network.http.dns_cache_timeout.set = 2 log.open_file = "rtorrent.log", "/var/log/rtorrent/rtorrent.log" log.open_file = "tracker.log", "/var/log/rtorrent/tracker.log" log.add_output = "info", "rtorrent.log" #These are very spammy, useful to see every single connection to trackers with tail -F /var/log/rtorrent/tracker.log #log.add_output = "dht_debug", "tracker.log" #log.add_output = "tracker_info", "tracker.log"
Deluge alternative
editDeluge is quite easy to use from the command line (see some advice) and probably harder to crash: it should be ok at least to seed the 7z torrents, which are about 3 thousands, but struggles a bit. It keeps less connections open and manages to publish torrents via DHT without a public IP nor port forwarding.
sudo apt install deluge-console deluged # Make sure the torrent/download dirs and limits in ~/.config/deluge/core.conf make sense, e.g. don't use NFS deluged screen -d -m deluge-console wget -r -np -nH -A torrent https://tools.wmflabs.org/dump-torrents/ cd dump-torrents/ ; for torrent in `find * -name "*7z.torrent"` ; do DIR=$(dirname $torrent); deluge-console "add -p /public/dumps/public/$DIR $torrent" ; sleep 10s ; done
Note, even just adding torrents consumes quite a bit of IO (probably from the ~/.config/deluge/state files, which are accessed frequently): make sure to have your config directory on a fast mount.
TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND 9451 be/4 nemobis 5.96 M/s 0.00 B/s 0.00 % 9.50 % python /usr/bin/deluged 9450 be/4 nemobis 0.00 B/s 890.21 K/s 0.00 % 0.23 % python /usr/bin/deluged
The daemon also tends to become unresponsive after a few hundreds commands: just kill it nicely and restart it, then resume your deluge-console or commands. When restarting the daemon with thousands of torrents, over a hour may be needed to fully resume.
The daemon sometimes opens way less connections than expected and may need to retry the announcements to trackers a few times. With about 8000 torrents, deluged may consume 150 % CPU just for "idling"; in such a case, reducing max_active_seeding and rotating torrents more quickly in the queue ([1]) may counterintuitively increase the number of announcements sent.
To see if there is any problem you don't even need to fire up an interface, you can just query Deluge with commands like
deluge-console "info -s Error"
Some torrents may be stuck in an error state just for a failure to check the local data, so we can recheck them all with a command like
for torrent in `deluge-console "info -s Error" | grep -B 5 "Progress: 0.00" | grep ID | sed "s/ID: //g"` ; do deluge-console recheck $torrent ; sleep 1s ; done
Fancier stuff is possible with Deluge RPC etc.