Grants:IdeaLab/Commons tarballs seedbox
|join other ideas|
Commons tarballs seedbox
An extremely cheap mirror of the Wikimedia Commons tarballs collection at Archive Team.
created on: 19:55, 25 January 2014
- Assumption: mirroring our data is imperative.
- Fact: Our media mirrors are poor, especially for Wikimedia Commons; but the Internet Archive lends us a big hand.
- Problem: as of 2013 IA bandwidth is congested, wiki researchers expressed difficulties in downloading such datasets from them.
- The network was improved in July 2014. From the 2013 ISC Annual Report: «In 2013 we expanded our relationship with our largest Hosted@ partner, Internet Archive to include the provision of 3rd party Internet Transit Services.»
- WMF cannot offer mirrors because it would be a waste of their expensive space (source Ariel Glenn),
- almost all the mirror services of the world were contacted and none could help.
- Idea: the Internet Archive is fueled by cheap 3 TB disks, often bought as discounted external HDD. Computational power and home bandwidth are basically free nowadays. Make a seedbox.
- Alternatives: w:en:User:Emijrp/Wikipedia Archive#Help seed the garden of knowledge.
Assemble a 30 TB seedbox with less than 1500 € and maintain it for free.
Setup and example cost:
- raspberry pi freenas, 30 €
- 10 3 TB external HDD via USB or SATA in a RAID, 1100 €
- an 80plus high quality PSU and a case, donated by me
- some cables for powering and connection, few tens €
- power and a 10/10 Mb/s fiber connection donated by me (home powered by the Alps' dams btw), or 1000/1000 by a friendly university lab's ethernet port.
- stuff can be bought via WMIT to save VAT and not become a personal asset,
- if the experiment fails the hard disks can be donated to the Internet Archive,
- we don't need redundancy and stuff, it's just a copy of something we maintain elsewhere,
- torrent will take care of keeping the copies in good shape (unless there are hardware failures),
- if computational power didn't suffice I can donate an old computer or two to the cause,
- poor disk reading speed is ok, our bandwidth won't be higher than that anyway,
- with all the efforts I always fail to consume my bandwidth, but Milan is the most fiber-rich city of Europe so if there's a need I may get a 100/100 connection with little more out of my pocket,
- a researcher can't take more than few weeks to download all the data, but if torrents are better seeded we can find more home seeders for small chunks of the data.
- virtual servers, don't even talk about it,
- AWS S3, about 2300 $/month.
Welcome, brainstormers! Your feedback on this idea is welcome. Please click the "discussion" link at the top of the page to start the conversation and share your thoughts.
Does this idea need funding? Learn more about WMF grantmaking. Or, expand to turn this idea into an Individual Engagement Grant proposal
Ready to create the rest of your proposal?
Need more help?