Wikimedia hardware status

This page is no longer updated; please see https://wikitech.wikimedia.org/wiki/Main_Page

The Wikipedia server setup is described at m:Wikimedia servers.

This page is intended to contain a log of significant changes to the hardware and software setup, to produce a working history of the project. When it gets full, please move the page to one with a suitable archive name, including the year, and start a new one.

June 2005

A huge number of changes since last the entry; Florida site relocated; Amsterdam site on-line; there are now 83 servers spread around the world; 67 in Florida, 5 in Paris (2 down), 11 in Amsterdam.

There have also been many other substantial software and configuration changes; see the server admin log for these.

March 2005

The image server is currently overloaded; system administrators are working to cut down on load prior to installing more efficient software and more powerful hardware to handle this task. See Image server overload 2005-03 for details...

February 2005

February 12

A new Cisco gigabit network core switch has been installed, which should increase system performance and manageability.

January 2005

January 9

Three Squid cache servers (chloe, ennael, bleuenn) located near Paris, France, are put in general active usage. Users are routed through GeoDNS to this cluster or the Florida cluster depending on their country.

(Transitorily: ennael has memory problems and is off, and reduced bandwidth to some other European countries prevents result in service only available to French users.)

December 2004

22 December

5 new 3GHz P4/3GB RAM Apache web servers now in service.(Benet, Biruni, Rose, Smellie, Anthony)

After initial service to prove reliabililty some will be used for Memcached as well as Apache; or for Squid.

10 December

Two new database slaves (Holbach, Webster) are now in service. These roughly double our slave database capacity.

November 2004

3 November

5 new webservers installed (kluge, khaldun, hypatia, humboldt, averroes)

October 2004

27 October

Wikipedia's servers are currently undergoing extensive maintenance to try to resolve earlier problems. Until these problems are fixed, the site will be very slow.

7 October

2GB of RAM from bart moved to browne leaving browne with 4GB. Bart has 4GB of new Kingston (?) RAM now. Bart and bayle switched from Apache to Squid service.

September 2004

10 September

New nfs storage server (albert) and database search server (bacon) installed.

August 2004

20 August

Eight new Apache servers installed.

4 August

New hardware is ordered see http://mail.wikimedia.org/pipermail/wikitech-l/2004-August/011886.html

July 2004

15 July 2004

Memcached now has more RAM to increase the potential hit rate of the parser cache. 15 instances each of 512MB are in use: 2 on bart, 2 on bayle, 4 on yongle, 4 on rabanus and 3 on will for a total of 7.5GB.

11 July 2004

Today the site was switched from using Suda as the main database server to using Ariel. It should be faster now. Suda used six 10,000 RPM SCSI disks and 2GB of RAM for the database. Ariel is using six 15,000 RPM disks and 7GB of RAM.

June 2004

25 June 2004

Full text search caused performance problems, with queries to the database backing up to the point that normal page reads were threatened (database connection errors when the DB runs out of connections). Building the first index for ja made ja editing impossible for 90 minutes and slowed all other wikis greatly today. The programming for the update which scans the whole table to update the index has been improved to reduce the chance of this problem.

23 June 2004

Zwinger had trouble keeping up with the database load without hurting other site performance. Will is now being used as a backup database slave instead.

June 22 2004

In addition to its other duties, Zwinger is now operating as a database slave, intended to be used for backup operation only. Read only is set, to prevent accidental writes.

June 21 2004

Ariel is now operating as a database slave, set to read only, and is gradually having read operations switched to it. Initially, watch list queries were switched. Full text search was being turned on as the required indexes were rebuilt.

June 12 2004

The addition of a third Squid cache, maurus, seems to have improved system performance. [1]

June 11 2004

The server was down for maintenance from 18:00 to 18:30 UTC. Reason: "To reboot Zwinger with an updated kernel which will fix the disk driver. This was planned to improve performance; as the main file (not database) server the sluggish disk is a bottleneck." as per [2]. Zwinger wasn't using DMA for its ATA disk. It is now.

June 10 2004

One of the new machines, Moreri, has been changed from Apache web server to Squid cache server.

May 2004

May 27 2004

The 2U server has been delivered and three of the new servers will go online today. See this post on Wikitech-l for full details.

May 26 2004

4 new servers are installed -- 1U p4 machines with 4 gig of RAM each, and 80GB drives. One of these will be changed into a mirrored RAID with 2x200Gb. Memtest86 will be run overnight on May 26th. The 2U is due in on Friday.

May 12 2004

Server will be down for maintenance on 2004-05-12 from about 02:00 to 03:00 UTC.

The replicated database on curly fell out of synch last week. To replace the index file and get the link back up will require briefly taking down MySQL on suda. During this downtime, some of suda's data files will be moved onto its new hard disk. Further details

May 4 2004

A short amount of downtime is experienced on May 4 as hardware upgrades take place
A RAID card is added to zwinger and a 2nd 250G drive.
The database is migrated to a new raid5 array
Browne passed its memory tests and is back online with 2G of ram.

May 3 2004

memtest86 is being run on browne overnight on May 3
Jimbo will insert the RAID card into zwinger and mirror that
The RAID 5 array on suda is rebuilt
suda will be taken down "for a moment" to setup the new raid 5 on the 3x146GB drives

Older news

Upgrades will be taking place on 30 April from 16.30 UTC onwards. This includes installing a single 250GB drive into zwinger in preparation for someone to migrate us to that bigger disk, and installing 3x146gb scsi drives in suda in a raid 5 array. The machine may be down momentarily in order to set up the RAID 5 array in the bios. During this time, curly will be used as a database server, which will cause some slowness
Browne is (mostly) down. Coronelli and curly are acting as squid servers.
Front-end squid server coronelli has recently been having problems, leading to severe slowdowns, but it is now back in service
The database server geoffrin has had a number of problems, and a temporary replacement, suda, is being used instead.
Why Wikipedia ran slow in late 2003 and January 2004.

Real-time monitoring

Live front end server graphs (down)
Ganglia cluster toolkit monitoring pages (down)