Wikimedia servers network proposal

This article is from summer 2004 and is kept for historical purposes

This article is for describing, commenting on and improving the network diagram proposition for the Wikimedia servers made by Hashar.

The work is based on the June 21st 2004 architecture, adding a bastion host, another nfs server and a network management.

Purpose

Please say here why these splits are desirable. Reasons such as:

allowing more IPs via the private network addresses. We're using many now, will use many more as we grow and it's good to be able to do that without worrying about asking for more globaly accessible IPs.
dividing traffic between networks. Potentially faster if they become highly loaded.
making it impossible for the outside world to directly reach the database, Apache and NFS machines, so they are more safe from attacks which might hurt the whole system.

Archi

The three squids keep their six public ip addresses and are directly connected to internet (public production network). The new bastion host is assigned a public ip as well.

The squids second nics are connected to an internal switch together with the apaches, DBs, NFS servers (internal production network).

Another network is added for management purposes (green lines on the picture). There is two possibilities for the implementation of this network:

hardware: add a NIC on each server and connect them to a new switch.

software: use virtual LANs to separate production and management traffics.

I (hashar) think the software way is the cheaper one. But then if the switch dies the whole cluster is blocked and we will not be able to see what's happening.

IP assignment

In order to save some public IPs, we can switch the management and the internal production networks to use rfc 1918 ips. Hashar's proposition is to use 10.0/16 for internal production and 172.16/16 for management purpose. Each network is then subnetted according to servers functions : 1 for squids, 2 for apaches, 3 for database and 4 for NFS servers.

To assign an ip to a server interface you just have to ask yourself:

What kind of interface is it ?

production : ip starts with 10.0
management : ip starts with 172.16

What kind of server is it ?

squids : the ip continue with .1
apaches : the ip continue with .2
database : the ip continue with .3
nfs : the ip continue with .4

How many server of that kind do we already have (n) ?

end of ip is n + 1.

Examples:

The ip for the maintenance interface (172.16.x.x) of the second ( x.x.x.2) squid (x.x.1.x) is 172.16.1.2 .

The ip for the internal production interface (10.0.x.x) of the fourth (x.x.x.4) apache (x.x.2.x) is 10.0.2.4 .

A local domain could be used as well to ease things a bit more. Let's say wikiloc. then add manag and prod sub domains and put the ips under.

Examples:

We want the apaches to connect to a machine called suda : use suda.prod.wikiloc .

We want to poll snmp for a machine called zwinger : use zwinger.manag.wikiloc .

Bastion

With the given setup, it will only be possible to ssh on the squids. If the squids crash / got high load and you need to make changes to a server behind (like database) it will either be impossible or very long to do (cause of ssh passing through squid being slow). Passing through another dedicated server will help fix the network in case the squids crash.

If the bastion crashes, there will be no way to access the cluster. So one of the squids could be made a secondary bastion.

Another task of the bastion could be polling / monitoring the other servers. The already used mrtg / ganglia could be migrated there. All architecture docs, process, maintenance planning could be held there.

Advantage : when a sysadmin want to check the cluster he(she) just have to log on the bastion and can check everything from there without even accessing the other servers.

A future console server can be connected to the bastion host.

Splitting Squids from private network

The Squids are the machines which are most exposed to potential attacks. If the Apaches and databases used a different network for their traffic from that used between Squids and Apaches it would be impossible for a compromised Squid to see confidential traffic exchanged between Apache and database. Not part of this plan (?). Should it be? Presumably same physical network as management? The use of the second network for Squid management effectively makes this impossible because it requires that the Squids be on this internal network. Perhaps better to have this traffic split and access the Squids directly rather than via the management network? Squid logs and such would have to be transferred via the bastion machine, since this assumes that the Squids are compromised? Or to some other machine with a connection to the Squid-Apache network. With this split, an attacker would need to compromise the Squids first, then the Apaches from the Squids. Less likely than finding just one vulnerability.

A previous draft was splitting the internal production network in two part ( [1] or draft #2 at bottom of this article).