User:Brooke Vibber/dbzip2

bzip2 is hideously slow; while looking around for ways to possibly speed it up I stumbled on pbzip2, which exploits the block-based nature of bzip2 compression to parallelize it for potentially linear speedups with an increased number of CPUs.

There are two downsides: first, there are some compatibility issues with third-party decompressor software (probably resolvable), and second it only scales until you run out of local CPUs.

For kicks I'm prototyping a distributed version, dbzip2. In theory, a lot of fast machines on a LAN should be able to speed things up even further.

I've been testing with a Wikipedia dump file that's about a gigabyte, compressing down to about a quarter that.

On my dual-core G5 Power Mac, regular single-threaded bzip2 compresses the file in about 356 seconds (3.122 MB/s input processing).

pbzip2 set for two threads runs about 50% faster; I clocked it at 234 seconds (4.750 MB/s).

My dbzip2 prototype does similar with two local threads, though a touch less efficiently:

  • Wrote 1295 blocks in 250.9 seconds (4.430 MB/s in, 0.988 MB/s out)

And I can squeeze a few more percentage points out by using remote compressors on the other machines I had lying around:

  • Wrote 1295 blocks in 188.8 seconds (5.887 MB/s in, 1.313 MB/s out)

The breakdown edit

 

Most of the data went to the two local threads on my Power Mac:

  • local thread: processed 447 blocks in 188.7 seconds (2.033 MB/s in, 0.458 MB/s out)
  • local thread: processed 444 blocks in 188.7 seconds (2.019 MB/s in, 0.451 MB/s out)

My old Linux box, an Athlon XP 2400+ on 100 Mbit ethernet, took a respectable share:

  • rdaneel:12345: processed 237 blocks in 188.7 seconds (1.078 MB/s in, 0.238 MB/s out)

CPU usage was pretty high though not maxed out (80-90%), likely due to the tenth of a second spent transferring each 900 KB input block and its compressed output over a 100 Mbit network.

Running the whole file locally it can process 1.344 MB/s.

My old Powerbook (1 GHz G4) and newer Mini (1.5 GHz Core Solo) sit on wireless, which is a lot slower:

  • verda-majo.local:12345: processed 61 blocks in 188.7 seconds (0.277 MB/s in, 0.060 MB/s out)
  • philo.local:12345: processed 106 blocks in 188.7 seconds (0.482 MB/s in, 0.106 MB/s out)

These two were clearly limited by the slower network, with CPU usage around just 20-30%. Still, every bit helps!

Conclusions edit

This shows promise for quickly compressing large files when fast machines are available on a fast network. Slow networks strongly reduce the benefits, but on switched gigabit ethernet things should be nicely CPU-limited even with several fast machines.

The main remaining issues:

  • Whether the bzip2 streams can be stitched together for improved compatibility with decompressor apps
    • They can! See current info at mw:dbzip2. --brion 00:21, 31 May 2006 (UTC)
  • Whether similar distribution can be applied to 7zip