pbzip2: parallel bzipping

Probably this software existed for a quite long time but I didn’t know its existence ’til now: pbzip2
it’s basically a bzip2 algorithm implementation with pthreads support. This mean, in a always more SMP world, that you can greatly improve your bzipping perfomances (divide the zipping time by the number of cores you have et voilà!)

Compression syntax is totally compatible:

$ pbzip2 big.file

while to unzip you have to do

$ pbzip2 -d big.file.bz2

Use with caution (or with -l and -p switches) cause you can easily saturate your 4xSix-cores monster.


4 thoughts on “pbzip2: parallel bzipping

  1. I love pbzip2, been using it for a couple of years now, after stumbling across it randomly. Even on dual-core it can nicely shift the bottleneck away from the CPU and plonk if firmly in the direction of disk IO. It’s in the Ubuntu repositories, haven’t checked to see if it’s in any other distro’s repo.

    Utilizing /dev/shm to remove the disk bottleneck and using a 559Mb mysqldump output, on a core2duo box:

    pbzip2 compress:

    real 1m42.135s
    user 3m12.756s
    sys 0m4.273s

    pbzip2 uncompress:

    real 0m15.968s
    user 0m26.244s
    sys 0m2.834s

    bzip2 compress:
    real 2m56.450s
    user 2m55.226s
    sys 0m0.435s

    bzip2 uncompress:

    real 0m19.909s
    user 0m19.219s
    sys 0m0.651s

    • It’s in Debian repos as well and on the author’s page there are precompiled packages for almost any distro out there. I did some benchmarking as well on my dual core desktop machine and the results are similar to yours.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s