We have a little internal Hadoop cluster for development and testing, two very powerful Dell PowerEdge R815 with Debian and a bunch of Xen VMs to reproduce a production environment. Problem is that the cluster, even with a relatively small amount of data, was sloooow. And when I say slow I mean almost unusable for Hadoop development (a mapreduce on a small dataset took 5x more than on the big one in production). Even an insignificant
$ hadoop fs -ls
took more than 4s to list the content of HDFS. strace was showing tons of wait() syscalls for no apparent reason, while in the production system the same operation takes 1s and no wait() at all.
After trying almost everything (even without Xen and running Hadoop on the bare metal), I changed by chance a Power Management option in the R815 BIOS. By default it was set to Active Power Controller. Changing it to Maximum Performance did the trick! The ls now takes about a second, just like the production environment. My guess is that probably the default value (which is some kind of automagical load detection) wasn’t able to see that the machine really needed power when running Hadoop, leaving the CPU underclocked to save energy. Maximum power probably is not so green but it solved the problem