Very slow Hadoop on PowerEdge R815

We have a little internal Hadoop cluster for development and testing, two very powerful Dell PowerEdge R815 with Debian and a bunch of Xen VMs to reproduce a production environment. Problem is that the cluster, even with a relatively small amount of data, was sloooow. And when I say slow I mean almost unusable for Hadoop development (a mapreduce on a small dataset took 5x more than on the big one in production). Even an insignificant

$ hadoop fs -ls

took more than 4s to list the content of HDFS. strace was showing tons of wait() syscalls for no apparent reason, while in the production system the same operation takes 1s and no wait() at all.
After trying almost everything (even without Xen and running Hadoop on the bare metal), I changed by chance a Power Management option in the R815 BIOS. By default it was set to Active Power Controller. Changing it to Maximum Performance did the trick! The ls now takes about a second, just like the production environment. My guess is that probably the default value (which is some kind of automagical load detection) wasn’t able to see that the machine really needed power when running Hadoop, leaving the CPU underclocked to save energy. Maximum power probably is not so green but it solved the problem

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s