Very slow Hadoop on PowerEdge R815

We have a little internal Hadoop cluster for development and testing, two very powerful Dell PowerEdge R815 with Debian and a bunch of Xen VMs to reproduce a production environment. Problem is that the cluster, even with a relatively small amount of data, was sloooow. And when I say slow I mean almost unusable for Hadoop development (a mapreduce on a small dataset took 5x more than on the big one in production). Even an insignificant

$ hadoop fs -ls

took more than 4s to list the content of HDFS. strace was showing tons of wait() syscalls for no apparent reason, while in the production system the same operation takes 1s and no wait() at all.
After trying almost everything (even without Xen and running Hadoop on the bare metal), I changed by chance a Power Management option in the R815 BIOS. By default it was set to Active Power Controller. Changing it to Maximum Performance did the trick! The ls now takes about a second, just like the production environment. My guess is that probably the default value (which is some kind of automagical load detection) wasn’t able to see that the machine really needed power when running Hadoop, leaving the CPU underclocked to save energy. Maximum power probably is not so green but it solved the problem

PHP 5.3 max_input_vars and big forms

Starting from PHP 5.3.9 there is a brand new php.ini option: max_input_vars. You may read in the PHP documentation about it. But what you don’t probably now is that if you are using the Suhoshin patch (for example if you’re using dotdeb packages), then you need to tweak 2 other variables to increase the max number of POST variables accepted by your PHP.

So, if you want to increase this number to, say, 3000 from the default number which is 1000, you have to put in your php.ini these lines:

max_input_vars = 3000 = 3000
suhosin.request.max_vars = 3000

The other suitable option is to fix your form and make it saner :)

Apache2: seg fault or similar nasty error detected in the parent process

If you happen to see a message like this

seg fault or similar nasty error detected in the parent process

when reloading Apache2, and if you’re using PHP5 through mod_php5, then it may be related to having an extension loaded via php.ini and not really present on the system. It was my case with a redis extension ( and I banged my head a day before finding it.

HOWTO: install puppet-dashboard on Debian Squeeze

This should apply to Ubuntu Server as well (10.10, 11.04) but it’s tested to work 100% on Debian Squeeze 6.0.
Puppet Dashboard is a neat piece of software really useful if you ara managing a good number of hosts without Puppet.

First of all, install the required deps:

# aptitude install ruby rake dbconfig-common libdbd-mysql-ruby mysql-client rubygems libhttpclient-ruby1.8

you’ll probably have lots of them already installed if you are running Puppet master on the same host (which by the way is not mandatory).
Then, download and install the deb package:

# wget
# dpkg -i puppet-dashboard_1.2.0-1_all.deb

Enable the daemon editing the default file /etc/default/puppet-dashboard and then customize your database definition by editing /etc/puppet-dashboard/database.yml which should looks something like this:

database: puppet_dashboard
username: puppet_dashboard
password: secret_password
encoding: utf8
adapter: mysql

if you plan to use MySQL as a backend. Remember to create the database and grant the appropriate privileges to the user

GRANT ALL PRIVILEGES ON puppet_dashboard.* TO 'puppet_dashboard'@'%' IDENTIFIED BY 'secret_password';

Now we have to populate the database, Rails way

# cd /usr/share/puppet-dashboard/
# rake RAILS_ENV=production db:migrate

Now you can start /etc/init.d/puppet-dashboard and /etc/init.d/puppet-dashboard-workers and you should be already able to access http://your-host.yourdomain.tld:3000 and see the Puppet Dashboard.
You just have to do two thing more before you can see any actual data in it: enable report sending in the Puppet clients and tell Puppet Master to pull those reports to the Dashboard via HTTP.

So, edit /etc/puppet.pupept.conf on the clients (I suggest you to do it via puppet if you do not already have this setting in it) and add

# ... whatever you already have

and on the Master side

# ... whatever you already have
reports = store, http
reporturl = http://your-host.yourdomain.tld:3000/reports/upload

That’s it!

esx-halt: shutdown VMWare ESXi from ssh

Long time no post – again – I hope this is the last time and I can be a little more prolific :) Anyway today I want to share a little script that I hacked to shutdown an ESXi (with the free license) host remotely, shutting down in a safely form all the VMs inside it. This could be quite usefull (and in fact it’s why I wrote it), if you want to shutdown ESXi from an UPS daemon when lights go out, and you cannot afford a complete ESXi license, so you’re running the free edition.
The script can be found here at GitHub

and it’s written in bash (I use bash4 but it should run on lesser versions too). On the server side, it works with VMWare ESXi 4.x.

Any question, patch or bug report are warmly welcome :)

Customize the console prompt in VMWare ESXi 4.0

The default console prompt of VMWare ESXi 4.0 really sucks, it’s black&white, it gives no info about the host you are connected to and if you have more than one host this is becomes quickly an headache.
So, how do you change it? Pretty easy:

echo 'export PS1="\[33[01;32m\]\u@\h\[33[00m\]:\[33[01;34m\]\w\[33[00m\]\$ "' > $HOME/.profile

then exit from the shell (ssh or local) and enter again and you will have a pretty nice colored console prompt :)

EDIT: ok, it seems that I cannot post “backslash zero” with WordPress. so please put before any “33” in this string “backslash zero” (the symbol and the number, not the two words). Thanks to Daniel for pointing this out. If you know a way to solve this, please share it :)