nginx and the too many open files limit

So, nginx is fast, nginx is light, nginx is great but… nginx can be nasty too, with undocumented unexpected behaviors. What happened today? We were putting in production a new reverse proxy based on nginx 1.4.1 and one of the obvious thing you do when putting it in production is to raise the nofile limit from the standard 1024 (at least on Debian). So, you expect that nginx will inherit those pam_limit little numbers but… no!! If you check out /proc/$nginx_worker_PID/limits you will see that nofile is still 1024. So, obviously someone is cheating. Looking at the nginx documentation about nofile you will see that the interesting option worker_rlimit_nofile has no default value, so one would think that this value would be inherited from the system conf but, as you have already figured out, it’s not that way. You have to explicitly set, for example:

worker_rlimit_nofile = 100000;

in your main part of nginx.conf to have it as you wish. BTW this overrides limits.conf even if you put a lower value in limits.conf, so if you’re using nginx >= 1.4 just tune this configuration option to solve the “too many open files” problem.

Xen, XFS and the barrier error

Probably nobody is going to hit this but since I’ve lost half a morning puzzling my mind about this error, I think it would be useful to blog about it.
We have some old Xen machine running Debian Lenny with Xen 3.2 and while upgrading a VM using a XFS root partition to Debian Wheezy (it was a Squeeze), I’ve got this nice error:

[ 5.024330] blkfront: xvda1: barrier or flush: disabled
[ 5.024338] end_request: I/O error, dev xvda1, sector 4196916
[ 5.024343] end_request: I/O error, dev xvda1, sector 4196916
[ 5.024360] XFS (xvda1): metadata I/O error: block 0x400a34 (“xlog_iodone”) error 5 buf count 3072
[ 5.024369] XFS (xvda1): xfs_do_force_shutdown(0x2) called from line 1007 of file /build/linux-s5x2oE/linux-3.2.46/fs/xfs/xfs_log.c. Return address = 0xffffffffa009fed5
[ 5.024394] XFS (xvda1): Log I/O Error Detected. Shutting down filesystem
[ 5.024401] XFS (xvda1): Please umount the filesystem and rectify the problem(s)
[ 5.024411] XFS (xvda1): xfs_log_force: error 5 returned.
[ 5.024419] XFS (xvda1): xfs_do_force_shutdown(0x1) called from line 1033 of file /build/linux-s5x2oE/linux-3.2.46/fs/xfs/xfs_buf.c. Return address = 0xffffffffa005f8a7

this strange error simply means that the underlying device doesn’t provide barrier support! The simple, quick fix (once you know it) is to specify in the fstab nobarrier option. I think that I found this problem because in Linux 3.2 the barrier option is enabled by default, while it was off in previous kernel versions.

Very slow Hadoop on PowerEdge R815

We have a little internal Hadoop cluster for development and testing, two very powerful Dell PowerEdge R815 with Debian and a bunch of Xen VMs to reproduce a production environment. Problem is that the cluster, even with a relatively small amount of data, was sloooow. And when I say slow I mean almost unusable for Hadoop development (a mapreduce on a small dataset took 5x more than on the big one in production). Even an insignificant

$ hadoop fs -ls

took more than 4s to list the content of HDFS. strace was showing tons of wait() syscalls for no apparent reason, while in the production system the same operation takes 1s and no wait() at all.
After trying almost everything (even without Xen and running Hadoop on the bare metal), I changed by chance a Power Management option in the R815 BIOS. By default it was set to Active Power Controller. Changing it to Maximum Performance did the trick! The ls now takes about a second, just like the production environment. My guess is that probably the default value (which is some kind of automagical load detection) wasn’t able to see that the machine really needed power when running Hadoop, leaving the CPU underclocked to save energy. Maximum power probably is not so green but it solved the problem

PHP 5.3 max_input_vars and big forms

Starting from PHP 5.3.9 there is a brand new php.ini option: max_input_vars. You may read in the PHP documentation about it. But what you don’t probably now is that if you are using the Suhoshin patch (for example if you’re using dotdeb packages), then you need to tweak 2 other variables to increase the max number of POST variables accepted by your PHP.

So, if you want to increase this number to, say, 3000 from the default number which is 1000, you have to put in your php.ini these lines:


max_input_vars = 3000
suhosin.post.max_vars = 3000
suhosin.request.max_vars = 3000

The other suitable option is to fix your form and make it saner :)

*fdisk and the 1.5TB partition size limit

If you have a large volume (like a disk array or the next-generation SATA disk) and you’re trying to create a single, giant partition for whatever reason, you should know that fdisk (for DOS compatibility reasons, I suppose) cannot create partitions bigger than ~1.5TB, although it won’t throw you any error or complain So if you want to create bigger partition, use parted (or one of its frontend). The limitation applies to fdisk, cfdisk and all the *fdisk family.

EDIT: in parted you have to change the disk partition type to something like gpt or you still won’t be able to create a partition bigger than 1.5TB.
Nonetheless, once the very large partition is created, I still haven’t found a way to format it, mount it and get all my terabytes. I’m still stuck with 1.5TB. Look at this:

server:~# parted /dev/mapper/mpath1
GNU Parted 1.7.1
Using /dev/mapper/mpath1
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) print
Disk /dev/mapper/mpath1: 6000GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Number Start End Size File system Name Flags
1 17.4kB 6000GB 6000GB ext2
server:~# df -h |grep mpath1p1
/dev/mapper/mpath1p1 1.5T 5.1M 1.5T 1% /mnt/logs

I’m stuck with this. Any idea, dear lazyweb?

FreeBSD6 and nfsd gotchas

If you’re using FreeBSD6 as NFS server, you may find useful these quick tips related to /etc/export syntax, because otherwise you will be stuck with a generic

mountd[321]: bad exports list

in your logs.
So, what went wrong here?
One possible cause is that in your exports file you’re trying to export as shared NFS resource a symlink and not a real directory. NFS doesn’t like it at all and will simply no work.
One other, curious, glitch I found is that if you have two resource in two separate lines with the same options, the latter will fail.
Example of /etc/exports:

/path/share1
/path/share2 -network 192.168.1.0
/path/share3 -network 192.168.1.0

in this case share1 and share2 will work, while share3 won’t work and you’ll get a

mountd[321]: can't change attributes for /path/share3
mountd[321]: bad exports list line /path/share3

but if you change the network value in share3 (and only this), it will work!
Maybe there’s an explanation for this (I didn’t read all the exports(5) manpage) but anyway it’s a little bit strange.