Wednesday, November 11, 2009

About Linux swappiness

For the longest time operating systems have been able to handle swap. In short swap extends physical memory with slow diskspace so that applications can use more memory than there is available.
On most unix systems the swap is in a dedicated partition because that has the lowest overhead. Plus you don't risk running out of diskspace when you want to swap, so things are quite predictable and nice. Linux has a very nice knob you can turn to affect the swap policy. It will not avoid swapping (in some situations you will have to), but it will affect how and when swap is used. That knob is /proc/sys/vm/swappiness.
The kernel default is a value of 60. The value can be between 0 and 100 and is effectively a percentage. It is used roughly in the following way:
If all available memory is exhausted (application memory, buffers and filesystem cache) and any memory allocation is requested the kernel needs to free a few pages of memory. It can either swap out application memory or drop some filesystem cache. The "swappiness" knob affects the probability which one is chosen.
This means that at a swappiness of 0 the kernel will try to never swap out a process, and at 100 it will try to always swap out processes and keep the filesystem cache intact. So with the default, if you use more than ca. 40% of your memory for applications and the rest is used as filesystem cache it will already start swapping a bit. The hilarious result is that you may up swapping a lot with lots of memory left - think of a machine with 64GB RAM! If you try to use 32G memory you'll be in swap hell.
That default might have been good with machines with less than 256MB RAM, but with current desktops and servers it is usually not optimal.
Now you might be tempted to tune it down to 0. Avoid swap. Swap is slow. All is good?
Not quite. At 0 your machine will try to avoid swapping until the last moment. Then it will have killed all filesystem cache (so every file operation will hit the disks) and in addition to that you start swapping like a madman. The result is usually a "swap storm" that hits very sudden. At the point where you might need some performance your machine doesn't provide it and might just be unresponsive to your input for a few minutes.
The other end (a value near 100) might make sense for a file server, but then it might be cheaper to just not run extra services on a machine that is very loaded already. I don't really see a usecase for a swappiness of 100 except maybe on machines that are very memory-limited.
On my desktop I've found a swappiness of 10-20 to be the sweet spot. This means that when 80%+ of memory is used by applications the machine will start swapping, but it's a more gradual hit and not an instant kill. And because there's still some filesystem cache the responsiveness for starting new processes (like a login shell ;) ) is still high enough to allow recovery from this pessimal system state.
Still your goal for optimal performance should be to avoid swapping. Disk access is slower than RAM by a factor of 1000 or more!