This occurred to me when looking at our Hadoop servers today, lots of our devs use IOWait as an indicator of IO performance but there are better measures. IOWait is a CPU metric, measuring the percent of time the CPU is idle, but waiting for an I/O to complete. Strangely - It is possible to have healthy system with nearly 100% iowait, or have a disk bottleneck with 0% iowait. A much better metric is to look at disk IO directly and you want to find the IOPS (IO Operations Per Second).

Measuring IOPS In linux I like the iostat command, though there are a few ways to get at the info. In debian/ubuntu it is in the sysstat package (ie: sudo apt-get install sysstat)

root@MACHINENAME:/home/deploy# iostat 1
Linux 2.6.24-28-server (MACHINENAME.forward.co.uk)  18/02/11
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
45.51    0.00    1.85    0.62       0.00       52.03

cciss/c0d0     4.00       0.00       40.00          0       40
cciss/c0d1     4.00       0.00       64.00          0       64
cciss/c0d2    12.00       0.00      248.00          0       248
cciss/c0d3     0.00       0.00        0.00          0       0
cciss/c0d4    25.00       0.00      320.00          0       320
cciss/c0d5     0.00       0.00        0.00          0       0
cciss/c0d6    30.00       0.00      344.00          0       344
cciss/c0d7    42.00    3144.00        0.00         3144     0


iostat 1 refreshes everysecond, if you do it over a longer period it will average the results. tps is what you are interested in, Transactions Per Second (ie IOPS). -x will give a more detailed output and separate out reads and writes and let you know how much data is going in and out per second.

What is a good or bad number though? As with most metrics, if the first time you look at it is when you are in trouble then it’s less helpful. You should have an idea of how much IO you typically do, then if you experience issues and are doing 10x that or only getting 1/10 from the disks then you have a good candidate explanation for them.

How much can I expect from my storage? It depends how fast the disks are spinning, and how many there is. As a rule of thumb I assume for a single disk: 7.2k RPM -> ~100 IOPS 10k RPM -> ~150 IOPS 15k RPM -> ~200 IOPS Our hadoop servers were pushing about 70 IOPS to each disk at peak and they are 7.2k ones so that is in line with this estimate.

See here for a breakdown of why these are good estimates for random IOs from a single disk. Interestingly a large amount of it comes from the latency of the platter spinning, which is why SSDs do so well for random IO (Compared to a 15k disk, ~50x for writes, ~200x reads) See also: A concrete example of faster CPU causing higher %iowait while actually doing more transactions here

Extreme Linux Performance Monitoring and Tuning: Part 1 (pdf) and Part 2 (pdf) from ufsdump.org/