keskiviikko 27. helmikuuta 2013

Troubleshoot performance part 2 - using vmstat to see if it is the CPU or IO

Is it CPU or IO with vmstat?

Vmstat provides a coarse overview of the health of the system. You need to be root or have system admin rights to use this tool. Usually you can see from vmstat report whether the problem is at the CPU or at IO side.

Vmstat takes too arguments – interval and count. Interval tells for how many seconds we calculate and report the measured values. Vmstat shows then the averages for that period. The count tells how many reports to run and print out.
  • Procs – r: Total number of processes waiting to run
  • Procs – b: Total number of busy processes
  • Memory – swpd: Used virtual memory
  • Memory – free: Free virtual memory
  • Memory – buff: Memory used as buffers
  • Memory – cache: Memory used as cache.
  • Swap – si: Memory swapped from disk (for every second)
  • Swap – so: Memory swapped to disk (for every second)
  • IO – bi: Blocks in. i.e blocks received from device (for every second)
  • IO – bo: Blocks out. i.e blocks sent to the device (for every second)
  • System – in: Interrupts per second
  • System – cs: Context switches
  • CPU – us, sy, id, we, st: CPU user time, system time, idle time, wait time
[root@soaserver /]# vmstat 2 3

procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------

r b swpd free buff cache si so bi bo in cs us sy id wa st

1 0 0 12682768 147204 1459768 0 0 113 7 172 27 1 0 99 1 0
0 0 0 12682768 147204 1459768 0 0 0 16 1005 308 0 0 100 0 0
0 0 0 12682768 147212 1459764 0 0 0 22 1005 301 0 0 100 0 0
[root@soaserver /]#

First line is summary since the last boot and for our purposes it is best to ignore it as it does not really tell much about the situation at the moment. Some people like to use the first line as indication whether the system is getting worse or better.

The bi/bo fields can be watched especially. They will show how much the system is writing to disk and reading from disk. A high value in either or both indicates that the system is IO bound and you need to figure out what system is behind it.

You can use the user (us), system (sy) and idle time (column id) to see if there is lots of user space processing or kernel level activity or lots idle time in CPUs to indicate whether the CPU is loaded or not. However as these are averages, we can have a situation in a multi-CPU node that one of the CPUs is hot while the other one is free so bear in mind this is a coarse average value here. The first field (# of procs ready to run) tells how many processes have been waiting for an available CPU and that also tells about CPU load or CPU saturation. You may have situations where a number of procs wait up all at the same time and need CPU so they need to queue for a free CPU and for the remainder of the measuring interval (2 secs in our example) CPU is free so it is possible for the CPU to be saturated and still have a high amount of idle time as the vmstat reports averages over the measuring interval.

If there is a lot of si = swap in, so=swap out activity, it indicates that the system is swapping. (meaning that the processes are using more memory than there is physical memory and the system needs to constantly write unused memory blocks to disk and read memory from disk back to main memory). This will kill the performance of the system. Remedy is to add memory or reduce memory consumption of processes.

Links:Lots of good examples from vmstat and other related tools:

For a good video on vmstat that is Solaris based, see this set of videos:

Ei kommentteja:

Lähetä kommentti