ESXTOP in quick – Troubleshoot common storage issues

Always remember by these mix of letters : VDP (With) CUMIN , I am sorry but I can’t remember all letters but this way 🙂

v = disk VM

d = disk adapter

p = power states

c = cpu

u = disk device (includes NFS as of 4.0 Update 2)

m = memory
i = interrupts
n = network

Pressing e and write the GID of the Vm to display only related process:
VM to expand/rollup to vscsi device (gid):

V = only show virtual machine worlds
e = Expand/Rollup CPU statistics, show details of all worlds associated with group (GID)
k = kill world, for tech support purposes only!
l = limit display to a single group (GID), enables you to focus on one VM
# = limiting the number of entitites, for instance the top 5

From <http://www.yellow-bricks.com/esxtop/>

I noticed Only M & C you can press Shift + v to display only VMs.

M

C

Pressing F to select what columns to display:

This is for Network display , with * means to display this property only

So what is elected above reflects on displayed columns below.

And so one for others., Also you can batch the output to external file:

esxtop -b -a –d 60 >> output.csv

60=inseconds

Press h to get help about this nice command:

m Memory

C CPU

n Network

d Disk Adapter

u Disk Drive

v Virtual Disk

Metrics and Thresholds

From <http://www.yellow-bricks.com/esxtop/>

Display Metric Threshold Explanation
CPU %RDY 10 Overprovisioning of vCPUs, excessive usage of vSMP or a limit(check %MLMTD) has been set. See Jason’s explanationfor vSMP VMs
CPU %CSTP 3 Excessive usage of vSMP. Decrease amount of vCPUs for this particular VM. This should lead to increased scheduling opportunities.
CPU %SYS 20 The percentage of time spent by system services on behalf of the world. Most likely caused by high IO VM. Check other metrics and VM for possible root cause
CPU %MLMTD 0 The percentage of time the vCPU was ready to run but deliberately wasn’t scheduled because that would violate the “CPU limit” settings. If larger than 0 the world is being throttled due to the limit on CPU.
CPU %SWPWT 5 VM waiting on swapped pages to be read from disk. Possible cause: Memory over commitment.

From <http://www.yellow-bricks.com/esxtop/>

Display Metric Threshold Explanation
MEM MCTLSZ 1 If larger than 0 host is forcing VMs to inflate balloon driver to reclaim memory as host is overcommited.
MEM SWCUR 1 If larger than 0 host has swapped memory pages in the past. Possible cause: Overcommitment.
MEM SWR/s 1 If larger than 0 host is actively reading from swap(vswp). Possible cause: Excessive memory overcommitment.
MEM SWW/s 1 If larger than 0 host is actively writing to swap(vswp). Possible cause: Excessive memory overcommitment.
MEM CACHEUSD 0 If larger than 0 host has compressed memory. Possible cause: Memory overcommitment.
MEM ZIP/s 0 If larger than 0 host is actively compressing memory. Possible cause: Memory overcommitment.
MEM UNZIP/s 0 If larger than 0 host has accessing compressed memory. Possible cause: Previously host was overcommited on memory.
MEM N%L 80 If less than 80 VM experiences poor NUMA locality. If a VM has a memory size greater than the amount of memory local to each processor, the ESX scheduler does not attempt to use NUMA optimizations for that VM and “remotely” uses memory via “interconnect”. Check “GST_ND(X)” to find out which NUMA nodes are used.

From <http://www.yellow-bricks.com/esxtop/>

Display Metric Threshold Explanation
NETWORK %DRPTX 1 Dropped packets transmitted, hardware overworked. Possible cause: very high network utilization
NETWORK %DRPRX 1 Dropped packets received, hardware overworked. Possible cause: very high network utilization

From <http://www.yellow-bricks.com/esxtop/>

Display Metric Threshold Explanation
DISK GAVG 25 Look at “DAVG” and “KAVG” as the sum of both is GAVG.
DISK DAVG 25 Disk latency most likely to be caused by array.
DISK KAVG 2 Disk latency caused by the VMkernel, high KAVG usually means queuing. Check “QUED”.
DISK QUED 1 Queue maxed out. Possibly queue depth set to low. Check with array vendor for optimal queue depth value.
DISK ABRTS/s 1 Aborts issued by guest(VM) because storage is not responding. For Windows VMs this happens after 60 seconds by default. Can be caused for instance when paths failed or array is not accepting any IO for whatever reason.
DISK RESETS/s 1 The number of commands reset per second.
DISK CONS/s 20 SCSI Reservation Conflicts per second. If many SCSI Reservation Conflicts occur performance could be degraded due to the lock on the VMFS

Advertisements

About Ahmad Sabry ElGendi

https://www.linkedin.com/pub/ahmad-elgendi/94/223/559
This entry was posted in Uncategorized, VCAP5-DCA, Vmware. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s