Close

25th September 2017

Objective 7.4 – Troubleshoot Virtual Machines

Onwards through section 7 moving to, Objective 7.4 – Troubleshoot Virtual Machines.

As always, this article is linked to from the main VCP6.5-DCV Blueprint.

Happy Revision

Simon

Objective 7.4 – Troubleshoot Virtual Machines

Monitor CPU and memory usage

Hosts

The hosts charts contain information about CPU, disk, memory, network, and storage usage for hosts. The help topic for each chart contains information about the data counters displayed in that chart. The counters available are determined by the collection level set for vCenter Server.

CPU (%)

The CPU (%) chart displays CPU usage for the host.

CPU (MHz)

The CPU (MHz) chart displays CPU usage for the host.

CPU Usage

The CPU Usage chart displays CPU usage of the 10 virtual machines on the host with the most CPU usage.

Memory (%)

The Memory (%) chart displays host memory usage.

Memory (Balloon)

The Memory (Balloon) chart displays balloon memory on a host.

Memory (MBps)

The Memory (MBps) chart displays the swap in and swap out rates for a host.

Memory (MB)

The Memory (MB) chart displays memory data counters for hosts.

Memory Usage

The Memory Usage chart displays memory usage for the 10 virtual machines on the host with the most memory usage.

Virtual Machines

The virtual machine charts contain information about CPU, disk, memory, network, storage, and fault tolerance for virtual machines. The help topic for each chart contains information about the data counters displayed in that chart. The counters available are determined by the collection level set for vCenter Server.

CPU (%)

The CPU (%) chart displays virtual machine CPU usage and ready values.

CPU Usage (MHz)

The CPU Usage (MHz) chart displays virtual machine CPU usage.

Memory (%)

The Memory (%) chart monitors virtual machine memory usage.

Memory (MB)

The Memory (MB) chart displays virtual machine balloon memory.

Memory (MBps)

The Memory (MBps) chart displays virtual machine memory swap rates.

Memory (MB)

The Memory (MB) chart displays memory data counters for virtual machines.

Identify and isolate CPU and memory contention issues

When Troubleshooting CPU and Memory contention issues, you are looking for the following;

For CPU contention as an administrator of the system you should be primarily interested in the ready time values also referenced as %RDY.  This measurement indicates the time that the VM was trying to process threads but was unable to be scheduled by the hypervisor.  In an ideal world, ready time would be zero, it will seldom be zero, I would suggest that if you are seeing consistent ready time, or ready time that is leaning between 5 and 10% that could well be indicative of an oversubscribed host or a misconfigured VM.

Moving onto CPU load, ESXTOP gives us a value to help identify the hosts CPU load.  For example with this metric 1.00 equates to 100% utilised, 2.00 equates to 200% utilised and 0.5 equates to 50% utilised.  Each value is reporting the same metric, over a different reporting period.  The left most is in effect the realtime value.

To diagnose memory contention at the VM you can examine the memory swapinrate, swapoutrate and ballooning. If a virtual machine has high ballooning or swapping, check the amount of free physical memory on the host.

Recognize impact of using CPU/memory limits, reservations and shares

When available resource capacity does not meet the demands of the resource consumers (and virtualization overhead), administrators might need to customize the amount of resources that are allocated to virtual machines or to the resource pools in which they reside.

Use the resource allocation settings (shares, reservation, and limit) to determine the amount of CPU, memory, and storage resources provided for a virtual machine. In particular, administrators have several options for allocating resources.

  • Reserve the physical resources of the host or cluster.
  • Set an upper bound on the resources that can be allocated to a virtual machine.
  • Guarantee that a particular virtual machine is always allocated a higher percentage of the physical resources than other virtual machines.

Resource Allocation Shares

Shares specify the relative importance of a virtual machine (or resource pool). If a virtual machine has twice as many shares of a resource as another virtual machine, it is entitled to consume twice as much of that resource when these two virtual machines are competing for resources.

Resource Allocation Reservation

A reservation specifies the guaranteed minimum allocation for a virtual machine.

Resource Allocation Limit

Limit specifies an upper bound for CPU, memory, or storage I/O resources that can be allocated to a virtual machine.

Resource Allocation Settings Suggestions

Select resource allocation settings (reservation, limit and shares) that are appropriate for your ESXi environment.

Admission Control

When you power on a virtual machine, the system checks the amount of CPU and memory resources that have not yet been reserved. Based on the available unreserved resources, the system determines whether it can guarantee the reservation for which the virtual machine is configured (if any). This process is called admission control.

Describe and differentiate critical performance metrics

Describe and differentiate common metrics, including:

Memory, CPU, Network, Storage

You could argue that performance metrics are only critical if they can help you diagnose the particular issue you are encountering, otherwise the information collected is non-critical.

https://docs.vmware.com/en/VMware-vSphere/6.5/com.vmware.vsphere.monitoring.doc/GUID-FF7F87C7-91E7-4A2D-88B5-E3E04A76F51B.html

The above link will take you to an in-depth analysis of each of the available metrics.

Monitor performance through esxtop

You can run the esxtop utility using the ESXi Shell to communicate with the management interface of the ESXi host. You must have root user privileges.

Type the command, using the options you want:

esxtop [-h] [-v] [-b] [-s] [-a] [-c config file] [-R vm-support_dir_path]  [-d delay] [-n iterations]

The esxtop utility reads its default configuration from .esxtop50rc on the ESXi system. This configuration file consists of nine lines.

The first eight lines contain lowercase and uppercase letters to specify which fields appear in which order on the CPU, memory, storage adapter, storage device, virtual machine storage, network, interrupt, and CPU power panels. The letters correspond to the letters in the Fields or Order panels for the respective esxtop panel.

The ninth line contains information on the other options. Most important, if you saved a configuration in secure mode, you do not get an insecure esxtop without removing the s from the seventh line of your .esxtop50rc file. A number specifies the delay time between updates. As in interactive mode, typing c, m, d, u, v, n, I, or p determines the panel with which esxtop starts.

Interactive Mode Command-Line Options

I would like to say that you won’t need to know these commands for any exam… But I can’t.

h or?

Displays a help menu for the current panel, giving a brief summary of commands, and the status of secure mode.

space

Immediately updates the current panel.

^L

Erases and redraws the current panel.

f or F

Displays a panel for adding or removing statistics columns (text boxes) to or from the current panel.

o or O

Displays a panel for changing the order of statistics columns on the current panel.

#

Prompts you for the number of statistics rows to display. Any value greater than 0 overrides automatic determination of the number of rows to show, which is based on window size measurement. If you change this number in one resxtop (or esxtop) panel, the change affects all four panels.

s

Prompts you for the delay between updates, in seconds. Fractional values are recognized down to microseconds. The default value is five seconds. The minimum value is two seconds. This command is not available in secure mode.

W

Write the current setup to an esxtop (or resxtop) configuration file. This is the recommended way to write a configuration file. The default filename is the one specified by -c option, or ~/.esxtop50rc if the -c option is not used. You can also specify a different filename on the prompt generated by this W command.

q

Quit the interactive mode.

c

Switch to the CPU resource utilization panel.

p

Switch to the CPU Power utilization panel.

m

Switch to the memory resource utilization panel.

d

Switch to the storage (disk) adapter resource utilization panel.

u

Switch to storage (disk) device resource utilization screen.

v

Switch to storage (disk) virtual machine resource utilization screen.

n

Switch to the network resource utilization panel.

i

Switch to the interrupt panel.

Troubleshoot Enhanced vMotion Compatibility (EVC) issues

vCenter Server performs compatibility checks before it allows migration of running or suspended virtual machines to ensure that the virtual machine is compatible with the target host.

vMotion transfers the running state of a virtual machine between underlying ESXi systems. Live migration requires that the processors of the target host provide the same instructions to the virtual machine after migration that the processors of the source host provided before migration. Clock speed, cache size, and number of cores can differ between source and target processors. However, the processors must come from the same vendor class (AMD or Intel) to be vMotion compatible.

Migrations of suspended virtual machines also require that the virtual machine be able to resume execution on the target host using equivalent instructions.

When you initiate a migration with vMotion or a migration of a suspended virtual machine, the Migrate Virtual Machine wizard checks the destination host for compatibility and produces an error message if compatibility problems will prevent migration.

The CPU instruction set available to the operating system and to applications running in a virtual machine is determined at the time that a virtual machine is powered on. This CPU feature set is based on the following items:

  • Host CPU family and model
  • Settings in the BIOS that might disable CPU features
  • ESX/ESXi version running on the host
  • The virtual machine’s compatibility setting
  • The virtual machine’s guest operating system

To improve CPU compatibility between hosts of varying CPU feature sets, some host CPU features can be hidden from the virtual machine by placing the host in an Enhanced vMotion Compatibility (EVC) cluster.

Compare and contrast Overview and Advanced Charts

The overview performance charts display the most common metrics for an object in the inventory. Use these charts to monitor and troubleshoot performance problems.

The metrics provided in Overview performance charts are a subset of those collected for hosts and the vCenter Server. For a complete list of all metrics collected by hosts and the vCenter Server, see the vSphere API Reference.

Clusters

The cluster charts contain information about CPU, disk, memory, and network usage for clusters. The help topic for each chart contains information about the data counters displayed in that chart. The collection level set for vCenter Server determines the available counters.

Data centers

The data center charts contain information about CPU, disk, memory, and storage usage for data centers. The help topic for each chart contains information about the data counters displayed in that chart. The counters available are determined by the collection level set for vCenter Server.

Datastores and Datastore Clusters

The datastore charts contain information about disk usage for datastores or the datastores that are part of a cluster. The help topic for each chart contains information about the data counters displayed in that chart. The counters available are determined by the collection level set for vCenter Server.

Hosts

The hosts charts contain information about CPU, disk, memory, network, and storage usage for hosts. The help topic for each chart contains information about the data counters displayed in that chart. The counters available are determined by the collection level set for vCenter Server.

Resource Pools

The resource pool charts contain information about CPU and memory usage for resource pools. The help topic for each chart contains information about the data counters displayed in that chart. The counters available are determined by the collection level set for vCenter Server.

vApps

The vApp charts contain information about CPU and memory usage for vApps. The help topic for each chart contains information about the data counters displayed in that chart. The counters available are determined by the collection level set for vCenter Server.

Virtual Machines

The virtual machine charts contain information about CPU, disk, memory, network, storage, and fault tolerance for virtual machines. The help topic for each chart contains information about the data counters displayed in that chart. The counters available are determined by the collection level set for vCenter Server.

Use advanced charts, or create your own custom charts, to see more performance data. Advanced charts can be useful when you are aware of a problem but need more statistical data to pinpoint the source of the trouble.

Advanced charts include the following features:

  • More information. Hover over a data point in a chart and details about that specific data point are displayed.
  • Customizable charts. Change chart settings. To create your own charts, save custom settings.
  • Export to spreadsheet.
  • Save to image file or spreadsheet.