Objective 7.1 – Troubleshoot vCenter Server and ESXi Hosts
So these post are now drawing into the home straight, starting with Section 7 – Troubleshoot a vSphere Deployment. We begin as is traditional with Objective 7.1 – Troubleshoot vCenter Server and ESXi Hosts.
As always this article is linked to from the main VCP6.5-DCV Blueprint.
Happy Revision
Simon
Objective 7.1 – Troubleshoot vCenter Server and ESXi Hosts
Understand VCSA monitoring tool
Appliance Management
There is an Appliance Management portal at https://<VCSA FQDN>:5480. This is independent of the vSphere Web Client if you ever have issues with the web client, it will not affect your ability to get to the Appliance Management page.
Upon login you will see your navigation pane down the left side, with the summary page loading first.Several important options and items appear on this page:
- Buttons to reboot and shutdown the VCSA
- Button to quickly and easily create a vCenter Support Bundle
- Button to create your vCenter Backups
- Health Status widget
From here we can quickly and easily see the status and health of the appliance.
Monitor status of the vCenter Server services
In the vSphere Web Client, you can view the health status of vCenter Server services and nodes.
Verify that the user you use to log in to the vCenter Server instance is a member of the SystemConfiguration.Administrators group in the vCenter Single Sign-On domain.
vCenter Server instances and machines that run vCenter Server services are considered nodes. Graphical badges represent the health status of services and nodes.
- Log in as administrator@your_domain_name to the vCenter Server instance by using the vSphere Web Client.
- On the vSphere Web Client Home page, click System Configuration.
You can view the health status badges for the services and nodes.
- In the Services Health and Nodes Health panes, click the hyperlink next to the health badge to view all services and nodes in this health state.
For example, in the Services Health pane, click the hyperlink of the Warning health status. In the dialog box that pops up, select a service to view more information about the service and attempt to resolve the health issues of the service.
Perform basic maintenance of a vCenter Server database
After your vCenter Server database instance and vCenter Server are installed and operational, perform standard database maintenance processes.
The standard database maintenance processes include the following:
- Monitoring the growth of the log file and compacting the database log file, as needed.
- Scheduling regular backups of the database.
- Backing up the database before any vCenter Server upgrade.
See your database vendor’s documentation for specific maintenance procedures and support.
Monitor status of ESXi management agents
The vCenter Solutions Manager displays the vSphere ESX Agent Manager agents that you use to deploy and manage related agents on ESX/ESXi hosts.
You can use the Solutions Manager to keep track of whether the agents of a solution are working as expected. Outstanding issues are reflected by the solution’s ESX Agent Manager status and a list of issues.
When the status of a solution changes, the Solutions Manager updates the ESX Agent Manager summary status and state. Administrators use this status to track whether the goal state is reached.
The agent health status is indicated by a specific colour.
Red
The solution must intervene for the ESX Agent Manager to proceed. For example, if a virtual machine agent is powered off manually on a compute resource and the ESX Agent Manager does not attempt to power on the agent. The ESX Agent Manager reports this action to the solution, and the solution alerts the administrator to power on the agent.
Yellow
The ESX Agent Manager is actively working to reach a goal state. The goal state can be enabled, disabled, or uninstalled. For example, when a solution is registered, its status is yellow until the ESX Agent Manager deploys the solutions agents to all the specified compute resources. A solution does not need to intervene when the ESX Agent Manager reports its ESX Agent Manager health status as yellow.
Green
A solution and all its agents have reached the goal state.
Determine ESXi host stability issues and gather diagnostics information
Troubleshooting vSphere HA Host States
vCenter Server reports vSphere HA host states that indicate an error condition on the host. Such errors can prevent vSphere HA from fully protecting the virtual machines on the host and can impede vSphere HA’s ability to restart virtual machines after a failure. Errors can occur when vSphere HA is being configured or unconfigured on a host or, more rarely, during normal operation. When this happens, you should determine how to resolve the error, so that vSphere HA is fully operational.
Troubleshooting vSphere Auto Deploy
The vSphere Auto Deploy troubleshooting topics offer solutions for situations when provisioning hosts with vSphere Auto Deploy does not work as expected.
Authentication Token Manipulation Error
Creating a password that does not meet the authentication requirements of the host causes an error.
Active Directory Rule Set Error Causes Host Profile Compliance Failure
Applying a host profile that specifies an Active Directory domain to join causes a compliance failure.
Unable to Download VIBs When Using vCenter Server Reverse Proxy
You are unable to download VIBs if vCenter Server is using a custom port for the reverse proxy.
VMware Technical Support routinely requests diagnostic information from you when a support request is handled. This diagnostic information contains product specific logs, configuration files, and data appropriate to the situation. The information is gathered using a specific script or tool for each product and can include a host support bundle from the ESXi host and vCenter Server support bundle. Data collected in a host support bundle may be considered sensitive. Additionally, as of vSphere 6.5, support bundles can include encrypted information from an ESXi host. For more information on support bundles
To collect ESX/ESXi and vCenter Server diagnostic data:
- Start the vSphere Web Client and log in to the vCenter Server system.
- Under Inventory Lists, select vCenter Servers.
- Click the vCenter Server that contains the ESX/ESXi hosts from which you want to export logs.
- Click the Monitor tab and click System Logs.
- Click Export System Logs.
- Select the ESX/ESXi hosts from which you want to export logs.
- Select the Include vCenter Server and vSphere Web Client logs option. This step is optional.
- Click Next.
- Select the system logs that are to be exported.
- Select Gather performance data to include performance data information in the log files.You can update the duration and interval time between which you want to collect the data.
- Click Next.
- Click Generate Log Bundle. The Download Log Bundles dialog appears when the Generating Diagnostic Bundle task completes.
- Click Download Log Bundle to save it to your local computer.The host or vCenter Server generates .zip bundles containing the log files. The Recent Tasks panel shows the Generate diagnostic bundles task in progress.
- After the download completes, click Finish or generate another log bundle.
To export the events log:
- Select an inventory object.
- Click the Monitor tab, and click Events.
- Click the Export icon.
- In the Export Events window, specify what types of event information you want to export.
- Click Generate CSV Report, and click Save.
- Specify a file name and location and save the file.
Monitor ESXi system health
ou can use the vSphere Web Client to monitor the state of host hardware components, such as CPU processors, memory, fans, and other components.
The host health monitoring tool allows you to monitor the health of a variety of host hardware components including:
- CPU processors
- Memory
- Fans
- Temperature
- Voltage
- Power
- Network
- Battery
- Storage
- Cable/Interconnect
- Software components
- Watchdog
- PCI devices
- Other
The host health monitoring tool presents data gathered using Systems Management Architecture for Server Hardware (SMASH) profiles. The information displayed depends on the sensors available on your server hardware. SMASH is an industry standard specification providing protocols for managing a variety of systems in the data center.
You can monitor host health status either by connecting the vSphere Web Client to a vCenter Server system. You can also set alarms to trigger when the host health status changes.
Locate and analyze vCenter Server and ESXi logs
View System Logs on an ESXi Host
You can use the direct console interface to view the system logs on an ESXi host. These logs provide information about system operational events.
- From the direct console, select View System Logs.
- Press a corresponding number key to view a log.
vCenter Server agent (vpxa) logs appear if the host is managed by vCenter Server.
- Press Enter or the spacebar to scroll through the messages.
- (Optional) : Perform a regular expression search.
Press the slash key (/).
Type the text to find.
Press Enter
The found text is highlighted on the screen.
- Press q to return to the direct console.
View vCenter System Log Entries
- In the vSphere Web Client, navigate to a vCenter Server.
- From the Monitor tab, click System Logs.
- From the drop-down menu, select the log and entry you want to view.
Common Logs
Determine appropriate commands for troubleshooting
I’m not sure there is anything I can put here, if you are planning to sit the exam you either know the commands or can work your way through them.
Troubleshoot common issues, including:
I’m not going to type up every available scenario that you might have to troubleshoot here, there is no way of knowing what might cop up on the exam. I think the best we can do when revising this topic is defer to the vSphere troubleshooting scenarios in the documentation centre
- vCenter Server services
- Identity Sources
- vCenter Server connectivity
- Virtual machine resource contention, configuration and operation
- Platform Services Controller (PSC)
- Problems with installation
- VMware Tools installation
- Fault Tolerant network latency
- KMS connectivity
- vCenter Certification Authority
Identifying Symptoms
Before you attempt to resolve a problem in your implementation, you must identify precisely how it is failing.
The first step in the troubleshooting process is to gather information that defines the specific symptoms of what is happening. You might ask these questions when gathering this information:
- What is the task or expected behavior that is not occurring?
- Can the affected task be divided into subtasks that you can evaluate separately?
- Is the task ending in an error? Is an error message associated with it?
- Is the task completing but in an unacceptably long time?
- Is the failure consistent or sporadic?
- What has changed recently in the software or hardware that might be related to the failure?
Defining the Problem Space
After you identify the symptoms of the problem, determine which components in your setup are affected, which components might be causing the problem, and which components are not involved.
To define the problem space in an implementation of vSphere, be aware of the components present. In addition to VMware software, consider third-party software in use and which hardware is being used with the VMware virtual hardware.
Recognizing the characteristics of the software and hardware elements and how they can impact the problem, you can explore general problems that might be causing the symptoms.
- Misconfiguration of software settings
- Failure of physical hardware
- Incompatibility of components
Break down the process and consider each piece and the likelihood of its involvement separately. For example, a case that is related to a virtual disk on local storage is probably unrelated to third-party router configuration. However, a local disk controller setting might be contributing to the problem. If a component is unrelated to the specific symptoms, you can probably eliminate it as a candidate for solution testing.
Think about what changed in the configuration recently before the problems started. Look for what is common in the problem. If several problems started at the same time, you can probably trace all the problems to the same cause.
Testing Possible Solutions
After you know the problem’s symptoms and which software or hardware components are most likely involved, you can systematically test solutions until you resolve the problem.
With the information that you have gained about the symptoms and affected components, you can design tests for pinpointing and resolving the problem. These tips might make this process more effective.
- Generate ideas for as many potential solutions as you can.
- Verify that each solution determines unequivocally whether the problem is fixed. Test each potential solution but move on promptly if the fix does not resolve the problem.
- Develop and pursue a hierarchy of potential solutions based on likelihood. Systematically eliminate each potential problem from the most likely to the least likely until the symptoms disappear.
- When testing potential solutions, change only one thing at a time. If your setup works after many things are changed at once, you might not be able to discern which of those things made a difference.
- If the changes that you made for a solution do not help resolve the problem, return the implementation to its previous status. If you do not return the implementation to its previous status, new errors might be introduced.
- Find a similar implementation that is working and test it in parallel with the implementation that is not working properly. Make changes on both systems at the same time until few differences or only one difference remains between them.