In the last post on this subject I discussed an issue where backing up a virtual VEAgent/Backup/Proxy host itself leads to orphaned disks on the host (when using a HotAdd backup transport method), which inevitably leads to the host itself becoming unstable and crashing – thus causing any subsequent VM backups relying on he backup host to fail!

The workaround was simply to not run backups against your virtual VEAgent host, or to make your VEAgent host physical.  HP has since come back to me with an actual fix.

As of version 8.00, Data Protector uses VDDK 5.0U1, and our issue is related to the following Known VMware Issue:

http://www.vmware.com/support/developer/vddk/VDDK-510-ReleaseNotes.html
 
Hang in connect or cleanup due to intermittent race condition. 
Windows processes can sometimes hang when calling VixDiskLib_ConnectEx() or VixDiskLib_Cleanup() due to a race condition while loading or unloading libraries. There is no known workaround. A fix has been identified, and will be available in the VDDK 5.1 first update release
 
If this is the case then it will be fixed in a newer version of the VDDK, the reason why I suspect it might be related is because if the Vmware VixDiskLib_Cleanup() hangs then the disks will not be removed from the backup host.  During a HotAdd backup a redo log is created for HotAdded disks, so in this case HotAdd failed to properly clean up redo logs and left the VM disks in your backup host.

The Fix:

  1. From the Backup host: remove all the folders inside of: C:\Windows\TEMP\vmware-SYSTEM
  2. Then remove the orphaned disks on the host corresponding to other virtual machines in the backup host using vCenter (not host disk management).  After this, reboot the host.
  3. If you have a related case open with HP, you can optionally add the variable: vixDiskLib.transport.LogLevel="6" to "C:\ProgramData\OmniBack\Config\client\ vepa_vddk.config".  This variable will log more information about the vddk transport method used during the backup job and this is the only way to determine why the HotAdd doesn’t remove the disks from the backup host.  Once the issue is duplicated with vixDiskLib.transport.LogLevel="6", collect the session report when the disk was not removed so we can get the information needed to determine the reason of this condition. 
  4. On backup host, open cmd prompt and type following:
  • diskpart
  • san policy=onlineAll 
  • Note: As per Vmware documentation, setting the san policy thus is a best practice when using HotAdd transport method.