Posts Tagged ‘Enterprise Backups’

Virtual Machine Consolidation Needed – Data Protector Backups

In the last post on this subject I discussed an issue where backing up a virtual VEAgent/Backup/Proxy host itself leads to orphaned disks on the host (when using a HotAdd backup transport method), which inevitably leads to the host itself becoming unstable and crashing – thus causing any subsequent VM backups relying on he backup host to fail!

The workaround was simply to not run backups against your virtual VEAgent host, or to make your VEAgent host physical.  HP has since come back to me with an actual fix.

As of version 8.00, Data Protector uses VDDK 5.0U1, and our issue is related to the following Known VMware Issue:

http://www.vmware.com/support/developer/vddk/VDDK-510-ReleaseNotes.html
 
Hang in connect or cleanup due to intermittent race condition. 
Windows processes can sometimes hang when calling VixDiskLib_ConnectEx() or VixDiskLib_Cleanup() due to a race condition while loading or unloading libraries. There is no known workaround. A fix has been identified, and will be available in the VDDK 5.1 first update release
 
If this is the case then it will be fixed in a newer version of the VDDK, the reason why I suspect it might be related is because if the Vmware VixDiskLib_Cleanup() hangs then the disks will not be removed from the backup host.  During a HotAdd backup a redo log is created for HotAdded disks, so in this case HotAdd failed to properly clean up redo logs and left the VM disks in your backup host.

The Fix:

  1. From the Backup host: remove all the folders inside of: C:\Windows\TEMP\vmware-SYSTEM
  2. Then remove the orphaned disks on the host corresponding to other virtual machines in the backup host using vCenter (not host disk management).  After this, reboot the host.
  3. If you have a related case open with HP, you can optionally add the variable: vixDiskLib.transport.LogLevel="6" to "C:\ProgramData\OmniBack\Config\client\ vepa_vddk.config".  This variable will log more information about the vddk transport method used during the backup job and this is the only way to determine why the HotAdd doesn’t remove the disks from the backup host.  Once the issue is duplicated with vixDiskLib.transport.LogLevel="6", collect the session report when the disk was not removed so we can get the information needed to determine the reason of this condition. 
  4. On backup host, open cmd prompt and type following:
  • diskpart
  • san policy=onlineAll 
  • Note: As per Vmware documentation, setting the san policy thus is a best practice when using HotAdd transport method.

 

Virtual Machine Consolidation Needed – Data Protector workaround

I've come across an issue several times when I try to backup a VEagent backup host itself (aka proxy host).  After some back and forth troubleshooting with HP Software Support, we finally determined that this is a known VMware limitation.  If the proxy host itself is a virtual machine, and the transport method to back it up is HotAdd (only option in Data Protector), then a backup job will result in an extra virtual disk being left over on the proxy host.  This will eventually, if not immediately, leave the host corrupted.  And once the proxy host is corrupted, any future VM backups that rely on it will not run.  Not good!

The only work-around is to NOT backup the proxy host itself using Data Protector, or to make the proxy host a physical machine.  More details on this issue can be found on the VMware support site.

In the event that this scenario has already played out in your environment, you currently have a VEagent client in Data Protector that is inaccessible and unusable, and all other VE backup jobs dont run as a result.  To fix, we do the following.

  1. Create a new VM to serve as a new VEagent backup host.  And add it as a VEagent client in DP GUI.
  2. Normally, you can right click a client in DP to delete it, but if the old VEagent host is unreachable, this will not work.  To remove this old client, delete the related line in the cell_info file on the DP cell manager.  This is found in: "C:\ProgramData\OmniBack\Config\Server\cell" (server 2008).  This will remove the old host from the DP GUI going forward.
  3. Lastly, you're existing VE backup jobs are still going to be referencing the old proxy host.  The old proxy hostname is hardcoded in the various VE barlist files for each of your VE backup jobs.  Open each of these files and swap out the old hostname with your new hostname.  These VE barlist files can be found on the cell manger in: "C:\ProgramData\OmniBack\Config\Server\BarLists\VEAgent" (server 2008).

You're now setup to move forward using a new VEagent backup host.

UPDATE:
There is now a better fix to this issue.  Please see this post:

 

Gathering Data Protector Logs for HP

What follows is the recommended information and logs one should gather when opening a Data Protector case with HP.  Note: the following only applies to a Data Protector 7.01 Windows Environment.

  1. Server Specs 
  2. Data Protector Patch Level
  3. Full Session Report
  4. Extended logs

1) Server Specs

No matter the issue you are having, invariably the server specs for the Cell Manager need to be provided.  This may also apply to your Installation Servers, Backup Host, or any other affected hosts by the issue you are having.  Generally the OS/Build is acceptable, such as: Windows 2008 R2 Enterprise x64.

2) Data Protector Patch Level

As with providing server specs, the patch level of all related systems will usually be requested as well.  Run this command from every host that has the DP inet agent installed:

omnicheck -patches

Copy/Paste the output into the support case for each system related to the issue.

3) Full Session Report

More than likely you will want to include the full session report from the backup job that had issues.  This session report will contain a detailed output of the backup job containing all of the related messages you would see in the DP GUI.  You can find the Session ID by browsing the Reporting context in DP GUI, and selecting your job.  The right pane will look something like this:

[138:742] Backup session "2013/07/22-3" of the backup specification "VEAgent VM-Desktops",backup group "VMWARE" has errors: 19.

In this example, 2013/07/22-3 is the Session ID. Copy/Paste that value into the following cmd, which you will execute on the Cell Manager:

omnidb -session <sessionid> -report > C:\session.txt

Upload the session.txt file to HP Support case.

4) Extended logs

In many cases, the default session report will not provide enough debug information to find what is actually the problem.  What will be needed is some extended logging.  However, to gather this additional debug info, you will have to first enable the debugging, re-produce the issue, and then disable the debugging.  The debug process will typically create dozens of log files, depending on the issue, which you can then zip up and upload to the HP Support case.  

There are two primary debug use cases:

  1. Troubleshooting Data Protector GUI
  2. Troubleshooting DP Backup Jobs that are failing/etc.

Troubleshooting DP GUI

Exit the GUI, and restart it from the MS-DOS prompt in debug mode:

cd \Program Files\Omniback\bin
manager.exe –debug 1-500 yourname.txt

Reproduce the error in the DP GUI and then exit the GUI to stop the debugging.  Depending on the nature of the issue, this will create a debug.txt file on every host related to the issue.  On each host, in the following location, look for long file names starting with OB2DBG and ending with yourname.txt:

(Windows 2003) Program Files\Omniback\tmp        
(Windows 2008) Program Data\Omniback\tmp

Gather all these debug files from each host, zip them up (per host), and upload to HP Support Case.

Troubleshooting Failing Backup Jobs

Depending on the nature of your failing backup job you might deviate from these steps, but this will generally apply to all failing backup jobs:

On the Cell Manager open the CMD prompt and execute the following:

omnisv stop / omnisv start -debug 1-500 yourname.txt

On any other related system(s) with a DP INET agent installed, go to Windows services, stop the Data Protector INET service.  In the start up parameters for the service add the following and then click start:

-debug 1-500 yourname.txt

Reproduce the backup job failure, and exit the DP GUI.  This will create dozens of logs on the Cell Manager and every related system debugging was enabled on.  Before you collect the logs however, be sure to disable the debugging by removing the service startup parameters you added to each system and restart them normally.  Also, don't forget to restart DP Cell Manager without the debug:

omnisv stop / omnisv start

On each host, in the following location, look for long file names starting with OB2DBG and ending with yourname.txt:

(Windows 2003) Program Files\Omniback\tmp        
(Windows 2008) Program Data\Omniback\tmp

Gather all these debug files from each host, zip them up (per host), and upload to HP Support Case.

Finally, you will also want to grab a copy of the cell_info file from the Cell Manager and upload it as well:

(Windows 2003) Program Files\Omniback\Config\Server\cell    
(Windows 2008) ProgramData\Omniback\Config\Server\cell 

Return top