After downgrading upgrading from vSphere ESX4.1u2 to ESXi 5.0 we get a performance problems and snapshot problems all over the place.
A lot of VM now needs now some more time for creating a snapshot and vRanger Pro only waits up to 5min and than mark the backup task as failed. If the snapshot creation finishing a short time later this ends in an unwanted Snap which isnt automaticly deleted. Similar problem we have with a few custom quiescing scripts which needs more than 5min do to their job.
We are also ineffected that on a few VMs always two snapshots are created at the same time which ends up that vRanger cant use CBT when performing a LAN-FREE Backup. This problems also exists when performing a snap with quiescing in the vSphere Client. Looks like a VMware problem and a case was opened. But i already hear about this problem from a Quest SE earlier this year.
So.... how can i increase the timeout period for the vranger service?
For all who downgrading upgrading from vSphere 4 to 5 it would be wise to just to upgrade your smallest cluster first or just a few hosts and see what your vRanger 5.3 says. This environment has backuped 120 VMs every day with a success rate of >99%. After upgrading 2/3 of the (13)Hosts the rate drops down to 70% at a first try and the backup window isnt long enough anymore. Largest job of 65 VMs finished in 499min when use the old COS and with LAN-FREE we are down to 926min with 14 failures (after tweaking).
Problems we seen:
- VA Appliance cant be shutdown over vsphere Client (VMware Tools are installed)
- Double Snapsthots
- Snapshots leave behind
- Timeout problems, Failed to create vRanger snapshot, API Call Failure, Error Message: Die zugrunde liegende Verbindung wurde geschlossen: Eine Verbindung, deren Aufrechterhaltung erwartet wurde, ist vom Server geschlossen worden.. , Error Message: Der Objektverweis wurde nicht auf eine Objektinstanz festgelegt., Error Message: Unable to continue Differential backup; the vRanger snapshot was not found. Error Message: RETRY timedout operation timed out [at xtimedwait:416], Error Message: Der Index lag außerhalb des Bereichs. Er muss nicht negativ und kleiner als die Auflistung sein.Parametername: index
- Duration period of running jobs are resetet when performing some work in vRanger (i deleted a batch of non-working Replication jobs). The task now shows a higher runtime than the job.
- Transport mode is not editable (modify is greyed out) so old replication jobs have to recreate
- LAN Mode is "slow" compared to COS and our Server, which use a older Quad Core XEON, is now CPU bound when reaching ~70MB/s incomming traffic and runing up to 10 task in parallel.
- The Name "VA-HotAdd" is not the right term because the work is not performing by the VAs and you have to install the vRanger in a single VM. So scaling looks limited to a single installation and depending of the enviroment the Backup is on wrong site when having a two production site. Otherwise you have to presenting all LUNs to the Host which holds the vRanger VM.
Regards
Joerg