What is the correct way to remediate an HA Cluster?

5,802

Solution 1

If you're not using DRS then you'll have to manually evacuate your powered on VM's to another host in the cluster before VUM will remediate the host. It's also recommended that if you're using HA Admission Control, Distributed Power Management or Fault Tolerance that you disable those features before you remediate the host.

In short, migrate (vMotion) your powered on VM's to another host in the cluster, remediate the host, then migrate the VM's back.

Solution 2

Disable the right options in your host/cluster remediation options screens:

I typically disable admission control, fault tolerance, and DPM (but who uses that?)

I may manually vMotion a few VM's if the process doesn't seem to kick-off.

Be patient. It takes up to 10-15 minutes per host, depending on your connectivity.

enter image description here

Share:
5,802

Related videos on Youtube

Ashok Kumawat
Author by

Ashok Kumawat

A mind at work. Microsoft MVP - Developer Technologies Lead of NUnit docs https://docs.nunit.org Trainer, Speaker, Blogger Interested in working together? https://seankilleen.com/hire/

Updated on September 18, 2022

Comments

  • Ashok Kumawat
    Ashok Kumawat over 1 year

    Background / Goal

    • I have a VMWare HA cluster for production Machines with two hosts.
    • It is currently set up so that it can account for the failure of up to one host. It does not use DRS.
    • I need to remediate both of these servers to apply patches. I would like to do this with zero downtime.

    Questions

    • Can I vMotion the VMs in the cluster specifically to another host in the cluster and then take down a server?
    • What is the best / recommended way to remediate servers in a HA configuration to avoid downtime?
    • Ashok Kumawat
      Ashok Kumawat over 11 years
      @ewwhite, We're on ESXi 5.1 -- have 2 standalone hosts and two in the HA cluster. We have an older academic license on the standalone hosts and an Enteprise license on the HA cluster hosts (can do storage vMotion, etc.)
  • joeqwerty
    joeqwerty over 11 years
    The key in the question is that this is not a DRS cluster. Without DRS enabled the running VM's will not be automatically migrated to the remaining host and the remediation will hang waiting for the powered on VM's to be powered off. During the remediation there are three options for the VM's: Power Off, Suspend, or Do Not Change Power State. In order to remediate a non-DRS cluster host the VM's need to be powered off or manually migrated to another host. To achieve the goal of no downtime the OP would need to manually migrate the powered on VM's prior to remediation.
  • ewwhite
    ewwhite over 11 years
    Right, which is when I'd manually move the VMs if the process doesn't start to move. But that makes sense that it wouldn't work in a non-DRS setup.
  • Oli
    Oli over 11 years
    ok fair enough, so it is a simple case of vmotion the machines off first, as long as vmotion is set up and running.. vmware-documentation
  • Ashok Kumawat
    Ashok Kumawat over 11 years
    Thanks, both of you. If I have two hosts in an HA configuration of say, server 1 and 2, can I manually force the VMs to migrate to server 2 and then take down server 1 for maintenance, or will this still cause the VMs to enter a failover scenario? Wondering if I can migrate to a specific host within a cluster or whether I'd have to migrate them onto a host outside of the cluster, essentially.
  • Ashok Kumawat
    Ashok Kumawat over 11 years
    Right... in my case, I'm wondering if I can vmotion them to a specific host within the cluster so that it wouldn't be affected by a failover scenario, or if I need to vmotion them out of the cluster entirely
  • ewwhite
    ewwhite over 11 years
    You can force them as long as you don't have anything requiring spare host capacity.
  • joeqwerty
    joeqwerty over 11 years
    I'm not following your line of thinking and I think you're making this more complicated than it actually is. You only have two hosts. You want to maintain uptime on all of your VM's while you remediate each host. Here are the steps: Migrate your running VM's from one host to the other host, remediate the evacuated host, migrate your running VM's back to the original host, repeat the process for the second host. What failover scenario are you referring to? How is a failover related to host remediation? I have two vSphere 5.1 hosts in an HA cluster and I do this all of the time.
  • Ashok Kumawat
    Ashok Kumawat over 11 years
    Thanks, this is exactly the process I was looking for. Much appreciated!
  • joeqwerty
    joeqwerty over 11 years
    Glad to help...
  • ewwhite
    ewwhite over 11 years
    It's a "forced migration" because DRS isn't handling the move, nor is the VMWare update manager. You (the operator) are forcing your VM's to move to another host. Maybe I should have said "manual"?