Hyper V Live Migration Only Goes One Way (Error 21502)

7,543

Well, finally found the issue:

I noticed that some of the created cluster networks were not legitimate (ie, they only contained one NIC, or were teamed with a NIC on a different subnet). I had disabled these. I was told by my colleagues that binding on the physical servers could make a difference. I changed these. I verified the cluster, made sure all nodes had both servers listed as possible owners, and to top it off, I had found the "Network for Live Migration" tab under properties for the Virtual Machine Resource.

I had ordered the cluster networks in "Network for Live Migration" in such a way that the Live Migration cluster network was first, followed by all active networks, with the disabled networks at the bottom. No love. Today after changing the binding and seeing no change, I decided to disable the all cluster networks in the Live Migration tab beyond three internal networks (LM, host, Cluster Domain). Now it's working.

Not sure what caused this to begin with. We haven't made any physical changes to the hardware in the last year. This was working at least 4 months ago. Looks like the Cluster manager doesn't always listen to its own settings.

Thanks for the replies on this question.

Share:
7,543

Related videos on Youtube

Insomnia
Author by

Insomnia

Updated on September 18, 2022

Comments

  • Insomnia
    Insomnia almost 2 years

    We've been running into an issue recently with one of our server stacks. Our two 2008 R2 servers are running in a cluster set up to live migrate VMs between eachother in case there is ever a detected fault.

    The servers are the exact same hardware-wise; they were ordered specifically for this purpose. Live migration had been working fine up until a couple months ago when we noticed that VIR001 could not migrate to VIR002. I've looked into this issue and I know that generally it is caused by improperly-named resources, but that doesn't seem to be the case here.

    VIR002 will live migrate any of its hosted VMs over to VIR001. VIR001 will not LM any VMs over to VIR002. Not sure where to start with this, I've noticed a couple Time-Server errors on VIR001, but if the issue was due to a sync problem, wouldn't both servers experience the same issue?

    Right now, looking for ideas on what to check. Thanks,

    (Update: I've ran the Failover Cluster Validation tool and it found no issues. I could not run the Disk validation as our cluster is still online with the cluster. Both servers in question are also set as possible owners for cluster resources)

    • joeqwerty
      joeqwerty over 11 years
      Have you checked the basics for the VM, like the Possible Owners setting?
    • Insomnia
      Insomnia over 11 years
      Do you mean the Preferred owners setting? Preferred owners doesn't seem to make a difference. I have some VMs set up with no owners, both in a list, and individual owners.
    • joeqwerty
      joeqwerty over 11 years
      Also, have you run the cluster validation wizard since this problem started occurring?
    • joeqwerty
      joeqwerty over 11 years
      No, not the Preferred Owners. The Possible Owners under the Advanced Policies tab of the resource properties (under the Services and applications node). If VIR002 isn't selected as a Possible Owner then those resources (the virtual machines) will never fail over to VIR002..
    • Insomnia
      Insomnia over 11 years
      Also, going through the Cluster Manager, I noticed the errors I've been getting again. EventID 1127 - Microsoft-Windows-Failover-Clustering. Cluster interfact Local Area Connection 2 failed, etc. These failing NICs were grouped together in weird ways, the manager seemed to randomly decide which NICs should talk to eachother. I had disabled the Cluster Networks that weren't required or correct. The networks that should be used for migration are still enabled and listed without error. Could be part of the issue?
    • joeqwerty
      joeqwerty over 11 years
      The Preferred Owners designates which hosts are the Preferred Owners of the clustered resource but a clustered resource may still fail over to a non-Preferred Owner if no Preferred Owner is available. A Possible Owner is a host which is allowed be a host for the clustered resource. If a host is not listed as a Possible Owner for the resource then that resource will never be allowed to fail over to that host.
    • Insomnia
      Insomnia over 11 years
      Gotcha. Just to confirm, my cluster HVCluster1 has both servers selected as possible owners. Same with the Cluster Disk 1.