How to recover Galera cluster node

5,350

Solved.

The crashed node was configured to use a different SST method to the donor node. It previously used Percona xtrabackup and through accident or stupidity the configs were out of date.

Share:
5,350

Related videos on Youtube

Tim
Author by

Tim

Web developer since 1997. PHP, JavaScript, HTML/CSS. Maker of Loco. Interested in localization localisation.

Updated on September 18, 2022

Comments

  • Tim
    Tim over 1 year

    I'm running a three node, multi-master Galera cluster under MariaDB.

    One of the nodes has crashed due to a hardware fault (node3) and for whatever reason this crashed one of the healthy nodes too (node2). So I'm left with one running node (node1) which at this point is the most advanced of the cluster.

    I am waiting for my hosting company to fix the third node, but in the mean time I am unable to restart the second node. When attempting to restart, I get the following errors from the xtrabackup-v2 program as it attempts a state transfer:

    [Warning] WSREP: 1.0 (node1): State transfer to 0.0 (node2) failed: -2 (No such file or directory)
      [ERROR] WSREP: gcs/src/gcs_group.cpp:gcs_group_handle_join_msg():737: Will never receive state. Need to abort.
    

    This will continue endlessly until I stop the mysqld service. I have no idea what the (No such file or directory) refers to.

    The crashed node's grastate.dat looks like this:

    version: 2.1
    uuid:    00000000-0000-0000-0000-000000000000
    seqno:   -1
    safe_to_bootstrap: 0
    

    What can I do to restart the failed node?


    Update 1:

    I have cleared the datadir, and the node will still not join the cluster. The file error seems pertinent, but it doesn't help me find what's wrong.

    State transfer to 0.0 (node2) failed: -2 (No such file or directory)


    Update 2:

    I found additional error information on the donor node:

    SREP_SST: [ERROR] innobackupex not in path: /usr/sbin:/sbin:/usr/ etc..

    Doing which innobackupex shows that this program is missing.