MySQL Galera node not starting (aborting with Error 'WSREP: [...]: 60: failed to reach primary view: 60 (Operation timed out)')

11,246

Make sure you start the first node by running the following command:

service mysql start --wsrep-new-cluster

Start the next nodes by running the command:

service mysql start

I get exactly the same errors as your when I forget to add the param --wsrep-new-cluster when I start the first node.

Check this page for details: Starting the cluster

Just a quick edit: I personally use Galera with MariaDB and the commands above work properly. As you use MySQL, you might need to switch mysql with mysqld in the commands above. Try with both.

Share:
11,246
user2642601
Author by

user2642601

Updated on June 08, 2022

Comments

  • user2642601
    user2642601 almost 2 years

    I am trying to setup three Galera nodes on FreeBSD 10 with MySQL 5.6.26 and VirtualBox. When I set up everything and run MySQL, it exits after some time and cannot start properly.

    Here is my log:

    2015-10-22 15:23:24 9402 [Note] WSREP: Read nil XID from storage engines, skipping position init
    2015-10-22 15:23:24 9402 [Note] WSREP: wsrep_load(): loading provider library '/usr/local/lib/libgalera_smm.so'
    2015-10-22 15:23:24 9402 [Note] WSREP: wsrep_load(): Galera 3.5(rXXXX) by Codership Oy <[email protected]> loaded successfully.
    2015-10-22 15:23:24 9402 [Note] WSREP: CRC-32C: using "slicing-by-8" algorithm.
    2015-10-22 15:23:24 9402 [Note] WSREP: Found saved state: 9bfd9448-780a-11e5-a465-e268e80baf6e:-1
    2015-10-22 15:23:24 9402 [Note] WSREP: Passing config to GCS: base_host = 192.168.1.10; base_port = 4567; cert.log_conflicts = no; debug = no; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 1; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /home/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /home/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.listen_addr = 192.168.1.10; gmcast.segment = 0; gmcast.version = 0; ist.recv_addr = 192.168.1.10; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.npvo = false; pc.version = 0; pc.wait_prim 
    2015-10-22 15:23:24 9402 [Note] WSREP: Service thread queue flushed.
    2015-10-22 15:23:24 9402 [Note] WSREP: Assign initial position for certification: 4, protocol version: -1
    2015-10-22 15:23:24 9402 [Note] WSREP: wsrep_sst_grab()
    2015-10-22 15:23:24 9402 [Note] WSREP: Start replication
    2015-10-22 15:23:24 9402 [Note] WSREP: Setting initial position to 9bfd9448-780a-11e5-a465-e268e80baf6e:4
    2015-10-22 15:23:24 9402 [Note] WSREP: protonet asio version 0
    2015-10-22 15:23:24 9402 [Note] WSREP: Using CRC-32C (optimized) for message checksums.
    2015-10-22 15:23:24 9402 [Note] WSREP: backend: asio
    2015-10-22 15:23:24 9402 [Note] WSREP: GMCast version 0
    2015-10-22 15:23:24 9402 [Note] WSREP: (b08a4d6e-78b7-11e5-80bf-12866e73025e, 'tcp://192.168.1.10:4567') listening at tcp://192.168.1.10:4567
    2015-10-22 15:23:24 9402 [Note] WSREP: (b08a4d6e-78b7-11e5-80bf-12866e73025e, 'tcp://192.168.1.10:4567') multicast: , ttl: 1
    2015-10-22 15:23:24 9402 [Note] WSREP: EVS version 0
    2015-10-22 15:23:24 9402 [Note] WSREP: PC version 0
    2015-10-22 15:23:24 9402 [Note] WSREP: gcomm: connecting to group 'test', peer '192.168.1.10:,192.168.1.20:,192.168.1.30:'
    2015-10-22 15:23:27 9402 [Warning] WSREP: no nodes coming from prim view, prim not possible
    2015-10-22 15:23:27 9402 [Note] WSREP: view(view_id(NON_PRIM,b08a4d6e-78b7-11e5-80bf-12866e73025e,1) memb {
        b08a4d6e-78b7-11e5-80bf-12866e73025e,0
    } joined {
    } left {
    } partitioned {
    })
    2015-10-22 15:23:27 9402 [Warning] WSREP: last inactive check more than PT1.5S ago (PT3.6479S), skipping check
    2015-10-22 15:23:57 9402 [Note] WSREP: view((empty))
    2015-10-22 15:23:57 9402 [ERROR] WSREP: failed to open gcomm backend connection: 60: failed to reach primary view: 60 (Operation timed out)
         at gcomm/src/pc.cpp:connect():141
    2015-10-22 15:23:57 9402 [ERROR] WSREP: gcs/src/gcs_core.c:gcs_core_open():202: Failed to open backend connection: -60 (Operation timed out)
    2015-10-22 15:23:57 9402 [ERROR] WSREP: gcs/src/gcs.c:gcs_open():1291: Failed to open channel 'test' at 'gcomm://192.168.1.10,192.168.1.20,192.168.1.30': -60 (Operation timed out)
    2015-10-22 15:23:57 9402 [ERROR] WSREP: gcs connect failed: Operation timed out
    2015-10-22 15:23:57 9402 [ERROR] WSREP: wsrep::connect(gcomm://192.168.1.10,192.168.1.20,192.168.1.30) failed: 7
    2015-10-22 15:23:57 9402 [ERROR] Aborting
    
    2015-10-22 15:23:57 9402 [Note] WSREP: Service disconnected.
    2015-10-22 15:23:58 9402 [Note] WSREP: Some threads may fail to exit.
    2015-10-22 15:23:58 9402 [Note] Binlog end
    2015-10-22 15:23:58 9402 [Note] /usr/local/libexec/mysqld: Shutdown complete
    
    151022 15:23:58 mysqld_safe mysqld from pid file /home/mysql/galera1.pid ended
    

    Part of my.cnf regarding wsrep config:

    wsrep_provider=/usr/local/lib/libgalera_smm.so
    wsrep_cluster_name="test"
    wsrep_cluster_address="gcomm://192.168.1.10,192.168.1.20,192.168.1.30"
    wsrep_slave_threads=8
    wsrep_node_address = "192.168.1.10"
    wsrep_sst_receive_address = "192.168.1.10"
    wsrep_node_incoming_address = "192.168.1.10"
    wsrep_provider_options = "gmcast.listen_addr=192.168.1.10;gcache.size=128M;ist.recv_addr=192.168.1.10"
    wsrep_auto_increment_control=1
    wsrep_retry_autocommit=0
    wsrep_max_ws_size=3741824
    wsrep_max_ws_rows=56000
    wsrep_certify_nonPK=1
    wsrep_convert_LOCK_to_trx=0
    wsrep_sst_donor=galera1
    wsrep_sst_donor_rejects_queries=1
    
    • Node 1 - 192.168.1.10
    • Node 2 - 192.168.1.20
    • Node 3 - 192.168.1.30

    The above output is from node 1.

    The networking between the nodes is working properly, so I can't seem to find a reason for this not to work.