Load balancing MySQL using HAProxy: Got an error reading communication packets?

7,459

Solution 1

These are the reasons given in MySQL docs:

The max_allowed_packet variable value is too small or queries require more memory than you have allocated for mysqld. See Section C.5.2.10, “Packet too large”.

Use of Ethernet protocol with Linux, both half and full duplex. Many Linux Ethernet drivers have this bug. You should test for this bug by transferring a huge file using FTP between the client and server machines. If a transfer goes in burst-pause-burst-pause mode, you are experiencing a Linux duplex syndrome. Switch the duplex mode for both your network card and hub/switch to either full duplex or to half duplex and test the results to determine the best setting.

A problem with the thread library that causes interrupts on reads.

Badly configured TCP/IP.

Faulty Ethernets, hubs, switches, cables, and so forth. This can be diagnosed properly only by replacing hardware.

And, this explains better:

Although they could be a symptom of a larger problem, they can be caused from normal (i.e. unpreventable) network issues.

Even if they're on the same LAN, for a variety of reasons, communication errors may occur between your application server and the database. In the cases of corrupt communications or time-outs, the applications and/or MySQL most likely retries and works and the problem never surfaces or makes itself apparent.

In my experience, the most common sources of these types of messages are from the application (server) flaking out, the application not terminating connections properly, or from latencies in off-site replication.

Quite likely they were happening before you enabled error logging on the MySQL server.

Solution 2

I found that increasing the timeout settings in the haproxy.cfg file solved this error for me. I spent a lot of time checking the my.cnf wait_timeout etc and realised the bottleneck was actually HAProxy.

Share:
7,459

Related videos on Youtube

Greg Petersen
Author by

Greg Petersen

Updated on September 18, 2022

Comments

  • Greg Petersen
    Greg Petersen over 1 year

    I've set up load balancing MySQL slaves using HAProxy via a xinetd. 2 load balancers shared a virtual IP that is managed by Pacemaker:

    crm configure show:

    node SVR120-27148.localdomain
    node SVR255-53192.localdomain
    primitive failover-ip ocf:heartbeat:IPaddr2 \
        params ip="192.168.5.9" cidr_netmask="32" \
        op monitor interval="5s" \
        meta is-managed="true"
    primitive haproxy ocf:heartbeat:haproxy \
        params conffile="/etc/haproxy/haproxy.cfg" \
        op monitor interval="30s" \
        meta is-managed="true"
    colocation haproxy-with-failover-ip inf: haproxy failover-ip
    order haproxy-after-failover-ip inf: failover-ip haproxy
    property $id="cib-bootstrap-options" \
        dc-version="1.0.12-unknown" \
        cluster-infrastructure="openais" \
        no-quorum-policy="ignore" \
        expected-quorum-votes="2" \
        stonith-enabled="false" \
        last-lrm-refresh="1342783084"
    

    /etc/haproxy/haproxy.cfg:

    global
        log 127.0.0.1 local1 debug
        maxconn 4096
        pidfile /var/run/haproxy.pid
        daemon
    
    defaults
        log global
        mode tcp
        option dontlognull 
        retries 3 
        option redispatch
        maxconn 2000
        contimeout 5000
        clitimeout 50000
        srvtimeout 50000
    
    frontend FE_mysql
        bind 192.168.5.9:3307
        default_backend BE_mysql
    
    backend BE_mysql
        mode tcp
        balance roundrobin
        option tcpka
        option httpchk
        #server mysql1 192.168.6.47:3306 weight 1 check port 9199 inter 12000 rise 3 fall 3
        server mysql2 192.168.6.248:3306 weight 1 check port 9199 inter 12000 rise 3 fall 3
        server mysql3 192.168.6.129:3306 weight 1 check port 9199 inter 12000 rise 3 fall 3
    

    My problem is most of time connecting via virtual IP, /var/log/mysqld.log keeps flooding with:

    120719 12:59:46 [Warning] Aborted connection 17237 to db: 'db' user: 'user' host: '192.168.5.192' (Got an error 
    reading communication packets) 
    120719 12:59:49 [Warning] Aborted connection 17242 to db: 'db' user: 'user' host: '192.168.5.192' (Got an error 
    reading communication packets) 
    120719 12:59:52 [Warning] Aborted connection 17248 to db: 'db' user: 'user' host: '192.168.5.192' (Got an error 
    reading communication packets) 
    

    (connection still established)

    192.168.5.192 is the HAProxy's IP address.

    mysql> show global status like 'Aborted%';
    +------------------+-------+
    | Variable_name    | Value |
    +------------------+-------+
    | Aborted_clients  | 53626 |
    | Aborted_connects | 400   |
    +------------------+-------+
    

    I don't think 128M is not enough for max_allowed_packet:

    max_connections = 300
    max_allowed_packet = 128M
    

    _timeout variables:

    mysql> show global variables like '%timeout';
    +----------------------------+----------+
    | Variable_name              | Value    |
    +----------------------------+----------+
    | connect_timeout            | 10       |
    | delayed_insert_timeout     | 300      |
    | innodb_lock_wait_timeout   | 60       |
    | innodb_rollback_on_timeout | OFF      |
    | interactive_timeout        | 3600     |
    | lock_wait_timeout          | 31536000 |
    | net_read_timeout           | 30       |
    | net_write_timeout          | 60       |
    | slave_net_timeout          | 3600     |
    | wait_timeout               | 600      |
    +----------------------------+----------+
    

    Is there anything that can cause this? Does it relate to HAProxy?

    Any thoughts?

    • Admin
      Admin almost 12 years
    • Greg Petersen
      Greg Petersen almost 12 years
      I'm not getting Got packet bigger than... error. Moreover, I'm setting max_allowed_packet to 128M.
    • longneck
      longneck almost 12 years
      what is 192.168.3.87? is that the ip of one of the clients?
    • Greg Petersen
      Greg Petersen almost 12 years
      Yes. Looking deep into the logs, I see that it also happens when connecting via 'real' IP.
  • neobie
    neobie over 4 years
    so what is the solution?
  • neobie
    neobie over 4 years
    what is your timeout values?
  • Wilson Hauck
    Wilson Hauck over 3 years
    Mr. Opata, The timeout connect at 6s is the max time allowed for 'connect' to be successful. If your instance is busy with activity, 6s is not enough time. Default is 10 seconds. 20s would tolerate packet retries much better for your clients and still let them in on first attempt usually. View profile, Network profile for contact info and free downloadable Utility Scripts to assist with performance tuning. Have a GREAT 2021! Stay Safe.