Infiniband Verifying that RDMA is working

6,485

It turns out this has been seen before. I don't like the answer because it seems to sweep it under the rug, but it is an answer nontheless:

http://linuxtoolkit.blogspot.com/2013/01/errors-when-running-doing-ib-testing.html

Share:
6,485

Related videos on Youtube

Ivan
Author by

Ivan

Updated on September 18, 2022

Comments

  • Ivan
    Ivan almost 2 years

    I have two identical computers with Mellanox cards connected to each other through a cable. No switch. Using opensm.

    I have run several tests, including ping_pong tests, ibping, etc. They all seem to work. However, when I run this test, it comes back with what appears to be an error, which I don't understand.

    I did tell the firewall

    sudo iptables -I INPUT -p tcp -s 192.168.0.0/24  -j ACCEPT -m comment --comment "Allow Infiniband"
    
    sudo iptables -I INPUT -p udp -s 192.168.0.0/24  -j ACCEPT -m comment --comment "Allow Infiniband"
    

    Any help deciphering and a possible solution would be great.

    [idf@node2 Downloads]$ sudo ib_write_bw
    
    ************************************
    * Waiting for client to connect... *
    ************************************
    ---------------------------------------------------------------------------------------
                        RDMA_Write BW Test
     Dual-port       : OFF      Device         : mlx4_0
     Number of qps   : 1        Transport type : IB
     Connection type : RC       Using SRQ      : OFF
     CQ Moderation   : 100
     Mtu             : 4096[B]
     Link type       : IB
     Max inline data : 0[B]
     rdma_cm QPs     : OFF
     Data ex. method : Ethernet
    ---------------------------------------------------------------------------------------
     local address: LID 0x01 QPN 0x004a PSN 0xa79f2e RKey 0x50042a04 VAddr 0x007f1682804000
     remote address: LID 0x02 QPN 0x004a PSN 0x5ef914 RKey 0x40042502 VAddr 0x007f94f9ce9000
    ---------------------------------------------------------------------------------------
     #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]   MsgRate[Mpps]
    ethernet_read_keys: Couldn't read remote address
     Unable to read to socket/rdam_cm
     Failed to exchange data between server and clients
    [idf@node2 Downloads]$
    
    
    [idf@node1 python]$ sudo ib_write_bw 192.168.0.1
    ---------------------------------------------------------------------------------------
                        RDMA_Write BW Test
     Dual-port       : OFF      Device         : mlx4_0
     Number of qps   : 1        Transport type : IB
     Connection type : RC       Using SRQ      : OFF
     TX depth        : 128
     CQ Moderation   : 100
     Mtu             : 4096[B]
     Link type       : IB
     Max inline data : 0[B]
     rdma_cm QPs     : OFF
     Data ex. method : Ethernet
    ---------------------------------------------------------------------------------------
     local address: LID 0x02 QPN 0x004a PSN 0x5ef914 RKey 0x40042502 VAddr 0x007f94f9ce9000
     remote address: LID 0x01 QPN 0x004a PSN 0xa79f2e RKey 0x50042a04 VAddr 0x007f1682804000
    ---------------------------------------------------------------------------------------
     #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]   MsgRate[Mpps]
    Conflicting CPU frequency values detected: 1600.000000 != 1733.000000
    Can't produce a report
    [idf@node1 python]$ 
    
    • haggai_e
      haggai_e about 9 years
      Did you try one of the ibv_* tests? (like ibv_rc_pingpong?)
    • Ivan
      Ivan about 9 years
      yes all those tests work.
  • hookenz
    hookenz about 9 years
    Yes but if you don't have ib_uverbs and rdma_ucm loaded by the kernel some tools work (i.e. ones that use send/recv, but rdma_send,recv don't).
  • hookenz
    hookenz about 9 years
    That's because CPU frequency scaling is enabled. Set the CPU to performance mode in the BIOS and that error will go away. Does lsmod show up rdma_ucm and the other modules I mentioned in my answer. If it doesn't then this is your issue. modprobe them on both machines and try again. And make sure all the required packages are installed.
  • haggai_e
    haggai_e about 9 years
    Without ib_uverbs you wouldn't see mlx4_0 in user space tools like ib_write_bw
  • hookenz
    hookenz about 9 years
    @haggai all I can say is that I've had this issue before. Although under Ubuntu. I'm saying to Ivan, ensure all the required packages and kernel modules are installed then it should just work.
  • Ivan
    Ivan about 9 years
    Gotcha. Let me see if I can change that...