启动opensm

时间:2015-07-09 22:13:44

标签: rdma

我正在使用softRoCE上的Accelio。

Ib devices configured -
# ibv_devices 
    device                 node GUID
    ------              ----------------
    rxe1                821f02fffef91598
    rxe0                d6bed9fffebe94af
error while running the accelio client -
# xio_ow_client 
 =============================================
 Server Address     : 127.0.0.1
 Server Port        : 2061
 Transport      : rdma
 Header Length      : 32
 Data Length        : 32
 Connection Index   : 0
 CPU Affinity       : 0
 Finite run     : 0
 =============================================
**** starting ...
session event: connection error. reason: No such device

# rping -c
rdma_resolve_route: No such device

因此检查了opensm状态 -     #/ etc / init.d / opennsd status     opensm停了     #/ etc / init.d / openmd start     opensm start [FAILED]

# tail -f /var/log/opensm.log 
Jul 09 15:04:45 655213 [AA4F3700] 0x03 -> OpenSM 3.3.7
Jul 09 15:04:45 692960 [AA4F3700] 0x80 -> OpenSM 3.3.7
Jul 09 15:04:45 693149 [AA4F3700] 0x02 -> osm_vendor_init: 1000 pending umads specified
Jul 09 15:04:45 797977 [AA4F3700] 0x80 -> Entering DISCOVERING state
Jul 09 15:04:45 799152 [AA4F3700] 0x02 -> osm_vendor_bind: Binding to port 0xd6bed9fffebe94af
Jul 09 15:04:45 800414 [AA4F3700] 0x01 -> osm_vendor_bind: ERR 5426: Unable to register class 129 version 1
Jul 09 15:04:45 800422 [AA4F3700] 0x01 -> osm_sm_mad_ctrl_bind: ERR 3118: Vendor specific bind failed
Jul 09 15:04:45 800425 [AA4F3700] 0x01 -> osm_sm_bind: ERR 2E10: SM MAD Controller bind failed (IB_ERROR)
Jul 09 15:04:45 800430 [AA4F3700] 0x01 -> osm_sa_mad_ctrl_unbind: ERR 1A11: No previous bind
Jul 09 15:04:45 829702 [AA4F3700] 0x80 -> Exiting SM

我会理解一些指示,以便我能理解我的错误。

1 个答案:

答案 0 :(得分:0)

RoCE设备不需要OpenSM。因此,当您只有RoCE设备时,无法启动OpenSM。

由于您未指定要连接的地址的服务器,因此rping无法运行。假设您的机器的支持RoCE的接口的IP地址为192.168.1.2(服务器)和192.168.1.3(客户端),则应按以下步骤运行命令:

server$ rping -s -a 192.168.1.2
client$ rping -c -a 192.168.1.2

谢谢,

- Shachar