Ignite TcpDiscoveryMulticastIpFinder不起作用:节点FAILED,apache点火服务器无法形成集群

时间:2018-05-01 14:42:49

标签: apache ignite

在示例配置中: https://github.com/apache/ignite/blob/master/examples/config/example-default.xml 它使用TcpDiscoveryMulticastIpFinder,但不像这样配置多播组:

                <!--<bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">-->
                <bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.multicast.TcpDiscoveryMulticastIpFinder">
                    <property name="addresses">
                        <list>
                            <!-- In distributed environment, replace with actual host IP address. -->
                            <value>127.0.0.1:47500..47509</value>
                        </list>
                    </property>
                </bean>

但我在官方文件https://apacheignite.readme.io/docs/cluster-config#section-multicast-based-discovery

中找到了

配置了组播组,

<bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
  <property name="ipFinder">
    <bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.multicast.TcpDiscoveryMulticastIpFinder">
      <property name="multicastGroup" value="228.10.10.157"/>
    </bean>
  </property>
</bean>

所以,我的问题是在示例中,它没有指定multicastGroup属性,它会使用一些默认属性吗? 或者我应该配置multicastGroup,我检查我的实验室,我应该使用228.1.2.4作为multicastGroup地址吗?

ip link show em1 | grep MULTICAST
2: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT qlen 1000

# ip maddress show
1:  lo
    inet  224.0.0.1
    inet6 ff02::1
    inet6 ff01::1
2:  em1
    link  01:00:5e:00:00:01
    link  33:33:00:00:00:01
    link  33:33:ff:e6:07:a8
    link  01:00:5e:01:02:04
    inet  228.1.2.4
    inet  224.0.0.1
    inet6 ff02::1:ffe6:7a8
    inet6 ff02::1
    inet6 ff01::1

在我的环境中,我有3个服务器节点,但是服务器无法形成群集,拓扑显示它始终有节点失败,

[10:59:34,424][WARNING][tcp-disco-msg-worker-#3][TcpDiscoverySpi] Timed out waiting for message delivery receipt (most probably, the reason is in long GC pauses on remote node; consider tuning GC and increasing 'ackTimeout' configuration property). Will retry to send message with increased timeout [currentTimeout=10000, rmtAddr=/192.168.28.162:47500, rmtPort=47500]
[11:00:02,334][WARNING][disco-event-worker-#101][GridDiscoveryManager] Node FAILED: TcpDiscoveryNode [id=ca28bc89-8455-49dd-9e3a-bc4e22581125, addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 192.168.28.163], sockAddrs=[/0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500, /192.168.28.163:47500], discPort=47500, order=20, intOrder=13, lastExchangeTime=1525186722970, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false]
[11:00:41,674][WARNING][disco-event-worker-#101][GridDiscoveryManager] Node FAILED: TcpDiscoveryNode [id=42a3f2ef-4aa7-49d1-9987-05807efb4d46, addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 192.168.28.184], sockAddrs=[/192.168.28.184:0, /0:0:0:0:0:0:0:1%lo:0, /127.0.0.1:0], discPort=0, order=25, intOrder=15, lastExchangeTime=1525186727940, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=true]

没有流量,CPU,MEM的使用率非常低,群集最初是第一次工作了一段时间而且稍后失败了。

====================

我停止所有节点,然后再试一次,它仍然失败。

我启动一个服务器节点,它工作,然后第二个,第三个, 我可以看到日志,拓扑更新到3个节点,但很快就失败了,只减少到1个服务器,3个节点都减少到1个节点:

[11:57:32,585][INFO][main][GridDiscoveryManager] Topology snapshot [ver=1, servers=1, clients=0, CPUs=32, offheap=25.0GB, heap=1.0GB]
[11:57:32,585][INFO][main][GridDiscoveryManager] Data Regions Configured:
[11:57:32,585][INFO][main][GridDiscoveryManager]   ^-- default [initSize=256.0 MiB, maxSize=25.1 GiB, persistenceEnabled=true]
[11:57:59,523][INFO][ignite-update-notifier-timer][GridUpdateNotifier] Your version is up to date.
[11:58:32,586][INFO][grid-timeout-worker-#71][IgniteKernal] 
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
    ^-- Node [id=4769f8fa, uptime=00:01:00.008]
    ^-- H/N/C [hosts=1, nodes=1, CPUs=32]
    ^-- CPU [cur=0.03%, avg=0.15%, GC=0%]
    ^-- PageMemory [pages=0]
    ^-- Heap [used=99MB, free=89.83%, comm=981MB]
    ^-- Non heap [used=50MB, free=96.7%, comm=50MB]
    ^-- Outbound messages queue [size=0]
    ^-- Public thread pool [active=0, idle=0, qSize=0]
    ^-- System thread pool [active=0, idle=6, qSize=0]
[11:59:03,122][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery accepted incoming connection [rmtAddr=/192.168.28.162, rmtPort=51705]
[11:59:03,135][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery spawning a new thread for connection [rmtAddr=/192.168.28.162, rmtPort=51705]
[11:59:03,136][INFO][tcp-disco-sock-reader-#6][TcpDiscoverySpi] Started serving remote node connection [rmtAddr=/192.168.28.162:51705, rmtPort=51705]
[11:59:08,174][INFO][tcp-disco-sock-reader-#6][TcpDiscoverySpi] Finished serving remote node connection [rmtAddr=/192.168.28.162:51705, rmtPort=51705
[11:59:14,391][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery accepted incoming connection [rmtAddr=/192.168.28.162, rmtPort=60747]
[11:59:14,391][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery spawning a new thread for connection [rmtAddr=/192.168.28.162, rmtPort=60747]
[11:59:14,392][INFO][tcp-disco-sock-reader-#7][TcpDiscoverySpi] Started serving remote node connection [rmtAddr=/192.168.28.162:60747, rmtPort=60747]
[11:59:14,399][INFO][tcp-disco-sock-reader-#7][TcpDiscoverySpi] Finished serving remote node connection [rmtAddr=/192.168.28.162:60747, rmtPort=60747
[11:59:18,428][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery accepted incoming connection [rmtAddr=/192.168.28.162, rmtPort=48386]
[11:59:18,428][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery spawning a new thread for connection [rmtAddr=/192.168.28.162, rmtPort=48386]
[11:59:18,428][INFO][tcp-disco-sock-reader-#8][TcpDiscoverySpi] Started serving remote node connection [rmtAddr=/192.168.28.162:48386, rmtPort=48386]
[11:59:18,452][INFO][disco-event-worker-#101][GridDiscoveryManager] Added new node to topology: TcpDiscoveryNode [id=8c87d53c-ba5e-4bdc-800c-0a51f391fc38, addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 192.168.28.162], sockAddrs=[/0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500, /192.168.28.162:47500], discPort=47500, order=2, intOrder=2, lastExchangeTime=1525190343144, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false]
[11:59:18,453][INFO][disco-event-worker-#101][GridDiscoveryManager] Topology snapshot [ver=2, servers=2, clients=0, CPUs=64, offheap=50.0GB, heap=2.0GB]
[11:59:18,453][INFO][disco-event-worker-#101][GridDiscoveryManager] Data Regions Configured:
[11:59:18,454][INFO][disco-event-worker-#101][GridDiscoveryManager]   ^-- default [initSize=256.0 MiB, maxSize=25.1 GiB, persistenceEnabled=true]
[11:59:32,589][INFO][grid-timeout-worker-#71][IgniteKernal] 
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
    ^-- Node [id=4769f8fa, uptime=00:02:00.014]
    ^-- H/N/C [hosts=2, nodes=2, CPUs=64]
    ^-- CPU [cur=0.2%, avg=0.12%, GC=0%]
    ^-- PageMemory [pages=0]
    ^-- Heap [used=112MB, free=88.57%, comm=981MB]
    ^-- Non heap [used=50MB, free=96.67%, comm=51MB]
    ^-- Outbound messages queue [size=0]
    ^-- Public thread pool [active=0, idle=0, qSize=0]
    ^-- System thread pool [active=0, idle=5, qSize=0]
[12:00:13,117][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery accepted incoming connection [rmtAddr=/192.168.28.163, rmtPort=41574]
[12:00:13,117][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery spawning a new thread for connection [rmtAddr=/192.168.28.163, rmtPort=41574]
[12:00:13,117][INFO][tcp-disco-sock-reader-#9][TcpDiscoverySpi] Started serving remote node connection [rmtAddr=/192.168.28.163:41574, rmtPort=41574]
[12:00:13,122][INFO][tcp-disco-sock-reader-#9][TcpDiscoverySpi] Finished serving remote node connection [rmtAddr=/192.168.28.163:41574, rmtPort=41574
[12:00:19,339][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery accepted incoming connection [rmtAddr=/192.168.28.163, rmtPort=60878]
[12:00:19,340][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery spawning a new thread for connection [rmtAddr=/192.168.28.163, rmtPort=60878]
[12:00:19,340][INFO][tcp-disco-sock-reader-#10][TcpDiscoverySpi] Started serving remote node connection [rmtAddr=/192.168.28.163:60878, rmtPort=60878]
[12:00:32,596][INFO][grid-timeout-worker-#71][IgniteKernal] 
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
    ^-- Node [id=4769f8fa, uptime=00:03:00.020]
    ^-- H/N/C [hosts=2, nodes=2, CPUs=64]
    ^-- CPU [cur=0.03%, avg=0.1%, GC=0%]
    ^-- PageMemory [pages=0]
    ^-- Heap [used=119MB, free=87.82%, comm=981MB]
    ^-- Non heap [used=50MB, free=96.65%, comm=52MB]
    ^-- Outbound messages queue [size=0]
    ^-- Public thread pool [active=0, idle=0, qSize=0]
    ^-- System thread pool [active=0, idle=6, qSize=0]
[12:00:34,361][INFO][tcp-disco-sock-reader-#10][TcpDiscoverySpi] Finished serving remote node connection [rmtAddr=/192.168.28.163:60878, rmtPort=60878
[12:00:34,434][INFO][tcp-disco-sock-reader-#8][TcpDiscoverySpi] Finished serving remote node connection [rmtAddr=/192.168.28.162:48386, rmtPort=48386
[12:00:39,572][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery accepted incoming connection [rmtAddr=/192.168.28.163, rmtPort=50348]
[12:00:39,573][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery spawning a new thread for connection [rmtAddr=/192.168.28.163, rmtPort=50348]
[12:00:39,573][INFO][tcp-disco-sock-reader-#11][TcpDiscoverySpi] Started serving remote node connection [rmtAddr=/192.168.28.163:50348, rmtPort=50348]
[12:00:41,880][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery accepted incoming connection [rmtAddr=/192.168.28.163, rmtPort=44933]
[12:00:41,880][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery spawning a new thread for connection [rmtAddr=/192.168.28.163, rmtPort=44933]
[12:00:41,881][INFO][tcp-disco-sock-reader-#12][TcpDiscoverySpi] Started serving remote node connection [rmtAddr=/192.168.28.163:44933, rmtPort=44933]
[12:00:41,885][INFO][tcp-disco-sock-reader-#12][TcpDiscoverySpi] Finished serving remote node connection [rmtAddr=/192.168.28.163:44933, rmtPort=44933
[12:00:44,448][WARNING][tcp-disco-msg-worker-#3][TcpDiscoverySpi] Timed out waiting for message delivery receipt (most probably, the reason is in long GC pauses on remote node; consider tuning GC and increasing 'ackTimeout' configuration property). Will retry to send message with increased timeout [currentTimeout=10000, rmtAddr=/192.168.28.162:47500, rmtPort=47500]
[12:00:44,451][WARNING][tcp-disco-msg-worker-#3][TcpDiscoverySpi] Failed to send message to next node [msg=TcpDiscoveryStatusCheckMessage [creatorNode=TcpDiscoveryNode [id=8c87d53c-ba5e-4bdc-800c-0a51f391fc38, addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 192.168.28.162], sockAddrs=[/0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500, /192.168.28.162:47500], discPort=47500, order=2, intOrder=2, lastExchangeTime=1525190412503, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], failedNodeId=null, status=1, super=TcpDiscoveryAbstractMessage [sndNodeId=8c87d53c-ba5e-4bdc-800c-0a51f391fc38, id=a9d4d6c1361-8c87d53c-ba5e-4bdc-800c-0a51f391fc38, verifierNodeId=null, topVer=0, pendingIdx=0, failedNodes=null, isClient=false]], next=TcpDiscoveryNode [id=8c87d53c-ba5e-4bdc-800c-0a51f391fc38, addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 192.168.28.162], sockAddrs=[/0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500, /192.168.28.162:47500], discPort=47500, order=2, intOrder=2, lastExchangeTime=1525190343144, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], errMsg=Failed to send message to next node [msg=TcpDiscoveryStatusCheckMessage [creatorNode=TcpDiscoveryNode [id=8c87d53c-ba5e-4bdc-800c-0a51f391fc38, addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 192.168.28.162], sockAddrs=[/0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500, /192.168.28.162:47500], discPort=47500, order=2, intOrder=2, lastExchangeTime=1525190412503, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], failedNodeId=null, status=1, super=TcpDiscoveryAbstractMessage [sndNodeId=8c87d53c-ba5e-4bdc-800c-0a51f391fc38, id=a9d4d6c1361-8c87d53c-ba5e-4bdc-800c-0a51f391fc38, verifierNodeId=null, topVer=0, pendingIdx=0, failedNodes=null, isClient=false]], next=ClusterNode [id=8c87d53c-ba5e-4bdc-800c-0a51f391fc38, order=2, addr=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 192.168.28.162], daemon=false]]]
[12:00:44,464][WARNING][tcp-disco-msg-worker-#3][TcpDiscoverySpi] Local node has detected failed nodes and started cluster-wide procedure. To speed up failure detection please see 'Failure Detection' section under javadoc for 'TcpDiscoverySpi'
[12:00:44,468][INFO][disco-event-worker-#101][GridDiscoveryManager] Added new node to topology: TcpDiscoveryNode [id=c096c28e-c1da-4f39-8c5d-db30e01826a7, addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 192.168.28.163], sockAddrs=[/0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500, /192.168.28.163:47500], discPort=47500, order=3, intOrder=3, lastExchangeTime=1525190406877, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false]
[12:00:44,469][INFO][disco-event-worker-#101][GridDiscoveryManager] Topology snapshot [ver=3, servers=3, clients=0, CPUs=96, offheap=75.0GB, heap=3.0GB]
[12:00:44,469][INFO][disco-event-worker-#101][GridDiscoveryManager] Data Regions Configured:
[12:00:44,469][INFO][disco-event-worker-#101][GridDiscoveryManager]   ^-- default [initSize=256.0 MiB, maxSize=25.1 GiB, persistenceEnabled=true]
[12:00:44,474][WARNING][disco-event-worker-#101][GridDiscoveryManager] Node FAILED: TcpDiscoveryNode [id=8c87d53c-ba5e-4bdc-800c-0a51f391fc38, addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 192.168.28.162], sockAddrs=[/0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500, /192.168.28.162:47500], discPort=47500, order=2, intOrder=2, lastExchangeTime=1525190343144, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false]
[12:00:44,475][INFO][disco-event-worker-#101][GridDiscoveryManager] Topology snapshot [ver=4, servers=2, clients=0, CPUs=64, offheap=50.0GB, heap=2.0GB]
[12:00:44,475][INFO][disco-event-worker-#101][GridDiscoveryManager] Data Regions Configured:
[12:00:44,475][INFO][disco-event-worker-#101][GridDiscoveryManager]   ^-- default [initSize=256.0 MiB, maxSize=25.1 GiB, persistenceEnabled=true]
[12:00:48,104][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery accepted incoming connection [rmtAddr=/192.168.28.163, rmtPort=42252]
[12:00:48,105][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery spawning a new thread for connection [rmtAddr=/192.168.28.163, rmtPort=42252]
[12:00:48,105][INFO][tcp-disco-sock-reader-#13][TcpDiscoverySpi] Started serving remote node connection [rmtAddr=/192.168.28.163:42252, rmtPort=42252]
[12:00:48,124][INFO][tcp-disco-sock-reader-#13][TcpDiscoverySpi] Finished serving remote node connection [rmtAddr=/192.168.28.163:42252, rmtPort=42252
[12:00:54,338][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery accepted incoming connection [rmtAddr=/192.168.28.163, rmtPort=51196]
[12:00:54,339][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery spawning a new thread for connection [rmtAddr=/192.168.28.163, rmtPort=51196]
[12:00:54,339][INFO][tcp-disco-sock-reader-#14][TcpDiscoverySpi] Started serving remote node connection [rmtAddr=/192.168.28.163:51196, rmtPort=51196]
[12:00:54,342][INFO][tcp-disco-sock-reader-#14][TcpDiscoverySpi] Finished serving remote node connection [rmtAddr=/192.168.28.163:51196, rmtPort=51196
[12:00:59,482][INFO][tcp-disco-sock-reader-#11][TcpDiscoverySpi] Finished serving remote node connection [rmtAddr=/192.168.28.163:50348, rmtPort=50348
[12:01:00,568][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery accepted incoming connection [rmtAddr=/192.168.28.163, rmtPort=41629]
[12:01:00,568][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery spawning a new thread for connection [rmtAddr=/192.168.28.163, rmtPort=41629]
[12:01:00,569][INFO][tcp-disco-sock-reader-#15][TcpDiscoverySpi] Started serving remote node connection [rmtAddr=/192.168.28.163:41629, rmtPort=41629]
[12:01:00,571][INFO][tcp-disco-sock-reader-#15][TcpDiscoverySpi] Finished serving remote node connection [rmtAddr=/192.168.28.163:41629, rmtPort=41629
[12:01:00,610][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery accepted incoming connection [rmtAddr=/192.168.28.163, rmtPort=49138]
[12:01:00,611][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery spawning a new thread for connection [rmtAddr=/192.168.28.163, rmtPort=49138]
[12:01:00,611][INFO][tcp-disco-sock-reader-#16][TcpDiscoverySpi] Started serving remote node connection [rmtAddr=/192.168.28.163:49138, rmtPort=49138]
[12:01:00,637][WARNING][tcp-disco-msg-worker-#3][TcpDiscoverySpi] Node is out of topology (probably, due to short-time network problems).
[12:01:00,637][INFO][tcp-disco-sock-reader-#16][TcpDiscoverySpi] Finished serving remote node connection [rmtAddr=/192.168.28.163:49138, rmtPort=49138
[12:01:00,638][WARNING][disco-event-worker-#101][GridDiscoveryManager] Local node SEGMENTED: TcpDiscoveryNode [id=4769f8fa-e388-4208-a61c-6a7a44a70d74, addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 192.168.28.161], sockAddrs=[Redis1/192.168.28.161:47500, /0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500], discPort=47500, order=1, intOrder=1, lastExchangeTime=1525190460629, loc=true, ver=2.4.0#20180305-sha1:aa342270, isClient=false]
[12:01:00,640][WARNING][disco-event-worker-#101][GridDiscoveryManager] Stopping local node according to configured segmentation policy.
[12:01:00,641][WARNING][disco-event-worker-#101][GridDiscoveryManager] Node FAILED: TcpDiscoveryNode [id=c096c28e-c1da-4f39-8c5d-db30e01826a7, addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 192.168.28.163], sockAddrs=[/0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500, /192.168.28.163:47500], discPort=47500, order=3, intOrder=3, lastExchangeTime=1525190406877, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false]
[12:01:00,642][INFO][disco-event-worker-#101][GridDiscoveryManager] Topology snapshot [ver=5, servers=1, clients=0, CPUs=32, offheap=25.0GB, heap=1.0GB]

1 个答案:

答案 0 :(得分:2)

默认多播组为228.1.2.4。

您是否尝试过使用org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder而不是多播?如果由于某些原因多播在您的环境中无法正常工作,那么使用静态IP地址的Discovery无论如何都会起作用。 以下是静态ip finder的示例:

 <bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
                        <property name="addresses">
                            <list>
                                <!-- In distributed environment, replace with actual host IP address. -->
                                <value>127.0.0.1:47500..47509</value>
                            </list>
                        </property>
                    </bean>
相关问题