Hadoop datanode绑定了错误的IP地址

时间:2015-09-25 15:08:09

标签: hadoop multiserver

我有一个三节点hadoop集群正在运行。出于某种原因,当datanode从站启动时,它们识别出一个甚至不存在于我的网络上的IP地址。这是我的主机名和IP映射。

nodes:
  - hostname: hadoop-master
    ip: 192.168.51.4
  - hostname: hadoop-data1
    ip: 192.168.52.4
  - hostname: hadoop-data2
    ip: 192.168.52.6

正如你在下面看到的那样,hadoop-master节点正常启动,但是在其他两个节点中,只有一个节点显示为Live数据节点,无论哪一个节目出现,总是有IP 192.168.51.1,就像你一样可以看到上面的内容甚至不存在于我的网络上。

hadoop@hadoop-master:~$ hdfs dfsadmin -report
Safe mode is ON
Configured Capacity: 84482326528 (78.68 GB)
Present Capacity: 75735965696 (70.53 GB)
DFS Remaining: 75735281664 (70.53 GB)
DFS Used: 684032 (668 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0

-------------------------------------------------
Live datanodes (2):

Name: 192.168.51.1:50010 (192.168.51.1)
Hostname: hadoop-data2
Decommission Status : Normal
Configured Capacity: 42241163264 (39.34 GB)
DFS Used: 303104 (296 KB)
Non DFS Used: 4305530880 (4.01 GB)
DFS Remaining: 37935329280 (35.33 GB)
DFS Used%: 0.00%
DFS Remaining%: 89.81%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Fri Sep 25 13:54:23 UTC 2015


Name: 192.168.51.4:50010 (hadoop-master)
Hostname: hadoop-master
Decommission Status : Normal
Configured Capacity: 42241163264 (39.34 GB)
DFS Used: 380928 (372 KB)
Non DFS Used: 4440829952 (4.14 GB)
DFS Remaining: 37799952384 (35.20 GB)
DFS Used%: 0.00%
DFS Remaining%: 89.49%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Fri Sep 25 13:54:21 UTC 2015

我确实尝试为每个主机显式添加dfs.datanode.address,但在这种情况下,它甚至无法显示为活动节点。这是我的hdfs-site.xml的样子(注意我已经尝试了dfs.datanode.address设置和缺席)。

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
  <property>
    <name>dfs.replication</name>
    <value>2</value>
    <description>Default block replication.
    The actual number of replications can be specified when the file is created.
    The default is used if replication is not specified in create time.
    </description>
  </property>
  <property>
    <name>dfs.namenode.rpc-bind-host</name>
    <value>0.0.0.0</value>
  </property>
  <property>
    <name>dfs.datanode.address</name>
    <value>192.168.51.4:50010</value>
  </property>
  <property>
    <name>dfs.namenode.datanode.registration.ip-hostname-check</name>
    <value>false</value>
  </property>
  <property>
   <name>dfs.namenode.name.dir</name>
   <value>/home/hadoop/hadoop-data/hdfs/namenode</value>
   <description>Determines where on the local filesystem the DFS name node should store the name table(fsimage). If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy.</description>
  </property>
  <property>
   <name>dfs.datanode.data.dir</name>
   <value>/home/hadoop/hadoop-data/hdfs/datanode</value>
   <description>Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. Directories that do not exist are ignored.</description>
  </property>
</configuration>

为什么hadoop将每个datanode与一个甚至不存在的IP相关联?或者更重要的是,如何让节点正常运行?

更新: 所有节点上的文件/ etc / hosts都是相同的

192.168.51.4 hadoop-master
192.168.52.4 hadoop-data1
192.168.52.6 hadoop-data2

以下是我的奴隶文件的内容。

hadoop@hadoop-master:~$ cat /usr/local/hadoop/etc/hadoop/slaves
hadoop-master
hadoop-data1
hadoop-data2

datanode日志:
https://gist.github.com/dwatrous/7241bb804a9be8f9303f https://gist.github.com/dwatrous/bcd85cda23d6eca3a68b https://gist.github.com/dwatrous/922c4f773aded0137fa3

namenode日志:
https://gist.github.com/dwatrous/dafaa7695698f36a5d93

2 个答案:

答案 0 :(得分:2)

在审核了所有可能的问题后,这个问题似乎与Vagrant和Virtualbox的某些组合有关。我试图在一个子网上运行主节点,在另一个子网上运行datanode。事实证明,网络配置的方式,我可以在这些子网之间进行通信,但是有一些类型的隐藏网关导致使用错误的IP地址。

解决方案是更改我的Vagrantfile以将所有三台主机放在同一子网上。之后一切都按预期工作。

答案 1 :(得分:0)

你可以发布你的整个datanode日志吗?尝试将以下值设置为要绑定到的ip的接口名称。

dfs.client.local.interfaces = eth0