Question

我很好奇！据我所知，HDFS需要运行datanode进程，这就是为什么它只能在服务器上运行。 Spark可以在本地运行，但需要winutils.exe这是Hadoop的一个组件。但究竟是做什么的呢？怎么样，我不能在Windows上运行Hadoop，但我可以运行基于Hadoop构建的Spark？

Answer 1

我至少知道一种用法，它用于在Windows操作系统上运行shell命令。您可以在org.apache.hadoop.util.Shell中找到它，其他模块依赖于此类并使用它的方法，例如getGetPermissionCommand()方法：

static final String WINUTILS_EXE = "winutils.exe";
...
static {
  IOException ioe = null;
  String path = null;
  File file = null;
  // invariant: either there's a valid file and path,
  // or there is a cached IO exception.
  if (WINDOWS) {
    try {
      file = getQualifiedBin(WINUTILS_EXE);
      path = file.getCanonicalPath();
      ioe = null;
    } catch (IOException e) {
      LOG.warn("Did not find {}: {}", WINUTILS_EXE, e);
      // stack trace comes at debug level
      LOG.debug("Failed to find " + WINUTILS_EXE, e);
      file = null;
      path = null;
      ioe = e;
    }
  } else {
    // on a non-windows system, the invariant is kept
    // by adding an explicit exception.
    ioe = new FileNotFoundException(E_NOT_A_WINDOWS_SYSTEM);
  }
  WINUTILS_PATH = path;
  WINUTILS_FILE = file;

  WINUTILS = path;
  WINUTILS_FAILURE = ioe;
}
...
public static String getWinUtilsPath() {
  if (WINUTILS_FAILURE == null) {
    return WINUTILS_PATH;
  } else {
    throw new RuntimeException(WINUTILS_FAILURE.toString(),
        WINUTILS_FAILURE);
  }
}
...
public static String[] getGetPermissionCommand() {
  return (WINDOWS) ? new String[] { getWinUtilsPath(), "ls", "-F" }
                   : new String[] { "/bin/ls", "-ld" };
}

Answer 2

尽管Max的答案涵盖了被引用的实际位置。让我简要介绍一下为什么Windows需要它-

来自Hadoop的Confluence页面本身-

Hadoop需要Windows上的本机库才能正常工作-   包括访问file：//文件系统，其中Hadoop使用了一些   Windows API，用于实现类似posix的文件访问权限。

这是在HADOOP.DLL和WINUTILS.EXE中实现的。

尤其是，％HADOOP_HOME％\ BIN \ WINUTILS.EXE必须可定位

而且，我认为您应该能够在Windows上同时运行Spark和Hadoop。

Windows上的Spark - winutils究竟是什么？我们为什么需要它？

2 个答案: