Question

每当我尝试在Hadoop系统中运行Reducer python程序时，我都会收到此错误。 Mapper程序完美运行。已经提供了与我的Mapper程序相同的权限。有语法错误吗？

追踪（最近一次通话）：文件“reducer.py”，第13行，in word，count = line.split（'\ t'，1） ValueError：解包需要多于1个值

            #!/usr/bin/env python
            import sys

            # maps words to their counts
            word2count = {}

            # input comes from STDIN
            for line in sys.stdin:
                # remove leading and trailing whitespace
                line = line.strip()

                # parse the input we got from mapper.py
                word, count = line.split('\t', 1)
                # convert count (currently a string) to int
                try:
                    count = int(count)
                except ValueError:
                    continue

                try:
                    word2count[word] = word2count[word]+count
                except:
                    word2count[word] = count

            # write the tuples to stdout
            # Note: they are unsorted
            for word in word2count.keys():
                print '%s\t%s'% ( word, word2count[word] )

Answer 1

当您在右侧执行多次分配且值太少时，会引发错误ValueError: need more than 1 value to unpack。因此看起来line中没有\t，因此line.split('\t',1)会产生单个值，导致word, count = ("foo",)之类的内容。

Answer 2

我无法详细回答。

但是，我解决了我在print中添加的一些额外mapper时遇到的相同问题。可能与print对sys.stdin的工作方式有关。

我知道您可能已经解决了这个问题

Answer 3

我将 line.split('\t', 1) 更改为 line.split(' ', 1) 并且有效。
好像空格没说清楚，说清楚了：应该是line.split('(one space here)', 1)。

wordcount：reducer python程序抛出ValueError

3 个答案: