我在命令行使用以下命令使用Mahout kmeans算法
来聚类数据mahout kmeans -i /vect_out/tfidf-vectors/ -c /out_canopy -o /out_kmeans -dm
org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure -cd 1.0 -x 20 -cl
其中/ out_canopy是包含使用Mahout冠层群集创建的群集的目录,其中包含clusters-0
目录,该目录本身包含名为_logs
的目录和名为part-r-00000
的文件
但它会一直报告以下错误
java.lang.IllegalStateException: No clusters found. Check your -c path.
at org.apache.mahout.clustering.kmeans.KMeansMapper.setup
答案 0 :(得分:0)
你确定/out_canopy
是目录吗?你试过了吗?
file /out_canopy
似乎有一个拼写错误,你只想写out_canopy
或某种类似的...
答案 1 :(得分:0)
这是一个特别棘手的问题。
1. Swallow IllegalStateExceptions thrown by removeShutdownHook in FileSystem. The javadoc states:
public boolean removeShutdownHook(Thread hook)
Throws:
IllegalStateException - If the virtual machine is already in the process of shutting down
So if we are getting this exception, it MEANS we are already in the process of shutdown, so we CANNOT, try what we may, removeShutdownHook. If Runtime had a method Runtime.isShutdownInProgress(), we could have checked for it before the removeShutdownHook call. As it stands, there is no such method. In my opinion, this would be a good patch regardless of the needs for this JIRA.
2. Not send SIGTERMs from the NM to the MR-AM in the first place. Rather we should expose a mechanism for the NM to politely tell the AM its no longer needed and should shutdown asap. Even after this, if an admin were to kill the MRAppMaster with a SIGTERM, the JobHistory would be lost defeating the purpose of 3614