RWeka的规则学习算法,查找有关日期的规则的问题

时间:2015-02-18 10:56:35

标签: algorithm date weka rules rweka

我对R的RWeka包有一些问题,更确切地说是规则学习算法。我自己创建了一个.arff文件,您可以在下面看到。现在我已经使用.arff文件的数据运行了RWeka包的JRip和J48算法,并得到了以下规则:

> JRip(Failure ~., data=date)
JRIP rules:
===========

 => Failure=no (35.0/11.0)

Number of Rules : 1

> J48(Failure ~., data=date)
J48 pruned tree
------------------
: no (35.0/11.0)

Number of Leaves  :     1

Size of the tree :      1

所以现在我的问题是为什么算法找不到基于生产日期的规则?很明显,2013-04-01生产的所有产品都有问题。

我的错误是什么?

提前致谢! titus24

@RELATION dataset

@ATTRIBUTE Date-of-Production			DATE "yyyy-MM-dd HH:mm:ss"
@ATTRIBUTE Location				{Frankfurt, Cologne, Hamburg, Munich, Berlin}
@ATTRIBUTE Failure				{yes, no}

@DATA
"2013-04-01 00:00:00",Frankfurt,yes
"2013-04-01 00:00:00",Cologne,yes
"2013-04-01 00:00:00",Munich,yes
"2013-04-01 00:00:00",Hamburg,yes
"2013-04-01 00:00:00",Berlin,yes
"2013-04-01 00:00:00",Frankfurt,yes
"2013-04-01 00:00:00",Cologne,yes
"2013-04-01 00:00:00",Munich,yes
"2013-04-01 00:00:00",Hamburg,yes
"2013-04-01 00:00:00",Berlin,yes
"2013-04-01 00:00:00",Frankfurt,yes
"2012-05-01 00:00:00",Cologne,no
"2012-05-02 00:00:00",Munich,no
"2012-05-03 00:00:00",Hamburg,no
"2012-05-04 00:00:00",Berlin,no
"2012-05-05 00:00:00",Frankfurt,no
"2012-05-06 00:00:00",Cologne,no
"2012-05-07 00:00:00",Munich,no
"2012-05-08 00:00:00",Hamburg,no
"2012-05-09 00:00:00",Berlin,no
"2012-05-10 00:00:00",Frankfurt,no
"2012-05-11 00:00:00",Cologne,no
"2012-05-12 00:00:00",Munich,no
"2012-05-13 00:00:00",Hamburg,no
"2012-05-14 00:00:00",Berlin,no
"2012-05-15 00:00:00",Frankfurt,no
"2012-05-16 00:00:00",Cologne,no
"2012-05-17 00:00:00",Munich,no
"2012-05-18 00:00:00",Hamburg,no
"2012-05-19 00:00:00",Berlin,no
"2012-05-20 00:00:00",Frankfurt,no
"2012-05-21 00:00:00",Cologne,no
"2012-05-22 00:00:00",Munich,no
"2012-05-23 00:00:00",Hamburg,no
"2012-05-24 00:00:00",Berlin,no

1 个答案:

答案 0 :(得分:0)

说明

WEKA中属性的日期的内部表示是存储自1970年1月1日00:00:00 GMT以来的毫秒的浮点数。如weka.core.Attribute文档中所述。从POSIXct / POSIXt转换为RWeka中的浮点数存在某种问题。

解决方案

手动转换日期并运行分类:

dataset <- read.arff("date.arff")
dataset[,1] <- unclass(dataset[, 1])   # get internal representation
J48(Failure ~ ., data = dataset)

输出与WEKA Explorer 3.7.12中的输出相同:

Date-of-Production <= 1337810400: no (24.0)
Date-of-Production > 1337810400: yes (11.0)

Number of Leaves  :     2

Size of the tree :  3