是否可以跳过一栏的处理?

时间:2019-02-07 12:40:09

标签: featuretools

我想将数据框的一列保持其原始状态,而不对其应用任何原语,这可能吗?

1 个答案:

答案 0 :(得分:0)

是的,您可以使用ignore_variables的{​​{1}}参数来完成此操作。这是一个演示实体集的示例。

ft.dfs

example entity set

如果我们要为会话实体构建功能,而忽略import featuretools as ft es = ft.demo.load_mock_customer(return_entityset=True) es.plot() 变量,则可以运行

device

feature_defs = ft.dfs(target_entity="sessions", entityset=es, agg_primitives=["count", "mode"], trans_primitives=[], ignore_variables={"sessions": ["device"]}, features_only=True) 具有以下功能

feature_defs

这使用[<Feature: customer_id>, <Feature: COUNT(transactions)>, <Feature: MODE(transactions.product_id)>, <Feature: customers.zip_code>, <Feature: MODE(transactions.products.brand)>, <Feature: customers.COUNT(sessions)>, <Feature: customers.COUNT(transactions)>, <Feature: customers.MODE(transactions.product_id)>] count原语创建功能,但忽略了会话实体中的设备变量。如果我们想将设备变量包含在其原始状态,则可以像这样

重新添加它
mode

现在,我们可以计算特征矩阵了。 feature_defs += [ft.Feature(es["sessions"]["device"])] 现在结束了

device

作为健全性检查,如果我们不使用fm = ft.calculate_feature_matrix(features=feature_defs, entityset=es) fm customer_id COUNT(transactions) MODE(transactions.product_id) customers.zip_code ... customers.COUNT(sessions) customers.COUNT(transactions) customers.MODE(transactions.product_id) device session_id ... 1 2 16 3 13244 ... 7 93 4 desktop 2 5 10 5 60091 ... 6 79 5 mobile 3 4 15 1 60091 ... 8 109 2 mobile 4 1 25 5 60091 ... 8 126 4 mobile 5 4 11 5 60091 ... 8 109 2 mobile 6 1 15 4 60091 ... 8 126 4 tablet 7 3 15 1 13244 ... 6 93 1 tablet 8 4 18 1 60091 ... 8 109 2 tablet 9 1 15 1 60091 ... 8 126 4 desktop 10 2 15 2 13244 ... 7 93 4 tablet 11 4 15 3 60091 ... 8 109 2 mobile 12 4 10 4 60091 ... 8 109 2 desktop 13 4 12 2 60091 ... 8 109 2 mobile 14 1 12 4 60091 ... 8 126 4 tablet 15 2 8 2 13244 ... 7 93 4 desktop 16 2 10 4 13244 ... 7 93 4 desktop 17 2 13 1 13244 ... 7 93 4 tablet 18 1 12 2 60091 ... 8 126 4 desktop 19 3 17 1 13244 ... 6 93 1 desktop 20 5 15 1 60091 ... 6 79 5 desktop 21 4 18 5 60091 ... 8 109 2 desktop 22 4 10 2 60091 ... 8 109 2 desktop 23 3 11 3 13244 ... 6 93 1 desktop 24 5 14 4 60091 ... 6 79 5 tablet 25 3 16 1 13244 ... 6 93 1 desktop 26 1 16 1 60091 ... 8 126 4 tablet 27 1 15 5 60091 ... 8 126 4 mobile 28 5 18 2 60091 ... 6 79 5 mobile 29 1 16 4 60091 ... 8 126 4 mobile 30 5 14 3 60091 ... 6 79 5 desktop 31 2 18 3 13244 ... 7 93 4 mobile 32 5 8 3 60091 ... 6 79 5 mobile 33 2 13 3 13244 ... 7 93 4 mobile 34 3 18 4 13244 ... 6 93 1 desktop 35 3 16 5 13244 ... 6 93 1 mobile

,则输出为
ignore_variables

您可以看到feature_defs = ft.dfs(target_entity="sessions", entityset=es, agg_primitives=["count", "mode"], trans_primitives=[], features_only=True) 功能已创建

<Feature: customers.MODE(sessions.device)>