dplyr group_by和filter

时间:2016-11-29 15:25:23

标签: r dplyr

考虑以下dplyr查询

# A tibble: 7 x 2
       class   n()
       <chr> <int>
1    2seater     5
2    compact    47
3    midsize    41
4    minivan    11
5     pickup    33
6 subcompact    35
7        suv    62

输出

> mpg %>% group_by(class) %>% filter(hwy==21) %>% summarise(n())

现在,我想按如下方式过滤结果:

# A tibble: 2 x 2
       class   n()
       <chr> <int>
1    minivan     1
2 subcompact     1

也就是说,我想显示高速公路里程数为21的汽车类别。结果如下:

# A tibble: 7 x 2
       class   n()
       <chr> <int>
1    2seater     0
2    compact     0
3    midsize     0
4    minivan     1
5     pickup     0
6 subcompact     1
7        suv     0

这是预期的结果,但我想要看到的是所有类别,如果一个班级没有高速公路里程为21的汽车,那么n()应报告为0。我可以这样做吗?

换句话说,我想要显示以下输出的dplyr查询:

DELETE test
PUT test
{
  "mappings": {
    "trade": {
      "properties": {
        "trade_id": {
          "type": "string",
          "index": "not_analyzed"
        },
        "product_id": {
          "type": "string",
          "index": "not_analyzed"
        },
        "quantity": {
          "type": "double"
        },
        "execution_time": {
          "type": "date"
        },
        "price_per_unit": {
          "type": "double"
        }
      }
    }
  }
}

POST test/trade/_bulk
{"index":{}}
{"execution_time":"2016-11-18T22:45:27Z","quantity":10,"price_per_unit":5}
{"index":{}}
{"execution_time":"2016-11-18T22:45:27Z","quantity":10,"price_per_unit":5}
{"index":{}}
{"execution_time":"2016-11-19T22:45:27Z","quantity":10,"price_per_unit":5}
{"index":{}}
{"execution_time":"2016-11-20T22:45:27Z","quantity":10,"price_per_unit":5}
{"index":{}}
{"execution_time":"2016-11-20T22:45:27Z","quantity":10,"price_per_unit":5}
{"index":{}}
{"execution_time":"2016-11-20T22:45:27Z","quantity":10,"price_per_unit":5}
{"index":{}}
{"execution_time":"2016-11-21T22:45:27Z","quantity":10,"price_per_unit":5}
{"index":{}}
{"execution_time":"2016-11-21T22:45:27Z","quantity":10,"price_per_unit":5}

POST test/trade/_search
{
  "size": 0,
  "aggs": {
    "sales_per_day": {
      "date_histogram": {
        "field": "execution_time",
        "interval": "day"
      },
      "aggs": {
        "sales": {
          "sum": {
            "script": {
              "lang": "groovy",
              "inline": "doc['quantity'] * doc['price_per_unit']"
            }
          }
        },
        "cumulative_sales": {
          "cumulative_sum": {
            "buckets_path": "sales"
          }
        }
      }
    }
  }
}

其中n()是公路里程为21的汽车类别。

这可能吗?

1 个答案:

答案 0 :(得分:0)

试试这个

mpg %>% mutate(k=(hwy==21)) %>% group_by(class) %>%
   summarise(n=sum(k))