提高mongodb聚合的性能

时间:2017-10-24 17:10:30

标签: mongodb

我每天都有数十亿份文件存放到百货商店(男人,女人等)
id_department:部门的位置,area_type:部门的分支名称(如鞋子,时装等)

(_id:59e86325dc03580bdbf2347f    
date:20170906
id_department:2640
goinside_type:2
area_type:1)
(_id:59e86325dc03580bdbf2347f    
date:20170906
id_department:2642
goinside_type:3
area_type:2)

我想写一个查询可以返回一个人在一个时间范围内访问area_type的问题,这里的问题是area_type可以超过1000并且每个area_type的条件可以不同(所以在这种情况下不能使用group bytype_type )。我的管道很长,会降低性能。

$pipeline = Array
(
    [0] => Array
        (
            [$match] => Array
                (
                    [id_station] => Array
                        (
                            [$in] => Array
                                (
                                    [0] => 2640
                                    [1] => 2642
                                    [2] => 2644
                                )

                        )
                    [date] => Array
                        (
                            [$gte] => 20170802
                            [$lte] => 20170930
                        )

                )

        )

    [1] => Array
        (
            [$group] => Array
                (
                    [_id] => Array
                        (
                            [id_station] => $id_station                            
                        )

                    [total_entries - area1] => Array
                        (
                            [$sum] => Array
                                (
                                    [$cond] => Array
                                        (
                                            [0] => Array
                                                (
                                                    [$and] => Array
                                                        (    
                                                            [0] => Array
                                                                (
                                                                    [$eq] => Array
                                                                        (
                                                                            [0] => $area_type
                                                                            [1] => 1
                                                                        )

                                                                )

                                                            [2] => Array
                                                                (
                                                                    [$gte] => Array
                                                                        (
                                                                            [0] => $date
                                                                            [1] => 20170901
                                                                        )

                                                                )

                                                            [3] => Array
                                                                (
                                                                    [$lte] => Array
                                                                        (
                                                                            [0] => $date
                                                                            [1] => 20170930
                                                                        )

                                                                )

                                                        )

                                                )

                                            [1] => 1
                                            [2] => 0
                                        )

                                )

                        )

                    [total_entries - area1previous] => Array
                        (
                            [$sum] => Array
                                (
                                    [$cond] => Array
                                        (
                                            [0] => Array
                                                (
                                                    [$and] => Array
                                                        (
                                                            [0] => Array
                                                                (
                                                                    [$eq] => Array
                                                                        (
                                                                            [0] => $area_type
                                                                            [1] => 1
                                                                        )

                                                                )

                                                            [2] => Array
                                                                (
                                                                    [$gte] => Array
                                                                        (
                                                                            [0] => $date
                                                                            [1] => 20170802
                                                                        )

                                                                )

                                                            [3] => Array
                                                                (
                                                                    [$lte] => Array
                                                                        (
                                                                            [0] => $date
                                                                            [1] => 20170831
                                                                        )

                                                                )

                                                        )

                                                )
                                            [1] => 1
                                            [2] => 0
                                        )

                                )

                        )
                        [total_entries - area2] => Array
                        (
                            [$sum] => Array
                                (
                                    [$cond] => Array
                                        (
                                            [0] => Array
                                                (
                                                    [$and] => Array
                                                        (    
                                                            [0] => Array
                                                                (
                                                                    [$eq] => Array
                                                                        (
                                                                            [0] => $area_type
                                                                            [1] => 2
                                                                        )

                                                                )                                                       

                                                            [2] => Array
                                                                (
                                                                    [$gte] => Array
                                                                        (
                                                                            [0] => $date
                                                                            [1] => 20170901
                                                                        )

                                                                )

                                                            [3] => Array
                                                                (
                                                                    [$lte] => Array
                                                                        (
                                                                            [0] => $date
                                                                            [1] => 20170930
                                                                        )

                                                                )

                                                        )

                                                )

                                            [1] => 1
                                            [2] => 0
                                        )

                                )

                        )

                    [total_entries - area2previous] => Array
                        (
                            [$sum] => Array
                                (
                                    [$cond] => Array
                                        (
                                            [0] => Array
                                                (
                                                    [$and] => Array
                                                        (
                                                            [0] => Array
                                                                (
                                                                    [$eq] => Array
                                                                        (
                                                                            [0] => $area_type
                                                                            [1] => 2
                                                                        )

                                                                )                                                           
                                                            [2] => Array
                                                                (
                                                                    [$gte] => Array
                                                                        (
                                                                            [0] => $date
                                                                            [1] => 20170802
                                                                        )

                                                                )

                                                            [3] => Array
                                                                (
                                                                    [$lte] => Array
                                                                        (
                                                                            [0] => $date
                                                                            [1] => 20170831
                                                                        )

                                                                )

                                                        )

                                                )
                                            [1] => 1
                                            [2] => 0
                                        )

                                )

                        )

                )

        )
)
 $cursor = $collection->aggregate($pipeline,  ['allowDiskUse' => true]); 

有什么想法可以解决这个问题吗?

1 个答案:

答案 0 :(得分:0)

这里最重要的是你在dateid_department / id_station(我怀疑是相同的)字段上创建索引。

collection.createIndex({
    "id_department" : 1,
    "date" : 1
})

这将加快你的$match阶段,之后只剩下几个文件来处理以下管道阶段。

一旦衡量了最终的效果并证明不够,您就可以尝试优化查询(例如,通过将重复日期过滤器提取到真正的临时$project$group阶段你已经分组了。)