通过mongo查询中的文档计数项目

时间:2016-06-17 13:55:23

标签: python mongodb pymongo aggregation-framework

我的mongo游标看起来像这样:

{ 
  "_id":ObjectId("57558ee01807ce2f774569cc"),
  "description": "Lorem Ipnsun ....",
  "results":[
      {
         "name":"Alica James",
         "gender":"male"
      },
      {
         "name":"Alica James",
     "gender":"female"
      },
      {
         "name":"Alica James",
         "gender":"female"
      }
   ]
},
{ 
  "_id":ObjectId("57558ee01807ce2f774569c6"),
  "description": "Lorem Ipnsun ....",
  "results":[
      {
         "name":"Van Ban",
         "gender":"unclear"
      }
   ]
},
{ 
  "_id":ObjectId("57558ee01807ce2f774569c7"),
  "description": "Lorem Ipnsun ....",
  "results":[]
}

如您所见,results键可以为空或可以包含值。在其中,有一个字段名称,其存在性别可以是男性女性或不清楚。 我想在我的收藏中找到所有文件,然后搜索每个文件,检查每个名字的性别分布。

因此,对于名称"Alica James",我希望我的查询得到

female_numbers_for_document = 2
male_numbers_for_document = 1
unclear_numbers_for_document = 0

Van Ban

female_numbers_for_document = 0
male_numbers_for_document = 0
unclear_numbers_for_document = 1

在python上,我开始这样做,首先我找到了所有关于集合的文档,然后我开始迭代光标中的每个文档,然后我宣布了一些vars来定义性别,但这不起作用,因为它需要只有第一个值,并没有经过results。代码如下所示:

def find_gender_distribution(self):
    cursor = self.mongo.db[self.collection_name].find()
    for document in cursor:
        female_numbers_for_document = document.find({"results.gender": "female"}).count()
        male_numbers_for_document = document.find({"results.gender": "male"}).count()
        unclear_numbers_for_document = document.find({"results.gender": "unclear"}).count()

我不知道如何计算包含相同性别的结果中有多少文档?请帮忙。

1 个答案:

答案 0 :(得分:0)

您使用了错误的方法来执行此操作。您需要使用.aggregate()方法来访问聚合管道。

unwind1 = {"$unwind": "$result"}
group1 = {
    "$group": {
        "_id": {"name": "$result.name", "gender": "$result.gender"},
        "count": {"$sum": 1}
    }
}
group2 = {
    "$group": {
        "_id": "$_id.name", 
        "nmale": {
            "$sum": {"$cond": [
                        {"$eq": ["$_id.gender", "male"]}, 
                        "$count", 
                        0
                    ]
            }
        }, 
        "nfemale": {
            "$sum": {"$cond": [
                        {"$eq": ["$_id.gender", "female"]}, 
                        "$count", 
                        0
                    ]
            }
        }, 
        "nunclear": {
            "$sum": {"$cond": [
                        {"$or": [
                            {"$ne": ["$_id.gender", "male"]}, 
                            {"$ne": ["$_id.gender", "female"]}
                        ]}, 
                        "$count", 
                        0
                    ]
            }
        }
    }
}       

pipeline = [unwind1, group1, group2]

def find_gender_distribution(self):
    collection = self.mongo.db[self.collection_name]
    cursor = collection.aggregate(pipeline)
    for document in cursor:
        print(document) #  or do something

如果我们打印光标,它会产生如下内容:

{ "_id" : "Alica James", "nmale" : 1, "nfemale" : 2, "nunclear" : 3 }
{ "_id" : "Van Ban", "nmale" : 0, "nfemale" : 0, "nunclear" : 1 }