我的django查询在终端

时间:2017-05-19 10:43:24

标签: python django

我有一个用户表,有3种类型的用户学生,教师和俱乐部,我有一张大学表。 我想要的是特定大学里有多少用户。 我得到了我想要的输出,但输出非常慢。我有90k用户,它产生的输出需要几分钟才能产生结果。

我的用户模型: -

from __future__ import unicode_literals
from django.db import models
from django.contrib.auth.models import User
from cms.models.masterUserTypes import MasterUserTypes
from cms.models.universities import Universities
from cms.models.departments import MasterDepartments



# WE ARE AT MODELS/APPUSERS

requestChoice = (
    ('male', 'male'),
    ('female', 'female'),
    )


class Users(models.Model):
    id = models.IntegerField(db_column="id", max_length=11, help_text="")
    userTypeId = models.ForeignKey(MasterUserTypes, db_column="userTypeId")
    universityId = models.ForeignKey(Universities, db_column="universityId")  
    departmentId = models.ForeignKey(MasterDepartments , db_column="departmentId",help_text="")  
    name = models.CharField(db_column="name",max_length=255,help_text="")
    username = models.CharField(db_column="username",unique=True, max_length=255,help_text="")
    email = models.CharField(db_column="email",unique=True, max_length=255,help_text="")
    password = models.CharField(db_column="password",max_length=255,help_text="")
    bio = models.TextField(db_column="bio",max_length=500,help_text="")
    gender = models.CharField(db_column="gender",max_length=6, choices=requestChoice,help_text="")
    mobileNo = models.CharField(db_column='mobileNo', max_length=16,help_text="")  
    dob = models.DateField(db_column="dob",help_text="")
    major = models.CharField(db_column="major",max_length=255,help_text="")
    graduationYear = models.IntegerField(db_column='graduationYear',max_length=11,help_text="")  
    canAddNews = models.BooleanField(db_column='canAddNews',default=False,help_text="")  
    receivePrivateMsgNotification = models.BooleanField(db_column='receivePrivateMsgNotification',default=True ,help_text="")  
    receivePrivateMsg = models.BooleanField(db_column='receivePrivateMsg',default=True ,help_text="")
    receiveCommentNotification = models.BooleanField(db_column='receiveCommentNotification',default=True ,help_text="")  
    receiveLikeNotification = models.BooleanField(db_column='receiveLikeNotification',default=True ,help_text="")  
    receiveFavoriteFollowNotification = models.BooleanField(db_column='receiveFavoriteFollowNotification',default=True ,help_text="")  
    receiveNewPostNotification = models.BooleanField(db_column='receiveNewPostNotification',default=True ,help_text="")  
    allowInPopularList = models.BooleanField(db_column='allowInPopularList',default=True ,help_text="")  
    xmppResponse = models.TextField(db_column='xmppResponse',help_text="")  
    xmppDatetime = models.DateTimeField(db_column='xmppDatetime', help_text="")  
    status = models.BooleanField(db_column="status", default=False, help_text="")
    deactivatedByAdmin = models.BooleanField(db_column="deactivatedByAdmin", default=False, help_text="")
    createdAt = models.DateTimeField(db_column='createdAt', auto_now=True, help_text="")  
    modifiedAt = models.DateTimeField(db_column='modifiedAt', auto_now=True, help_text="")  
    updatedBy = models.ForeignKey(User,db_column="updatedBy",help_text="Logged in user updated by ......")
    lastPasswordReset = models.DateTimeField(db_column='lastPasswordReset',help_text="")
    authorities = models.CharField(db_column="departmentId",max_length=255,help_text="")

    class Meta:
        managed = False
        db_table = 'users'

我正在使用的查询产生了所需的输出但是sloq是: -

universities = Universities.objects.using('cms').all()
    for item in universities:
        studentcount = Users.objects.using('cms').filter(universityId=item.id,userTypeId=2).count()
        facultyCount = Users.objects.using('cms').filter(universityId=item.id,userTypeId=1).count()
        clubCount = Users.objects.using('cms').filter(universityId=item.id,userTypeId=3).count()
        totalcount = Users.objects.using('cms').filter(universityId=item.id).count()
        print studentcount,facultyCount,clubCount,totalcount
        print item.name

2 个答案:

答案 0 :(得分:2)

您应该使用注释来获取每所大学的计数和条件表达式,以根据条件获得计数(docs)

Universities.objects.using('cms').annotate(
    studentcount=Sum(Case(When(users_set__userTypeId=2, then=1), output_field=IntegerField())),
    facultyCount =Sum(Case(When(users_set__userTypeId=1, then=1), output_field=IntegerField())),
    clubCount=Sum(Case(When(users_set__userTypeId=3, then=1), output_field=IntegerField())),
    totalcount=Count('users_set'),
)

答案 1 :(得分:1)

首先,一个明显的优化。在循环中,您基本上执行了四次相同的查询:对不同的userTypeId进行三次过滤,一次没有一次。您可以在一个COUNT(*) ... GROUP BY userTypeId查询中执行此操作。

...
# Here, we're building a dict {userTypeId: count}
# by counting PKs over each userTypeId
qs = Users.objects.using('cms').filter(universityId=item.id)
counts = {
    x["userTypeId"]: x["cnt"]
    for x in qs.values('userTypeId').annotate(cnt=Count('pk'))
}

student_count = counts.get(2, 0)
faculty_count = counts.get(1, 0)
club_count = count.get(3, 0)
total_count = sum(count.values())  # Assuming there may be other userTypeIds
...

但是,您仍在进行1 + n次查询,其中n是您在数据库中拥有的大学数量。如果数量很少,这很好,但如果数量很高,则需要进一步聚合,加入UniversitiesUsers。我带来的初稿是这样的:

# Assuming University.name is unique, otherwise you'll need to use IDs
# to distinguish between different projects, instead of names.
qs = Users.objects.using('cms').values('userTypeId', 'university__name')\
    .annotate(cnt=Count('pk').order_by('university__name')
for name, group in itertools.groupby(qs, lambda x: x["university__name"]):
    print("University: %s" % name)
    cnts = {g["userTypeId"]: g["cnt"] for g in group}
    faculty, student, club = cnts.get(1, 0), cnts.get(2, 0), cnts.get(3, 0)
    # NOTE: I'm assuming there are only few (if any) userTypeId values
    #       other than {1,2,3}.
    total = sum(cnts.values())
    print("  Student: %d, faculty: %d, club: %d, total: %d" % (
          student, faculty, club, total))

我可能在那里打错了,但希望这是正确的。就SQL而言,它应该发出类似

的查询
SELECT uni.name, usr.userTypeId, COUNT(usr.id)
FROM some_app_universities AS uni
LEFT JOUN some_app_users AS usr ON us.universityId = uni.id
GROUP BY uni.name, usr.userTypeId
ORDER BY uni.name

考虑阅读aggregations and annotations上的文档。并且一定要查看Django ORM发出的原始SQL(例如使用Django Debug Toolbar)并分析它在数据库上的工作情况。例如,如果您正在使用PostgreSQL,请使用EXPLAIN SELECT。根据您的数据集,您可能会从那里的某些索引中受益(例如,在userTypeId列上)。

哦,并且在旁注...这是偏离主题的,但在Python中,使用lowercase_with_underscores使变量和属性成为一种习惯。在Django中,模型类名称通常是单数,例如UserUniversity