使用django bulk_create与M2M关系

时间:2017-06-18 21:04:17

标签: python django postgresql m2m

我在响应对象中有一堆对象,我将其保存到数据库中。逐个对象地执行它非常慢,因为这实际上意味着如果它的30k对象,它将对数据库进行30k提交。

示例1:

for obj in response['RESULTS']:

    _city = City.objects.create(
        id=obj['id'],
        name=obj['name'],
        shortname=obj['shortname'],
        location=obj['location'],
        region=region_fk
    )

    _events = Event.objects.get(pk=obj['Event'])
    _events.city_set.add(_city)

我实施bulk_create()的新方法是这样的:

示例2:

bulk_list = []

for obj in response['RESULTS']:

    # get the foreignkey instead of duplicating data

    if obj.get('Region'):
        region_fk = Region.objects.get(pk=obj['Region'])      

    bulk_list.append(
        City(
            id=obj['id'],
            name=obj['name'],
            shortname=obj['shortname'],
            location=obj['location'],
            region=region_fk
        )
    )

bulk_save = City.objects.bulk_create(bulk_list)

虽然这比我之前尝试的速度快很多,但它有问题,现在我不知道如何添加我的M2M关系。

models.py

class City(models.Model):

    id = models.CharField(primary_key=True, max_length=64)
    name = models.CharField(max_length=32)
    shortname = models.CharField(max_length=32)
    location = models.CharField(max_length=32)
    region = models.ForeignKey(max_length=32)
    events = models.ManyToManyField(Event)


class Event(models.Model):

    id = models.CharField(primary_key=True, max_length=64)
    description = models.TextField()
    date = models.DateTimeField()

class Region(models.Model):

    id = models.IntegerField(primary_key=True)

问题

我已经浏览了stackoverflow并找到了一些例子,但我完全不理解它们。似乎大多数答案都在讨论 bulk_create M2M关系以及through模型,我不确定那是我在寻找什么。

  1. 如何添加M2M关系?
  2. 请分解,以便我能理解,我想学习: - )
  3. 任何帮助或指示都非常感谢。谢谢。

    其他信息

    我跑:

    • 的PostgreSQL
    • 的django == 1.11

    相关帖子

    Django关于此主题的文档

    响应示例:

    "RESULT": [
      {
        "City": [
          {
            "id": "349bc6ab-1c82-46b9-889e-2cc534d5717e",
            "name": "Stockholm",
            "shortname": "Sthlm",
            "location": "Sweden",
            "region": [
              2
            ],
            "events": [
              {
                "id": "989b6563-97d2-4b7d-83a2-03c9cc774c21",
                "description": "some text",
                "date": "2017-06-19T00:00:00"
              },
              {
                "id": "70613514-e569-4af4-b770-a7bc9037ddc2",
                "description": "some text",
                "date": "2017-06-20T00:00:00"
              },
                {
                "id": "7533c16b-3b3a-4b81-9d1b-af528ec6e52b",
                "description": "some text",
                "date": "2017-06-22T00:00:00"
              },
          }
      }
    ]
    

1 个答案:

答案 0 :(得分:0)

取决于。

如果你的M2M关系没有明确的through模型,那么使用Django ORM的可能解决方案是:

from itertools import groupby

# Create all ``City`` objects (like you did in your second example):
cities = City.objects.bulk_create(
    [
        City(
            id=obj['id'],
            name=obj['name'],
            shortname=['shortname'],
            location=['location'],
            region=['region']
        ) for obj in response['RESULTS']
    ]
)

# Select all related ``Event`` objects.
events = Event.objects.in_bulk([obj['Event'] for obj in response['RESULTS']])

# Add all related cities to corresponding events:
for event_id, event_cities_raw in groupby(response['RESULTS'], lambda x: x['Event']):
    event = events[event_id]
    # To avoid DB queries we can gather all cities ids from response
    city_ids = [city['id'] for city in event_cities_raw]
    # And get saved objects from bulk_create result, which are required for ``add`` method.
    event_cities = [city for city in cities if city.pk in city_ids]
    event.city_set.add(*event_cities)

1个bulk_create查询,1个in_bulk查询+ 1个查询,用于响应中的每个唯一事件(event.city_set.add默认执行单个UPDATE查询。)

使用明确的through模型,应该可以为此模型使用另一个bulk_create,换句话说,将所有event.city_set.add个查询替换为单个ExplicitThrough.objects.bulk_create

response['RESULTS']的事件不存在时,您可能需要处理情况,然后您必须使用另一个bulk_create创建这些对象。

发表评论

如果response['RESULTS']中的某些事件在数据库中不存在。 在这种情况下,您可以在bulk_create查询下执行另一个Event.objects.in_bulk

new_events = Event.objects.create_bulk([obj['Event'] for obj in response['RESULTS'] if obj['Event']['id'] not in events])

但是在这里,它取决于response['RESULTS']中的对象结构。但总的来说,你需要在这里创建缺失的事件。它应该比使用Event.objects.get_or_create调用更快。