GeoDjango与PostGIS

时间:2017-10-23 08:09:50

标签: django postgresql postgis geodjango

我正在使用Postgres 10和PostGIS的GeoDjango。我有两个模型如下:

class Postcode(models.Model):
    name = models.CharField(max_length=8, unique=True)
    location = models.PointField(geography=True)

class Transaction(models.Model):
    transaction_id = models.CharField(max_length=60)
    price = models.IntegerField()
    date_of_transfer = models.DateField()
    postcode = models.ForeignKey(Postcode, on_delete=models.CASCADE)
    property_type = models.CharField(max_length=1,blank=True)
    street = models.CharField(blank=True, max_length=200)

    class Meta:
        indexes = [models.Index(fields=['-date_of_transfer',]),
                    models.Index(fields=['price',]),
                   ]

鉴于特定的邮政编码,我想找到指定距离内最近的交易。为此,我使用以下代码:

        transactions = Transaction.objects.filter(price__gte=min_price)   \
                           .filter(postcode__location__distance_lte=(pc.location,D(mi=distance))) \
                           .annotate(distance=Distance('postcode__location',pc.location)).order_by('distance')[0:25]

在具有16GB RAM的Windows PC i5 2500k上,查询运行缓慢大约需要20-60秒(取决于过滤条件)。如果我通过date_of_transfer订购,那么它在<1秒内运行更长的距离(超过1英里),但对于小距离仍然很慢(例如,距离为0.1米时为45秒)。

到目前为止,我已经尝试过:

* changing the location field from Geometry to Geography
* using dwithin instead of distance_lte

这些都不会对查询的速度产生太大的影响。

GeoDjango为当前版本生成的SQL是:

SELECT "postcodes_transaction"."id",
 "postcodes_transaction"."transaction_id", 
"postcodes_transaction"."price", 
"postcodes_transaction"."date_of_transfer",
"postcodes_transaction"."postcode_id", 
"postcodes_transaction"."street",  
ST_Distance("postcodes_postcode"."location", 
ST_GeogFromWKB('\x0101000020e6100000005471e316f3bfbf4ad05fe811c14940'::bytea)) AS "distance" 
FROM "postcodes_transaction" INNER JOIN "postcodes_postcode" 
ON ("postcodes_transaction"."postcode_id" = "postcodes_postcode"."id") 
WHERE ("postcodes_transaction"."price" >= 50000 
AND ST_Distance("postcodes_postcode"."location", ST_GeomFromEWKB('\x0101000020e6100000005471e316f3bfbf4ad05fe811c14940'::bytea)) <= 1609.344 
AND "postcodes_transaction"."date_of_transfer" >= '2000-01-01'::date 
AND "postcodes_transaction"."date_of_transfer" <= '2017-10-01'::date) 
ORDER BY "distance" ASC LIMIT 25

在邮政编码表上,位置字段有一个索引,如下所示:

CREATE INDEX postcodes_postcode_location_id
  ON public.postcodes_postcode
  USING gist
  (location);

交易表有2200万行,邮政编码表有250万行。有关我可以采取哪些方法来改善此查询性能的任何建议?

以下是参考的查询计划:

"Limit  (cost=2394838.01..2394840.93 rows=25 width=76) (actual time=19028.400..19028.409 rows=25 loops=1)"
"  Output: postcodes_transaction.id, postcodes_transaction.transaction_id, postcodes_transaction.price, postcodes_transaction.date_of_transfer, postcodes_transaction.postcode_id, postcodes_transaction.street, (_st_distance(postcodes_postcode.location, '0101 (...)"
"  ->  Gather Merge  (cost=2394838.01..2893397.65 rows=4273070 width=76) (actual time=19028.399..19028.407 rows=25 loops=1)"
"        Output: postcodes_transaction.id, postcodes_transaction.transaction_id, postcodes_transaction.price, postcodes_transaction.date_of_transfer, postcodes_transaction.postcode_id, postcodes_transaction.street, (_st_distance(postcodes_postcode.location, (...)"
"        Workers Planned: 2"
"        Workers Launched: 2"
"        ->  Sort  (cost=2393837.99..2399179.33 rows=2136535 width=76) (actual time=18849.396..18849.449 rows=387 loops=3)"
"              Output: postcodes_transaction.id, postcodes_transaction.transaction_id, postcodes_transaction.price, postcodes_transaction.date_of_transfer, postcodes_transaction.postcode_id, postcodes_transaction.street, (_st_distance(postcodes_postcode.loc (...)"
"              Sort Key: (_st_distance(postcodes_postcode.location, '0101000020e6100000005471e316f3bfbf4ad05fe811c14940'::geography, '0'::double precision, true))"
"              Sort Method: quicksort  Memory: 1013kB"
"              Worker 0: actual time=18615.809..18615.948 rows=577 loops=1"
"              Worker 1: actual time=18904.700..18904.721 rows=576 loops=1"
"              ->  Hash Join  (cost=699247.34..2074281.07 rows=2136535 width=76) (actual time=10705.617..18841.448 rows=5573 loops=3)"
"                    Output: postcodes_transaction.id, postcodes_transaction.transaction_id, postcodes_transaction.price, postcodes_transaction.date_of_transfer, postcodes_transaction.postcode_id, postcodes_transaction.street, _st_distance(postcodes_postcod (...)"
"                    Inner Unique: true"
"                    Hash Cond: (postcodes_transaction.postcode_id = postcodes_postcode.id)"
"                    Worker 0: actual time=10742.668..18608.763 rows=5365 loops=1"
"                    Worker 1: actual time=10749.748..18897.838 rows=5522 loops=1"
"                    ->  Parallel Seq Scan on public.postcodes_transaction  (cost=0.00..603215.80 rows=6409601 width=68) (actual time=0.052..4214.812 rows=5491618 loops=3)"
"                          Output: postcodes_transaction.id, postcodes_transaction.transaction_id, postcodes_transaction.price, postcodes_transaction.date_of_transfer, postcodes_transaction.postcode_id, postcodes_transaction.street"
"                          Filter: ((postcodes_transaction.price >= 50000) AND (postcodes_transaction.date_of_transfer >= '2000-01-01'::date) AND (postcodes_transaction.date_of_transfer <= '2017-10-01'::date))"
"                          Rows Removed by Filter: 2025049"
"                          Worker 0: actual time=0.016..4226.643 rows=5375779 loops=1"
"                          Worker 1: actual time=0.016..4188.138 rows=5439515 loops=1"
"                    ->  Hash  (cost=682252.00..682252.00 rows=836667 width=36) (actual time=10654.921..10654.921 rows=1856 loops=3)"
"                          Output: postcodes_postcode.location, postcodes_postcode.id"
"                          Buckets: 131072  Batches: 16  Memory Usage: 1032kB"
"                          Worker 0: actual time=10692.068..10692.068 rows=1856 loops=1"
"                          Worker 1: actual time=10674.101..10674.101 rows=1856 loops=1"
"                          ->  Seq Scan on public.postcodes_postcode  (cost=0.00..682252.00 rows=836667 width=36) (actual time=5058.685..10651.176 rows=1856 loops=3)"
"                                Output: postcodes_postcode.location, postcodes_postcode.id"
"                                Filter: (_st_distance(postcodes_postcode.location, '0101000020e6100000005471e316f3bfbf4ad05fe811c14940'::geography, '0'::double precision, true) <= '1609.344'::double precision)"
"                                Rows Removed by Filter: 2508144"
"                                Worker 0: actual time=5041.442..10688.265 rows=1856 loops=1"
"                                Worker 1: actual time=5072.242..10670.215 rows=1856 loops=1"
"Planning time: 0.538 ms"
"Execution time: 19065.962 ms"

0 个答案:

没有答案
相关问题