OFFSET与ROW_NUMBER()

时间:2010-06-26 21:34:53

标签: postgresql

正如我们所知,Postgresql的OFFSET要求它扫描所有行,直到达到你请求的位置为止,这使得通过巨大的结果集分页变得无用,随着OFFSET上升而变得越来越慢

PG 8.4现在支持窗口功能。而不是:

SELECT * FROM table ORDER BY somecol LIMIT 10 OFFSET 500

你可以说:

SELECT * FROM (SELECT *, ROW_NUMBER() OVER (ORDER BY somecol ASC) AS rownum FROM table) AS foo
WHERE rownum > 500 AND rownum <= 510

后一种方法对我们有帮助吗?或者我们是否必须继续使用识别列和临时表来进行大分页?

2 个答案:

答案 0 :(得分:24)

我构建了一个比较OFFSET,游标和ROW_NUMBER()的测试。我对ROW_NUMBER()的印象是,无论你在结果集中的哪个位置,它的速度都是一致的,这是正确的。然而,这个速度比OFFSET或CURSOR要慢得多,正如我的印象一样,速度几乎相同,速度都会降低,直到你走的结果越远。

结果:

offset(100,100): 0.016359
scroll(100,100): 0.018393
rownum(100,100): 15.535614

offset(100,480000): 1.761800
scroll(100,480000): 1.781913
rownum(100,480000): 15.158601

offset(100,999900): 3.670898
scroll(100,999900): 3.664517
rownum(100,999900): 14.581068

测试脚本使用sqlalchemy设置表和1000000行测试数据。然后,它使用psycopg2游标执行每个SELECT语句,并使用三种不同的方法获取结果。

from sqlalchemy import *

metadata = MetaData()
engine = create_engine('postgresql://scott:tiger@localhost/test', echo=True)

t1 = Table('t1', metadata,
    Column('id', Integer, primary_key=True),
    Column('d1', String(50)),
    Column('d2', String(50)),
    Column('d3', String(50)),
    Column('d4', String(50)),
    Column('d5', String(50))
)

if not engine.has_table('t1'):
    conn = engine.connect()
    t1.create(conn)

    # 1000000 rows
    for i in range(100):
        conn.execute(t1.insert(), [
            dict(
                ('d%d' % col, "data data data %d %d" % (col, (i * 10000) + j))
                for col in range(1, 6)
            ) for j in xrange(1, 10001)
        ])

import time

def timeit(fn, count, *args):
    now = time.time()
    for i in xrange(count):
        fn(*args)
    total = time.time() - now
    print "%s(%s): %f" % (fn.__name__, ",".join(repr(x) for x in args), total)

# this is a raw psycopg2 connection.
conn = engine.raw_connection()

def offset(limit, offset):
    cursor = conn.cursor()
    cursor.execute("select * from t1 order by id limit %d offset %d" % (limit, offset))
    cursor.fetchall()
    cursor.close()

def rownum(limit, offset):
    cursor = conn.cursor()
    cursor.execute("select * from (select *, "
                    "row_number() over (order by id asc) as rownum from t1) as foo "
                    "where rownum>=%d and rownum<%d" % (offset, limit + offset))
    cursor.fetchall()
    cursor.close()

def scroll(limit, offset):
    cursor = conn.cursor('foo')
    cursor.execute("select * from t1 order by id")
    cursor.scroll(offset)
    cursor.fetchmany(limit)
    cursor.close()

print 

timeit(offset, 10, 100, 100)
timeit(scroll, 10, 100, 100)
timeit(rownum, 10, 100, 100)

print 

timeit(offset, 10, 100, 480000)
timeit(scroll, 10, 100, 480000)
timeit(rownum, 10, 100, 480000)

print 

timeit(offset, 10, 100, 999900)
timeit(scroll, 10, 100, 999900)
timeit(rownum, 10, 100, 999900)

答案 1 :(得分:4)

对大型结果集使用CURSOR,速度会快得多。对于小的结果集,LIMIT OFFSET结构工作正常,但它有它的限制。

ROW_NUMBER是一件好事,但不是分页。由于顺序扫描,最终会导致性能下降。