Question

遇到一个非常令人沮丧的错误，只要访问我的某个api端点，就会弹出这个错误。为了给出上下文，我正在使用 SQLAlchemy 的 Flask 应用程序处理应用程序，该应用程序将数据存储在 PostgreSQL 数据库集中以保存1000个连接。

用户可以通过 / timeseries 端点查询所述数据的方式之一。数据以json的形式返回，json是从查询数据库返回的ResultProxies中汇编而成的。

希望通过使用多线程，我可以使视图控制器为 / timeseries 调用的方法运行得更快，因为我们的原始设置需要很长时间来响应将返回大量卷的查询数据。

由于没有正确清理会话，我已经阅读了许多其他帖子，但我觉得我已经覆盖了相同的问题。我写过的代码有什么明显的错误吗？

该应用程序使用AWS弹性beanstalk进行部署。

@classmethod
def timeseries_all(cls, table_names, agg_unit, start, end, geom=None):
    """
    For each candidate dataset, query the matching timeseries and push datasets with nonempty
    timeseries into a list to convert to JSON and display.

    :param table_names: list of tables to generate timetables for
    :param agg_unit: a unit of time to divide up the data by (day, week, month, year)
    :param start: starting date to limit query
    :param end: ending date to limit query
    :param geom: geometric constraints of the query

    :returns: timeseries list to display
    """

    threads = []
    timeseries_dicts = []

    # set up engine for use with threading
    psql_db = create_engine(DATABASE_CONN, pool_size=10, max_overflow=-1, pool_timeout=100)
    scoped_sessionmaker = scoped_session(sessionmaker(bind=psql_db, autoflush=True, autocommit=True))

    def fetch_timeseries(t_name):
        _session = scoped_sessionmaker()
        # retrieve MetaTable object to call timeseries from
        table = MetaTable.get_by_dataset_name(t_name)
        # retrieve ResultProxy from executing timeseries selection
        rp = _session.execute(table.timeseries(agg_unit, start, end, geom))

        # empty results will just have a header
        if rp.rowcount > 0:

            timeseries = {
                'dataset_name': t_name,
                'items': [],
                'count': 0
            }

            for row in rp.fetchall():
                timeseries['items'].append({'count': row.count, 'datetime': row.time_bucket.date()})
                timeseries['count'] += row.count

            # load to outer storage
            timeseries_dicts.append(timeseries)

        # clean up session
        rp.close()
        scoped_sessionmaker.remove()

    # create a new thread for every table to query
    for name in table_names:
        thread = threading.Thread(target=fetch_timeseries, args=(name, ))
        threads.append(thread)

    # start all threads
    for thread in threads:
        thread.start()

    # wait for all threads to finish
    for thread in threads:
        thread.join()

    # release all connections associated with this engine
    psql_db.dispose()

    return timeseries_dicts

Answer 1

我认为你会以一种迂回的方式解决这个问题。以下是一些关于充分利用postgres连接的建议（我在生产中使用过这种配置）。

我将使用Flask-SQLAlchemy扩展来处理与Postgres实例的连接。如果查看SQLAlchemy文档，您将看到author highly recommends使用它来处理数据库连接生命周期而不是滚动自己的数据库。
处理大量请求的更高效方法是将Flask应用程序放在像gunicorn或uwsgi这样的wsgi服务器后面。这些服务器将能够生成应用程序的多个实例。然后当有人点击你的终端时，你的连接会在这些实例之间进行负载平衡。

因此，例如，如果你有uwsgi设置来运行5个进程，那么你就可以同时处理50个db连接（5个app x 10个池）

TimeoutError：达到大小5溢出10的QueuePool限制，连接超时，超时30

1 个答案: