SQLAlchemy导致内存泄漏

时间:2020-10-23 17:19:17

标签: python flask memory-leaks sqlalchemy

应用启动内存

Partition of a set of 249162 objects. Total size = 28889880 bytes.
 Index  Count   %     Size   % Cumulative  % Referrers by Kind (class / dict of class)
     0  77463  31  5917583  20   5917583  20 types.CodeType
     1  30042  12  3774404  13   9691987  34 function
     2  51799  21  3070789  11  12762776  44 tuple
     3  15106   6  2061017   7  14823793  51 dict of type
     4   5040   2  1928939   7  16752732  58 function, tuple
     5   6627   3  1459448   5  18212180  63 type
     6   5227   2  1346136   5  19558316  68 dict of module
     7  16466   7  1026538   4  20584854  71 dict (no owner)
     8    734   0   685897   2  21270751  74 dict of module, tuple
     9    420   0   626760   2  21897511  76 function, module

100次后续调用(与SQLAlchemy交互)后的应用内存

Partition of a set of 628910 objects. Total size = 107982928 bytes.
 Index  Count   %     Size   % Cumulative  % Referrers by Kind (class / dict of class)
     0  23373   4 27673632  26  27673632  26 sqlalchemy.sql.schema.Column
     1 141175  22 20904408  19  48578040  45 dict of sqlalchemy.sql.schema.Column
     2  78401  12  5984371   6  54562411  51 types.CodeType
     3  34133   5  4239726   4  58802137  54 function
     4  64371  10  3661978   3  62464115  58 tuple
     5  20034   3  2971710   3  65435825  61 dict of sqlalchemy.sql.schema.Table
     6  13356   2  2297232   2  67733057  63 sqlalchemy.sql.base.ColumnCollection
     7  15924   3  2133374   2  69866431  65 dict of type
     8   5095   1  1946855   2  71813286  67 function, tuple
     9   8714   1  1793696   2  73606982  68 type

帮助功能,可通过rcs检测内存使用情况

def heap_results():
    from guppy import hpy
    hp = hpy()
    h = hp.heap()
    return Response(response=str(h.bytype),
                    status=200,
                    mimetype='application/json')

SQLAlchemy的实现非常简单。使用db.Model,我们将为ORM创建一个类,并将表拆分为ORM类的子函数。

我们正gc.collect()正在将最终答复返回给用户。我们还使用db.session.flush()db.session.expunge_all()db.session.close()

我们尝试删除db.session.*命令和gc.collect()。没什么改变。

这是我们的应用程序内存使用情况的时序图,正在重新启动应用程序时,您会看到内存上限重置为稳定状态: enter image description here

用于模拟HAProxy的代码。

def reconnect():
    hostnames = [Settings.SECRETS.get('PATRONI_HOST_C', ''), Settings.SECRETS.get('PATRONI_HOST_E', '')]

    try:
        master_node = HAProxy(hostnames=hostnames)

    except (ValueError, TypeError, BaseException) as e:
        # send an alert here though, use the informant!
        raise e

    else:
        if master_node in ['None', None]:
            raise ValueError("Failed to determined which server is acting as the master node")

    my_app.config['SQLALCHEMY_DATABASE_URI'] = "postgresql://{}:{}@{}/{}".format(Settings.SECRETS['PATRONI_USER'],
                                                                                 Settings.SECRETS['PATRONI_PASSWORD'],
                                                                                 master_node,
                                                                                 Settings.SECRETS['PATRONI_DB'])

    my_app.config['SQLALCHEMY_ENGINE_OPTIONS'] = {
    'pool_recycle': 1800
    }

    new_db = SQLAlchemy(my_app)
    new_db_orm = DBORM(new_db)
    return new_db_orm

DBORM(经过修改以隐藏全部功能)是什么样的:

class DBORM(object):
    def __init__(self, database):
        self.database = database
        self.owner_model = self.owner_model()

    def create_owner_model(self):
        db = self.database

        class OwnerModel(db.Model):
            __tablename__ = "owners"
            owner_id = db.Column(UUID(as_uuid=True), unique=True,
                                      nullable=False, primary_key=True)
  
            client_owner = db.Column(db.String(255), unique=False, nullable=False)
            admin_owner = db.Column(db.String(255), unique=False, nullable=False)

            @staticmethod
            def owner_validation(owner_model, owner=None):
                if owner is not None:
                    owner_model = OwnerModel.get_owner_by_id(owner_id=owner_id,
                                                                  return_as_model=True)
                    if owner_model is not None:
                        client_owner = owner_model.client_owner
                        admin_owner = owner_model.admin_owner

                        if client_owner is None and admin_owner is None:
                            return False

                        elif client_owner.lower() == owner.lower():
                            return True

                        elif admin_owner.lower() == owner.lower():
                            return True

                        else:
                            return False

                    else:
                        return None

                else:
                    return None

通过API使用OwnerModel的示例

@api.route('/owners/{owner_id}')
def my_function(owner_id):
   try:
      dborm = reconnect()

   except (AttributeError, KeyError, ValueError, BaseException) as e:
      logger.error(f'Unable to get an owner model.')
      logger.info(f'Garbage collector, collected: {gc.collect()}')
      return Response(response=Exception.database_reconnect_failure(),
                      status=503,
                      mimetype='application/json')

   else:
      response = dborm.get_owner_by_id(owner_id=owner_id)

   logger.info(f'Garbage collector, collected: {gc.collect()}')
   return Response(response=json.dumps(response),
                   status=200,
                   mimetype='application/json')

1 个答案:

答案 0 :(得分:0)

SQLAlchemy MetaData 保存对 Table 对象的引用,Declarative 基类还具有用于查找的内部注册表,例如用作 relationship() 延迟评估参数中的上下文。当您重复创建模型类的新版本时,这也会创建所需的元数据,例如 Table,如果保留引用,您可能会消耗越来越多的内存。在我看来,Column 对象支配着您的内存使用量这一事实支持这一点。

您的目标应该是在应用程序的生命周期内只创建一次模型及其元数据。您只需要能够动态更改连接参数。 1.3 以下的 SQLAlchemy 版本为此提供了 creatorEngine 参数,1.4 版引入了 DialectEvents.do_connect() 事件挂钩 for even finer control

使用 creator

import psycopg2

db = SQLAlchemy()
dborm = DBORM(db)


def initialize(app):
    """
    Setup `db` configuration and initialize the application. Call this once and
    once only, before your application starts using the database.

    The `creator` creates a new connection every time the connection pool
    requires one, due to all connections being in use, or old ones having been
    recycled, etc.
    """
    # Placeholder that lets the Engine know which dialect it will be speaking
    app.config['SQLALCHEMY_DATABASE_URI'] = "postgresql+psycopg2://"
    app.config['SQLALCHEMY_ENGINE_OPTIONS'] = {
        'pool_recycle': 1800,
        'creator': lambda: psycopg2.connect(
            dbname=Settings.SECRETS['PATRONI_DB'],
            user=Settings.SECRETS['PATRONI_USER'],
            password=Settings.SECRETS['PATRONI_PASSWORD'],
            host=HAProxy([
                Settings.SECRETS.get('PATRONI_HOST_C', ''),
                Settings.SECRETS.get('PATRONI_HOST_E', ''),
            ]))
    }

    db.init_app(app)


class OwnerModel(db.Model):
    __tablename__ = "owners"
    ...

请注意,您需要更改 DBORM 以使用全局 db 和模型类,并且您的控制器不再调用 reconnect()(不存在),而只需使用dbdborm 以及类。

相关问题