session.rollback删除先前从数据库查询的数据。

时间:2016-03-23 21:59:14

标签: python transactions sqlalchemy rollback

问题

在遇到数据库完整性错误(SQLAlchemy的要求)之后调用SQLAlchemy的session.rollback()会导致所有会话对象被丢弃。这包括由先前选择创建的对象。

查看生成的SQL,我们的查询数据被捕获在一个隐式事务中,直到我们执行session.close之后才完成。

简化的SQL示例

Begin (implicit) transaction
SELECT Data
# Should be end transaction here
# Should Start new Transaction
INSERT New Data
# hit integrity error
rollback

问题部分

try:
    self.session.add(insertable['cluster'])
    self.session.commit()
except IntegrityError:
    print('Prevented from inserting duplicate to cluster table')
    self.session.rollback()
# insertable is now an empty object and can't be used without repopulating

这意味着在最坏的情况下,我们在每个插页上都会出现完整性错误,我们需要为每个插页重新选择行ID。

session.commit选择

def find_entry(self, inpath):
    inpath = inpath.rstrip('/')
    inpath = '/'.join(inpath.split('/')[:-1])
    entry = self.session.query(
        models.Gather).filter_by(path=inpath).first()
    self.session.commit()
    return entry

但是这并没有真正提交会话事务,因此当我们回滚时,我们仍然会丢失数据。

原因和其他影响

出于性能原因,我们有一组代码从数据库中选择数据,从所述DB中的两个表中获取行ID,然后在插入时使用所选行ID作为外键。

这主要是为了让我们不必查询每个插页。

问题在于,如果我们在插入时遇到约束或完整性错误,那么我们必须执行session.rollback。我们发现这个session.rollback正在杀死我们之前的查询,即使它们应该在逻辑上处于不同的交易中。

除了我们选择的数据,如果我们有成功插入的对象,我们想引用他们的ID,这些也会在sessions.rollback之后删除。

完整代码

class DBInserter:

    def __init__(self):
        # connection info here
        Session = sessionmaker(bind=self.engine)
        self.session = Session()

    def __del__(self):
        self.session.close()
        self.engine.dispose()

    def find_entry(self, inpath):
        inpath = inpath.rstrip('/')
        inpath = '/'.join(inpath.split('/')[:-1])
        entry = self.session.query(
            models.Gather).filter_by(path=inpath).first()
        return entry

    def build_insertable(self, jsonin, gather, lnn):
        """
        Build sqlalchemy object that is ready to be inserted into the db
        """
        object_dict = {}

        cluster = models.Cluster(
            guid = jsonin.get('guid'),
            )

        gather_id = None
        if gather:
            gather_id = gather.gather_id
            cluster.gather_id = gather_id
            gather.cluster_guid = jsonin.get('guid')
            gather.cluster_name = jsonin.get('name'),
            object_dict['gather'] = gather

        node_gather = models.NodeGather(
            gather_id = gather_id,
            lnn = lnn,
            checksum = jsonin.get('checksum'),
            checksum_valid = jsonin.get('checksum_valid'),
            compliance = jsonin.get('compliance'),
            encoding = jsonin.get('encoding'),
            joinmode = jsonin.get('joinmode'),
            master = jsonin.get('master'),
            maxid = jsonin.get('maxid'),
            timezone = jsonin.get('timezone'),
            )

        object_dict['cluster'] = cluster
        object_dict['node_gather'] = node_gather
        return object_dict

    def insert(self, insertable):
        """
        insert prepared sqlalchemy object into the db
        """    
        # Not doing batch inserts until we get single case to work properly.
        try:
            self.session.add(insertable['cluster'])
            self.session.commit()
        except IntegrityError:
            print('Prevented from inserting duplicate to cluster table')
            self.session.rollback()

        if insertable.get('gather'):
            try:
                self.session.add(insertable.get('gather'))
                self.session.commit()
            except IntegrityError:
                print('Prevented from inserting duplicate to gather table')
                self.session.rollback()

    def __call__(self, jsonin, path):
        path = path.rstrip('/')
        lnn = int(path.split('/')[-1].split('-')[-1])
        out = self.build_insertable(jsonin, self.find_entry(path), lnn)
        return out

SQL

2016-03-08 01:38:52,438 INFO sqlalchemy.engine.base.Engine SELECT gather.gather_id AS gather_gather_id, gather.cluster_guid AS gather_cluster_guid, gather.path AS gather_path, gather
.cluster_name AS gather_cluster_name, gather.gather_date AS gather_gather_date, gather.unfurl_start AS gather_unfurl_start, gather.unfurl_end AS gather_unfurl_end, gather.upload_date
AS gather_upload_date, gather.source_lnn AS gather_source_lnn, gather.last_full AS gather_last_full, gather.path_exists AS gather_path_exists, gather.type AS gather_type 
FROM gather 
WHERE gather.path = %(path_1)s 
LIMIT %(param_1)s
2016-03-08 01:38:52,438 INFO sqlalchemy.engine.base.Engine {'param_1': 1, 'path_1': '/mnt/logs/REALPAGE/2015-12-14-005'}
2016-03-08 01:38:52,440 INFO sqlalchemy.engine.base.Engine COMMIT
2016-03-08 01:38:52,441 INFO sqlalchemy.engine.base.Engine BEGIN (implicit)
2016-03-08 01:38:52,441 INFO sqlalchemy.engine.base.Engine SELECT gather.gather_id AS gather_gather_id, gather.cluster_guid AS gather_cluster_guid, gather.path AS gather_path, gather
.cluster_name AS gather_cluster_name, gather.gather_date AS gather_gather_date, gather.unfurl_start AS gather_unfurl_start, gather.unfurl_end AS gather_unfurl_end, gather.upload_date
AS gather_upload_date, gather.source_lnn AS gather_source_lnn, gather.last_full AS gather_last_full, gather.path_exists AS gather_path_exists, gather.type AS gather_type 
FROM gather 
WHERE gather.gather_id = %(param_1)s
2016-03-08 01:38:52,441 INFO sqlalchemy.engine.base.Engine {'param_1': 'd284c3f7983f94bac95e024038820f05475feddb2f24aec2cb52d42c343194dd'}
2016-03-08 01:38:52,444 INFO sqlalchemy.engine.base.Engine INSERT INTO cluster (guid, site_id) VALUES (%(guid)s, %(site_id)s)
2016-03-08 01:38:52,444 INFO sqlalchemy.engine.base.Engine {'guid': '00074309e06ace7817523b06c7cbf76f7c08', 'site_id': None}
2016-03-08 01:38:52,445 INFO sqlalchemy.engine.base.Engine ROLLBACK

请注意,select和rollback之间没有提交。

HALP。

0 个答案:

没有答案