Question

应用引擎数据存储区当然有downtime。但是，我希望有一个“故障安全”put，面对数据存储错误会更加强大（参见下面的动机）。当数据存储不可用时，似乎任务队列是推迟写入的明显位置。我不知道任何其他解决方案（除了通过urlfetch将数据发送给第三方）。

动机：我有一个真正需要放入数据存储区的实体 - 只是向用户显示错误消息就不会这样做。例如，可能出现了一些副作用，这些副作用无法轻易撤消（也许是与第三方网站的某些互动）。

我想出了一个简单的包装器（我认为）提供了一个合理的“故障安全”装置（见下文）。您是否看到任何问题，或者想要更强大的实施？（注意：感谢Nick Johnson和Saxon Druce在答案中发布的建议，这篇文章已经过编辑，对代码进行了一些改进。）

import logging
from google.appengine.api.labs.taskqueue import taskqueue
from google.appengine.datastore import entity_pb
from google.appengine.ext import db
from google.appengine.runtime.apiproxy_errors import CapabilityDisabledError

def put_failsafe(e, db_put_deadline=20, retry_countdown=60, queue_name='default'):
    """Tries to e.put().  On success, 1 is returned.  If this raises a db.Error
    or CapabilityDisabledError, then a task will be enqueued to try to put the
    entity (the task will execute after retry_countdown seconds) and 2 will be
    returned.  If the task cannot be enqueued, then 0 will be returned.  Thus a
    falsey value is only returned on complete failure.

    Note that since the taskqueue payloads are limited to 10kB, if the protobuf
    representing e is larger than 10kB then the put will be unable to be
    deferred to the taskqueue.

    If a put is deferred to the taskqueue, then it won't necessarily be
    completed as soon as the datastore is back up.  Thus it is possible that
    e.put() will occur *after* other, later puts when 1 is returned.

    Ensure e's model is imported in the code which defines the task which tries
    to re-put e (so that e can be deserialized).
    """
    try:
        e.put(rpc=db.create_rpc(deadline=db_put_deadline))
        return 1
    except (db.Error, CapabilityDisabledError), ex1:
        try:
            taskqueue.add(queue_name=queue_name,
                          countdown=retry_countdown,
                          url='/task/retry_put',
                          payload=db.model_to_protobuf(e).Encode())
            logging.info('failed to put to db now, but deferred put to the taskqueue e=%s ex=%s' % (e, ex1))
            return 2
        except (taskqueue.Error, CapabilityDisabledError), ex2:
            return 0

任务的请求处理程序：

from google.appengine.ext import db, webapp

# IMPORTANT: This task deserializes entity protobufs.  To ensure that this is
#            successful, you must import any db.Model that may need to be
#            deserialized here (otherwise this task may raise a KindError).

class RetryPut(webapp.RequestHandler):
    def post(self):
        e = db.model_from_protobuf(entity_pb.EntityProto(self.request.body))
        e.put() # failure will raise an exception => the task to be retried

每次使用它都很有诱惑力，但我认为有时候如果我告诉他们他们的更改将在稍后出现（并继续向他们显示旧数据，直到数据存储区重新启动），对用户来说可能会更加混乱和延迟的put执行）。

Answer 1

您的方法是合理的，但有几点需要注意：

默认情况下，put操作将重试，直到时间不足为止。由于您有备份策略，您可能希望尽快放弃 - 在这种情况下，您应该为put方法调用提供rpc参数，指定自定义截止日期。
无需设置明确的倒计时 - 任务队列将以不断增加的间隔为您重试失败的操作。
您不需要使用pickle - Protocol Buffers具有自然的字符串编码，效率更高。有关如何使用它的演示，请参见this post。
正如Saxon所指出的，任务队列有效负载限制为10千字节，因此您可能遇到大型实体的问题。
最重要的是，这会将数据存储区一致性模型从“强烈一致”更改为“最终一致”。也就是说，您排队到任务队列的put可以在将来的任何时间应用，覆盖在过渡期间所做的任何更改。任何数量的竞争条件都是可能的，如果任务队列中存在待处理的暂停状态，则实质上会使事务无效。

Answer 2

一个潜在的问题是tasks are limited to 10kb of data，所以如果您的实体大于腌制后的实体，这将无效。

应用引擎上的故障安全数据存储更新

2 个答案: