具有生存时间的Python内存缓存

时间:2015-08-02 11:12:59

标签: python python-2.7 caching

我有多个线程运行相同的进程,需要能够相互通知在接下来的n秒内不应该处理某些事情,如果他们这样做的话,那就不是世界末日。

我的目标是能够将字符串和TTL传递给缓存,并能够将缓存中的所有字符串作为列表获取。缓存可以存储在内存中,TTL不会超过20秒。

有没有人对如何实现这一点有任何建议?

9 个答案:

答案 0 :(得分:18)

您可以使用expiringdict模块:

  

库的核心是ExpiringDict类,它是一个有序字典,具有用于缓存目的的自动过期值。

在描述中他们不讨论多线程,所以为了不弄乱,请使用Lock

答案 1 :(得分:8)

OP使用的是python 2.7,但是如果您使用的是python 3,那么接受的答案中提到的ExpiringDict目前已经过期。对github repo的最后一次提交是2017年6月17日,doesn't work with Python 3.5

有一个未解决的问题

还有一个最近维护的项目cachetools(最新提交,2018年6月14日)

pip install cachetools

from cachetools import TTLCache

cache = TTLCache(maxsize=10, ttl=360)
cache['apple'] = 'top dog'
...
>>> cache['apple']
'top dog'
... after 360 seconds...
>>> cache['apple']
KeyError exception thrown

ttl是生存时间,以秒为单位。

答案 2 :(得分:4)

对于即将到期的内存高速缓存,对于一般用途,通常这样做的常见设计模式不是通过字典,而是通过函数或方法装饰器。缓存字典在后台进行管理。因此,此答案在某种程度上补充了answer by User,后者使用字典而不是装饰器。

ttl_cache中的cachetools==3.1.0装饰器的工作原理与functools.lru_cache类似,但使用的是time to live

import cachetools.func

@cachetools.func.ttl_cache(maxsize=128, ttl=10 * 60)
def example_function(key):
    return get_expensively_computed_value(key)


class ExampleClass:
    EXP = 2

    @classmethod
    @cachetools.func.ttl_cache()
    def example_classmethod(cls, i):
        return i * cls.EXP

    @staticmethod
    @cachetools.func.ttl_cache()
    def example_staticmethod(i):
        return i * 3

答案 3 :(得分:3)

我绝对喜欢@iutinvg的想法,我只是想进一步介绍一下。使它不必知道通过ttl并使其成为装饰器,从而使您不必考虑它。如果您有djangopy3,并且不想通过pip安装任何依赖项,请尝试一下。

import time
from django.utils.functional import lazy
from functools import lru_cache, partial, update_wrapper


def lru_cache_time(seconds, maxsize=None):
    """
    Adds time aware caching to lru_cache
    """
    def wrapper(func):
        # Lazy function that makes sure the lru_cache() invalidate after X secs
        ttl_hash = lazy(lambda: round(time.time() / seconds), int)()

        @lru_cache(maxsize)
        def time_aware(__ttl, *args, **kwargs):
            """
            Main wrapper, note that the first argument ttl is not passed down. 
            This is because no function should bother to know this that 
            this is here.
            """
            def wrapping(*args, **kwargs):
                return func(*args, **kwargs)
            return wrapping(*args, **kwargs)
        return update_wrapper(partial(time_aware, ttl_hash), func)
    return wrapper


@lru_cache_time(seconds=10)
def meaning_of_life():
    """
    This message should show up if you call help().
    """
    print('this better only show up once!')
    return 42


@lru_cache_time(seconds=10)
def mutiply(a, b):
    """
    This message should show up if you call help().
    """
    print('this better only show up once!')
    return a * b

# This is a test, prints a `.` for every second, there should be 10s 
# beween each "this better only show up once!" *2 because of the two functions.
for _ in range(20):
    meaning_of_life()
    mutiply(50, 99991)
    print('.')
    time.sleep(1)

答案 4 :(得分:2)

那样的东西?

from time import time, sleep
import itertools
from threading import Thread, RLock
import signal


class CacheEntry():
  def __init__(self, string, ttl=20):
    self.string = string
    self.expires_at = time() + ttl
    self._expired = False

  def expired(self):
    if self._expired is False:
      return (self.expires_at < time())
    else:
      return self._expired

class CacheList():
  def __init__(self):
    self.entries = []
    self.lock = RLock()

  def add_entry(self, string, ttl=20):
    with self.lock:
        self.entries.append(CacheEntry(string, ttl))

  def read_entries(self):
    with self.lock:
        self.entries = list(itertools.dropwhile(lambda x:x.expired(), self.entries))
        return self.entries

def read_entries(name, slp, cachelist):
  while True:
    print "{}: {}".format(name, ",".join(map(lambda x:x.string, cachelist.read_entries())))
    sleep(slp)

def add_entries(name, ttl, cachelist):
  s = 'A'
  while True:
    cachelist.add_entry(s, ttl)
    print("Added ({}): {}".format(name, s))
    sleep(1)
    s += 'A'



if __name__ == "__main__":
  signal.signal(signal.SIGINT, signal.SIG_DFL)

  cl = CacheList()
  print_threads = []
  print_threads.append(Thread(None, read_entries, args=('t1', 1, cl)))
  # print_threads.append(Thread(None, read_entries, args=('t2', 2, cl)))
  # print_threads.append(Thread(None, read_entries, args=('t3', 3, cl)))

  adder_thread = Thread(None, add_entries, args=('a1', 2, cl))
  adder_thread.start()

  for t in print_threads:
    t.start()

  for t in print_threads:
    t.join()

  adder_thread.join()

答案 5 :(得分:2)

如果您不想使用任何第3个库,则可以在昂贵的函数中再添加一个参数:ttl_hash=None。此新参数称为“时间敏感哈希”,其唯一目的是影响lru_cache

例如:

from functools import lru_cache
import time


@lru_cache()
def my_expensive_function(a, b, ttl_hash=None):
    return a + b  # horrible CPU load...

def get_ttl_hash(seconds=3600):
    """Return the same value withing `seconds` time period"""
    return round(time.time() / seconds)

# somewhere in your code...
res = my_expensive_function(2, 2, ttl_hash=get_ttl_hash())
# cache will be updated once in an hour

答案 6 :(得分:2)

我知道这有点老了,但是对于那些对没有第三方依赖性感兴趣的人来说,这只是内置functools.lru_cache的一个小包装(我在写完此书后注意到Javier's similar answer想我仍然发布了它,因为这不需要Django):

import functools
import time


def time_cache(max_age, maxsize=128, typed=False):
    """Least-recently-used cache decorator with time-based cache invalidation.

    Args:
        max_age: Time to live for cached results (in seconds).
        maxsize: Maximum cache size (see `functools.lru_cache`).
        typed: Cache on distinct input types (see `functools.lru_cache`).
    """
    def _decorator(fn):
        @functools.lru_cache(maxsize=maxsize, typed=typed)
        def _new(*args, __time_salt, **kwargs):
            return fn(*args, **kwargs)

        @functools.wraps(fn)
        def _wrapped(*args, **kwargs):
            return _new(*args, **kwargs, __time_salt=int(time.time() / max_age))

        return _wrapped

    return _decorator

及其用法:

@time_cache(10)
def expensive(a: int):
    """An expensive function."""
    time.sleep(1 + a)


print("Starting...")
expensive(1)
print("Again...")
expensive(1)
print("Done")

NB这使用time.time并附带所有警告。如果可用/适当,您可能要使用time.monotonic

答案 7 :(得分:1)

如果您想避免使用第三方包,您可以添加一个自定义的 timed_lru_cache decorator,它构建在 lru_cache 装饰器之上。

下面的默认生命周期为 20 秒,最大大小为 128。请注意,整个缓存将在 20 秒后过期,而不是单个项目。

from datetime import datetime, timedelta
from functools import lru_cache, wraps


def timed_lru_cache(seconds: int = 20, maxsize: int = 128):
    def wrapper_cache(func):
        func = lru_cache(maxsize=maxsize)(func)
        func.lifetime = timedelta(seconds=seconds)
        func.expiration = datetime.utcnow() + func.lifetime

        @wraps(func)
        def wrapped_func(*args, **kwargs):
            if datetime.utcnow() >= func.expiration:
                func.cache_clear()
                func.expiration = datetime.utcnow() + func.lifetime

            return func(*args, **kwargs)

        return wrapped_func

    return wrapper_cache

然后,只需在函数上方添加 @timed_lru_cache 即可:

@timed_lru_cache
def my_function():
  # code goes here...

答案 8 :(得分:0)

您还可以使用dictttl,它具有MutableMapping,OrderedDict和defaultDict(list)

初始化普通字典,每个键的ttl为30秒

data = {'a': 1, 'b': 2}
dict_ttl = DictTTL(30, data)

OrderedDict

data = {'a': 1, 'b': 2}
dict_ttl = OrderedDictTTL(30, data)

defaultDict(列表)

dict_ttl = DefaultDictTTL(30)
data = {'a': [10, 20], 'b': [1, 2]}
[dict_ttl.append_values(k, v) for k, v in data.items()]