Skip to content

Stores the results of expensive function calls and returns the cached result when the same inputs occur again

License

Notifications You must be signed in to change notification settings

rtmigo/filememo_py

Repository files navigation

File-based memoization decorator. Caches the results of expensive function calls. Retains the cached results between program restarts.

CI tests are done in Python 3.8, 3.9 and 3.10 on macOS, Ubuntu and Windows.


The function can be expensive because it is slow, or uses a lot of system resources, or literally makes a request to a paid API.

The memoize decorator returns the cached result when the same function called with the same arguments. Thus, the function is expensive only once and inexpensive thereafter.

For example, the simplest cache for downloaded data can be set like this:

@memoize
def downloaded(url):
    return requests.get(url)
    
downloaded("http://example.net/aaa")  # downloads data
downloaded("http://example.net/bbb")  # downloads data
downloaded("http://example.net/aaa")  # gets data from cache   

Data is saved to the file system using pickledir. Even after the program restart, the cached results will be in place.

# gets data from cache after restart
downloaded("http://example.net/aaa")     

Install

$ pip3 install filememo

Use

from filememo import memoize

@memoize
def long_running_function(a, b, c):
    return compute()

# the following line actually computes the value only
# when the program runs for the first time. On subsequent 
# runs, the value is read from the file
x = long_running_function(1, 2, 3)

Function arguments

The results depend on both the function and its arguments. All results are cached separately.

@memoize
def that_function(a, b, c):
    return compute(a, b, c)

@memoize
def other_function(a, b):
    return compute(a, b)

# the following calls will cache three different values 
y1 = that_function(1, 2, 3)  
y2 = that_function(30, 20, 40)
y3 = other_function(1, 2)

# the way the arguments are set is also important, as is their order. 
# Therefore, the following calls are cached as three different ones
y4 = other_function(1, b=2)
y5 = other_function(a=1, b=2)
y6 = other_function(b=2, a=1)

Cache directory

If dir_path is not specified, the cached data is stored in the directory returned by the gettempdir . However, there is a high probability that the cache stored there will not survive a reboot. And even a certain probability that the system does not have a temporary directory, so the current directory will be considered temporary.

To better control the situation, you can set a specific directory for storing caches.

@memoize(dir_path='/var/tmp/myfuncs')
def function(a, b):
    return a+b
    
# it's ok if different functions share the same directory    
@memoize(dir_path='/var/tmp/myfuncs')
def other_func():
    return compute()

Expiration date

The max_age argument sets two conditions at once:

  • if the result is not yet in the cache (and we will add it now), then it will live in the cache no longer than max_age. After that it will be automatically deleted
  • if the result is already in the cache, then we only use it if its age is less than max_age. Otherwise, the function will be run again, and the result will be replaced with a new one
@memoize(max_age = datetime.timedelta(minutes=5))
def function(a, b):
    return compute()

Data version

When you specify version, all results with different versions are considered outdated.

Say you have the following function:

@memoize(version=1)
def function(a, b):
    return a + b

You changed your mind, and now the function should return the product of numbers instead of the sum. But the cache already contains the previous results with the sums. In this case, you can just change version. Previous results will not be returned.

@memoize(version=2)
def function(a, b):
    return a * b

Note that all other than the current version are deprecated, regardless of whether their value is greater or less. If you used version=10, and then started using version=9, then 9 is considered current, and 10 is obsolete.

Exceptions

If the decorated function throws an exception, the error is considered permanent. The exception is stored in the cache and will be raised every time.

from filememo import memoize, FunctionException

@memoize
def divide(a, b):
    return a / b

try:
    # tryng to run the function for the first time
    divide(1, 0)
except FunctionException as e:
    print(f"Error: {e.inner}")      

try:
    # not actually running again, getting error from cache
    divide(1, 0)
except FunctionException as e:
    print(f"Cached error: {e.inner}")      

The exceptions_max_age = None argument will prevent exceptions from being cached. Each error will be considered a one-time error.

@memoize(exceptions_max_age = None)
def download(url):
    return http_get(url)
    
while True:
    try:
        download('http://sample.net/path')
        break
    except FunctionException:
        time.sleep(1)
        # will retry        

You can also set the expiration time for cached exceptions. It may differ from the caching time of the data itself.

# keep downloaded data for a day, remember connection errors for 5 minutes

@memoize(max_age = datetime.timedelta(days: 1)
         exceptions_max_age = datetime.timedelta(minutes: 5))
def download(url):
    return http_get(url)

In-memory caching

Each call to a function decorated with @memoize results in I/O operations. If your absolute priority is performance, then even reading from the disk cache can be considered expensive. Although filememo does not attempt to cache the read data in memory, this functionality is easy to achieve by combining decorators.

from functools import lru_cache
from filememo import memoize

@lru_cache
@memoize
def too_expensive():
    return compute()

In this example, the filememo disk cache will be used to store the results between program runs, while the functools RAM cache will store the results between function calls.

If the data is already in disk cache, and the program is just started, then calling too_expensive() for the first time will read the result from disk. Further calls to too_expensive() will return the result from memory.