(Quick Reference)

3.1.3 Memoize - Reference Documentation

Authors: The Whole GPars Gang

Version: 1.0.0

3.1.3 Memoize

The memoize function enables caching of function's return values. Repeated calls to the memoized function with the same argument values will, instead of invoking the calculation encoded in the original function, retrieve the result value from an internal transparent cache. Provided the calculation is considerably slower than retrieving a cached value from the cache, this allows users to trade-off memory for performance. Checkout out the example, where we attempt to scan multiple websites for particular content:

The memoize functionality of GPars has been contributed to Groovy in version 1.8 and if you run on Groovy 1.8 or later, it is recommended to use the Groovy functionality. Memoize in GPars is almost identical, except that it searches the memoize caches concurrently using the surrounding thread pool and so may give performance benefits in some scenarios.

The GPars memoize functionality has been renamed to avoid future conflicts with the memoize functionality in Groovy. GPars now calls the methods with a preceding letter g , such as gmemoize().

Examples of use

GParsPool.withPool {
    def urls = ['http://www.dzone.com', 'http://www.theserverside.com', 'http://www.infoq.com']
    Closure download = {url ->
        println "Downloading $url"
        url.toURL().text.toUpperCase()
    }
    Closure cachingDownload = download.gmemoize()

println 'Groovy sites today: ' + urls.findAllParallel {url -> cachingDownload(url).contains('GROOVY')} println 'Grails sites today: ' + urls.findAllParallel {url -> cachingDownload(url).contains('GRAILS')} println 'Griffon sites today: ' + urls.findAllParallel {url -> cachingDownload(url).contains('GRIFFON')} println 'Gradle sites today: ' + urls.findAllParallel {url -> cachingDownload(url).contains('GRADLE')} println 'Concurrency sites today: ' + urls.findAllParallel {url -> cachingDownload(url).contains('CONCURRENCY')} println 'GPars sites today: ' + urls.findAllParallel {url -> cachingDownload(url).contains('GPARS')} }

Notice closures are enhanced inside the GParsPool.withPool() blocks with a memoize() function, which returns a new closure wrapping the original closure with a cache. In the example we're calling the cachingDownload function in several places in the code, however, each unique url gets downloaded only once - the first time it is needed. The values are then cached and available for subsequent calls. And also to all threads, no matter which thread originally came first with a download request for the particular url and had to handle the actual calculation/download.

So, to wrap up, memoize shields a function by a cache of past return values. However, memoize can do even more. In some algorithms adding a little memory may have dramatic impact on the computational complexity of the calculation. Let's look at a classical example of Fibonacci numbers.

Fibonacci example

A purely functional, recursive implementation, following closely the definition of Fibonacci numbers is exponentially complex:

Closure fib = {n -> n > 1 ? call(n - 1) + call(n - 2) : n}

Try calling the fib function with numbers around 30 and you'll see how slow it is.

Now with a little twist and added memoize cache the algorithm magically turns into a linearly complex one:

Closure fib
fib = {n -> n > 1 ? fib(n - 1) + fib(n - 2) : n}.gmemoize()

The extra memory we added cut off all but one recursive branches of the calculation. And all subsequent calls to the same fib function will also benefit from the cached values.

Also, see below, how the memoizeAtMost variant can reduce memory consumption in our example, yet preserve the linear complexity of the algorithm.

Available variants

memoize

The basic variant, which keeps values in the internal cache for the whole lifetime of the memoized function. Provides the best performance characteristics of all the variants.

memoizeAtMost

Allows the user to set a hard limit on number of items cached. Once the limit has been reached, all subsequently added values will eliminate the oldest value from the cache using the LRU (Last Recently Used) strategy.

So for our Fibonacci number example, we could safely reduce the cache size to two items:

Closure fib
fib = {n -> n > 1 ? fib(n - 1) + fib(n - 2) : n}.memoizeAtMost(2)

Setting an upper limit on the cache size may have two purposes:

  1. Keep the memory footprint of the cache within defined boundaries
  2. Preserve desired performance characteristics of the function. Too large caches may take longer to retrieve the cached value than it would have taken to calculate the result directly.

memoizeAtLeast

Allows unlimited growth of the internal cache until the JVM's garbage collector decides to step in and evict SoftReferences, used by our implementation, from the memory. The single parameter value to the memoizeAtLeast() method specifies the minimum number of cached items that should be protected from gc eviction. The cache will never shrink below the specified number of entries. The cache ensures it only protects the most recently used items from eviction using the LRU (Last Recently Used) strategy.

memoizeBetween

Combines memoizeAtLeast and memoizeAtMost and so allowing the cache to grow and shrink in the range between the two parameter values depending on available memory and the gc activity, yet the cache size will never exceed the upper size limit to preserve desired performance characteristics of the cache.