npstreams: streaming NumPy functions

npstreams is an open-source Python package for streaming NumPy array operations. The goal is to provide tested, (almost) drop-in replacements for NumPy functions (where possible) that operate on streams of arrays instead of dense arrays.

npstreams also provides some utilities for parallelization. These parallelization generators can be combined with the streaming functions to drastically improve performance in some cases.

The code presented herein has been in use at some point by the Siwick research group.

Example

Consider the following snippet to combine 50 images from an iterable source:

import numpy as np

images = np.empty( shape = (2048, 2048, 50) )
for index, im in enumerate(source):
    images[:,:,index] = im

avg = np.average(images, axis = 2)

If the source iterable provided 10000 images, the above routine would not work on most machines. Moreover, what if we want to transform the images one by one before averaging them? What about looking at the average while it is being computed? Let’s look at an example:

import numpy as np
from npstreams import iaverage
from scipy.misc import imread

stream = map(imread, list_of_filenames)
averaged = iaverage(stream)

At this point, the generators map() and iaverage() are ‘wired’ but will not compute anything until it is requested. We can look at the average evolve:

import matplotlib.pyplot as plt
for avg in average:
    plt.imshow(avg); plt.show()

We can also use last() to get at the final average:

from npstreams import last

total = last(averaged) # average of the entire stream. See also npstreams.average

Benchmark

npstreams provides a function for benchmarking common use cases.

To run the benchmark with default parameters, from the interpreter:

from npstreams import benchmark
benchmark()

From a command-line terminal:

python -m npstreams.benchmarks

The results will be printed to the screen.

General Documentation

Authors

  • Laurent P. René de Cotret