npstreams: streaming NumPy functions¶
npstreams
is an open-source Python package for streaming NumPy array operations.
The goal is to provide tested, (almost) drop-in replacements for NumPy functions (where possible)
that operate on streams of arrays instead of dense arrays.
npstreams
also provides some utilities for parallelization. These parallelization
generators can be combined with the streaming functions to drastically improve performance
in some cases.
The code presented herein has been in use at some point by the Siwick research group.
Example¶
Consider the following snippet to combine 50 images
from an iterable source
:
import numpy as np
images = np.empty( shape = (2048, 2048, 50) )
for index, im in enumerate(source):
images[:,:,index] = im
avg = np.average(images, axis = 2)
If the source
iterable provided 10000 images, the above routine would
not work on most machines. Moreover, what if we want to transform the images
one by one before averaging them? What about looking at the average while it
is being computed? Let’s look at an example:
import numpy as np
from npstreams import iaverage
from scipy.misc import imread
stream = map(imread, list_of_filenames)
averaged = iaverage(stream)
At this point, the generators map()
and iaverage()
are ‘wired’
but will not compute anything until it is requested. We can look at the average evolve:
import matplotlib.pyplot as plt
for avg in average:
plt.imshow(avg); plt.show()
We can also use last()
to get at the final average:
from npstreams import last
total = last(averaged) # average of the entire stream. See also npstreams.average
Benchmark¶
npstreams provides a function for benchmarking common use cases.
To run the benchmark with default parameters, from the interpreter:
from npstreams import benchmark
benchmark()
From a command-line terminal:
python -m npstreams.benchmarks
The results will be printed to the screen.