Jul 26, 2012 · There is a second, possibly surprising improvement, provided by set_openblas_num_threads(int). When I use “export OPENBLAS_NUM_THREADS=x”, implicit parallelization employs 4 threads, for all x >= 4, on my 4 dual-core system. However with set_openblas_num_threads, I can utilize all 8.

Karen everett nationalityNumpy is a great software package for doing numeric computations in Python. If set up right (i.e. using OpenBLAS), I've found it can be even faster than MATLAB's highly optimized ATLAS backend. Unfortunately, if installed naïvely using pip, it can be very slow.

The Python implementations of matrixstatistics and matrix_multiply use NumPy v1.14.0 and OpenBLAS v0.2.20 functions; the rest are pure Python implementations. Raw benchmark numbers in CSV format are available here and the benchmark source code for each language can be found in the perf. files listed here .