In this post, we look into speeding up Theano on Mac, by enabling use of multiple CPU threads. The basic idea is to enable OpenMP so that Theano can do parallel processing on multiple cores.

For instructions of how to install Theano on Mac, please refer to the official documentation.

When attempting to tune Theano per “Multi cores support in Theano”, we quickly run into an error on Mac.

$ python theano/misc/elemwise_openmp_speedup.py
/Users/z/Library/Python/2.7/lib/python/site-packages/theano/tensor/opt.py:4684: UserWarning: Your g++ compiler fails to compile OpenMP code. We know this happen with some version of the EPD mingw compiler and LLVM compiler on Mac OS X. We disable openmp everywhere in Theano. To remove this warning set the theano flags `openmp` to False.
  no_recycling=[])

This is because Xcode ships clang without OpenMP support. As such, we have to resort to using GCC.

One (painful) option to build/install GCC:

$ brew install gcc
# ... and wait for eternity ...

Fortunately, HPC on Mac OS X offers precompiled binaries of GCC.

$ wget http://downloads.sourceforge.net/project/hpc/hpc/gcc/gcc-5.2-bin.tar.gz
$ tar -xz -C / -f gcc-5.2-bin.tar.gz
# This untars gcc to /usr/local/

Now we have GCC/OpenMP support and the earlier command works:

$ THEANO_FLAGS="cxx=g++" python theano/misc/elemwise_openmp_speedup.py
Fast op time without openmp 0.000201s with openmp 0.000162s speedup 1.24
Slow op time without openmp 0.002742s with openmp 0.000902s speedup 3.04

Happy tuning!