When I implemented memory alignment and SSE support in Auryn a few years back I got a blast out of the almost 4x speed increase of all my simulations using vectorized neuron models. With heavy load from synaptic plasticity computation, I still had an about 2-fold speedup. Lured by the promise of another such performance jump, I now finally got around to implement and test AVX support in Auryn. I was surprised that on the standard benchmarks I am running, I did not get any speedup over SSE. I was surprised, but it seems as if others have found that too. Since the state vectors of a neural network simulation are typically too large to reside entirely in L1 cache, the real bottle neck seems to be memory. Euler integration of any group of integrate-and-fire neurons mostly fetches a chunk of vector from memory, then uses multiplication and addition on them (_mm_mul_ps and _mm_add_ps in SSE intrinsics) and then writes back to memory. Since this would be general to most neuro simulators, it unfortunately seems as if only little (and as it turns out
nothing in the case of Auryn) can be gained from AVX.
Somehow not having to implement AVX is a relieve because it sparse me of having to deal with the code compliance issues to still support SSE for machines which cannot deal with AVX. A speed of two, however, would have been nice for Auryn, but it seems clear that that's not what it is. Oh well. There is no such thing as a free lunch
I found this resource interesting:
https://indico.cern.ch/event/327306/con ... undrum.pdf
And here the results from my test:

- avx_test.png (17.09 KiB) Viewed 20281 times