These modules are written in c and are taken from this software developed by the Harvard-Smithsonian center for astrophysics. A public repository containing only the ready-to-drop transforms implementation will hopefully be mantained and documented.
In the module Grana/Source/Smithsonians_Discrete_Hilbert_Fourier_Hartley_transform/am_sysdep.h you can override the default processor data cache size to further improve FFTs and FHTs performances, as written by the authors:
For optimum performance, the default processor data cache size settings here can be overridden by target-specific definitions supplied at compile time.
Line-by-line and CIA computations are blocked to fit in L1 cache. If the cache size is set to 0, cache blocking is turned off.
For FFTs and FHTs, the L1 cache size setting controls the point at which the computation switches over from recursive to iterative.