Sorry, I thought I'd responded to this already.
How about a pointer to an all real implementation that does in place
computation. The intermediate results would require storage if the
temporary imaginary parts for correct results, wouldn't they?
No, this is not correct.
First, I never called it an "all real" FFT. An FFT specialized for
real inputs still has complex outputs, it is just that half of these
complex outputs are redundant because they are complex conjugates.
Therefore, you don't need to store them. Similar redundancies apply
to all of the intermediate results, which is why you can perform the
whole FFT of real data in-place with only O(1) storage beyond that
required for the inputs (overwritten by the outputs).
In particular, there are three common strategies. First, one well-
known trick is that an FFT of N real inputs, for even N, can be
expressed as a complex-input FFT of length N/2, plus O(N) operations
on the output (see e.g. Numerical Recipes). This can be done in-
place, and has the advantage of simplicity if you already have a
complex-input FFT, but is slightly suboptimal from an arithmetic
perspective. Second, one can re-express the FFT in terms of the
discrete Hartley transform, but this also turns out to be sub-optimal
from an arithmetic perspective. Third, the minimal arithmetic counts
are achieved by another strategy: directly start with an ordinary FFT
algorithm of length N, but then prune all of the redundant
computations from the algorithm. This can also be done in-place; a
well known description of this method was published by Sorensen in
1987.
There are many, many freely available FFT implementations that
implement one (or more than one) of these strategies. The first
strategy is found all over the place, from textbook code like
Numerical Recipes' to more optimized programs like the one by Ooura
(google it). The second strategy, based on the DHT, can also be
easily found (e.g. google for a code by Ron Mayer). The third
strategy is exemplified, for example, by Sorensen's original code from
the 1980s (
http://faculty.prairiestate.edu/skifowit/fft/sfftcf.txt).
All of the codes I just mentioned work in place, if I remember
correctly. FFTW (
www.fftw.org) implements all three strategies [*],
and can work both in-place and out-of-place.
Regards,
Steven G. Johnson
[*] The reason FFTW implements all three strategies is as follows.
The (first) N/2 complex-FFT method, which is generalized to N/2r in
FFTW 3.2, is useful in FFTW because it allows us to exploit our SIMD
optimizations more easily. The (third) pruned complex-FFT method is
the only one that works well for odd N, and is also advantageous on
machines without SIMD instructions. The (second) DHT-based method is
useful for large prime N, because there are efficient DHT algorithms
for prime N analogous to corresponding prime-N complex-FFT algorithms.