Maker Pro
Maker Pro

FFT algorithm

I have a question about FFT. I want to implement Cooley-Tukey FFT algorihtm
but i must know how much RAM i need. I have N samples (24bit) so i need 3xN
bytes for input and 3xN bytes for output and how much more?

You'll have to do some googling, but there is a FFT library for linux.
No need to reinvent the wheel.

I saw your other post regarding THD. You need to beware of edge
effects if the source is not synchronous to the sampling AND the data
at the edges are not continuous. You need the sample clock and sine
source to be synchronous else the energy gets "smeared" between bins.
The edge effects are hard to explain, but visualize the end of the
sample stream being wrapped around and touching the beginning of the
stream. If there is a hickup here, this shows up as an artifact. This
is why more analysis is done by windowing the data. To be honest, you
really need to read a basic DSP book like Openheim and Schafer.
 
P

pgw

Jan 1, 1970
0
You'll have to do some googling, but there is a FFT library for linux.
No need to reinvent the wheel.
I saw your other post regarding THD.

No that was not me.
You need to beware of edge
effects if the source is not synchronous to the sampling AND the data
at the edges are not continuous. You need the sample clock and sine
source to be synchronous else the energy gets "smeared" between bins.

But I can't synchronize sampling to all harmonics. I think this way is not
sufficient. Another way is to get a many periods of signal
The edge effects are hard to explain, but visualize the end of the
sample stream being wrapped around and touching the beginning of the
stream. If there is a hickup here, this shows up as an artifact.

The theory of Digital Fourier Transform I know. I used FFT many times but
never implement.
Now I'm disigning a hardware and that is why I asked about requisite
memory.
 
R

Robert Baer

Jan 1, 1970
0
pgw said:
Dnia Sat, 24 Mar 2007 20:49:25 -0500, Jamie napisa³(a):




Yes, that is true, it prevents aliasing. But i think i need more than 2
samples/period.
If you want reasonable resolution and accuracy, look for more than 10
samples per period.
 
No that was not me.

But I can't synchronize sampling to all harmonics. I think this way is not
sufficient. Another way is to get a many periods of signal

Trust me. If you can synchronize the source, the analysis is greatly
reduced. This is done in ADC characterization. I believe B&K makes a
source that uses a 10Mhz reference.
The theory of Digital Fourier Transform I know. I used FFT many times but
never implement.
Now I'm disigning a hardware and that is why I asked about requisite
memory.

I really suggest programming your algorithms as a software exercise
before you attempt to do this in hardware. If the source is not
synchronized, you will need to understand windowing. You probably
should research Blackman-Harris window and the "flat top" window.
 
J

joseph2k

Jan 1, 1970
0
pgw said:
I think that a number of samples determine the resolution, and a time
interval (sampling frequency) determine a band.
Not at all, the total sample length, in time, determines the lowest
resolvable frequency in absolute terms. The ratio of sample rate to sample
period determines the resolution. BTW 400 ns is a very short time to
accumulate 2.5 megasamples. Is that the (effective??) sample rate or the
total sample acquisition time?
 
J

joseph2k

Jan 1, 1970
0
MooseFET said:
By time you get through an FFT, you lose bits. You should usually
figure on dropping one bit for every power of two in the length. This
means that at 24bits, your results will be poor. You may want to add
some LSB space to the values to help prevent this.
What in the heck are getting on about?
 
J

joseph2k

Jan 1, 1970
0
Robert said:
Eight megs is nothing; i have done FFTs with millions of digits for
input sample, to do multi-million digit myltiply, divide, etc.
I presume you are talking megasample only, not including megadigit,
otherwise the cpu time required would take well past the current model heat
death of the universe.
 
J

joseph2k

Jan 1, 1970
0
Jamie said:
for streaming, i do sample overlapping.
i use 25% of the last block as part of the current block etc..
As you are undoubtedly aware that is a seperate issue.
 
J

joseph2k

Jan 1, 1970
0
You'll have to do some googling, but there is a FFT library for linux.
No need to reinvent the wheel.

I saw your other post regarding THD. You need to beware of edge
effects if the source is not synchronous to the sampling AND the data
at the edges are not continuous. You need the sample clock and sine
source to be synchronous else the energy gets "smeared" between bins.
The edge effects are hard to explain, but visualize the end of the
sample stream being wrapped around and touching the beginning of the
stream. If there is a hickup here, this shows up as an artifact. This
is why more analysis is done by windowing the data. To be honest, you
really need to read a basic DSP book like Openheim and Schafer.
Either you did not fully understand what it includes/explains or you cannot
express yourself clearly. Perhaps try the book by E. Oran Brigham, you do
have to be able to follow the math though.
 
M

MooseFET

Jan 1, 1970
0
I presume you are talking megasample only, not including megadigit,
otherwise the cpu time required would take well past the current model heat
death of the universe.

It could be the other way around. An 8 point FFT with million digit
numbers wouldn't take nearly that long.

BTW: In real life, you don't need divides inside the butterfly loop.
For a smallish collection of fixed lengths you can store most or all
of sin() and cos() in tables.
 
P

pgw

Jan 1, 1970
0
joseph2k said:
Not at all, the total sample length, in time, determines the lowest
resolvable frequency in absolute terms. The ratio of sample rate to sample
period determines the resolution.

That's exactly what I ment.
BTW 400 ns is a very short time to
accumulate 2.5 megasamples. Is that the (effective??) sample rate or the
total sample acquisition time?

Of course 400ns is the sample rate (my mistake) and that give 1s the
acquisition time.
 
By time you get through an FFT, you lose bits. You should usually
figure on dropping one bit for every power of two in the length. This
means that at 24bits, your results will be poor. You may want to add
some LSB space to the values to help prevent this.

This is too pessimistic. In fixed-point arithmetic, the error grows
as O(sqrt(N)), corresponding to one bit of accuracy lost for every
factor of 4 in size. In floating-point arithmetic, the error grows as
O(log N) at worst and O(sqrt(log N)) on average. (All assuming the
FFT is implemented carefully, of course.)
 
R

Robert Baer

Jan 1, 1970
0
joseph2k said:
Robert Baer wrote:



I presume you are talking megasample only, not including megadigit,
otherwise the cpu time required would take well past the current model heat
death of the universe.
Sorry; you may be correct with a schoolboy multiply O(N*N), but for
an FFT-based multiply it is closer to O(N*log(N)).
 
R

Robert Baer

Jan 1, 1970
0
MooseFET said:
It could be the other way around. An 8 point FFT with million digit
numbers wouldn't take nearly that long.

BTW: In real life, you don't need divides inside the butterfly loop.
For a smallish collection of fixed lengths you can store most or all
of sin() and cos() in tables.
It turns out that storing SIN() and COS() in tables (first and then
looking up what one needs in the butterfly) is faster than using the FPU
for calculating them; nevermind that Pentiums has a SINCOS instruction
that returns both...
 
M

MooseFET

Jan 1, 1970
0
This is too pessimistic. In fixed-point arithmetic, the error grows
as O(sqrt(N)), corresponding to one bit of accuracy lost for every
factor of 4 in size. In floating-point arithmetic, the error grows as
O(log N) at worst and O(sqrt(log N)) on average. (All assuming the
FFT is implemented carefully, of course.)


The rule I gave is pessimistic, but I believe realistically so. This
is based on my observations of integer implementations. The error in
the sin() and cos() functions were likely part of it. To keep the
code small, some inaccuracies due to small size in the table may have
been accepted.
 
J

joseph2k

Jan 1, 1970
0
On Mar 24, 8:22 pm, Jamie

No, if you have purely real inputs (and hence half of the outputs are
redundant by conjugate symmetry), there are numerous methods to save a
factor of two in memory (and time) over the complex transform. These
methods can also operate in-place.

So, the answer to the original poster is that he/she needs storage for
the N samples (overwritten by the N outputs), plus O(1) additional
memory, at minimum. Of course, various algorithms may require more
memory in exchange for better speed, accuracy, and/or simplicity.

Regards,
Steven G. Johnson
How about a pointer to an all real implementation that does in place
computation. The intermediate results would require storage if the
temporary imaginary parts for correct results, wouldn't they?
 
S

Steven G. Johnson

Jan 1, 1970
0
Sorry, I thought I'd responded to this already.

How about a pointer to an all real implementation that does in place
computation. The intermediate results would require storage if the
temporary imaginary parts for correct results, wouldn't they?

No, this is not correct.

First, I never called it an "all real" FFT. An FFT specialized for
real inputs still has complex outputs, it is just that half of these
complex outputs are redundant because they are complex conjugates.
Therefore, you don't need to store them. Similar redundancies apply
to all of the intermediate results, which is why you can perform the
whole FFT of real data in-place with only O(1) storage beyond that
required for the inputs (overwritten by the outputs).

In particular, there are three common strategies. First, one well-
known trick is that an FFT of N real inputs, for even N, can be
expressed as a complex-input FFT of length N/2, plus O(N) operations
on the output (see e.g. Numerical Recipes). This can be done in-
place, and has the advantage of simplicity if you already have a
complex-input FFT, but is slightly suboptimal from an arithmetic
perspective. Second, one can re-express the FFT in terms of the
discrete Hartley transform, but this also turns out to be sub-optimal
from an arithmetic perspective. Third, the minimal arithmetic counts
are achieved by another strategy: directly start with an ordinary FFT
algorithm of length N, but then prune all of the redundant
computations from the algorithm. This can also be done in-place; a
well known description of this method was published by Sorensen in
1987.

There are many, many freely available FFT implementations that
implement one (or more than one) of these strategies. The first
strategy is found all over the place, from textbook code like
Numerical Recipes' to more optimized programs like the one by Ooura
(google it). The second strategy, based on the DHT, can also be
easily found (e.g. google for a code by Ron Mayer). The third
strategy is exemplified, for example, by Sorensen's original code from
the 1980s (http://faculty.prairiestate.edu/skifowit/fft/sfftcf.txt).
All of the codes I just mentioned work in place, if I remember
correctly. FFTW (www.fftw.org) implements all three strategies [*],
and can work both in-place and out-of-place.

Regards,
Steven G. Johnson

[*] The reason FFTW implements all three strategies is as follows.
The (first) N/2 complex-FFT method, which is generalized to N/2r in
FFTW 3.2, is useful in FFTW because it allows us to exploit our SIMD
optimizations more easily. The (third) pruned complex-FFT method is
the only one that works well for odd N, and is also advantageous on
machines without SIMD instructions. The (second) DHT-based method is
useful for large prime N, because there are efficient DHT algorithms
for prime N analogous to corresponding prime-N complex-FFT algorithms.
 
J

joseph2k

Jan 1, 1970
0
Steven said:
Sorry, I thought I'd responded to this already.

How about a pointer to an all real implementation that does in place
computation. The intermediate results would require storage if the
temporary imaginary parts for correct results, wouldn't they?

No, this is not correct.

First, I never called it an "all real" FFT. An FFT specialized for
real inputs still has complex outputs, it is just that half of these
complex outputs are redundant because they are complex conjugates.
Therefore, you don't need to store them. Similar redundancies apply
to all of the intermediate results, which is why you can perform the
whole FFT of real data in-place with only O(1) storage beyond that
required for the inputs (overwritten by the outputs).

In particular, there are three common strategies. First, one well-
known trick is that an FFT of N real inputs, for even N, can be
expressed as a complex-input FFT of length N/2, plus O(N) operations
on the output (see e.g. Numerical Recipes). This can be done in-
place, and has the advantage of simplicity if you already have a
complex-input FFT, but is slightly suboptimal from an arithmetic
perspective. Second, one can re-express the FFT in terms of the
discrete Hartley transform, but this also turns out to be sub-optimal
from an arithmetic perspective. Third, the minimal arithmetic counts
are achieved by another strategy: directly start with an ordinary FFT
algorithm of length N, but then prune all of the redundant
computations from the algorithm. This can also be done in-place; a
well known description of this method was published by Sorensen in
1987.

There are many, many freely available FFT implementations that
implement one (or more than one) of these strategies. The first
strategy is found all over the place, from textbook code like
Numerical Recipes' to more optimized programs like the one by Ooura
(google it). The second strategy, based on the DHT, can also be
easily found (e.g. google for a code by Ron Mayer). The third
strategy is exemplified, for example, by Sorensen's original code from
the 1980s (http://faculty.prairiestate.edu/skifowit/fft/sfftcf.txt).
All of the codes I just mentioned work in place, if I remember
correctly. FFTW (www.fftw.org) implements all three strategies [*],
and can work both in-place and out-of-place.

Regards,
Steven G. Johnson

[*] The reason FFTW implements all three strategies is as follows.
The (first) N/2 complex-FFT method, which is generalized to N/2r in
FFTW 3.2, is useful in FFTW because it allows us to exploit our SIMD
optimizations more easily. The (third) pruned complex-FFT method is
the only one that works well for odd N, and is also advantageous on
machines without SIMD instructions. The (second) DHT-based method is
useful for large prime N, because there are efficient DHT algorithms
for prime N analogous to corresponding prime-N complex-FFT algorithms.
That will teach me not to ignore my copy of CAlgo. Nice post. Thanx.
 
M

MooseFET

Jan 1, 1970
0
I have a question about FFT. I want to implement Cooley-Tukey FFT algorihtm
but i must know how much RAM i need. I have N samples (24bit) so i need 3xN
bytes for input and 3xN bytes for output and how much more?

Elsewhere I got the impression that you intend to use the FFT for a
THD calculation. If this is the case, I have a few observations that
may make your life easier.

(1)
The harmonics in any signal you are likely to measure are much smaller
than the fundamental. They also tend to decrease with frequency. If
you gather one extra point and take the first difference, you will be
doing an FFT on data where the harmonics are boosted in amplitude so
the resolution issues should be less. You can easily scale the
results to correct for the change you have made.

(2)
Before you window the data, there may be a slow change in the offset
voltage putting a ramp into the data. If you don't do as suggested in
(1), you may want to fit to a straight line or low order curve before
you do the FFT.

(3)
If the frequency of the sampling and the frequency of the data are not
tightly coupled, you won't know exactly the frequency of the signal.
You can "old style" fit a sin() and cos() to the data. You can tune
the frequency until it matches as well as can be. When you strip out
the fundamental, the upper bits of the numbers will almost certainly
be zeros. You may be able to reduce the bits you work with down to
the natural size of the CPU you have.

(4)
You can do (1) and (3) with a sort of self tuning IIR filter. This
would mean that you only need to store the differences. This would
lower the storage needs.
 
Top