furnace/extern/fftw/doc/FAQ/fftw-faq.ascii

            FFTW FREQUENTLY ASKED QUESTIONS WITH ANSWERS
                            14 Sep 2021
			     Matteo Frigo
			   Steven G. Johnson
 			    <fftw@fftw.org>

This is the list of Frequently Asked Questions about FFTW, a collection of
fast C routines for computing the Discrete Fourier Transform in one or
more dimensions.

===============================================================================

Index

 Section 1.  Introduction and General Information
 Q1.1        What is FFTW?
 Q1.2        How do I obtain FFTW?
 Q1.3        Is FFTW free software?
 Q1.4        What is this about non-free licenses?
 Q1.5        In the West? I thought MIT was in the East?

 Section 2.  Installing FFTW
 Q2.1        Which systems does FFTW run on?
 Q2.2        Does FFTW run on Windows?
 Q2.3        My compiler has trouble with FFTW.
 Q2.4        FFTW does not compile on Solaris, complaining about const.
 Q2.5        What's the difference between --enable-3dnow and --enable-k7?
 Q2.6        What's the difference between the fma and the non-fma versions?
 Q2.7        Which language is FFTW written in?
 Q2.8        Can I call FFTW from Fortran?
 Q2.9        Can I call FFTW from C++?
 Q2.10       Why isn't FFTW written in Fortran/C++?
 Q2.11       How do I compile FFTW to run in single precision?
 Q2.12       --enable-k7 does not work on x86-64

 Section 3.  Using FFTW
 Q3.1        Why not support the FFTW 2 interface in FFTW 3?
 Q3.2        Why do FFTW 3 plans encapsulate the input/output arrays and not ju
 Q3.3        FFTW seems really slow.
 Q3.4        FFTW slows down after repeated calls.
 Q3.5        An FFTW routine is crashing when I call it.
 Q3.6        My Fortran program crashes when calling FFTW.
 Q3.7        FFTW gives results different from my old FFT.
 Q3.8        FFTW gives different results between runs
 Q3.9        Can I save FFTW's plans?
 Q3.10       Why does your inverse transform return a scaled result?
 Q3.11       How can I make FFTW put the origin (zero frequency) at the center 
 Q3.12       How do I FFT an image/audio file in *foobar* format?
 Q3.13       My program does not link (on Unix).
 Q3.14       I included your header, but linking still fails.
 Q3.15       My program crashes, complaining about stack space.
 Q3.16       FFTW seems to have a memory leak.
 Q3.17       The output of FFTW's transform is all zeros.
 Q3.18       How do I call FFTW from the Microsoft language du jour?
 Q3.19       Can I compute only a subset of the DFT outputs?
 Q3.20       Can I use FFTW's routines for in-place and out-of-place matrix tra

 Section 4.  Internals of FFTW
 Q4.1        How does FFTW work?
 Q4.2        Why is FFTW so fast?

 Section 5.  Known bugs
 Q5.1        FFTW 1.1 crashes in rfftwnd on Linux.
 Q5.2        The MPI transforms in FFTW 1.2 give incorrect results/leak memory.
 Q5.3        The test programs in FFTW 1.2.1 fail when I change FFTW to use sin
 Q5.4        The test program in FFTW 1.2.1 fails for n > 46340.
 Q5.5        The threaded code fails on Linux Redhat 5.0
 Q5.6        FFTW 2.0's rfftwnd fails for rank > 1 transforms with a final dime
 Q5.7        FFTW 2.0's complex transforms give the wrong results with prime fa
 Q5.8        FFTW 2.1.1's MPI test programs crash with MPICH.
 Q5.9        FFTW 2.1.2's multi-threaded transforms don't work on AIX.
 Q5.10       FFTW 2.1.2's complex transforms give incorrect results for large p
 Q5.11       FFTW 2.1.3's multi-threaded transforms don't give any speedup on S
 Q5.12       FFTW 2.1.3 crashes on AIX.

===============================================================================

Section 1.  Introduction and General Information

 Q1.1        What is FFTW?
 Q1.2        How do I obtain FFTW?
 Q1.3        Is FFTW free software?
 Q1.4        What is this about non-free licenses?
 Q1.5        In the West? I thought MIT was in the East?

-------------------------------------------------------------------------------

Question 1.1.  What is FFTW?

FFTW is a free collection of fast C routines for computing the Discrete
Fourier Transform in one or more dimensions.  It includes complex, real,
symmetric, and parallel transforms, and can handle arbitrary array sizes
efficiently.  FFTW is typically faster than other publically-available FFT
implementations, and is even competitive with vendor-tuned libraries.
(See our web page for extensive benchmarks.)  To achieve this performance,
FFTW uses novel code-generation and runtime self-optimization techniques
(along with many other tricks).

-------------------------------------------------------------------------------

Question 1.2.  How do I obtain FFTW?

FFTW can be found at the FFTW web page.  You can also retrieve it from
ftp.fftw.org in /pub/fftw.

-------------------------------------------------------------------------------

Question 1.3.  Is FFTW free software?

Starting with version 1.3, FFTW is Free Software in the technical sense
defined by the Free Software Foundation (see Categories of Free and
Non-Free Software), and is distributed under the terms of the GNU General
Public License.  Previous versions of FFTW were distributed without fee
for noncommercial use, but were not technically ``free.''

Non-free licenses for FFTW are also available that permit different terms
of use than the GPL.

-------------------------------------------------------------------------------

Question 1.4.  What is this about non-free licenses?

The non-free licenses are for companies that wish to use FFTW in their
products but are unwilling to release their software under the GPL (which
would require them to release source code and allow free redistribution).
Such users can purchase an unlimited-use license from MIT.  Contact us for
more details.

We could instead have released FFTW under the LGPL, or even disallowed
non-Free usage.  Suffice it to say, however, that MIT owns the copyright
to FFTW and they only let us GPL it because we convinced them that it
would neither affect their licensing revenue nor irritate existing
licensees.

-------------------------------------------------------------------------------

Question 1.5.  In the West? I thought MIT was in the East?

Not to an Italian.  You could say that we're a Spaghetti Western (with
apologies to Sergio Leone).

===============================================================================

Section 2.  Installing FFTW

 Q2.1        Which systems does FFTW run on?
 Q2.2        Does FFTW run on Windows?
 Q2.3        My compiler has trouble with FFTW.
 Q2.4        FFTW does not compile on Solaris, complaining about const.
 Q2.5        What's the difference between --enable-3dnow and --enable-k7?
 Q2.6        What's the difference between the fma and the non-fma versions?
 Q2.7        Which language is FFTW written in?
 Q2.8        Can I call FFTW from Fortran?
 Q2.9        Can I call FFTW from C++?
 Q2.10       Why isn't FFTW written in Fortran/C++?
 Q2.11       How do I compile FFTW to run in single precision?
 Q2.12       --enable-k7 does not work on x86-64

-------------------------------------------------------------------------------

Question 2.1.  Which systems does FFTW run on?

FFTW is written in ANSI C, and should work on any system with a decent C
compiler.  (See also Q2.2 `Does FFTW run on Windows?', Q2.3 `My compiler
has trouble with FFTW.'.) FFTW can also take advantage of certain
hardware-specific features, such as cycle counters and SIMD instructions,
but this is optional.

-------------------------------------------------------------------------------

Question 2.2.  Does FFTW run on Windows?

Yes, many people have reported successfully using FFTW on Windows with
various compilers.  FFTW was not developed on Windows, but the source code
is essentially straight ANSI C.  See also the FFTW Windows installation
notes, Q2.3 `My compiler has trouble with FFTW.', and Q3.18 `How do I call
FFTW from the Microsoft language du jour?'.

-------------------------------------------------------------------------------

Question 2.3.  My compiler has trouble with FFTW.

Complain fiercely to the vendor of the compiler.

We have successfully used gcc 3.2.x on x86 and PPC, a recent Compaq C
compiler for Alpha, version 6 of IBM's xlc compiler for AIX, Intel's icc
versions 5-7, and Sun WorkShop cc version 6.

FFTW is likely to push compilers to their limits, however, and several
compiler bugs have been exposed by FFTW.  A partial list follows.

gcc 2.95.x for Solaris/SPARC produces incorrect code for the test program
(workaround: recompile the libbench2 directory with -O2).

NetBSD/macppc 1.6 comes with a gcc version that also miscompiles the test
program. (Please report a workaround if you know one.)

gcc 3.2.3 for ARM reportedly crashes during compilation.  This bug is
reportedly fixed in later versions of gcc.

Versions 8.0 and 8.1 of Intel's icc falsely claim to be gcc, so you should
specify CC="icc -no-gcc"; this is automatic in FFTW 3.1.  icc-8.0.066
reportely produces incorrect code for FFTW 2.1.5, but is fixed in version
8.1.  icc-7.1 compiler build 20030402Z appears to produce incorrect
dependencies, causing the compilation to fail.  icc-7.1 build 20030307Z
appears to work fine.  (Use icc -V to check which build you have.)  As of
2003/04/18, build 20030402Z appears not to be available any longer on
Intel's website, whereas the older build 20030307Z is available.

ranlib of GNU binutils 2.9.1 on Irix has been observed to corrupt the FFTW
libraries, causing a link failure when FFTW is compiled.  Since ranlib is
completely superfluous on Irix, we suggest deleting it from your system
and replacing it with a symbolic link to /bin/echo.

If support for SIMD instructions is enabled in FFTW, further compiler
problems may appear:

gcc 3.4.[0123] for x86 produces incorrect SSE2 code for FFTW when -O2 (the
best choice for FFTW) is used, causing FFTW to crash (make check crashes).
This bug is fixed in gcc 3.4.4.  On x86_64 (amd64/em64t), gcc 3.4.4
reportedly still has a similar problem, but this is fixed as of gcc 3.4.6.

gcc-3.2 for x86 produces incorrect SIMD code if -O3 is used.  The same
compiler produces incorrect SIMD code if no optimization is used, too.
When using gcc-3.2, it is a good idea not to change the default CFLAGS
selected by the configure script.

Some 3.0.x and 3.1.x versions of gcc on x86 may crash.  gcc so-called 2.96
shipping with RedHat 7.3 crashes when compiling SIMD code.  In both cases,
please upgrade to gcc-3.2 or later.

Intel's icc 6.0 misaligns SSE constants, but FFTW has a workaround. icc
8.x fails to compile FFTW 3.0.x because it falsely claims to be gcc; we
believe this to be a bug in icc, but FFTW 3.1 has a workaround.

Visual C++ 2003 reportedly produces incorrect code for SSE/SSE2 when
compiling FFTW.  This bug was reportedly fixed in VC++ 2005;
alternatively, you could switch to the Intel compiler. VC++ 6.0 also
reportedly produces incorrect code for the file reodft11e-r2hc-odd.c
unless optimizations are disabled for that file.

gcc 2.95 on MacOS X miscompiles AltiVec code (fixed in later versions).
gcc 3.2.x miscompiles AltiVec permutations, but FFTW has a workaround.
gcc 4.0.1 on MacOS for Intel crashes when compiling FFTW; a workaround is
to compile one file without optimization: cd kernel; make CFLAGS=" "
trig.lo.

gcc 4.1.1 reportedly crashes when compiling FFTW for MIPS; the workaround
is to compile the file it crashes on (t2_64.c) with a lower optimization
level.

gcc versions 4.1.2 to 4.2.0 for x86 reportedly miscompile FFTW 3.1's test
program, causing make check to crash (gcc bug #26528).  The bug was
reportedly fixed in gcc version 4.2.1 and later.  A workaround is to
compile libbench2/verify-lib.c without optimization.

-------------------------------------------------------------------------------

Question 2.4.  FFTW does not compile on Solaris, complaining about const.

We know that at least on Solaris 2.5.x with Sun's compilers 4.2 you might
get error messages from make such as

"./fftw.h", line 88: warning: const is a keyword in ANSI C

This is the case when the configure script reports that const does not
work:

checking for working const... (cached) no

You should be aware that Solaris comes with two compilers, namely,
/opt/SUNWspro/SC4.2/bin/cc and /usr/ucb/cc.  The latter compiler is
non-ANSI.  Indeed, it is a perverse shell script that calls the real
compiler in non-ANSI mode.  In order to compile FFTW, change your path so
that the right cc is used.

To know whether your compiler is the right one,  type cc -V.  If the
compiler prints ``ucbcc'', as in

ucbcc: WorkShop Compilers 4.2 30 Oct 1996 C 4.2

then the compiler is wrong.  The right message is something like

cc: WorkShop Compilers 4.2 30 Oct 1996 C 4.2

-------------------------------------------------------------------------------

Question 2.5.  What's the difference between --enable-3dnow and --enable-k7?

--enable-k7 enables 3DNow! instructions on K7 processors (AMD Athlon and
its variants).  K7 support is provided by assembly routines generated by a
special purpose compiler.  As of fftw-3.2, --enable-k7 is no longer
supported.

--enable-3dnow enables generic 3DNow! support using gcc builtin functions.
This works on earlier AMD processors, but it is not as fast as our special
assembly routines.  As of fftw-3.1, --enable-3dnow is no longer supported.

-------------------------------------------------------------------------------

Question 2.6.  What's the difference between the fma and the non-fma versions?

The fma version tries to exploit the fused multiply-add instructions
implemented in many processors such as PowerPC, ia-64, and MIPS.  The two
FFTW packages are otherwise identical.  In FFTW 3.1, the fma and non-fma
versions were merged together into a single package, and the configure
script attempts to automatically guess which version to use.

The FFTW 3.1 configure script enables fma by default on PowerPC, Itanium,
and PA-RISC, and disables it otherwise.  You can force one or the other by
using the --enable-fma or --disable-fma flag for configure.

Definitely use fma if you have a PowerPC-based system with gcc (or IBM
xlc).  This includes all GNU/Linux systems for PowerPC and the older
PowerPC-based MacOS systems.  Also use it on PA-RISC and Itanium with the
HP/UX compiler.

Definitely do not use the fma version if you have an ia-32 processor
(Intel, AMD, MacOS on Intel, etcetera).

For other architectures/compilers, the situation is not so clear.  For
example, ia-64 has the fma instruction, but gcc-3.2 appears not to exploit
it correctly.  Other compilers may do the right thing, but we have not
tried them.  Please send us your feedback so that we can update this FAQ
entry.

-------------------------------------------------------------------------------

Question 2.7.  Which language is FFTW written in?

FFTW is written in ANSI C.  Most of the code, however, was automatically
generated by a program called genfft, written in the Objective Caml
dialect of ML.  You do not need to know ML or to have an Objective Caml
compiler in order to use FFTW.

genfft is provided with the FFTW sources, which means that you can play
with the code generator if you want.  In this case, you need a working
Objective Caml system.  Objective Caml is available from the Caml web
page.

-------------------------------------------------------------------------------

Question 2.8.  Can I call FFTW from Fortran?

Yes, FFTW (versions 1.3 and higher) contains a Fortran-callable interface,
documented in the FFTW manual.

By default, FFTW configures its Fortran interface to work with the first
compiler it finds, e.g. g77.  To configure for a different, incompatible
Fortran compiler foobar, use ./configure F77=foobar when installing FFTW.
(In the case of g77, however, FFTW 3.x also includes an extra set of
Fortran-callable routines with one less underscore at the end of
identifiers, which should cover most other Fortran compilers on Linux at
least.)

-------------------------------------------------------------------------------

Question 2.9.  Can I call FFTW from C++?

Most definitely.  FFTW should compile and/or link under any C++ compiler.
Moreover, it is likely that the C++ <complex> template class is
bit-compatible with FFTW's complex-number format (see the FFTW manual for
more details).

-------------------------------------------------------------------------------

Question 2.10.  Why isn't FFTW written in Fortran/C++?

Because we don't like those languages, and neither approaches the
portability of C.

-------------------------------------------------------------------------------

Question 2.11.  How do I compile FFTW to run in single precision?

On a Unix system: configure --enable-float.  On a non-Unix system: edit
config.h to #define the symbol FFTW_SINGLE (for FFTW 3.x).  In both cases,
you must then recompile FFTW.  In FFTW 3, all FFTW identifiers will then
begin with fftwf_ instead of fftw_.

-------------------------------------------------------------------------------

Question 2.12.  --enable-k7 does not work on x86-64

Support for --enable-k7 was discontinued in fftw-3.2.

The fftw-3.1 release supports --enable-k7.  This option only works on
32-bit x86 machines that implement 3DNow!, including the AMD Athlon and
the AMD Opteron in 32-bit mode.  --enable-k7 does not work on AMD Opteron
in 64-bit mode.  Use --enable-sse for x86-64 machines.

FFTW supports 3DNow! by means of assembly code generated by a
special-purpose compiler.  It is hard to produce assembly code that works
in both 32-bit and 64-bit mode.

===============================================================================

Section 3.  Using FFTW

 Q3.1        Why not support the FFTW 2 interface in FFTW 3?
 Q3.2        Why do FFTW 3 plans encapsulate the input/output arrays and not ju
 Q3.3        FFTW seems really slow.
 Q3.4        FFTW slows down after repeated calls.
 Q3.5        An FFTW routine is crashing when I call it.
 Q3.6        My Fortran program crashes when calling FFTW.
 Q3.7        FFTW gives results different from my old FFT.
 Q3.8        FFTW gives different results between runs
 Q3.9        Can I save FFTW's plans?
 Q3.10       Why does your inverse transform return a scaled result?
 Q3.11       How can I make FFTW put the origin (zero frequency) at the center 
 Q3.12       How do I FFT an image/audio file in *foobar* format?
 Q3.13       My program does not link (on Unix).
 Q3.14       I included your header, but linking still fails.
 Q3.15       My program crashes, complaining about stack space.
 Q3.16       FFTW seems to have a memory leak.
 Q3.17       The output of FFTW's transform is all zeros.
 Q3.18       How do I call FFTW from the Microsoft language du jour?
 Q3.19       Can I compute only a subset of the DFT outputs?
 Q3.20       Can I use FFTW's routines for in-place and out-of-place matrix tra

-------------------------------------------------------------------------------

Question 3.1.  Why not support the FFTW 2 interface in FFTW 3?

FFTW 3 has semantics incompatible with earlier versions: its plans can
only be used for a given stride, multiplicity, and other characteristics
of the input and output arrays; these stronger semantics are necessary for
performance reasons.  Thus, it is impossible to efficiently emulate the
older interface (whose plans can be used for any transform of the same
size).  We believe that it should be possible to upgrade most programs
without any difficulty, however.

-------------------------------------------------------------------------------

Question 3.2.  Why do FFTW 3 plans encapsulate the input/output arrays and not just the algorithm?

There are several reasons:

* It was important for performance reasons that the plan be specific to
  array characteristics like the stride (and alignment, for SIMD), and
  requiring that the user maintain these invariants is error prone.
* In most high-performance applications, as far as we can tell, you are
  usually transforming the same array over and over, so FFTW's semantics
  should not be a burden.
* If you need to transform another array of the same size, creating a new
  plan once the first exists is a cheap operation.
* If you need to transform many arrays of the same size at once, you
  should really use the plan_many routines in FFTW's "advanced" interface.
* If the abovementioned array characteristics are the same, you are
  willing to pay close attention to the documentation, and you really need
  to, we provide a "new-array execution" interface to apply a plan to a
  new array.

-------------------------------------------------------------------------------

Question 3.3.  FFTW seems really slow.

You are probably recreating the plan before every transform, rather than
creating it once and reusing it for all transforms of the same size.  FFTW
is designed to be used in the following way:

* First, you create a plan.  This will take several seconds.
* Then, you reuse the plan many times to perform FFTs.  These are fast.

If you don't need to compute many transforms and the time for the planner
is significant, you have two options.  First, you can use the
FFTW_ESTIMATE option in the planner, which uses heuristics instead of
runtime measurements and produces a good plan in a short time.  Second,
you can use the wisdom feature to precompute the plan; see Q3.9 `Can I
save FFTW's plans?'

-------------------------------------------------------------------------------

Question 3.4.  FFTW slows down after repeated calls.

Probably, NaNs or similar are creeping into your data, and the slowdown is
due to the resulting floating-point exceptions.  For example, be aware
that repeatedly FFTing the same array is a diverging process (because FFTW
computes the unnormalized transform).

-------------------------------------------------------------------------------

Question 3.5.  An FFTW routine is crashing when I call it.

Did the FFTW test programs pass (make check, or cd tests; make bigcheck if
you want to be paranoid)?  If so, you almost certainly have a bug in your
own code.  For example, you could be passing invalid arguments (such as
wrongly-sized arrays) to FFTW, or you could simply have memory corruption
elsewhere in your program that causes random crashes later on.  Please
don't complain to us unless you can come up with a minimal self-contained
program (preferably under 30 lines) that illustrates the problem.

-------------------------------------------------------------------------------

Question 3.6.  My Fortran program crashes when calling FFTW.

As described in the manual, on 64-bit machines you must store the plans in
variables large enough to hold a pointer, for example integer*8.  We
recommend using integer*8 on 32-bit machines as well, to simplify porting.

-------------------------------------------------------------------------------

Question 3.7.  FFTW gives results different from my old FFT.

People follow many different conventions for the DFT, and you should be
sure to know the ones that we use (described in the FFTW manual).  In
particular, you should be aware that the FFTW_FORWARD/FFTW_BACKWARD
directions correspond to signs of -1/+1 in the exponent of the DFT
definition.  (*Numerical Recipes* uses the opposite convention.)

You should also know that we compute an unnormalized transform.  In
contrast, Matlab is an example of program that computes a normalized
transform.  See Q3.10 `Why does your inverse transform return a scaled
result?'.

Finally, note that floating-point arithmetic is not exact, so different
FFT algorithms will give slightly different results (on the order of the
numerical accuracy; typically a fractional difference of 1e-15 or so in
double precision).

-------------------------------------------------------------------------------

Question 3.8.  FFTW gives different results between runs

If you use FFTW_MEASURE or FFTW_PATIENT mode, then the algorithm FFTW
employs is not deterministic: it depends on runtime performance
measurements.  This will cause the results to vary slightly from run to
run.  However, the differences should be slight, on the order of the
floating-point precision, and therefore should have no practical impact on
most applications.

If you use saved plans (wisdom) or FFTW_ESTIMATE mode, however, then the
algorithm is deterministic and the results should be identical between
runs.

-------------------------------------------------------------------------------

Question 3.9.  Can I save FFTW's plans?

Yes. Starting with version 1.2, FFTW provides the wisdom mechanism for
saving plans; see the FFTW manual.

-------------------------------------------------------------------------------

Question 3.10.  Why does your inverse transform return a scaled result?

Computing the forward transform followed by the backward transform (or
vice versa) yields the original array scaled by the size of the array.
(For multi-dimensional transforms, the size of the array is the product of
the dimensions.)  We could, instead, have chosen a normalization that
would have returned the unscaled array. Or, to accomodate the many
conventions in this matter, the transform routines could have accepted a
"scale factor" parameter. We did not do this, however, for two reasons.
First, we didn't want to sacrifice performance in the common case where
the scale factor is 1. Second, in real applications the FFT is followed or
preceded by some computation on the data, into which the scale factor can
typically be absorbed at little or no cost.

-------------------------------------------------------------------------------

Question 3.11.  How can I make FFTW put the origin (zero frequency) at the center of its output?

For human viewing of a spectrum, it is often convenient to put the origin
in frequency space at the center of the output array, rather than in the
zero-th element (the default in FFTW).  If all of the dimensions of your
array are even, you can accomplish this by simply multiplying each element
of the input array by (-1)^(i + j + ...), where i, j, etcetera are the
indices of the element.  (This trick is a general property of the DFT, and
is not specific to FFTW.)

-------------------------------------------------------------------------------

Question 3.12.  How do I FFT an image/audio file in *foobar* format?

FFTW performs an FFT on an array of floating-point values.  You can
certainly use it to compute the transform of an image or audio stream, but
you are responsible for figuring out your data format and converting it to
the form FFTW requires.

-------------------------------------------------------------------------------

Question 3.13.  My program does not link (on Unix).

The libraries must be listed in the correct order (-lfftw3 -lm for FFTW
3.x) and *after* your program sources/objects.  (The general rule is that
if *A* uses *B*, then *A* must be listed before *B* in the link command.).

-------------------------------------------------------------------------------

Question 3.14.  I included your header, but linking still fails.

You're a C++ programmer, aren't you?  You have to compile the FFTW library
and link it into your program, not just #include <fftw3.h>.  (Yes, this is
really a FAQ.)

-------------------------------------------------------------------------------

Question 3.15.  My program crashes, complaining about stack space.

You cannot declare large arrays with automatic storage (e.g. via
fftw_complex array[N]); you should use fftw_malloc (or equivalent) to
allocate the arrays you want to transform if they are larger than a few
hundred elements.

-------------------------------------------------------------------------------

Question 3.16.  FFTW seems to have a memory leak.

After you create a plan, FFTW caches the information required to quickly
recreate the plan.  (See Q3.9 `Can I save FFTW's plans?') It also
maintains a small amount of other persistent memory.  You can deallocate
all of FFTW's internally allocated memory, if you wish, by calling
fftw_cleanup(), as documented in the manual.

-------------------------------------------------------------------------------

Question 3.17.  The output of FFTW's transform is all zeros.

You should initialize your input array *after* creating the plan, unless
you use FFTW_ESTIMATE: planning with FFTW_MEASURE or FFTW_PATIENT
overwrites the input/output arrays, as described in the manual.

-------------------------------------------------------------------------------

Question 3.18.  How do I call FFTW from the Microsoft language du jour?

Please *do not* ask us Windows-specific questions.  We do not use Windows.
We know nothing about Visual Basic, Visual C++, or .NET.  Please find the
appropriate Usenet discussion group and ask your question there.  See also
Q2.2 `Does FFTW run on Windows?'.

-------------------------------------------------------------------------------

Question 3.19.  Can I compute only a subset of the DFT outputs?

In general, no, an FFT intrinsically computes all outputs from all inputs.
In principle, there is something called a *pruned FFT* that can do what
you want, but to compute K outputs out of N the complexity is in general
O(N log K) instead of O(N log N), thus saving only a small additive factor
in the log.  (The same argument holds if you instead have only K nonzero
inputs.)

There are some specific cases in which you can get the O(N log K)
performance benefits easily, however, by combining a few ordinary FFTs.
In particular, the case where you want the first K outputs, where K
divides N, can be handled by performing N/K transforms of size K and then
summing the outputs multiplied by appropriate phase factors.  For more
details, see pruned FFTs with FFTW.

There are also some algorithms that compute pruned transforms
*approximately*, but they are beyond the scope of this FAQ.

-------------------------------------------------------------------------------

Question 3.20.  Can I use FFTW's routines for in-place and out-of-place matrix transposition?

You can use the FFTW guru interface to create a rank-0 transform of vector
rank 2 where the vector strides are transposed.  (A rank-0 transform is
equivalent to a 1D transform of size 1, which.  just copies the input into
the output.)  Specifying the same location for the input and output makes
the transpose in-place.

For double-valued data stored in row-major format, plan creation looks
like this:

fftw_plan plan_transpose(int rows, int cols, double *in, double *out)
{
    const unsigned flags = FFTW_ESTIMATE; /* other flags are possible */
    fftw_iodim howmany_dims[2];

    howmany_dims[0].n  = rows;
    howmany_dims[0].is = cols;
    howmany_dims[0].os = 1;

    howmany_dims[1].n  = cols;
    howmany_dims[1].is = 1;
    howmany_dims[1].os = rows;

    return fftw_plan_guru_r2r(/*rank=*/ 0, /*dims=*/ NULL,
                              /*howmany_rank=*/ 2, howmany_dims,
                              in, out, /*kind=*/ NULL, flags);
}
(This entry was written by Rhys Ulerich.)

===============================================================================

Section 4.  Internals of FFTW

 Q4.1        How does FFTW work?
 Q4.2        Why is FFTW so fast?

-------------------------------------------------------------------------------

Question 4.1.  How does FFTW work?

The innovation (if it can be so called) in FFTW consists in having a
variety of composable *solvers*, representing different FFT algorithms and
implementation strategies, whose combination into a particular *plan* for
a given size can be determined at runtime according to the characteristics
of your machine/compiler.  This peculiar software architecture allows FFTW
to adapt itself to almost any machine.

For more details (albeit somewhat outdated), see the paper "FFTW: An
Adaptive Software Architecture for the FFT", by M. Frigo and S. G.
Johnson, *Proc. ICASSP* 3, 1381 (1998), also available at the FFTW web
page.

-------------------------------------------------------------------------------

Question 4.2.  Why is FFTW so fast?

This is a complex question, and there is no simple answer.  In fact, the
authors do not fully know the answer, either.  In addition to many small
performance hacks throughout FFTW, there are three general reasons for
FFTW's speed.

* 	FFTW uses a variety of FFT algorithms and implementation styles that
  can be arbitrarily composed to adapt itself to a machine.  See Q4.1 `How
  does FFTW work?'.
* 	FFTW uses a code generator to produce highly-optimized routines for
  computing small transforms.
* 	FFTW uses explicit divide-and-conquer to take advantage of the memory
  hierarchy.

For more details (albeit somewhat outdated), see the paper "FFTW: An
Adaptive Software Architecture for the FFT", by M. Frigo and S. G.
Johnson, *Proc. ICASSP* 3, 1381 (1998), available along with other
references at the FFTW web page.

===============================================================================

Section 5.  Known bugs

 Q5.1        FFTW 1.1 crashes in rfftwnd on Linux.
 Q5.2        The MPI transforms in FFTW 1.2 give incorrect results/leak memory.
 Q5.3        The test programs in FFTW 1.2.1 fail when I change FFTW to use sin
 Q5.4        The test program in FFTW 1.2.1 fails for n > 46340.
 Q5.5        The threaded code fails on Linux Redhat 5.0
 Q5.6        FFTW 2.0's rfftwnd fails for rank > 1 transforms with a final dime
 Q5.7        FFTW 2.0's complex transforms give the wrong results with prime fa
 Q5.8        FFTW 2.1.1's MPI test programs crash with MPICH.
 Q5.9        FFTW 2.1.2's multi-threaded transforms don't work on AIX.
 Q5.10       FFTW 2.1.2's complex transforms give incorrect results for large p
 Q5.11       FFTW 2.1.3's multi-threaded transforms don't give any speedup on S
 Q5.12       FFTW 2.1.3 crashes on AIX.

-------------------------------------------------------------------------------

Question 5.1.  FFTW 1.1 crashes in rfftwnd on Linux.

This bug was fixed in FFTW 1.2.  There was a bug in rfftwnd causing an
incorrect amount of memory to be allocated.  The bug showed up in Linux
with libc-5.3.12 (and nowhere else that we know of).

-------------------------------------------------------------------------------

Question 5.2.  The MPI transforms in FFTW 1.2 give incorrect results/leak memory.

These bugs were corrected in FFTW 1.2.1.  The MPI transforms (really, just
the transpose routines) in FFTW 1.2 had bugs that could cause errors in
some situations.

-------------------------------------------------------------------------------

Question 5.3.  The test programs in FFTW 1.2.1 fail when I change FFTW to use single precision.

This bug was fixed in FFTW 1.3.  (Older versions of FFTW did work in
single precision, but the test programs didn't--the error tolerances in
the tests were set for double precision.)

-------------------------------------------------------------------------------

Question 5.4.  The test program in FFTW 1.2.1 fails for n > 46340.

This bug was fixed in FFTW 1.3.  FFTW 1.2.1 produced the right answer, but
the test program was wrong.  For large n, n*n in the naive transform that
we used for comparison overflows 32 bit integer precision, breaking the
test.

-------------------------------------------------------------------------------

Question 5.5.  The threaded code fails on Linux Redhat 5.0

We had problems with glibc-2.0.5.  The code should work with glibc-2.0.7.

-------------------------------------------------------------------------------

Question 5.6.  FFTW 2.0's rfftwnd fails for rank > 1 transforms with a final dimension >= 65536.

This bug was fixed in FFTW 2.0.1.  (There was a 32-bit integer overflow
due to a poorly-parenthesized expression.)

-------------------------------------------------------------------------------

Question 5.7.  FFTW 2.0's complex transforms give the wrong results with prime factors 17 to 97.

There was a bug in the complex transforms that could cause incorrect
results under (hopefully rare) circumstances for lengths with
intermediate-size prime factors (17-97).  This bug was fixed in FFTW
2.1.1.

-------------------------------------------------------------------------------

Question 5.8.  FFTW 2.1.1's MPI test programs crash with MPICH.

This bug was fixed in FFTW 2.1.2.  The 2.1/2.1.1 MPI test programs crashed
when using the MPICH implementation of MPI with the ch_p4 device (TCP/IP);
the transforms themselves worked fine.

-------------------------------------------------------------------------------

Question 5.9.  FFTW 2.1.2's multi-threaded transforms don't work on AIX.

This bug was fixed in FFTW 2.1.3.  The multi-threaded transforms in
previous versions didn't work with AIX's pthreads implementation, which
idiosyncratically creates threads in detached (non-joinable) mode by
default.

-------------------------------------------------------------------------------

Question 5.10.  FFTW 2.1.2's complex transforms give incorrect results for large prime sizes.

This bug was fixed in FFTW 2.1.3.  FFTW's complex-transform algorithm for
prime sizes (in versions 2.0 to 2.1.2) had an integer overflow problem
that caused incorrect results for many primes greater than 32768 (on
32-bit machines).  (Sizes without large prime factors are not affected.)

-------------------------------------------------------------------------------

Question 5.11.  FFTW 2.1.3's multi-threaded transforms don't give any speedup on Solaris.

This bug was fixed in FFTW 2.1.4.  (By default, Solaris creates threads
that do not parallelize over multiple processors, so one has to request
the proper behavior specifically.)

-------------------------------------------------------------------------------

Question 5.12.  FFTW 2.1.3 crashes on AIX.

The FFTW 2.1.3 configure script picked incorrect compiler flags for the
xlc compiler on newer IBM processors.  This is fixed in FFTW 2.1.4.