@node Installation and Customization, Acknowledgments, Upgrading from FFTW version 2, Top
@chapter Installation and Customization
@cindex installation

This chapter describes the installation and customization of FFTW, the
latest version of which may be downloaded from
@uref{http://www.fftw.org, the FFTW home page}.

In principle, FFTW should work on any system with an ANSI C compiler
(@code{gcc} is fine).  However, planner time is drastically reduced if
FFTW can exploit a hardware cycle counter; FFTW comes with cycle-counter
support for all modern general-purpose CPUs, but you may need to add a
couple of lines of code if your compiler is not yet supported
(@pxref{Cycle Counters}).  (On Unix, there will be a warning at the end
of the @code{configure} output if no cycle counter is found.)
@cindex cycle counter
@cindex compiler
@cindex portability


Installation of FFTW is simplest if you have a Unix or a GNU system,
such as GNU/Linux, and we describe this case in the first section below,
including the use of special configuration options to e.g. install
different precisions or exploit optimizations for particular
architectures (e.g. SIMD).  Compilation on non-Unix systems is a more
manual process, but we outline the procedure in the second section.  It
is also likely that pre-compiled binaries will be available for popular
systems.

Finally, we describe how you can customize FFTW for particular needs by
generating @emph{codelets} for fast transforms of sizes not supported
efficiently by the standard FFTW distribution.
@cindex codelet

@menu
* Installation on Unix::
* Installation on non-Unix systems::
* Cycle Counters::
* Generating your own code::
@end menu

@c ------------------------------------------------------------

@node Installation on Unix, Installation on non-Unix systems, Installation and Customization, Installation and Customization
@section Installation on Unix

FFTW comes with a @code{configure} program in the GNU style.
Installation can be as simple as:
@fpindex configure

@example
./configure
make
make install
@end example

This will build the uniprocessor complex and real transform libraries
along with the test programs.  (We recommend that you use GNU
@code{make} if it is available; on some systems it is called
@code{gmake}.)  The ``@code{make install}'' command installs the fftw
and rfftw libraries in standard places, and typically requires root
privileges (unless you specify a different install directory with the
@code{--prefix} flag to @code{configure}).  You can also type
``@code{make check}'' to put the FFTW test programs through their paces.
If you have problems during configuration or compilation, you may want
to run ``@code{make distclean}'' before trying again; this ensures that
you don't have any stale files left over from previous compilation
attempts.

The @code{configure} script chooses the @code{gcc} compiler by default,
if it is available; you can select some other compiler with:
@example
./configure CC="@r{@i{<the name of your C compiler>}}"
@end example

The @code{configure} script knows good @code{CFLAGS} (C compiler flags)
@cindex compiler flags
for a few systems.  If your system is not known, the @code{configure}
script will print out a warning.  In this case, you should re-configure
FFTW with the command
@example
./configure CFLAGS="@r{@i{<write your CFLAGS here>}}"
@end example
and then compile as usual.  If you do find an optimal set of
@code{CFLAGS} for your system, please let us know what they are (along
with the output of @code{config.guess}) so that we can include them in
future releases.

@code{configure} supports all the standard flags defined by the GNU
Coding Standards; see the @code{INSTALL} file in FFTW or
@uref{http://www.gnu.org/prep/standards/html_node/index.html, the GNU web page}.
Note especially @code{--help} to list all flags and
@code{--enable-shared} to create shared, rather than static, libraries.
@code{configure} also accepts a few FFTW-specific flags, particularly:

@itemize @bullet

@item
@cindex precision
@code{--enable-float}: Produces a single-precision version of FFTW
(@code{float}) instead of the default double-precision (@code{double}).
@xref{Precision}.

@item
@cindex precision
@code{--enable-long-double}: Produces a long-double precision version of
FFTW (@code{long double}) instead of the default double-precision
(@code{double}).  The @code{configure} script will halt with an error
message if @code{long double} is the same size as @code{double} on your
machine/compiler.  @xref{Precision}.

@item
@cindex precision
@code{--enable-quad-precision}: Produces a quadruple-precision version
of FFTW using the nonstandard @code{__float128} type provided by
@code{gcc} 4.6 or later on x86, x86-64, and Itanium architectures,
instead of the default double-precision (@code{double}).  The
@code{configure} script will halt with an error message if the
compiler is not @code{gcc} version 4.6 or later or if @code{gcc}'s
@code{libquadmath} library is not installed.  @xref{Precision}.

@item
@cindex threads
@code{--enable-threads}: Enables compilation and installation of the
FFTW threads library (@pxref{Multi-threaded FFTW}), which provides a
simple interface to parallel transforms for SMP systems.  By default,
the threads routines are not compiled.

@item
@code{--enable-openmp}: Like @code{--enable-threads}, but using OpenMP
compiler directives in order to induce parallelism rather than
spawning its own threads directly, and installing an @samp{fftw3_omp} library
rather than an @samp{fftw3_threads} library (@pxref{Multi-threaded           
FFTW}).  You can use both @code{--enable-openmp} and @code{--enable-threads}
since they compile/install libraries with different names.  By default,
the OpenMP routines are not compiled.

@item
@code{--with-combined-threads}: By default, if @code{--enable-threads}
is used, the threads support is compiled into a separate library that
must be linked in addition to the main FFTW library.  This is so that
users of the serial library do not need to link the system threads
libraries.  If @code{--with-combined-threads} is specified, however,
then no separate threads library is created, and threads are included
in the main FFTW library.  This is mainly useful under Windows, where
no system threads library is required and inter-library dependencies
are problematic.

@item
@cindex MPI
@code{--enable-mpi}: Enables compilation and installation of the FFTW
MPI library (@pxref{Distributed-memory FFTW with MPI}), which provides
parallel transforms for distributed-memory systems with MPI.  (By
default, the MPI routines are not compiled.)  @xref{FFTW MPI
Installation}.

@item
@cindex Fortran-callable wrappers
@code{--disable-fortran}: Disables inclusion of legacy-Fortran
wrapper routines (@pxref{Calling FFTW from Legacy Fortran}) in the standard
FFTW libraries.  These wrapper routines increase the library size by
only a negligible amount, so they are included by default as long as
the @code{configure} script finds a Fortran compiler on your system.
(To specify a particular Fortran compiler @i{foo}, pass
@code{F77=}@i{foo} to @code{configure}.)

@item
@code{--with-g77-wrappers}: By default, when Fortran wrappers are
included, the wrappers employ the linking conventions of the Fortran
compiler detected by the @code{configure} script.  If this compiler is
GNU @code{g77}, however, then @emph{two} versions of the wrappers are
included: one with @code{g77}'s idiosyncratic convention of appending
two underscores to identifiers, and one with the more common
convention of appending only a single underscore.  This way, the same
FFTW library will work with both @code{g77} and other Fortran
compilers, such as GNU @code{gfortran}.  However, the converse is not
true: if you configure with a different compiler, then the
@code{g77}-compatible wrappers are not included.  By specifying
@code{--with-g77-wrappers}, the @code{g77}-compatible wrappers are
included in addition to wrappers for whatever Fortran compiler
@code{configure} finds.
@fpindex g77

@item
@code{--with-slow-timer}: Disables the use of hardware cycle counters,
and falls back on @code{gettimeofday} or @code{clock}.  This greatly
worsens performance, and should generally not be used (unless you don't
have a cycle counter but still really want an optimized plan regardless
of the time).  @xref{Cycle Counters}.

@item
@code{--enable-sse} (single precision),
@code{--enable-sse2} (single, double),
@code{--enable-avx} (single, double),
@code{--enable-avx2} (single, double),
@code{--enable-avx512} (single, double),
@code{--enable-avx-128-fma},
@code{--enable-kcvi} (single),
@code{--enable-altivec} (single),
@code{--enable-vsx} (single, double),
@code{--enable-neon} (single, double on aarch64),
@code{--enable-generic-simd128},
and
@code{--enable-generic-simd256}:

Enable various SIMD instruction sets.  You need compiler that supports
the given SIMD extensions, but FFTW will try to detect at runtime
whether the CPU supports these extensions.  That is, you can compile
with@code{--enable-avx} and the code will still run on a CPU without AVX
support.

@itemize @minus
@item
These options require a compiler supporting SIMD extensions, and
compiler support is always a bit flaky: see the FFTW FAQ for a list of
compiler versions that have problems compiling FFTW.
@item
Because of the large variety of ARM processors and ABIs, FFTW
does not attempt to guess the correct @code{gcc} flags for generating
NEON code.  In general, you will have to provide them on the command line.
This command line is known to have worked at least once:
@example
./configure --with-slow-timer --host=arm-linux-gnueabi \
  --enable-single --enable-neon \
  "CC=arm-linux-gnueabi-gcc -march=armv7-a -mfloat-abi=softfp"
@end example
@end itemize

@end itemize

@cindex compiler
To force @code{configure} to use a particular C compiler @i{foo}
(instead of the default, usually @code{gcc}), pass @code{CC=}@i{foo} to the 
@code{configure} script; you may also need to set the flags via the variable
@code{CFLAGS} as described above.
@cindex compiler flags

@c ------------------------------------------------------------
@node Installation on non-Unix systems, Cycle Counters, Installation on Unix, Installation and Customization
@section Installation on non-Unix systems

It should be relatively straightforward to compile FFTW even on non-Unix
systems lacking the niceties of a @code{configure} script.  Basically,
you need to edit the @code{config.h} header (copy it from
@code{config.h.in}) to @code{#define} the various options and compiler
characteristics, and then compile all the @samp{.c} files in the
relevant directories.  

The @code{config.h} header contains about 100 options to set, each one
initially an @code{#undef}, each documented with a comment, and most of
them fairly obvious.  For most of the options, you should simply
@code{#define} them to @code{1} if they are applicable, although a few
options require a particular value (e.g. @code{SIZEOF_LONG_LONG} should
be defined to the size of the @code{long long} type, in bytes, or zero
if it is not supported).  We will likely post some sample
@code{config.h} files for various operating systems and compilers for
you to use (at least as a starting point).  Please let us know if you
have to hand-create a configuration file (and/or a pre-compiled binary)
that you want to share.

To create the FFTW library, you will then need to compile all of the
@samp{.c} files in the @code{kernel}, @code{dft}, @code{dft/scalar},
@code{dft/scalar/codelets}, @code{rdft}, @code{rdft/scalar},
@code{rdft/scalar/r2cf}, @code{rdft/scalar/r2cb},
@code{rdft/scalar/r2r}, @code{reodft}, and @code{api} directories.
If you are compiling with SIMD support (e.g. you defined
@code{HAVE_SSE2} in @code{config.h}), then you also need to compile
the @code{.c} files in the @code{simd-support},
@code{@{dft,rdft@}/simd}, @code{@{dft,rdft@}/simd/*} directories.

Once these files are all compiled, link them into a library, or a shared
library, or directly into your program.

To compile the FFTW test program, additionally compile the code in the
@code{libbench2/} directory, and link it into a library.  Then compile
the code in the @code{tests/} directory and link it to the
@code{libbench2} and FFTW libraries.  To compile the @code{fftw-wisdom}
(command-line) tool (@pxref{Wisdom Utilities}), compile
@code{tools/fftw-wisdom.c} and link it to the @code{libbench2} and FFTW
libraries

@c ------------------------------------------------------------
@node Cycle Counters, Generating your own code, Installation on non-Unix systems, Installation and Customization
@section Cycle Counters
@cindex cycle counter

FFTW's planner actually executes and times different possible FFT
algorithms in order to pick the fastest plan for a given @math{n}.  In
order to do this in as short a time as possible, however, the timer must
have a very high resolution, and to accomplish this we employ the
hardware @dfn{cycle counters} that are available on most CPUs.
Currently, FFTW supports the cycle counters on x86, PowerPC/POWER, Alpha,
UltraSPARC (SPARC v9), IA64, PA-RISC, and MIPS processors.

@cindex compiler
Access to the cycle counters, unfortunately, is a compiler and/or
operating-system dependent task, often requiring inline assembly
language, and it may be that your compiler is not supported.  If you are
@emph{not} supported, FFTW will by default fall back on its estimator
(effectively using @code{FFTW_ESTIMATE} for all plans).
@ctindex FFTW_ESTIMATE

You can add support by editing the file @code{kernel/cycle.h}; normally,
this will involve adapting one of the examples already present in order
to use the inline-assembler syntax for your C compiler, and will only
require a couple of lines of code.  Anyone adding support for a new
system to @code{cycle.h} is encouraged to email us at @email{fftw@@fftw.org}.

If a cycle counter is not available on your system (e.g. some embedded
processor), and you don't want to use estimated plans, as a last resort
you can use the @code{--with-slow-timer} option to @code{configure} (on
Unix) or @code{#define WITH_SLOW_TIMER} in @code{config.h} (elsewhere).
This will use the much lower-resolution @code{gettimeofday} function, or even
@code{clock} if the former is unavailable, and planning will be
extremely slow.

@c ------------------------------------------------------------
@node Generating your own code,  , Cycle Counters, Installation and Customization
@section Generating your own code
@cindex code generator

The directory @code{genfft} contains the programs that were used to
generate FFTW's ``codelets,'' which are hard-coded transforms of small
sizes.
@cindex codelet
We do not expect casual users to employ the generator, which is a rather
sophisticated program that generates directed acyclic graphs of FFT
algorithms and performs algebraic simplifications on them.  It was
written in Objective Caml, a dialect of ML, which is available at
@uref{http://caml.inria.fr/ocaml/index.en.html}.
@cindex Caml


If you have Objective Caml installed (along with recent versions of
GNU @code{autoconf}, @code{automake}, and @code{libtool}), then you
can change the set of codelets that are generated or play with the
generation options.  The set of generated codelets is specified by the
@code{@{dft,rdft@}/@{codelets,simd@}/*/Makefile.am} files.  For example, you can add
efficient REDFT codelets of small sizes by modifying
@code{rdft/codelets/r2r/Makefile.am}.
@cindex REDFT
After you modify any @code{Makefile.am} files, you can type @code{sh
bootstrap.sh} in the top-level directory followed by @code{make} to
re-generate the files.

We do not provide more details about the code-generation process, since
we do not expect that most users will need to generate their own code.
However, feel free to contact us at @email{fftw@@fftw.org} if
you are interested in the subject.

@cindex monadic programming
You might find it interesting to learn Caml and/or some modern
programming techniques that we used in the generator (including monadic
programming), especially if you heard the rumor that Java and
object-oriented programming are the latest advancement in the field.
The internal operation of the codelet generator is described in the
paper, ``A Fast Fourier Transform Compiler,'' by M. Frigo, which is
available from the @uref{http://www.fftw.org,FFTW home page} and also
appeared in the @cite{Proceedings of the 1999 ACM SIGPLAN Conference on
Programming Language Design and Implementation (PLDI)}.