mirror of
https://github.com/tildearrow/furnace.git
synced 2024-11-24 13:35:11 +00:00
362 lines
16 KiB
Text
362 lines
16 KiB
Text
|
@node Installation and Customization, Acknowledgments, Upgrading from FFTW version 2, Top
|
||
|
@chapter Installation and Customization
|
||
|
@cindex installation
|
||
|
|
||
|
This chapter describes the installation and customization of FFTW, the
|
||
|
latest version of which may be downloaded from
|
||
|
@uref{http://www.fftw.org, the FFTW home page}.
|
||
|
|
||
|
In principle, FFTW should work on any system with an ANSI C compiler
|
||
|
(@code{gcc} is fine). However, planner time is drastically reduced if
|
||
|
FFTW can exploit a hardware cycle counter; FFTW comes with cycle-counter
|
||
|
support for all modern general-purpose CPUs, but you may need to add a
|
||
|
couple of lines of code if your compiler is not yet supported
|
||
|
(@pxref{Cycle Counters}). (On Unix, there will be a warning at the end
|
||
|
of the @code{configure} output if no cycle counter is found.)
|
||
|
@cindex cycle counter
|
||
|
@cindex compiler
|
||
|
@cindex portability
|
||
|
|
||
|
|
||
|
Installation of FFTW is simplest if you have a Unix or a GNU system,
|
||
|
such as GNU/Linux, and we describe this case in the first section below,
|
||
|
including the use of special configuration options to e.g. install
|
||
|
different precisions or exploit optimizations for particular
|
||
|
architectures (e.g. SIMD). Compilation on non-Unix systems is a more
|
||
|
manual process, but we outline the procedure in the second section. It
|
||
|
is also likely that pre-compiled binaries will be available for popular
|
||
|
systems.
|
||
|
|
||
|
Finally, we describe how you can customize FFTW for particular needs by
|
||
|
generating @emph{codelets} for fast transforms of sizes not supported
|
||
|
efficiently by the standard FFTW distribution.
|
||
|
@cindex codelet
|
||
|
|
||
|
@menu
|
||
|
* Installation on Unix::
|
||
|
* Installation on non-Unix systems::
|
||
|
* Cycle Counters::
|
||
|
* Generating your own code::
|
||
|
@end menu
|
||
|
|
||
|
@c ------------------------------------------------------------
|
||
|
|
||
|
@node Installation on Unix, Installation on non-Unix systems, Installation and Customization, Installation and Customization
|
||
|
@section Installation on Unix
|
||
|
|
||
|
FFTW comes with a @code{configure} program in the GNU style.
|
||
|
Installation can be as simple as:
|
||
|
@fpindex configure
|
||
|
|
||
|
@example
|
||
|
./configure
|
||
|
make
|
||
|
make install
|
||
|
@end example
|
||
|
|
||
|
This will build the uniprocessor complex and real transform libraries
|
||
|
along with the test programs. (We recommend that you use GNU
|
||
|
@code{make} if it is available; on some systems it is called
|
||
|
@code{gmake}.) The ``@code{make install}'' command installs the fftw
|
||
|
and rfftw libraries in standard places, and typically requires root
|
||
|
privileges (unless you specify a different install directory with the
|
||
|
@code{--prefix} flag to @code{configure}). You can also type
|
||
|
``@code{make check}'' to put the FFTW test programs through their paces.
|
||
|
If you have problems during configuration or compilation, you may want
|
||
|
to run ``@code{make distclean}'' before trying again; this ensures that
|
||
|
you don't have any stale files left over from previous compilation
|
||
|
attempts.
|
||
|
|
||
|
The @code{configure} script chooses the @code{gcc} compiler by default,
|
||
|
if it is available; you can select some other compiler with:
|
||
|
@example
|
||
|
./configure CC="@r{@i{<the name of your C compiler>}}"
|
||
|
@end example
|
||
|
|
||
|
The @code{configure} script knows good @code{CFLAGS} (C compiler flags)
|
||
|
@cindex compiler flags
|
||
|
for a few systems. If your system is not known, the @code{configure}
|
||
|
script will print out a warning. In this case, you should re-configure
|
||
|
FFTW with the command
|
||
|
@example
|
||
|
./configure CFLAGS="@r{@i{<write your CFLAGS here>}}"
|
||
|
@end example
|
||
|
and then compile as usual. If you do find an optimal set of
|
||
|
@code{CFLAGS} for your system, please let us know what they are (along
|
||
|
with the output of @code{config.guess}) so that we can include them in
|
||
|
future releases.
|
||
|
|
||
|
@code{configure} supports all the standard flags defined by the GNU
|
||
|
Coding Standards; see the @code{INSTALL} file in FFTW or
|
||
|
@uref{http://www.gnu.org/prep/standards/html_node/index.html, the GNU web page}.
|
||
|
Note especially @code{--help} to list all flags and
|
||
|
@code{--enable-shared} to create shared, rather than static, libraries.
|
||
|
@code{configure} also accepts a few FFTW-specific flags, particularly:
|
||
|
|
||
|
@itemize @bullet
|
||
|
|
||
|
@item
|
||
|
@cindex precision
|
||
|
@code{--enable-float}: Produces a single-precision version of FFTW
|
||
|
(@code{float}) instead of the default double-precision (@code{double}).
|
||
|
@xref{Precision}.
|
||
|
|
||
|
@item
|
||
|
@cindex precision
|
||
|
@code{--enable-long-double}: Produces a long-double precision version of
|
||
|
FFTW (@code{long double}) instead of the default double-precision
|
||
|
(@code{double}). The @code{configure} script will halt with an error
|
||
|
message if @code{long double} is the same size as @code{double} on your
|
||
|
machine/compiler. @xref{Precision}.
|
||
|
|
||
|
@item
|
||
|
@cindex precision
|
||
|
@code{--enable-quad-precision}: Produces a quadruple-precision version
|
||
|
of FFTW using the nonstandard @code{__float128} type provided by
|
||
|
@code{gcc} 4.6 or later on x86, x86-64, and Itanium architectures,
|
||
|
instead of the default double-precision (@code{double}). The
|
||
|
@code{configure} script will halt with an error message if the
|
||
|
compiler is not @code{gcc} version 4.6 or later or if @code{gcc}'s
|
||
|
@code{libquadmath} library is not installed. @xref{Precision}.
|
||
|
|
||
|
@item
|
||
|
@cindex threads
|
||
|
@code{--enable-threads}: Enables compilation and installation of the
|
||
|
FFTW threads library (@pxref{Multi-threaded FFTW}), which provides a
|
||
|
simple interface to parallel transforms for SMP systems. By default,
|
||
|
the threads routines are not compiled.
|
||
|
|
||
|
@item
|
||
|
@code{--enable-openmp}: Like @code{--enable-threads}, but using OpenMP
|
||
|
compiler directives in order to induce parallelism rather than
|
||
|
spawning its own threads directly, and installing an @samp{fftw3_omp} library
|
||
|
rather than an @samp{fftw3_threads} library (@pxref{Multi-threaded
|
||
|
FFTW}). You can use both @code{--enable-openmp} and @code{--enable-threads}
|
||
|
since they compile/install libraries with different names. By default,
|
||
|
the OpenMP routines are not compiled.
|
||
|
|
||
|
@item
|
||
|
@code{--with-combined-threads}: By default, if @code{--enable-threads}
|
||
|
is used, the threads support is compiled into a separate library that
|
||
|
must be linked in addition to the main FFTW library. This is so that
|
||
|
users of the serial library do not need to link the system threads
|
||
|
libraries. If @code{--with-combined-threads} is specified, however,
|
||
|
then no separate threads library is created, and threads are included
|
||
|
in the main FFTW library. This is mainly useful under Windows, where
|
||
|
no system threads library is required and inter-library dependencies
|
||
|
are problematic.
|
||
|
|
||
|
@item
|
||
|
@cindex MPI
|
||
|
@code{--enable-mpi}: Enables compilation and installation of the FFTW
|
||
|
MPI library (@pxref{Distributed-memory FFTW with MPI}), which provides
|
||
|
parallel transforms for distributed-memory systems with MPI. (By
|
||
|
default, the MPI routines are not compiled.) @xref{FFTW MPI
|
||
|
Installation}.
|
||
|
|
||
|
@item
|
||
|
@cindex Fortran-callable wrappers
|
||
|
@code{--disable-fortran}: Disables inclusion of legacy-Fortran
|
||
|
wrapper routines (@pxref{Calling FFTW from Legacy Fortran}) in the standard
|
||
|
FFTW libraries. These wrapper routines increase the library size by
|
||
|
only a negligible amount, so they are included by default as long as
|
||
|
the @code{configure} script finds a Fortran compiler on your system.
|
||
|
(To specify a particular Fortran compiler @i{foo}, pass
|
||
|
@code{F77=}@i{foo} to @code{configure}.)
|
||
|
|
||
|
@item
|
||
|
@code{--with-g77-wrappers}: By default, when Fortran wrappers are
|
||
|
included, the wrappers employ the linking conventions of the Fortran
|
||
|
compiler detected by the @code{configure} script. If this compiler is
|
||
|
GNU @code{g77}, however, then @emph{two} versions of the wrappers are
|
||
|
included: one with @code{g77}'s idiosyncratic convention of appending
|
||
|
two underscores to identifiers, and one with the more common
|
||
|
convention of appending only a single underscore. This way, the same
|
||
|
FFTW library will work with both @code{g77} and other Fortran
|
||
|
compilers, such as GNU @code{gfortran}. However, the converse is not
|
||
|
true: if you configure with a different compiler, then the
|
||
|
@code{g77}-compatible wrappers are not included. By specifying
|
||
|
@code{--with-g77-wrappers}, the @code{g77}-compatible wrappers are
|
||
|
included in addition to wrappers for whatever Fortran compiler
|
||
|
@code{configure} finds.
|
||
|
@fpindex g77
|
||
|
|
||
|
@item
|
||
|
@code{--with-slow-timer}: Disables the use of hardware cycle counters,
|
||
|
and falls back on @code{gettimeofday} or @code{clock}. This greatly
|
||
|
worsens performance, and should generally not be used (unless you don't
|
||
|
have a cycle counter but still really want an optimized plan regardless
|
||
|
of the time). @xref{Cycle Counters}.
|
||
|
|
||
|
@item
|
||
|
@code{--enable-sse} (single precision),
|
||
|
@code{--enable-sse2} (single, double),
|
||
|
@code{--enable-avx} (single, double),
|
||
|
@code{--enable-avx2} (single, double),
|
||
|
@code{--enable-avx512} (single, double),
|
||
|
@code{--enable-avx-128-fma},
|
||
|
@code{--enable-kcvi} (single),
|
||
|
@code{--enable-altivec} (single),
|
||
|
@code{--enable-vsx} (single, double),
|
||
|
@code{--enable-neon} (single, double on aarch64),
|
||
|
@code{--enable-generic-simd128},
|
||
|
and
|
||
|
@code{--enable-generic-simd256}:
|
||
|
|
||
|
Enable various SIMD instruction sets. You need compiler that supports
|
||
|
the given SIMD extensions, but FFTW will try to detect at runtime
|
||
|
whether the CPU supports these extensions. That is, you can compile
|
||
|
with@code{--enable-avx} and the code will still run on a CPU without AVX
|
||
|
support.
|
||
|
|
||
|
@itemize @minus
|
||
|
@item
|
||
|
These options require a compiler supporting SIMD extensions, and
|
||
|
compiler support is always a bit flaky: see the FFTW FAQ for a list of
|
||
|
compiler versions that have problems compiling FFTW.
|
||
|
@item
|
||
|
Because of the large variety of ARM processors and ABIs, FFTW
|
||
|
does not attempt to guess the correct @code{gcc} flags for generating
|
||
|
NEON code. In general, you will have to provide them on the command line.
|
||
|
This command line is known to have worked at least once:
|
||
|
@example
|
||
|
./configure --with-slow-timer --host=arm-linux-gnueabi \
|
||
|
--enable-single --enable-neon \
|
||
|
"CC=arm-linux-gnueabi-gcc -march=armv7-a -mfloat-abi=softfp"
|
||
|
@end example
|
||
|
@end itemize
|
||
|
|
||
|
@end itemize
|
||
|
|
||
|
@cindex compiler
|
||
|
To force @code{configure} to use a particular C compiler @i{foo}
|
||
|
(instead of the default, usually @code{gcc}), pass @code{CC=}@i{foo} to the
|
||
|
@code{configure} script; you may also need to set the flags via the variable
|
||
|
@code{CFLAGS} as described above.
|
||
|
@cindex compiler flags
|
||
|
|
||
|
@c ------------------------------------------------------------
|
||
|
@node Installation on non-Unix systems, Cycle Counters, Installation on Unix, Installation and Customization
|
||
|
@section Installation on non-Unix systems
|
||
|
|
||
|
It should be relatively straightforward to compile FFTW even on non-Unix
|
||
|
systems lacking the niceties of a @code{configure} script. Basically,
|
||
|
you need to edit the @code{config.h} header (copy it from
|
||
|
@code{config.h.in}) to @code{#define} the various options and compiler
|
||
|
characteristics, and then compile all the @samp{.c} files in the
|
||
|
relevant directories.
|
||
|
|
||
|
The @code{config.h} header contains about 100 options to set, each one
|
||
|
initially an @code{#undef}, each documented with a comment, and most of
|
||
|
them fairly obvious. For most of the options, you should simply
|
||
|
@code{#define} them to @code{1} if they are applicable, although a few
|
||
|
options require a particular value (e.g. @code{SIZEOF_LONG_LONG} should
|
||
|
be defined to the size of the @code{long long} type, in bytes, or zero
|
||
|
if it is not supported). We will likely post some sample
|
||
|
@code{config.h} files for various operating systems and compilers for
|
||
|
you to use (at least as a starting point). Please let us know if you
|
||
|
have to hand-create a configuration file (and/or a pre-compiled binary)
|
||
|
that you want to share.
|
||
|
|
||
|
To create the FFTW library, you will then need to compile all of the
|
||
|
@samp{.c} files in the @code{kernel}, @code{dft}, @code{dft/scalar},
|
||
|
@code{dft/scalar/codelets}, @code{rdft}, @code{rdft/scalar},
|
||
|
@code{rdft/scalar/r2cf}, @code{rdft/scalar/r2cb},
|
||
|
@code{rdft/scalar/r2r}, @code{reodft}, and @code{api} directories.
|
||
|
If you are compiling with SIMD support (e.g. you defined
|
||
|
@code{HAVE_SSE2} in @code{config.h}), then you also need to compile
|
||
|
the @code{.c} files in the @code{simd-support},
|
||
|
@code{@{dft,rdft@}/simd}, @code{@{dft,rdft@}/simd/*} directories.
|
||
|
|
||
|
Once these files are all compiled, link them into a library, or a shared
|
||
|
library, or directly into your program.
|
||
|
|
||
|
To compile the FFTW test program, additionally compile the code in the
|
||
|
@code{libbench2/} directory, and link it into a library. Then compile
|
||
|
the code in the @code{tests/} directory and link it to the
|
||
|
@code{libbench2} and FFTW libraries. To compile the @code{fftw-wisdom}
|
||
|
(command-line) tool (@pxref{Wisdom Utilities}), compile
|
||
|
@code{tools/fftw-wisdom.c} and link it to the @code{libbench2} and FFTW
|
||
|
libraries
|
||
|
|
||
|
@c ------------------------------------------------------------
|
||
|
@node Cycle Counters, Generating your own code, Installation on non-Unix systems, Installation and Customization
|
||
|
@section Cycle Counters
|
||
|
@cindex cycle counter
|
||
|
|
||
|
FFTW's planner actually executes and times different possible FFT
|
||
|
algorithms in order to pick the fastest plan for a given @math{n}. In
|
||
|
order to do this in as short a time as possible, however, the timer must
|
||
|
have a very high resolution, and to accomplish this we employ the
|
||
|
hardware @dfn{cycle counters} that are available on most CPUs.
|
||
|
Currently, FFTW supports the cycle counters on x86, PowerPC/POWER, Alpha,
|
||
|
UltraSPARC (SPARC v9), IA64, PA-RISC, and MIPS processors.
|
||
|
|
||
|
@cindex compiler
|
||
|
Access to the cycle counters, unfortunately, is a compiler and/or
|
||
|
operating-system dependent task, often requiring inline assembly
|
||
|
language, and it may be that your compiler is not supported. If you are
|
||
|
@emph{not} supported, FFTW will by default fall back on its estimator
|
||
|
(effectively using @code{FFTW_ESTIMATE} for all plans).
|
||
|
@ctindex FFTW_ESTIMATE
|
||
|
|
||
|
You can add support by editing the file @code{kernel/cycle.h}; normally,
|
||
|
this will involve adapting one of the examples already present in order
|
||
|
to use the inline-assembler syntax for your C compiler, and will only
|
||
|
require a couple of lines of code. Anyone adding support for a new
|
||
|
system to @code{cycle.h} is encouraged to email us at @email{fftw@@fftw.org}.
|
||
|
|
||
|
If a cycle counter is not available on your system (e.g. some embedded
|
||
|
processor), and you don't want to use estimated plans, as a last resort
|
||
|
you can use the @code{--with-slow-timer} option to @code{configure} (on
|
||
|
Unix) or @code{#define WITH_SLOW_TIMER} in @code{config.h} (elsewhere).
|
||
|
This will use the much lower-resolution @code{gettimeofday} function, or even
|
||
|
@code{clock} if the former is unavailable, and planning will be
|
||
|
extremely slow.
|
||
|
|
||
|
@c ------------------------------------------------------------
|
||
|
@node Generating your own code, , Cycle Counters, Installation and Customization
|
||
|
@section Generating your own code
|
||
|
@cindex code generator
|
||
|
|
||
|
The directory @code{genfft} contains the programs that were used to
|
||
|
generate FFTW's ``codelets,'' which are hard-coded transforms of small
|
||
|
sizes.
|
||
|
@cindex codelet
|
||
|
We do not expect casual users to employ the generator, which is a rather
|
||
|
sophisticated program that generates directed acyclic graphs of FFT
|
||
|
algorithms and performs algebraic simplifications on them. It was
|
||
|
written in Objective Caml, a dialect of ML, which is available at
|
||
|
@uref{http://caml.inria.fr/ocaml/index.en.html}.
|
||
|
@cindex Caml
|
||
|
|
||
|
|
||
|
If you have Objective Caml installed (along with recent versions of
|
||
|
GNU @code{autoconf}, @code{automake}, and @code{libtool}), then you
|
||
|
can change the set of codelets that are generated or play with the
|
||
|
generation options. The set of generated codelets is specified by the
|
||
|
@code{@{dft,rdft@}/@{codelets,simd@}/*/Makefile.am} files. For example, you can add
|
||
|
efficient REDFT codelets of small sizes by modifying
|
||
|
@code{rdft/codelets/r2r/Makefile.am}.
|
||
|
@cindex REDFT
|
||
|
After you modify any @code{Makefile.am} files, you can type @code{sh
|
||
|
bootstrap.sh} in the top-level directory followed by @code{make} to
|
||
|
re-generate the files.
|
||
|
|
||
|
We do not provide more details about the code-generation process, since
|
||
|
we do not expect that most users will need to generate their own code.
|
||
|
However, feel free to contact us at @email{fftw@@fftw.org} if
|
||
|
you are interested in the subject.
|
||
|
|
||
|
@cindex monadic programming
|
||
|
You might find it interesting to learn Caml and/or some modern
|
||
|
programming techniques that we used in the generator (including monadic
|
||
|
programming), especially if you heard the rumor that Java and
|
||
|
object-oriented programming are the latest advancement in the field.
|
||
|
The internal operation of the codelet generator is described in the
|
||
|
paper, ``A Fast Fourier Transform Compiler,'' by M. Frigo, which is
|
||
|
available from the @uref{http://www.fftw.org,FFTW home page} and also
|
||
|
appeared in the @cite{Proceedings of the 1999 ACM SIGPLAN Conference on
|
||
|
Programming Language Design and Implementation (PLDI)}.
|
||
|
|