These ‘-m
’ options are defined for the x86 family of computers.
-march=
cpu-type
The choices for cpu-type are:
native
’-march=native
enables all instruction subsets supported by the local machine (hence the result might not run on different machines). Using -mtune=native
produces code optimized for the local machine under the constraints of the selected instruction set. i386
’i486
’i586
’pentium
’lakemont
’pentium-mmx
’pentiumpro
’i686
’-march
, the Pentium Pro instruction set is used, so the code runs on all i686 family chips. When used with -mtune
, it has the same meaning as ‘generic
’. pentium2
’pentium3
’pentium3m
’pentium-m
’pentium4
’pentium4m
’prescott
’nocona
’core2
’nehalem
’westmere
’sandybridge
’ivybridge
’haswell
’broadwell
’skylake
’bonnell
’silvermont
’knl
’skylake-avx512
’k6
’k6-2
’k6-3
’athlon
’athlon-tbird
’athlon-4
’athlon-xp
’athlon-mp
’k8
’opteron
’athlon64
’athlon-fx
’k8-sse3
’opteron-sse3
’athlon64-sse3
’amdfam10
’barcelona
’bdver1
’bdver2
’bdver3
’bdver4
’znver1
’btver1
’btver2
’winchip-c6
’winchip2
’c3
’c3-2
’geode
’-mtune=
cpu-type
-mtune=pentium4
generates code that is tuned for Pentium 4 but still runs on i686 machines. The choices for cpu-type are the same as for -march
. In addition, -mtune
supports 2 extra choices for cpu-type:
generic
’-mtune
or -march
option instead of -mtune=generic
. But, if you do not know exactly what CPU users of your application will have, then you should use this option. As new processors are deployed in the marketplace, the behavior of this option will change. Therefore, if you upgrade to a newer version of GCC, code generation controlled by this option will change to reflect the processors that are most common at the time that version of GCC is released.
There is no -march=generic
option because -march
indicates the instruction set the compiler can use, and there is no generic instruction set applicable to all processors. In contrast, -mtune
indicates the processor (or, in this case, collection of processors) for which the code is optimized.
intel
’-mtune
or -march
option instead of -mtune=intel
. But, if you want your application performs better on both Haswell and Silvermont, then you should use this option. As new Intel processors are deployed in the marketplace, the behavior of this option will change. Therefore, if you upgrade to a newer version of GCC, code generation controlled by this option will change to reflect the most current Intel processors at the time that version of GCC is released.
There is no -march=intel
option because -march
indicates the instruction set the compiler can use, and there is no common instruction set applicable to all processors. In contrast, -mtune
indicates the processor (or, in this case, collection of processors) for which the code is optimized.
-mcpu=
cpu-type
-mtune
. -mfpmath=
unit
387
’-ffloat-store
for more detailed description. This is the default choice for x86-32 targets.
sse
’For the x86-32 compiler, you must use -march=cpu-type, -msse
or -msse2
switches to enable SSE extensions and make this option effective. For the x86-64 compiler, these extensions are enabled by default.
The resulting code should be considerably faster in the majority of cases and avoid the numerical instability problems of 387 code, but may break some existing code that expects temporaries to be 80 bits.
This is the default choice for the x86-64 compiler.
sse,387
’sse+387
’both
’-masm=
dialect
asm
(see Basic Asm) and extended asm
(see Extended Asm). Supported choices (in dialect order) are ‘att
’ or ‘intel
’. The default is ‘att
’. Darwin does not support ‘intel
’. -mieee-fp
-mno-ieee-fp
-msoft-float
Warning: the requisite libraries are not part of GCC. Normally the facilities of the machine's usual C compiler are used, but this can't be done directly in cross-compilation. You must make your own arrangements to provide suitable library functions for cross-compilation.
On machines where a function returns floating-point results in the 80387 register stack, some floating-point opcodes may be emitted even if -msoft-float
is used.
-mno-fp-ret-in-387
The usual calling convention has functions return values of types float
and double
in an FPU register, even if there is no FPU. The idea is that the operating system should emulate an FPU.
The option -mno-fp-ret-in-387
causes such values to be returned in ordinary CPU registers instead.
-mno-fancy-math-387
sin
, cos
and sqrt
instructions for the 387. Specify this option to avoid generating those instructions. This option is the default on OpenBSD and NetBSD. This option is overridden when -march
indicates that the target CPU always has an FPU and so the instruction does not need emulation. These instructions are not generated unless you also use the -funsafe-math-optimizations
switch. -malign-double
-mno-align-double
double
, long double
, and long long
variables on a two-word boundary or a one-word boundary. Aligning double
variables on a two-word boundary produces code that runs somewhat faster on a Pentium at the expense of more memory. On x86-64, -malign-double
is enabled by default.
Warning: if you use the -malign-double
switch, structures containing the above types are aligned differently than the published application binary interface specifications for the x86-32 and are not binary compatible with structures in code compiled without that switch.
-m96bit-long-double
-m128bit-long-double
long double
type. The x86-32 application binary interface specifies the size to be 96 bits, so -m96bit-long-double
is the default in 32-bit mode. Modern architectures (Pentium and newer) prefer long double
to be aligned to an 8- or 16-byte boundary. In arrays or structures conforming to the ABI, this is not possible. So specifying -m128bit-long-double
aligns long double
to a 16-byte boundary by padding the long double
with an additional 32-bit zero.
In the x86-64 compiler, -m128bit-long-double
is the default choice as its ABI specifies that long double
is aligned on 16-byte boundary.
Notice that neither of these options enable any extra precision over the x87 standard of 80 bits for a long double
.
Warning: if you override the default value for your target ABI, this changes the size of structures and arrays containing long double
variables, as well as modifying the function calling convention for functions taking long double
. Hence they are not binary-compatible with code compiled without that switch.
-mlong-double-64
-mlong-double-80
-mlong-double-128
long double
type. A size of 64 bits makes the long double
type equivalent to the double
type. This is the default for 32-bit Bionic C library. A size of 128 bits makes the long double
type equivalent to the __float128
type. This is the default for 64-bit Bionic C library. Warning: if you override the default value for your target ABI, this changes the size of structures and arrays containing long double
variables, as well as modifying the function calling convention for functions taking long double
. Hence they are not binary-compatible with code compiled without that switch.
-malign-data=
type
compat
’ uses increased alignment value compatible uses GCC 4.8 and earlier, ‘abi
’ uses alignment value as specified by the psABI, and ‘cacheline
’ uses increased alignment value to match the cache line size. ‘compat
’ is the default. -mlarge-data-threshold=
threshold
-mcmodel=medium
is specified, data objects larger than threshold are placed in the large data section. This value must be the same across all objects linked into the binary, and defaults to 65535. -mrtd
ret
num instruction, which pops their arguments while returning. This saves one instruction in the caller since there is no need to pop the arguments there. You can specify that an individual function is called with this calling sequence with the function attribute stdcall
. You can also override the -mrtd
option by using the function attribute cdecl
. See Function Attributes.
Warning: this calling convention is incompatible with the one normally used on Unix, so you cannot use it if you need to call libraries compiled with the Unix compiler.
Also, you must provide function prototypes for all functions that take variable numbers of arguments (including printf
); otherwise incorrect code is generated for calls to those functions.
In addition, seriously incorrect code results if you call a function with too many arguments. (Normally, extra arguments are harmlessly ignored.)
-mregparm=
num
regparm
. See Function Attributes. Warning: if you use this switch, and num is nonzero, then you must build all modules with the same value, including any libraries. This includes the system libraries and startup modules.
-msseregparm
sseregparm
. See Function Attributes. Warning: if you use this switch then you must build all modules with the same value, including any libraries. This includes the system libraries and startup modules.
-mvect8-ret-in-mem
-mpc32
-mpc64
-mpc80
-mpc32
is specified, the significands of results of floating-point operations are rounded to 24 bits (single precision); -mpc64
rounds the significands of results of floating-point operations to 53 bits (double precision) and -mpc80
rounds the significands of results of floating-point operations to 64 bits (extended double precision), which is the default. When this option is used, floating-point operations in higher precisions are not available to the programmer without setting the FPU control word explicitly. Setting the rounding of floating-point operations to less than the default 80 bits can speed some programs by 2% or more. Note that some mathematical libraries assume that extended-precision (80-bit) floating-point operations are enabled by default; routines in such libraries could suffer significant loss of accuracy, typically through so-called “catastrophic cancellation”, when this option is used to set the precision to less than extended precision.
-mstackrealign
-mstackrealign
option generates an alternate prologue and epilogue that realigns the run-time stack if necessary. This supports mixing legacy codes that keep 4-byte stack alignment with modern codes that keep 16-byte stack alignment for SSE compatibility. See also the attribute force_align_arg_pointer
, applicable to individual functions. -mpreferred-stack-boundary=
num
-mpreferred-stack-boundary
is not specified, the default is 4 (16 bytes or 128 bits). Warning: When generating code for the x86-64 architecture with SSE extensions disabled, -mpreferred-stack-boundary=3
can be used to keep the stack boundary aligned to 8 byte boundary. Since x86-64 ABI require 16 byte stack alignment, this is ABI incompatible and intended to be used in controlled environment where stack space is important limitation. This option leads to wrong code when functions compiled with 16 byte stack alignment (such as functions from a standard library) are called with misaligned stack. In this case, SSE instructions may lead to misaligned memory access traps. In addition, variable arguments are handled incorrectly for 16 byte aligned objects (including x87 long double and __int128), leading to wrong results. You must build all modules with -mpreferred-stack-boundary=3
, including any libraries. This includes the system libraries and startup modules.
-mincoming-stack-boundary=
num
-mincoming-stack-boundary
is not specified, the one specified by -mpreferred-stack-boundary
is used. On Pentium and Pentium Pro, double
and long double
values should be aligned to an 8-byte boundary (see -malign-double
) or suffer significant run time performance penalties. On Pentium III, the Streaming SIMD Extension (SSE) data type __m128
may not work properly if it is not 16-byte aligned.
To ensure proper alignment of this values on the stack, the stack boundary must be as aligned as that required by any value stored on the stack. Further, every function must be generated such that it keeps the stack aligned. Thus calling a function compiled with a higher preferred stack boundary from a function compiled with a lower preferred stack boundary most likely misaligns the stack. It is recommended that libraries that use callbacks always use the default setting.
This extra alignment does consume extra stack space, and generally increases code size. Code that is sensitive to stack space usage, such as embedded systems and operating system kernels, may want to reduce the preferred alignment to -mpreferred-stack-boundary=2
.
-mmmx
-msse
-msse2
-msse3
-mssse3
-msse4
-msse4a
-msse4.1
-msse4.2
-mavx
-mavx2
-mavx512f
-mavx512pf
-mavx512er
-mavx512cd
-mavx512vl
-mavx512bw
-mavx512dq
-mavx512ifma
-mavx512vbmi
-msha
-maes
-mpclmul
-mclfushopt
-mfsgsbase
-mrdrnd
-mf16c
-mfma
-mfma4
-mprefetchwt1
-mxop
-mlwp
-m3dnow
-mpopcnt
-mabm
-mbmi
-mbmi2
-mlzcnt
-mfxsr
-mxsave
-mxsaveopt
-mxsavec
-mxsaves
-mrtm
-mtbm
-mmpx
-mmwaitx
-mclzero
-mpku
-mno-
option to disable use of these instructions. These extensions are also available as built-in functions: see x86 Built-in Functions, for details of the functions enabled and disabled by these switches.
To generate SSE/SSE2 instructions automatically from floating-point code (as opposed to 387 instructions), see -mfpmath=sse
.
GCC depresses SSEx instructions when -mavx
is used. Instead, it generates new AVX instructions or AVX equivalence for all SSEx instructions when needed.
These options enable GCC to use these extended instructions in generated code, even without -mfpmath=sse
. Applications that perform run-time CPU detection must compile separate files for each supported architecture, using the appropriate flags. In particular, the file containing the CPU detection code should be compiled without these options.
-mdump-tune-features
-mtune-ctrl=
feature-list
-mdump-tune-features
. When specified, the feature is turned on if it is not preceded with ‘^
’, otherwise, it is turned off. -mtune-ctrl=feature-list is intended to be used by GCC developers. Using it may lead to code paths not covered by testing and can potentially result in compiler ICEs or runtime errors. -mno-default
-mdump-tune-features
. -mcld
cld
instruction in the prologue of functions that use string instructions. String instructions depend on the DF flag to select between autoincrement or autodecrement mode. While the ABI specifies the DF flag to be cleared on function entry, some operating systems violate this specification by not clearing the DF flag in their exception dispatchers. The exception handler can be invoked with the DF flag set, which leads to wrong direction mode when string instructions are used. This option can be enabled by default on 32-bit x86 targets by configuring GCC with the --enable-cld
configure option. Generation of cld
instructions can be suppressed with the -mno-cld
compiler option in this case. -mvzeroupper
vzeroupper
instruction before a transfer of control flow out of the function to minimize the AVX to SSE transition penalty as well as remove unnecessary zeroupper
intrinsics. -mprefer-avx128
-mcx16
CMPXCHG16B
instructions. CMPXCHG16B
allows for atomic operations on 128-bit double quadword (or oword) data types. This is useful for high-resolution counters that can be updated by multiple processors (or cores). This instruction is generated as part of atomic built-in functions: see __sync Builtins or __atomic Builtins for details. -msahf
SAHF
instructions in 64-bit code. Early Intel Pentium 4 CPUs with Intel 64 support, prior to the introduction of Pentium 4 G1 step in December 2005, lacked the LAHF
and SAHF
instructions which are supported by AMD64. These are load and store instructions, respectively, for certain status flags. In 64-bit mode, the SAHF
instruction is used to optimize fmod
, drem
, and remainder
built-in functions; see Other Builtins for details. -mmovbe
movbe
instruction to implement __builtin_bswap32
and __builtin_bswap64
. -mcrc32
__builtin_ia32_crc32qi
, __builtin_ia32_crc32hi
, __builtin_ia32_crc32si
and __builtin_ia32_crc32di
to generate the crc32
machine instruction. -mrecip
RCPSS
and RSQRTSS
instructions (and their vectorized variants RCPPS
and RSQRTPS
) with an additional Newton-Raphson step to increase precision instead of DIVSS
and SQRTSS
(and their vectorized variants) for single-precision floating-point arguments. These instructions are generated only when -funsafe-math-optimizations
is enabled together with -ffinite-math-only
and -fno-trapping-math
. Note that while the throughput of the sequence is higher than the throughput of the non-reciprocal instruction, the precision of the sequence can be decreased by up to 2 ulp (i.e. the inverse of 1.0 equals 0.99999994). Note that GCC implements 1.0f/sqrtf(
x)
in terms of RSQRTSS
(or RSQRTPS
) already with -ffast-math
(or the above option combination), and doesn't need -mrecip
.
Also note that GCC emits the above sequence with additional Newton-Raphson step for vectorized single-float division and vectorized sqrtf(
x)
already with -ffast-math
(or the above option combination), and doesn't need -mrecip
.
-mrecip=
opt
!
’ to invert the option: all
’default
’-mrecip
. none
’-mno-recip
. div
’vec-div
’sqrt
’vec-sqrt
’So, for example, -mrecip=all,!sqrt
enables all of the reciprocal approximations, except for square root.
-mveclibabi=
type
svml
’ for the Intel short vector math library and ‘acml
’ for the AMD math core library. To use this option, both -ftree-vectorize
and -funsafe-math-optimizations
have to be enabled, and an SVML or ACML ABI-compatible library must be specified at link time. GCC currently emits calls to vmldExp2
, vmldLn2
, vmldLog102
, vmldLog102
, vmldPow2
, vmldTanh2
, vmldTan2
, vmldAtan2
, vmldAtanh2
, vmldCbrt2
, vmldSinh2
, vmldSin2
, vmldAsinh2
, vmldAsin2
, vmldCosh2
, vmldCos2
, vmldAcosh2
, vmldAcos2
, vmlsExp4
, vmlsLn4
, vmlsLog104
, vmlsLog104
, vmlsPow4
, vmlsTanh4
, vmlsTan4
, vmlsAtan4
, vmlsAtanh4
, vmlsCbrt4
, vmlsSinh4
, vmlsSin4
, vmlsAsinh4
, vmlsAsin4
, vmlsCosh4
, vmlsCos4
, vmlsAcosh4
and vmlsAcos4
for corresponding function type when -mveclibabi=svml
is used, and __vrd2_sin
, __vrd2_cos
, __vrd2_exp
, __vrd2_log
, __vrd2_log2
, __vrd2_log10
, __vrs4_sinf
, __vrs4_cosf
, __vrs4_expf
, __vrs4_logf
, __vrs4_log2f
, __vrs4_log10f
and __vrs4_powf
for the corresponding function type when -mveclibabi=acml
is used.
-mabi=
name
sysv
’ for the ABI used on GNU/Linux and other systems, and ‘ms
’ for the Microsoft ABI. The default is to use the Microsoft ABI when targeting Microsoft Windows and the SysV ABI on all other systems. You can control this behavior for specific functions by using the function attributes ms_abi
and sysv_abi
. See Function Attributes. -mtls-dialect=
type
gnu
’ or ‘gnu2
’ conventions. ‘gnu
’ is the conservative default; ‘gnu2
’ is more efficient, but it may add compile- and run-time requirements that cannot be satisfied on all systems. -mpush-args
-mno-push-args
-maccumulate-outgoing-args
-mno-push-args
. -mthreads
-mthreads
option. When compiling, -mthreads
defines -D_MT
; when linking, it links in a special thread helper library -lmingwthrd
which cleans up per-thread exception-handling data. -mms-bitfields
-mno-ms-bitfields
If packed
is used on a structure, or if bit-fields are used, it may be that the Microsoft ABI lays out the structure differently than the way GCC normally does. Particularly when moving packed data between functions compiled with GCC and the native Microsoft compiler (either via function call or as data in a file), it may be necessary to access either format.
This option is enabled by default for Microsoft Windows targets. This behavior can also be controlled locally by use of variable or type attributes. For more information, see x86 Variable Attributes and x86 Type Attributes.
The Microsoft structure layout algorithm is fairly simple with the exception of the bit-field packing. The padding and alignment of members of structures and whether a bit-field can straddle a storage-unit boundary are determine by these rules:
aligned
attribute or the pack
pragma), whichever is less. For structures, unions, and arrays, the alignment requirement is the largest alignment requirement of its members. Every object is allocated an offset so that: offset % alignment_requirement == 0
MSVC interprets zero-length bit-fields in the following ways:
For example:
struct { unsigned long bf_1 : 12; unsigned long : 0; unsigned long bf_2 : 12; } t1;
The size of t1
is 8 bytes with the zero-length bit-field. If the zero-length bit-field were removed, t1
's size would be 4 bytes.
foo
, and the alignment of the zero-length bit-field is greater than the member that follows it, bar
, bar
is aligned as the type of the zero-length bit-field. For example:
struct { char foo : 4; short : 0; char bar; } t2; struct { char foo : 4; short : 0; double bar; } t3;
For t2
, bar
is placed at offset 2, rather than offset 1. Accordingly, the size of t2
is 4. For t3
, the zero-length bit-field does not affect the alignment of bar
or, as a result, the size of the structure.
Taking this into account, it is important to note the following:
t2
has a size of 4 bytes, since the zero-length bit-field follows a normal bit-field, and is of type short. struct { char foo : 6; long : 0; } t4;
Here, t4
takes up 4 bytes.
struct { char foo; long : 0; char bar; } t5;
Here, t5
takes up 2 bytes.
-mno-align-stringops
-minline-all-stringops
memcpy
, strlen
, and memset
for short lengths. -minline-stringops-dynamically
-mstringop-strategy=
alg
rep_byte
’rep_4byte
’rep_8byte
’rep
prefix of the specified size. byte_loop
’loop
’unrolled_loop
’libcall
’-mmemcpy-strategy=
strategy
__builtin_memcpy
should be inlined and what inline algorithm to use when the expected size of the copy operation is known. strategy is a comma-separated list of alg:max_size:dest_align triplets. alg is specified in -mstringop-strategy
, max_size specifies the max byte size with which inline algorithm alg is allowed. For the last triplet, the max_size must be -1
. The max_size of the triplets in the list must be specified in increasing order. The minimal byte size for alg is 0
for the first triplet and max_size + 1
of the preceding range. -mmemset-strategy=
strategy
-mmemcpy-strategy=
except that it is to control __builtin_memset
expansion. -momit-leaf-frame-pointer
-fomit-leaf-frame-pointer
removes the frame pointer for leaf functions, which might make debugging harder. -mtls-direct-seg-refs
-mno-tls-direct-seg-refs
%gs
for 32-bit, %fs
for 64-bit), or whether the thread base pointer must be added. Whether or not this is valid depends on the operating system, and whether it maps the segment to cover the entire TLS area. For systems that use the GNU C Library, the default is on.
-msse2avx
-mno-sse2avx
-mavx
turns this on by default. -mfentry
-mno-fentry
-pg
), put the profiling counter call before the prologue. Note: On x86 architectures the attribute ms_hook_prologue
isn't possible at the moment for -mfentry
and -pg
. -mrecord-mcount
-mno-record-mcount
-pg
), generate a __mcount_loc section that contains pointers to each profiling call. This is useful for automatically patching and out calls. -mnop-mcount
-mno-nop-mcount
-pg
), generate the calls to the profiling functions as nops. This is useful when they should be patched in later dynamically. This is likely only useful together with -mrecord-mcount
. -mskip-rax-setup
-mno-skip-rax-setup
-mskip-rax-setup
can be used to skip setting up RAX register when there are no variable arguments passed in vector registers. Warning: Since RAX register is used to avoid unnecessarily saving vector registers on stack when passing variable arguments, the impacts of this option are callees may waste some stack space, misbehave or jump to a random location. GCC 4.4 or newer don't have those issues, regardless the RAX register value.
-m8bit-idiv
-mno-8bit-idiv
-mavx256-split-unaligned-load
-mavx256-split-unaligned-store
-mstack-protector-guard=
guard
global
’ for global canary or ‘tls
’ for per-thread canary in the TLS block (the default). This option has effect only when -fstack-protector
or -fstack-protector-all
is specified. -mmitigate-rop
These ‘-m
’ switches are supported in addition to the above on x86-64 processors in 64-bit environments.
-m32
-m64
-mx32
-m16
-miamcu
-m32
option sets int
, long
, and pointer types to 32 bits, and generates code that runs on any i386 system. The -m64
option sets int
to 32 bits and long
and pointer types to 64 bits, and generates code for the x86-64 architecture. For Darwin only the -m64
option also turns off the -fno-pic
and -mdynamic-no-pic
options.
The -mx32
option sets int
, long
, and pointer types to 32 bits, and generates code for the x86-64 architecture.
The -m16
option is the same as -m32
, except for that it outputs the .code16gcc
assembly directive at the beginning of the assembly output so that the binary can run in 16-bit mode.
The -miamcu
option generates code which conforms to Intel MCU psABI. It requires the -m32
option to be turned on.
-mno-red-zone
-mno-red-zone
disables this red zone. -mcmodel=small
-mcmodel=kernel
-mcmodel=medium
-mlarge-data-threshold
are put into large data or BSS sections and can be located above 2GB. Programs can be statically or dynamically linked. -mcmodel=large
-maddress-mode=long
-maddress-mode=short
© Free Software Foundation
Licensed under the GNU Free Documentation License, Version 1.3.
https://gcc.gnu.org/onlinedocs/gcc-6.3.0/gcc/x86-Options.html