This is gccint.info, produced by makeinfo version 7.1 from gccint.texi.

Copyright © 1988-2024 Free Software Foundation, Inc.

 Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.3 or
any later version published by the Free Software Foundation; with the
Invariant Sections being "Funding Free Software", the Front-Cover Texts
being (a) (see below), and with the Back-Cover Texts being (b) (see
below).  A copy of the license is included in the section entitled "GNU
Free Documentation License".

 (a) The FSF's Front-Cover Text is:

 A GNU Manual

 (b) The FSF's Back-Cover Text is:

 You have freedom to copy and modify this GNU Manual, like GNU software.
Copies published by the Free Software Foundation raise funds for GNU
development.
INFO-DIR-SECTION Software development
START-INFO-DIR-ENTRY
* gccint: (gccint).            Internals of the GNU Compiler Collection.
END-INFO-DIR-ENTRY

 This file documents the internals of the GNU compilers.

 Copyright © 1988-2024 Free Software Foundation, Inc.

 Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.3 or
any later version published by the Free Software Foundation; with the
Invariant Sections being "Funding Free Software", the Front-Cover Texts
being (a) (see below), and with the Back-Cover Texts being (b) (see
below).  A copy of the license is included in the section entitled "GNU
Free Documentation License".

 (a) The FSF's Front-Cover Text is:

 A GNU Manual

 (b) The FSF's Back-Cover Text is:

 You have freedom to copy and modify this GNU Manual, like GNU software.
Copies published by the Free Software Foundation raise funds for GNU
development.


File: gccint.info,  Node: Calls,  Next: RTL SSA,  Prev: Insns,  Up: RTL

14.20 RTL Representation of Function-Call Insns
===============================================

Insns that call subroutines have the RTL expression code ‘call_insn’.
These insns must satisfy special rules, and their bodies must use a
special RTL expression code, ‘call’.

 A ‘call’ expression has two operands, as follows:

     (call (mem:FM ADDR) NBYTES)

Here NBYTES is an operand that represents the number of bytes of
argument data being passed to the subroutine, FM is a machine mode
(which must equal as the definition of the ‘FUNCTION_MODE’ macro in the
machine description) and ADDR represents the address of the subroutine.

 For a subroutine that returns no value, the ‘call’ expression as shown
above is the entire body of the insn, except that the insn might also
contain ‘use’ or ‘clobber’ expressions.

 For a subroutine that returns a value whose mode is not ‘BLKmode’, the
value is returned in a hard register.  If this register's number is R,
then the body of the call insn looks like this:

     (set (reg:M R)
          (call (mem:FM ADDR) NBYTES))

This RTL expression makes it clear (to the optimizer passes) that the
appropriate register receives a useful value in this insn.

 When a subroutine returns a ‘BLKmode’ value, it is handled by passing
to the subroutine the address of a place to store the value.  So the
call insn itself does not "return" any value, and it has the same RTL
form as a call that returns nothing.

 On some machines, the call instruction itself clobbers some register,
for example to contain the return address.  ‘call_insn’ insns on these
machines should have a body which is a ‘parallel’ that contains both the
‘call’ expression and ‘clobber’ expressions that indicate which
registers are destroyed.  Similarly, if the call instruction requires
some register other than the stack pointer that is not explicitly
mentioned in its RTL, a ‘use’ subexpression should mention that
register.

 Functions that are called are assumed to modify all registers listed in
the configuration macro ‘CALL_USED_REGISTERS’ (*note Register Basics::)
and, with the exception of ‘const’ functions and library calls, to
modify all of memory.

 Insns containing just ‘use’ expressions directly precede the
‘call_insn’ insn to indicate which registers contain inputs to the
function.  Similarly, if registers other than those in
‘CALL_USED_REGISTERS’ are clobbered by the called function, insns
containing a single ‘clobber’ follow immediately after the call to
indicate which registers.


File: gccint.info,  Node: RTL SSA,  Next: Sharing,  Prev: Calls,  Up: RTL

14.21 On-the-Side SSA Form for RTL
==================================

The patterns of an individual RTL instruction describe which registers
are inputs to that instruction and which registers are outputs from that
instruction.  However, it is often useful to know where the definition
of a register input comes from and where the result of a register output
is used.  One way of obtaining this information is to use the RTL SSA
form, which provides a Static Single Assignment representation of the
RTL instructions.

 The RTL SSA code is located in the ‘rtl-ssa’ subdirectory of the GCC
source tree.  This section only gives a brief overview of it; please see
the comments in the source code for more details.

* Menu:

* Using RTL SSA::             What a pass needs to do to use the RTL SSA form
* RTL SSA Instructions::      How instructions are represented and organized
* RTL SSA Basic Blocks::      How instructions are grouped into blocks
* RTL SSA Resources::         How registers and memory are represented
* RTL SSA Accesses::          How register and memory accesses are represented
* RTL SSA Phi Nodes::         How multiple sources are combined into one
* RTL SSA Access Lists::      How accesses are chained together
* Changing RTL Instructions:: How to use the RTL SSA framework to change insns


File: gccint.info,  Node: Using RTL SSA,  Next: RTL SSA Instructions,  Up: RTL SSA

14.21.1 Using RTL SSA in a pass
-------------------------------

A pass that wants to use the RTL SSA form should start with the
following:

     #define INCLUDE_ALGORITHM
     #define INCLUDE_FUNCTIONAL
     #include "config.h"
     #include "system.h"
     #include "coretypes.h"
     #include "backend.h"
     #include "rtl.h"
     #include "df.h"
     #include "rtl-ssa.h"

 All the RTL SSA code is contained in the ‘rtl_ssa’ namespace, so most
passes will then want to do:

     using namespace rtl_ssa;

 However, this is purely a matter of taste, and the examples in the rest
of this section do not require it.

 The RTL SSA represention is an optional on-the-side feature that
applies on top of the normal RTL instructions.  It is currently local to
individual RTL passes and is not maintained across passes.

 However, in order to allow the RTL SSA information to be preserved
across passes in future, ‘crtl->ssa’ points to the current function's
SSA form (if any).  Passes that want to use the RTL SSA form should
first do:

     crtl->ssa = new rtl_ssa::function_info (FN);

 where FN is the function that the pass is processing.  (Passes that are
‘using namespace rtl_ssa’ do not need the ‘rtl_ssa::’.)

 Once the pass has finished with the SSA form, it should do the
following:

     free_dominance_info (CDI_DOMINATORS);
     if (crtl->ssa->perform_pending_updates ())
       cleanup_cfg (0);

     delete crtl->ssa;
     crtl->ssa = nullptr;

 The ‘free_dominance_info’ call is necessary because dominance
information is not currently maintained between RTL passes.  The next
two lines commit any changes to the RTL instructions that were queued
for later; see the comment above the declaration of
‘perform_pending_updates’ for details.  The final two lines discard the
RTL SSA form and free the associated memory.


File: gccint.info,  Node: RTL SSA Instructions,  Next: RTL SSA Basic Blocks,  Prev: Using RTL SSA,  Up: RTL SSA

14.21.2 RTL SSA Instructions
----------------------------

RTL SSA instructions are represented by an ‘rtl_ssa::insn_info’.  These
instructions are chained together in a single list that follows a
reverse postorder (RPO) traversal of the function.  This means that if
any path through the function can execute an instruction I1 and then
later execute an instruction I2 for the first time, I1 appears before I2
in the list(1).

 Two RTL SSA instructions can be compared to find which instruction
occurs earlier than the other in the RPO.  One way to do this is to use
the C++ comparison operators, such as:

     *INSN1 < *INSN2

 Another way is to use the ‘compare_with’ function:

     INSN1->compare_with (INSN2)

 This expression is greater than zero if INSN1 comes after INSN2 in the
RPO, less than zero if INSN1 comes before INSN2 in the RPO, or zero if
INSN1 and INSN2 are the same.  This order is maintained even if
instructions are added to the function or moved around.

 The main purpose of ‘rtl_ssa::insn_info’ is to hold SSA information
about an instruction.  However, it also caches certain properties of the
instruction, such as whether it is an inline assembly instruction,
whether it has volatile accesses, and so on.

   ---------- Footnotes ----------

   (1) Note that this order is different from the order of the
underlying RTL instructions, which follow machine code order instead.


File: gccint.info,  Node: RTL SSA Basic Blocks,  Next: RTL SSA Resources,  Prev: RTL SSA Instructions,  Up: RTL SSA

14.21.3 RTL SSA Basic Blocks
----------------------------

RTL SSA instructions (*note RTL SSA Instructions::) are organized into
basic blocks, with each block being represented by an ‘rtl_ssa:bb_info’.
There is a one-to-one mapping between these ‘rtl_ssa:bb_info’ structures
and the underlying CFG ‘basic_block’ structures (*note Basic Blocks::).

 If a CFG basic block BB contains an RTL instruction INSN, the RTL SSA
represenation of BB also contains an RTL SSA representation of INSN(1).
Within RTL SSA, these instructions are referred to as "real"
instructions.  These real instructions fall into two groups: debug
instructions and nondebug instructions.  Only nondebug instructions
should affect code generation decisions.

 In addition, each RTL SSA basic block has two "artificial"
instructions: a "head" instruction that comes before all the real
instructions and an "end" instruction that comes after all real
instructions.  These instructions exist to represent things that are
conceptually defined or used at the start and end of a basic block.  The
instructions always exist, even if they do not currently do anything.

 Like instructions, these blocks are chained together in a reverse
postorder.  This list includes the entry block (which always comes
first) and the exit block (which always comes last).

 RTL SSA basic blocks are chained together into "extended basic blocks"
(EBBs), represented by an ‘rtl_ssa::ebb_info’.  Extended basic blocks
contain one or more basic blocks.  They have the property that if a
block BBY comes immediately after a block BBX in an EBB, then BBY can
only be reached by BBX; in other words, BBX is the sole predecessor of
BBY.

 Each extended basic block starts with an artificial "phi node"
instruction.  This instruction defines all phi nodes for the EBB (*note
RTL SSA Phi Nodes::).  (Individual blocks in an EBB do not need phi
nodes because their live values can only come from one source.)

 The contents of a function are therefore represented using a four-level
hierarchy:

   • functions (‘rtl_ssa::function_info’), which contain ...

   • extended basic blocks (‘rtl_ssa::ebb_info’), which contain ...

   • basic blocks (‘rtl_ssa::bb_info’), which contain ...

   • instructions (‘rtl_ssa::insn_info’)

 In dumps, a basic block is identified as ‘bbN’, where N is the index of
the associated CFG ‘basic_block’ structure.  An EBB is in turn
identified by the index of its first block.  For example, an EBB that
contains ‘bb10’, ‘bb5’, ‘bb6’ and ‘bb9’ is identified as EBB10.

   ---------- Footnotes ----------

   (1) Note that this excludes non-instruction things like ‘note’s and
‘barrier’s that also appear in the chain of RTL instructions.


File: gccint.info,  Node: RTL SSA Resources,  Next: RTL SSA Accesses,  Prev: RTL SSA Basic Blocks,  Up: RTL SSA

14.21.4 RTL SSA Resources
-------------------------

The RTL SSA form tracks two types of "resource": registers and memory.
Each hard and pseudo register is a separate resource.  Memory is a
single unified resource, like it is in GIMPLE (*note GIMPLE::).

 Each resource has a unique identifier.  The unique identifier for a
register is simply its register number.  The unique identifier for
memory is a special register number called ‘MEM_REGNO’.

 Since resource numbers so closely match register numbers, it is
sometimes convenient to refer to them simply as register numbers, or
"regnos" for short.  However, the RTL SSA form also provides an
abstraction of resources in the form of ‘rtl_ssa::resource_info’.  This
is a lightweight class that records both the regno of a resource and the
‘machine_mode’ that the resource has (*note Machine Modes::).  It has
functions for testing whether a resource is a register or memory.  In
principle it could be extended to other kinds of resource in future.


File: gccint.info,  Node: RTL SSA Accesses,  Next: RTL SSA Phi Nodes,  Prev: RTL SSA Resources,  Up: RTL SSA

14.21.5 RTL SSA Register and Memory Accesses
--------------------------------------------

In the RTL SSA form, most reads or writes of a resource are represented
as a ‘rtl_ssa::access_info’(1).  These ‘rtl_ssa::access_info’s are
organized into the following class hierarchy:

     rtl_ssa::access_info
       |
       +-- rtl_ssa::use_info
       |
       +-- rtl_ssa::def_info
             |
             +-- rtl_ssa::clobber_info
             |
             +-- rtl_ssa::set_info
                   |
                   +-- rtl_ssa::phi_info

 A ‘rtl_ssa::use_info’ represents a read or use of a resource and a
‘rtl_ssa::def_info’ represents a write or definition of a resource.  As
in the main RTL representation, there are two basic types of definition:
clobbers and sets.  The difference is that a clobber leaves the register
with an unspecified value that cannot be used or relied on by later
instructions, while a set leaves the register with a known value that
later instructions could use if they wanted to.  A
‘rtl_ssa::clobber_info’ represents a clobber and a ‘rtl_ssa::set_info’
represent a set.

 Each ‘rtl_ssa::use_info’ records which single ‘rtl_ssa::set_info’
provides the value of the resource; this is null if the resource is
completely undefined at the point of use.  Each ‘rtl_ssa::set_info’ in
turn records all the ‘rtl_ssa::use_info’s that use its value.

 If a value of a resource can come from multiple sources, a
‘rtl_ssa::phi_info’ brings those multiple sources together into a single
definition (*note RTL SSA Phi Nodes::).

   ---------- Footnotes ----------

   (1) The exceptions are call clobbers, which are generally represented
separately.  See the comment above ‘rtl_ssa::insn_info’ for details.


File: gccint.info,  Node: RTL SSA Phi Nodes,  Next: RTL SSA Access Lists,  Prev: RTL SSA Accesses,  Up: RTL SSA

14.21.6 RTL SSA Phi Nodes
-------------------------

If a resource is live on entry to an extended basic block and if the
resource's value can come from multiple sources, the extended basic
block has a "phi node" that collects together these multiple sources.
The phi node conceptually has one input for each incoming edge of the
extended basic block, with the input specifying the value of the
resource on that edge.  For example, suppose a function contains the
following RTL:

     ;; Basic block bb3
     ...
     (set (reg:SI R1) (const_int 0))  ;; A
     (set (pc) (label_ref bb5))

     ;; Basic block bb4
     ...
     (set (reg:SI R1) (const_int 1))  ;; B
     ;; Fall through

     ;; Basic block bb5
     ;; preds: bb3, bb4
     ;; live in: R1 ...
     (code_label bb5)
     ...
     (set (reg:SI R2)
          (plus:SI (reg:SI R1) ...))  ;; C

 The value of R1 on entry to block 5 can come from either A or B.  The
extended basic block that contains block 5 would therefore have a phi
node with two inputs: the first input would have the value of R1 defined
by A and the second input would have the value of R1 defined by B.  This
phi node would then provide the value of R1 for C (assuming that R1 does
not change again between the start of block 5 and C).

 Since RTL is not a "native" SSA representation, these phi nodes simply
collect together definitions that already exist.  Each input to a phi
node for a resource R is itself a definition of resource R (or is null
if the resource is completely undefined for a particular incoming edge).
This is in contrast to a native SSA representation like GIMPLE, where
the phi inputs can be arbitrary expressions.  As a result, RTL SSA phi
nodes never involve "hidden" moves: all moves are instead explicit.

 Phi nodes are represented as a ‘rtl_ssa::phi_node’.  Each input to a
phi node is represented as an ‘rtl_ssa::use_info’.


File: gccint.info,  Node: RTL SSA Access Lists,  Next: Changing RTL Instructions,  Prev: RTL SSA Phi Nodes,  Up: RTL SSA

14.21.7 RTL SSA Access Lists
----------------------------

All the definitions of a resource are chained together in reverse
postorder.  In general, this list can contain an arbitrary mix of both
sets (‘rtl_ssa::set_info’) and clobbers (‘rtl_ssa::clobber_info’).
However, it is often useful to skip over all intervening clobbers of a
resource in order to find the next set.  The list is constructed in such
a way that this can be done in amortized constant time.

 All uses (‘rtl_ssa::use_info’) of a given set are also chained together
into a list.  This list of uses is divided into three parts:

  1. uses by "real" nondebug instructions (*note real RTL SSA insns::)

  2. uses by real debug instructions

  3. uses by phi nodes (*note RTL SSA Phi Nodes::)

 The first and second parts individually follow reverse postorder.  The
third part has no particular order.

 The last use by a real nondebug instruction always comes earlier in the
reverse postorder than the next definition of the resource (if any).
This means that the accesses follow a linear sequence of the form:

   • first definition of resource R

        • first use by a real nondebug instruction of the first
          definition of resource R

        • ...

        • last use by a real nondebug instruction of the first
          definition of resource R

   • second definition of resource R

        • first use by a real nondebug instruction of the second
          definition of resource R

        • ...

        • last use by a real nondebug instruction of the second
          definition of resource R

   • ...

   • last definition of resource R

        • first use by a real nondebug instruction of the last
          definition of resource R

        • ...

        • last use by a real nondebug instruction of the last definition
          of resource R

 (Note that clobbers never have uses; only sets do.)

 This linear view is easy to achieve when there is only a single
definition of a resource, which is commonly true for pseudo registers.
However, things are more complex if code has a structure like the
following:

     // ebb2, bb2
     R = VA;        // A
     if (...)
       {
         // ebb2, bb3
         use1 (R);  // B
         ...
         R = VC;    // C
       }
     else
       {
         // ebb4, bb4
         use2 (R);  // D
       }

 The list of accesses would begin as follows:

   • definition of R by A

        • use of A's definition of R by B

   • definition of R by C

 The next access to R is in D, but the value of R that D uses comes from
A rather than C.

 This is resolved by adding a phi node for ‘ebb4’.  All inputs to this
phi node have the same value, which in the example above is A's
definition of R.  In other circumstances, it would not be necessary to
create a phi node when all inputs are equal, so these phi nodes are
referred to as "degenerate" phi nodes.

 The full list of accesses to R is therefore:

   • definition of R by A

        • use of A's definition of R by B

   • definition of R by C

   • definition of R by ebb4's phi instruction, with the input coming
     from A

        • use of the ebb4's R phi definition of R by B

 Note that A's definition is also used by ebb4's phi node, but this use
belongs to the third part of the use list described above and so does
not form part of the linear sequence.

 It is possible to "look through" any degenerate phi to the ultimate
definition using the function ‘look_through_degenerate_phi’.  Note that
the input to a degenerate phi is never itself provided by a degenerate
phi.

 At present, the SSA form takes this principle one step further and
guarantees that, for any given resource RES, one of the following is
true:

   • The resource has a single definition DEF, which is not a phi node.
     Excluding uses of undefined registers, all uses of RES by real
     nondebug instructions use the value provided by DEF.

   • Excluding uses of undefined registers, all uses of RES use values
     provided by definitions that occur earlier in the same extended
     basic block.  These definitions might come from phi nodes or from
     real instructions.


File: gccint.info,  Node: Changing RTL Instructions,  Prev: RTL SSA Access Lists,  Up: RTL SSA

14.21.8 Using the RTL SSA framework to change instructions
----------------------------------------------------------

There are various routines that help to change a single RTL instruction
or a group of RTL instructions while keeping the RTL SSA form
up-to-date.  This section first describes the process for changing a
single instruction, then goes on to describe the differences when
changing multiple instructions.

* Menu:

* Changing One RTL SSA Instruction::
* Changing Multiple RTL SSA Instructions::


File: gccint.info,  Node: Changing One RTL SSA Instruction,  Next: Changing Multiple RTL SSA Instructions,  Up: Changing RTL Instructions

14.21.8.1 Changing One RTL SSA Instruction
..........................................

Before making a change, passes should first use a statement like the
following:

     auto attempt = crtl->ssa->new_change_attempt ();

 Here, ‘attempt’ is an RAII object that should remain in scope for the
entire change attempt.  It automatically frees temporary memory related
to the changes when it goes out of scope.

 Next, the pass should create an ‘rtl_ssa::insn_change’ object for the
instruction that it wants to change.  This object specifies several
things:

   • what the instruction's new list of uses should be (‘new_uses’).  By
     default this is the same as the instruction's current list of uses.

   • what the instruction's new list of definitions should be
     (‘new_defs’).  By default this is the same as the instruction's
     current list of definitions.

   • where the instruction should be located (‘move_range’).  This is a
     range of instructions after which the instruction could be placed,
     represented as an ‘rtl_ssa::insn_range’.  By default the
     instruction must remain at its current position.

 If a pass was attempting to change all these properties of an
instruction ‘insn’, it might do something like this:

     rtl_ssa::insn_change change (insn);
     change.new_defs = ...;
     change.new_uses = ...;
     change.move_range = ...;

 This ‘rtl_ssa::insn_change’ only describes something that the pass
_might_ do; at this stage, nothing has actually changed.

 As noted above, the default ‘move_range’ requires the instruction to
remain where it is.  At the other extreme, it is possible to allow the
instruction to move anywhere within its extended basic block, provided
that all the new uses and definitions can be performed at the new
location.  The way to do this is:

     change.move_range = insn->ebb ()->insn_range ();

 In either case, the next step is to make sure that move range is
consistent with the new uses and definitions.  The way to do this is:

     if (!rtl_ssa::restrict_movement (change))
       return false;

 This function tries to limit ‘move_range’ to a range of instructions at
which ‘new_uses’ and ‘new_defs’ can be correctly performed.  It returns
true on success or false if no suitable location exists.

 The pass should also tentatively change the pattern of the instruction
to whatever form the pass wants the instruction to have.  This should
use the facilities provided by ‘recog.cc’.  For example:

     rtl_insn *rtl = insn->rtl ();
     insn_change_watermark watermark;
     validate_change (rtl, &PATTERN (rtl), new_pat, 1);

 will tentatively replace ‘insn’'s pattern with ‘new_pat’.

 These changes and the construction of the ‘rtl_ssa::insn_change’ can
happen in either order or be interleaved.

 After the tentative changes to the instruction are complete, the pass
should check whether the new pattern matches a target instruction or
satisfies the requirements of an inline asm:

     if (!rtl_ssa::recog (attempt, change))
       return false;

 This step might change the instruction pattern further in order to make
it match.  It might also add new definitions or restrict the range of
the move.  For example, if the new pattern did not match in its original
form, but could be made to match by adding a clobber of the flags
register, ‘rtl_ssa::recog’ will check whether the flags register is free
at an appropriate point.  If so, it will add a clobber of the flags
register to ‘new_defs’ and restrict ‘move_range’ to the locations at
which the flags register can be safely clobbered.

 Even if the proposed new instruction is valid according to
‘rtl_ssa::recog’, the change might not be worthwhile.  For example, when
optimizing for speed, the new instruction might turn out to be slower
than the original one.  When optimizing for size, the new instruction
might turn out to be bigger than the original one.

 Passes should check for this case using ‘change_is_worthwhile’.  For
example:

     if (!rtl_ssa::change_is_worthwhile (change))
       return false;

 If the change passes this test too then the pass can perform the change
using:

     confirm_change_group ();
     crtl->ssa->change_insn (change);

 Putting all this together, the change has the following form:

     auto attempt = crtl->ssa->new_change_attempt ();

     rtl_ssa::insn_change change (insn);
     change.new_defs = ...;
     change.new_uses = ...;
     change.move_range = ...;

     if (!rtl_ssa::restrict_movement (change))
       return false;

     insn_change_watermark watermark;
     // Use validate_change etc. to change INSN's pattern.
     ...
     if (!rtl_ssa::recog (attempt, change)
         || !rtl_ssa::change_is_worthwhile (change))
       return false;

     confirm_change_group ();
     crtl->ssa->change_insn (change);


File: gccint.info,  Node: Changing Multiple RTL SSA Instructions,  Prev: Changing One RTL SSA Instruction,  Up: Changing RTL Instructions

14.21.8.2 Changing Multiple RTL SSA Instructions
................................................

The process for changing multiple instructions is similar to the process
for changing single instructions (*note Changing One RTL SSA
Instruction::).  The pass should again start the change attempt with:

     auto attempt = crtl->ssa->new_change_attempt ();

 and keep ‘attempt’ in scope for the duration of the change attempt.  It
should then construct an ‘rtl_ssa::insn_change’ for each change that it
wants to make.

 After this, it should combine the changes into a sequence of
‘rtl_ssa::insn_change’ pointers.  This sequence must be in reverse
postorder; the instructions will remain strictly in the order that the
sequence specifies.

 For example, if a pass is changing exactly two instructions, it might
do:

     rtl_ssa::insn_change *changes[] = { &change1, &change2 };

 where ‘change1’'s instruction must come before ‘change2’'s.
Alternatively, if the pass is changing a variable number of
instructions, it might build up the sequence in a
‘vec<rtl_ssa::insn_change *>’.

 By default, ‘rtl_ssa::restrict_movement’ assumes that all instructions
other than the one passed to it will remain in their current positions
and will retain their current uses and definitions.  When changing
multiple instructions, it is usually more effective to ignore the other
instructions that are changing.  The sequencing described above ensures
that the changing instructions remain in the correct order with respect
to each other.  The way to do this is:

     if (!rtl_ssa::restrict_movement_ignoring (change, insn_is_changing (changes)))
       return false;

 Similarly, when ‘rtl_ssa::restrict_movement’ is detecting whether a
register can be clobbered, it by default assumes that all other
instructions will remain in their current positions and retain their
current form.  It is again more effective to ignore changing
instructions (which might, for example, no longer need to clobber the
flags register).  The way to do this is:

     if (!rtl_ssa::recog_ignoring (attempt, change, insn_is_changing (changes)))
       return false;

 When changing multiple instructions, the important question is usually
not whether each individual change is worthwhile, but whether the
changes as a whole are worthwhile.  The way to test this is:

     if (!rtl_ssa::changes_are_worthwhile (changes))
       return false;

 The process for changing single instructions makes sure that one
‘rtl_ssa::insn_change’ in isolation is valid.  But when changing
multiple instructions, it is also necessary to test whether the sequence
as a whole is valid.  For example, it might be impossible to satisfy all
of the ‘move_range’s at once.

 Therefore, once the pass has a sequence of changes that are
individually correct, it should use:

     if (!crtl->ssa->verify_insn_changes (changes))
       return false;

 to check whether the sequence as a whole is valid.  If all checks pass,
the final step is:

     confirm_change_group ();
     crtl->ssa->change_insns (changes);

 Putting all this together, the process for a two-instruction change is:

     auto attempt = crtl->ssa->new_change_attempt ();

     rtl_ssa::insn_change change1 (insn1);
     change1.new_defs = ...;
     change1.new_uses = ...;
     change1.move_range = ...;

     rtl_ssa::insn_change change2 (insn2);
     change2.new_defs = ...;
     change2.new_uses = ...;
     change2.move_range = ...;

     rtl_ssa::insn_change *changes[] = { &change1, &change2 };

     auto is_changing = insn_is_changing (changes);
     if (!rtl_ssa::restrict_movement_ignoring (change1, is_changing)
         || !rtl_ssa::restrict_movement_ignoring (change2, is_changing))
       return false;

     insn_change_watermark watermark;
     // Use validate_change etc. to change INSN1's and INSN2's patterns.
     ...
     if (!rtl_ssa::recog_ignoring (attempt, change1, is_changing)
         || !rtl_ssa::recog_ignoring (attempt, change2, is_changing)
         || !rtl_ssa::changes_are_worthwhile (changes)
         || !crtl->ssa->verify_insn_changes (changes))
       return false;

     confirm_change_group ();
     crtl->ssa->change_insns (changes);


File: gccint.info,  Node: Sharing,  Next: Reading RTL,  Prev: RTL SSA,  Up: RTL

14.22 Structure Sharing Assumptions
===================================

The compiler assumes that certain kinds of RTL expressions are unique;
there do not exist two distinct objects representing the same value.  In
other cases, it makes an opposite assumption: that no RTL expression
object of a certain kind appears in more than one place in the
containing structure.

 These assumptions refer to a single function; except for the RTL
objects that describe global variables and external functions, and a few
standard objects such as small integer constants, no RTL objects are
common to two functions.

   • Each pseudo-register has only a single ‘reg’ object to represent
     it, and therefore only a single machine mode.

   • For any symbolic label, there is only one ‘symbol_ref’ object
     referring to it.

   • All ‘const_int’ expressions with equal values are shared.

   • All ‘const_poly_int’ expressions with equal modes and values are
     shared.

   • There is only one ‘pc’ expression.

   • There is only one ‘const_double’ expression with value 0 for each
     floating point mode.  Likewise for values 1 and 2.

   • There is only one ‘const_vector’ expression with value 0 for each
     vector mode, be it an integer or a double constant vector.

   • No ‘label_ref’ or ‘scratch’ appears in more than one place in the
     RTL structure; in other words, it is safe to do a tree-walk of all
     the insns in the function and assume that each time a ‘label_ref’
     or ‘scratch’ is seen it is distinct from all others that are seen.

   • Only one ‘mem’ object is normally created for each static variable
     or stack slot, so these objects are frequently shared in all the
     places they appear.  However, separate but equal objects for these
     variables are occasionally made.

   • When a single ‘asm’ statement has multiple output operands, a
     distinct ‘asm_operands’ expression is made for each output operand.
     However, these all share the vector which contains the sequence of
     input operands.  This sharing is used later on to test whether two
     ‘asm_operands’ expressions come from the same statement, so all
     optimizations must carefully preserve the sharing if they copy the
     vector at all.

   • No RTL object appears in more than one place in the RTL structure
     except as described above.  Many passes of the compiler rely on
     this by assuming that they can modify RTL objects in place without
     unwanted side-effects on other insns.

   • During initial RTL generation, shared structure is freely
     introduced.  After all the RTL for a function has been generated,
     all shared structure is copied by ‘unshare_all_rtl’ in
     ‘emit-rtl.cc’, after which the above rules are guaranteed to be
     followed.

   • During the combiner pass, shared structure within an insn can exist
     temporarily.  However, the shared structure is copied before the
     combiner is finished with the insn.  This is done by calling
     ‘copy_rtx_if_shared’, which is a subroutine of ‘unshare_all_rtl’.


File: gccint.info,  Node: Reading RTL,  Prev: Sharing,  Up: RTL

14.23 Reading RTL
=================

To read an RTL object from a file, call ‘read_rtx’.  It takes one
argument, a stdio stream, and returns a single RTL object.  This routine
is defined in ‘read-rtl.cc’.  It is not available in the compiler
itself, only the various programs that generate the compiler back end
from the machine description.

 People frequently have the idea of using RTL stored as text in a file
as an interface between a language front end and the bulk of GCC.  This
idea is not feasible.

 GCC was designed to use RTL internally only.  Correct RTL for a given
program is very dependent on the particular target machine.  And the RTL
does not contain all the information about the program.

 The proper way to interface GCC to a new language front end is with the
"tree" data structure, described in the files ‘tree.h’ and ‘tree.def’.
The documentation for this structure (*note GENERIC::) is incomplete.


File: gccint.info,  Node: Control Flow,  Next: Loop Analysis and Representation,  Prev: RTL,  Up: Top

15 Control Flow Graph
*********************

A control flow graph (CFG) is a data structure built on top of the
intermediate code representation (the RTL or ‘GIMPLE’ instruction
stream) abstracting the control flow behavior of a function that is
being compiled.  The CFG is a directed graph where the vertices
represent basic blocks and edges represent possible transfer of control
flow from one basic block to another.  The data structures used to
represent the control flow graph are defined in ‘basic-block.h’.

 In GCC, the representation of control flow is maintained throughout the
compilation process, from constructing the CFG early in ‘pass_build_cfg’
to ‘pass_free_cfg’ (see ‘passes.def’).  The CFG takes various different
modes and may undergo extensive manipulations, but the graph is always
valid between its construction and its release.  This way, transfer of
information such as data flow, a measured profile, or the loop tree, can
be propagated through the passes pipeline, and even from ‘GIMPLE’ to
‘RTL’.

 Often the CFG may be better viewed as integral part of instruction
chain, than structure built on the top of it.  Updating the compiler's
intermediate representation for instructions cannot be easily done
without proper maintenance of the CFG simultaneously.

* Menu:

* Basic Blocks::           The definition and representation of basic blocks.
* Edges::                  Types of edges and their representation.
* Profile information::    Representation of frequencies and probabilities.
* Maintaining the CFG::    Keeping the control flow graph and up to date.
* Liveness information::   Using and maintaining liveness information.


File: gccint.info,  Node: Basic Blocks,  Next: Edges,  Up: Control Flow

15.1 Basic Blocks
=================

A basic block is a straight-line sequence of code with only one entry
point and only one exit.  In GCC, basic blocks are represented using the
‘basic_block’ data type.

 Special basic blocks represent possible entry and exit points of a
function.  These blocks are called ‘ENTRY_BLOCK_PTR’ and
‘EXIT_BLOCK_PTR’.  These blocks do not contain any code.

 The ‘BASIC_BLOCK’ array contains all basic blocks in an unspecified
order.  Each ‘basic_block’ structure has a field that holds a unique
integer identifier ‘index’ that is the index of the block in the
‘BASIC_BLOCK’ array.  The total number of basic blocks in the function
is ‘n_basic_blocks’.  Both the basic block indices and the total number
of basic blocks may vary during the compilation process, as passes
reorder, create, duplicate, and destroy basic blocks.  The index for any
block should never be greater than ‘last_basic_block’.  The indices 0
and 1 are special codes reserved for ‘ENTRY_BLOCK’ and ‘EXIT_BLOCK’, the
indices of ‘ENTRY_BLOCK_PTR’ and ‘EXIT_BLOCK_PTR’.

 Two pointer members of the ‘basic_block’ structure are the pointers
‘next_bb’ and ‘prev_bb’.  These are used to keep doubly linked chain of
basic blocks in the same order as the underlying instruction stream.
The chain of basic blocks is updated transparently by the provided API
for manipulating the CFG.  The macro ‘FOR_EACH_BB’ can be used to visit
all the basic blocks in lexicographical order, except ‘ENTRY_BLOCK’ and
‘EXIT_BLOCK’.  The macro ‘FOR_ALL_BB’ also visits all basic blocks in
lexicographical order, including ‘ENTRY_BLOCK’ and ‘EXIT_BLOCK’.

 The functions ‘post_order_compute’ and ‘inverted_post_order_compute’
can be used to compute topological orders of the CFG. The orders are
stored as vectors of basic block indices.  The ‘BASIC_BLOCK’ array can
be used to iterate each basic block by index.  Dominator traversals are
also possible using ‘walk_dominator_tree’.  Given two basic blocks A and
B, block A dominates block B if A is _always_ executed before B.

 Each ‘basic_block’ also contains pointers to the first instruction (the
“head”) and the last instruction (the “tail”) or “end” of the
instruction stream contained in a basic block.  In fact, since the
‘basic_block’ data type is used to represent blocks in both major
intermediate representations of GCC (‘GIMPLE’ and RTL), there are
pointers to the head and end of a basic block for both representations,
stored in intermediate representation specific data in the ‘il’ field of
‘struct basic_block_def’.

 For RTL, these pointers are ‘BB_HEAD’ and ‘BB_END’.

 In the RTL representation of a function, the instruction stream
contains not only the "real" instructions, but also “notes” or “insn
notes” (to distinguish them from “reg notes”).  Any function that moves
or duplicates the basic blocks needs to take care of updating of these
notes.  Many of these notes expect that the instruction stream consists
of linear regions, so updating can sometimes be tedious.  All types of
insn notes are defined in ‘insn-notes.def’.

 In the RTL function representation, the instructions contained in a
basic block always follow a ‘NOTE_INSN_BASIC_BLOCK’, but zero or more
‘CODE_LABEL’ nodes can precede the block note.  A basic block ends with
a control flow instruction or with the last instruction before the next
‘CODE_LABEL’ or ‘NOTE_INSN_BASIC_BLOCK’.  By definition, a ‘CODE_LABEL’
cannot appear in the middle of the instruction stream of a basic block.

 In addition to notes, the jump table vectors are also represented as
"pseudo-instructions" inside the insn stream.  These vectors never
appear in the basic block and should always be placed just after the
table jump instructions referencing them.  After removing the table-jump
it is often difficult to eliminate the code computing the address and
referencing the vector, so cleaning up these vectors is postponed until
after liveness analysis.  Thus the jump table vectors may appear in the
insn stream unreferenced and without any purpose.  Before any edge is
made “fall-thru”, the existence of such construct in the way needs to be
checked by calling ‘can_fallthru’ function.

 For the ‘GIMPLE’ representation, the PHI nodes and statements contained
in a basic block are in a ‘gimple_seq’ pointed to by the basic block
intermediate language specific pointers.  Abstract containers and
iterators are used to access the PHI nodes and statements in a basic
blocks.  These iterators are called “GIMPLE statement iterators” (GSIs).
Grep for ‘^gsi’ in the various ‘gimple-*’ and ‘tree-*’ files.  There is
a ‘gimple_stmt_iterator’ type for iterating over all kinds of statement,
and a ‘gphi_iterator’ subclass for iterating over PHI nodes.  The
following snippet will pretty-print all PHI nodes the statements of the
current function in the GIMPLE representation.

     basic_block bb;

     FOR_EACH_BB (bb)
       {
        gphi_iterator pi;
        gimple_stmt_iterator si;

        for (pi = gsi_start_phis (bb); !gsi_end_p (pi); gsi_next (&pi))
          {
            gphi *phi = pi.phi ();
            print_gimple_stmt (dump_file, phi, 0, TDF_SLIM);
          }
        for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si))
          {
            gimple stmt = gsi_stmt (si);
            print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
          }
       }


File: gccint.info,  Node: Edges,  Next: Profile information,  Prev: Basic Blocks,  Up: Control Flow

15.2 Edges
==========

Edges represent possible control flow transfers from the end of some
basic block A to the head of another basic block B.  We say that A is a
predecessor of B, and B is a successor of A.  Edges are represented in
GCC with the ‘edge’ data type.  Each ‘edge’ acts as a link between two
basic blocks: The ‘src’ member of an edge points to the predecessor
basic block of the ‘dest’ basic block.  The members ‘preds’ and ‘succs’
of the ‘basic_block’ data type point to type-safe vectors of edges to
the predecessors and successors of the block.

 When walking the edges in an edge vector, “edge iterators” should be
used.  Edge iterators are constructed using the ‘edge_iterator’ data
structure and several methods are available to operate on them:

‘ei_start’
     This function initializes an ‘edge_iterator’ that points to the
     first edge in a vector of edges.

‘ei_last’
     This function initializes an ‘edge_iterator’ that points to the
     last edge in a vector of edges.

‘ei_end_p’
     This predicate is ‘true’ if an ‘edge_iterator’ represents the last
     edge in an edge vector.

‘ei_one_before_end_p’
     This predicate is ‘true’ if an ‘edge_iterator’ represents the
     second last edge in an edge vector.

‘ei_next’
     This function takes a pointer to an ‘edge_iterator’ and makes it
     point to the next edge in the sequence.

‘ei_prev’
     This function takes a pointer to an ‘edge_iterator’ and makes it
     point to the previous edge in the sequence.

‘ei_edge’
     This function returns the ‘edge’ currently pointed to by an
     ‘edge_iterator’.

‘ei_safe_edge’
     This function returns the ‘edge’ currently pointed to by an
     ‘edge_iterator’, but returns ‘NULL’ if the iterator is pointing at
     the end of the sequence.  This function has been provided for
     existing code makes the assumption that a ‘NULL’ edge indicates the
     end of the sequence.

 The convenience macro ‘FOR_EACH_EDGE’ can be used to visit all of the
edges in a sequence of predecessor or successor edges.  It must not be
used when an element might be removed during the traversal, otherwise
elements will be missed.  Here is an example of how to use the macro:

     edge e;
     edge_iterator ei;

     FOR_EACH_EDGE (e, ei, bb->succs)
       {
          if (e->flags & EDGE_FALLTHRU)
            break;
       }

 There are various reasons why control flow may transfer from one block
to another.  One possibility is that some instruction, for example a
‘CODE_LABEL’, in a linearized instruction stream just always starts a
new basic block.  In this case a “fall-thru” edge links the basic block
to the first following basic block.  But there are several other reasons
why edges may be created.  The ‘flags’ field of the ‘edge’ data type is
used to store information about the type of edge we are dealing with.
Each edge is of one of the following types:

_jump_
     No type flags are set for edges corresponding to jump instructions.
     These edges are used for unconditional or conditional jumps and in
     RTL also for table jumps.  They are the easiest to manipulate as
     they may be freely redirected when the flow graph is not in SSA
     form.

_fall-thru_
     Fall-thru edges are present in case where the basic block may
     continue execution to the following one without branching.  These
     edges have the ‘EDGE_FALLTHRU’ flag set.  Unlike other types of
     edges, these edges must come into the basic block immediately
     following in the instruction stream.  The function
     ‘force_nonfallthru’ is available to insert an unconditional jump in
     the case that redirection is needed.  Note that this may require
     creation of a new basic block.

_exception handling_
     Exception handling edges represent possible control transfers from
     a trapping instruction to an exception handler.  The definition of
     "trapping" varies.  In C++, only function calls can throw, but for
     Ada exceptions like division by zero or segmentation fault are
     defined and thus each instruction possibly throwing this kind of
     exception needs to be handled as control flow instruction.
     Exception edges have the ‘EDGE_ABNORMAL’ and ‘EDGE_EH’ flags set.

     When updating the instruction stream it is easy to change possibly
     trapping instruction to non-trapping, by simply removing the
     exception edge.  The opposite conversion is difficult, but should
     not happen anyway.  The edges can be eliminated via
     ‘purge_dead_edges’ call.

     In the RTL representation, the destination of an exception edge is
     specified by ‘REG_EH_REGION’ note attached to the insn.  In case of
     a trapping call the ‘EDGE_ABNORMAL_CALL’ flag is set too.  In the
     ‘GIMPLE’ representation, this extra flag is not set.

     In the RTL representation, the predicate ‘may_trap_p’ may be used
     to check whether instruction still may trap or not.  For the tree
     representation, the ‘tree_could_trap_p’ predicate is available, but
     this predicate only checks for possible memory traps, as in
     dereferencing an invalid pointer location.

_sibling calls_
     Sibling calls or tail calls terminate the function in a
     non-standard way and thus an edge to the exit must be present.
     ‘EDGE_SIBCALL’ and ‘EDGE_ABNORMAL’ are set in such case.  These
     edges only exist in the RTL representation.

_computed jumps_
     Computed jumps contain edges to all labels in the function
     referenced from the code.  All those edges have ‘EDGE_ABNORMAL’
     flag set.  The edges used to represent computed jumps often cause
     compile time performance problems, since functions consisting of
     many taken labels and many computed jumps may have _very_ dense
     flow graphs, so these edges need to be handled with special care.
     During the earlier stages of the compilation process, GCC tries to
     avoid such dense flow graphs by factoring computed jumps.  For
     example, given the following series of jumps,

            goto *x;
            [ ... ]

            goto *x;
            [ ... ]

            goto *x;
            [ ... ]

     factoring the computed jumps results in the following code sequence
     which has a much simpler flow graph:

            goto y;
            [ ... ]

            goto y;
            [ ... ]

            goto y;
            [ ... ]

          y:
            goto *x;

     However, the classic problem with this transformation is that it
     has a runtime cost in there resulting code: An extra jump.
     Therefore, the computed jumps are un-factored in the later passes
     of the compiler (in the pass called
     ‘pass_duplicate_computed_gotos’).  Be aware of that when you work
     on passes in that area.  There have been numerous examples already
     where the compile time for code with unfactored computed jumps
     caused some serious headaches.

_nonlocal goto handlers_
     GCC allows nested functions to return into caller using a ‘goto’ to
     a label passed to as an argument to the callee.  The labels passed
     to nested functions contain special code to cleanup after function
     call.  Such sections of code are referred to as "nonlocal goto
     receivers".  If a function contains such nonlocal goto receivers,
     an edge from the call to the label is created with the
     ‘EDGE_ABNORMAL’ and ‘EDGE_ABNORMAL_CALL’ flags set.

_function entry points_
     By definition, execution of function starts at basic block 0, so
     there is always an edge from the ‘ENTRY_BLOCK_PTR’ to basic block
     0.  There is no ‘GIMPLE’ representation for alternate entry points
     at this moment.  In RTL, alternate entry points are specified by
     ‘CODE_LABEL’ with ‘LABEL_ALTERNATE_NAME’ defined.  This feature is
     currently used for multiple entry point prologues and is limited to
     post-reload passes only.  This can be used by back-ends to emit
     alternate prologues for functions called from different contexts.
     In future full support for multiple entry functions defined by
     Fortran 90 needs to be implemented.

_function exits_
     In the pre-reload representation a function terminates after the
     last instruction in the insn chain and no explicit return
     instructions are used.  This corresponds to the fall-thru edge into
     exit block.  After reload, optimal RTL epilogues are used that use
     explicit (conditional) return instructions that are represented by
     edges with no flags set.


File: gccint.info,  Node: Profile information,  Next: Maintaining the CFG,  Prev: Edges,  Up: Control Flow

15.3 Profile information
========================

In many cases a compiler must make a choice whether to trade speed in
one part of code for speed in another, or to trade code size for code
speed.  In such cases it is useful to know information about how often
some given block will be executed.  That is the purpose for maintaining
profile within the flow graph.  GCC can handle profile information
obtained through “profile feedback”, but it can also estimate branch
probabilities based on statics and heuristics.

 The feedback based profile is produced by compiling the program with
instrumentation, executing it on a train run and reading the numbers of
executions of basic blocks and edges back to the compiler while
re-compiling the program to produce the final executable.  This method
provides very accurate information about where a program spends most of
its time on the train run.  Whether it matches the average run of course
depends on the choice of train data set, but several studies have shown
that the behavior of a program usually changes just marginally over
different data sets.

 When profile feedback is not available, the compiler may be asked to
attempt to predict the behavior of each branch in the program using a
set of heuristics (see ‘predict.def’ for details) and compute estimated
frequencies of each basic block by propagating the probabilities over
the graph.

 Each ‘basic_block’ contains two integer fields to represent profile
information: ‘frequency’ and ‘count’.  The ‘frequency’ is an estimation
how often is basic block executed within a function.  It is represented
as an integer scaled in the range from 0 to ‘BB_FREQ_BASE’.  The most
frequently executed basic block in function is initially set to
‘BB_FREQ_BASE’ and the rest of frequencies are scaled accordingly.
During optimization, the frequency of the most frequent basic block can
both decrease (for instance by loop unrolling) or grow (for instance by
cross-jumping optimization), so scaling sometimes has to be performed
multiple times.

 The ‘count’ contains hard-counted numbers of execution measured during
training runs and is nonzero only when profile feedback is available.
This value is represented as the host's widest integer (typically a 64
bit integer) of the special type ‘gcov_type’.

 Most optimization passes can use only the frequency information of a
basic block, but a few passes may want to know hard execution counts.
The frequencies should always match the counts after scaling, however
during updating of the profile information numerical error may
accumulate into quite large errors.

 Each edge also contains a branch probability field: an integer in the
range from 0 to ‘REG_BR_PROB_BASE’.  It represents probability of
passing control from the end of the ‘src’ basic block to the ‘dest’
basic block, i.e. the probability that control will flow along this
edge.  The ‘EDGE_FREQUENCY’ macro is available to compute how frequently
a given edge is taken.  There is a ‘count’ field for each edge as well,
representing same information as for a basic block.

 The basic block frequencies are not represented in the instruction
stream, but in the RTL representation the edge frequencies are
represented for conditional jumps (via the ‘REG_BR_PROB’ macro) since
they are used when instructions are output to the assembly file and the
flow graph is no longer maintained.

 The probability that control flow arrives via a given edge to its
destination basic block is called “reverse probability” and is not
directly represented, but it may be easily computed from frequencies of
basic blocks.

 Updating profile information is a delicate task that can unfortunately
not be easily integrated with the CFG manipulation API.  Many of the
functions and hooks to modify the CFG, such as
‘redirect_edge_and_branch’, do not have enough information to easily
update the profile, so updating it is in the majority of cases left up
to the caller.  It is difficult to uncover bugs in the profile updating
code, because they manifest themselves only by producing worse code, and
checking profile consistency is not possible because of numeric error
accumulation.  Hence special attention needs to be given to this issue
in each pass that modifies the CFG.

 It is important to point out that ‘REG_BR_PROB_BASE’ and ‘BB_FREQ_BASE’
are both set low enough to be possible to compute second power of any
frequency or probability in the flow graph, it is not possible to even
square the ‘count’ field, as modern CPUs are fast enough to execute
$2^32$ operations quickly.


File: gccint.info,  Node: Maintaining the CFG,  Next: Liveness information,  Prev: Profile information,  Up: Control Flow

15.4 Maintaining the CFG
========================

An important task of each compiler pass is to keep both the control flow
graph and all profile information up-to-date.  Reconstruction of the
control flow graph after each pass is not an option, since it may be
very expensive and lost profile information cannot be reconstructed at
all.

 GCC has two major intermediate representations, and both use the
‘basic_block’ and ‘edge’ data types to represent control flow.  Both
representations share as much of the CFG maintenance code as possible.
For each representation, a set of “hooks” is defined so that each
representation can provide its own implementation of CFG manipulation
routines when necessary.  These hooks are defined in ‘cfghooks.h’.
There are hooks for almost all common CFG manipulations, including block
splitting and merging, edge redirection and creating and deleting basic
blocks.  These hooks should provide everything you need to maintain and
manipulate the CFG in both the RTL and ‘GIMPLE’ representation.

 At the moment, the basic block boundaries are maintained transparently
when modifying instructions, so there rarely is a need to move them
manually (such as in case someone wants to output instruction outside
basic block explicitly).

 In the RTL representation, each instruction has a ‘BLOCK_FOR_INSN’
value that represents pointer to the basic block that contains the
instruction.  In the ‘GIMPLE’ representation, the function ‘gimple_bb’
returns a pointer to the basic block containing the queried statement.

 When changes need to be applied to a function in its ‘GIMPLE’
representation, “GIMPLE statement iterators” should be used.  These
iterators provide an integrated abstraction of the flow graph and the
instruction stream.  Block statement iterators are constructed using the
‘gimple_stmt_iterator’ data structure and several modifiers are
available, including the following:

‘gsi_start’
     This function initializes a ‘gimple_stmt_iterator’ that points to
     the first non-empty statement in a basic block.

‘gsi_last’
     This function initializes a ‘gimple_stmt_iterator’ that points to
     the last statement in a basic block.

‘gsi_end_p’
     This predicate is ‘true’ if a ‘gimple_stmt_iterator’ represents the
     end of a basic block.

‘gsi_next’
     This function takes a ‘gimple_stmt_iterator’ and makes it point to
     its successor.

‘gsi_prev’
     This function takes a ‘gimple_stmt_iterator’ and makes it point to
     its predecessor.

‘gsi_insert_after’
     This function inserts a statement after the ‘gimple_stmt_iterator’
     passed in.  The final parameter determines whether the statement
     iterator is updated to point to the newly inserted statement, or
     left pointing to the original statement.

‘gsi_insert_before’
     This function inserts a statement before the ‘gimple_stmt_iterator’
     passed in.  The final parameter determines whether the statement
     iterator is updated to point to the newly inserted statement, or
     left pointing to the original statement.

‘gsi_remove’
     This function removes the ‘gimple_stmt_iterator’ passed in and
     rechains the remaining statements in a basic block, if any.

 In the RTL representation, the macros ‘BB_HEAD’ and ‘BB_END’ may be
used to get the head and end ‘rtx’ of a basic block.  No abstract
iterators are defined for traversing the insn chain, but you can just
use ‘NEXT_INSN’ and ‘PREV_INSN’ instead.  *Note Insns::.

 Usually a code manipulating pass simplifies the instruction stream and
the flow of control, possibly eliminating some edges.  This may for
example happen when a conditional jump is replaced with an unconditional
jump.  Updating of edges is not transparent and each optimization pass
is required to do so manually.  However only few cases occur in
practice.  The pass may call ‘purge_dead_edges’ on a given basic block
to remove superfluous edges, if any.

 Another common scenario is redirection of branch instructions, but this
is best modeled as redirection of edges in the control flow graph and
thus use of ‘redirect_edge_and_branch’ is preferred over more low level
functions, such as ‘redirect_jump’ that operate on RTL chain only.  The
CFG hooks defined in ‘cfghooks.h’ should provide the complete API
required for manipulating and maintaining the CFG.

 It is also possible that a pass has to insert control flow instruction
into the middle of a basic block, thus creating an entry point in the
middle of the basic block, which is impossible by definition: The block
must be split to make sure it only has one entry point, i.e. the head of
the basic block.  The CFG hook ‘split_block’ may be used when an
instruction in the middle of a basic block has to become the target of a
jump or branch instruction.

 For a global optimizer, a common operation is to split edges in the
flow graph and insert instructions on them.  In the RTL representation,
this can be easily done using the ‘insert_insn_on_edge’ function that
emits an instruction "on the edge", caching it for a later
‘commit_edge_insertions’ call that will take care of moving the inserted
instructions off the edge into the instruction stream contained in a
basic block.  This includes the creation of new basic blocks where
needed.  In the ‘GIMPLE’ representation, the equivalent functions are
‘gsi_insert_on_edge’ which inserts a block statement iterator on an
edge, and ‘gsi_commit_edge_inserts’ which flushes the instruction to
actual instruction stream.

 While debugging the optimization pass, the ‘verify_flow_info’ function
may be useful to find bugs in the control flow graph updating code.


File: gccint.info,  Node: Liveness information,  Prev: Maintaining the CFG,  Up: Control Flow

15.5 Liveness information
=========================

Liveness information is useful to determine whether some register is
"live" at given point of program, i.e. that it contains a value that may
be used at a later point in the program.  This information is used, for
instance, during register allocation, as the pseudo registers only need
to be assigned to a unique hard register or to a stack slot if they are
live.  The hard registers and stack slots may be freely reused for other
values when a register is dead.

 Liveness information is available in the back end starting with
‘pass_df_initialize’ and ending with ‘pass_df_finish’.  Three flavors of
live analysis are available: With ‘LR’, it is possible to determine at
any point ‘P’ in the function if the register may be used on some path
from ‘P’ to the end of the function.  With ‘UR’, it is possible to
determine if there is a path from the beginning of the function to ‘P’
that defines the variable.  ‘LIVE’ is the intersection of the ‘LR’ and
‘UR’ and a variable is live at ‘P’ if there is both an assignment that
reaches it from the beginning of the function and a use that can be
reached on some path from ‘P’ to the end of the function.

 In general ‘LIVE’ is the most useful of the three.  The macros
‘DF_[LR,UR,LIVE]_[IN,OUT]’ can be used to access this information.  The
macros take a basic block number and return a bitmap that is indexed by
the register number.  This information is only guaranteed to be up to
date after calls are made to ‘df_analyze’.  See the file ‘df-core.cc’
for details on using the dataflow.

 The liveness information is stored partly in the RTL instruction stream
and partly in the flow graph.  Local information is stored in the
instruction stream: Each instruction may contain ‘REG_DEAD’ notes
representing that the value of a given register is no longer needed, or
‘REG_UNUSED’ notes representing that the value computed by the
instruction is never used.  The second is useful for instructions
computing multiple values at once.


File: gccint.info,  Node: Loop Analysis and Representation,  Next: Machine Desc,  Prev: Control Flow,  Up: Top

16 Analysis and Representation of Loops
***************************************

GCC provides extensive infrastructure for work with natural loops, i.e.,
strongly connected components of CFG with only one entry block.  This
chapter describes representation of loops in GCC, both on GIMPLE and in
RTL, as well as the interfaces to loop-related analyses (induction
variable analysis and number of iterations analysis).

* Menu:

* Loop representation::         Representation and analysis of loops.
* Loop querying::               Getting information about loops.
* Loop manipulation::           Loop manipulation functions.
* LCSSA::                       Loop-closed SSA form.
* Scalar evolutions::           Induction variables on GIMPLE.
* loop-iv::                     Induction variables on RTL.
* Number of iterations::        Number of iterations analysis.
* Dependency analysis::         Data dependency analysis.


File: gccint.info,  Node: Loop representation,  Next: Loop querying,  Up: Loop Analysis and Representation

16.1 Loop representation
========================

This chapter describes the representation of loops in GCC, and functions
that can be used to build, modify and analyze this representation.  Most
of the interfaces and data structures are declared in ‘cfgloop.h’.  Loop
structures are analyzed and this information disposed or updated at the
discretion of individual passes.  Still most of the generic CFG
manipulation routines are aware of loop structures and try to keep them
up-to-date.  By this means an increasing part of the compilation
pipeline is setup to maintain loop structure across passes to allow
attaching meta information to individual loops for consumption by later
passes.

 In general, a natural loop has one entry block (header) and possibly
several back edges (latches) leading to the header from the inside of
the loop.  Loops with several latches may appear if several loops share
a single header, or if there is a branching in the middle of the loop.
The representation of loops in GCC however allows only loops with a
single latch.  During loop analysis, headers of such loops are split and
forwarder blocks are created in order to disambiguate their structures.
Heuristic based on profile information and structure of the induction
variables in the loops is used to determine whether the latches
correspond to sub-loops or to control flow in a single loop.  This means
that the analysis sometimes changes the CFG, and if you run it in the
middle of an optimization pass, you must be able to deal with the new
blocks.  You may avoid CFG changes by passing
‘LOOPS_MAY_HAVE_MULTIPLE_LATCHES’ flag to the loop discovery, note
however that most other loop manipulation functions will not work
correctly for loops with multiple latch edges (the functions that only
query membership of blocks to loops and subloop relationships, or
enumerate and test loop exits, can be expected to work).

 Body of the loop is the set of blocks that are dominated by its header,
and reachable from its latch against the direction of edges in CFG.  The
loops are organized in a containment hierarchy (tree) such that all the
loops immediately contained inside loop L are the children of L in the
tree.  This tree is represented by the ‘struct loops’ structure.  The
root of this tree is a fake loop that contains all blocks in the
function.  Each of the loops is represented in a ‘struct loop’
structure.  Each loop is assigned an index (‘num’ field of the ‘struct
loop’ structure), and the pointer to the loop is stored in the
corresponding field of the ‘larray’ vector in the loops structure.  The
indices do not have to be continuous, there may be empty (‘NULL’)
entries in the ‘larray’ created by deleting loops.  Also, there is no
guarantee on the relative order of a loop and its subloops in the
numbering.  The index of a loop never changes.

 The entries of the ‘larray’ field should not be accessed directly.  The
function ‘get_loop’ returns the loop description for a loop with the
given index.  ‘number_of_loops’ function returns number of loops in the
function.  To traverse all loops, use a range-based for loop with class
‘loops_list’ instance.  The ‘flags’ argument passed to the constructor
function of class ‘loops_list’ is used to determine the direction of
traversal and the set of loops visited.  Each loop is guaranteed to be
visited exactly once, regardless of the changes to the loop tree, and
the loops may be removed during the traversal.  The newly created loops
are never traversed, if they need to be visited, this must be done
separately after their creation.

 Each basic block contains the reference to the innermost loop it
belongs to (‘loop_father’).  For this reason, it is only possible to
have one ‘struct loops’ structure initialized at the same time for each
CFG.  The global variable ‘current_loops’ contains the ‘struct loops’
structure.  Many of the loop manipulation functions assume that
dominance information is up-to-date.

 The loops are analyzed through ‘loop_optimizer_init’ function.  The
argument of this function is a set of flags represented in an integer
bitmask.  These flags specify what other properties of the loop
structures should be calculated/enforced and preserved later:

   • ‘LOOPS_MAY_HAVE_MULTIPLE_LATCHES’: If this flag is set, no changes
     to CFG will be performed in the loop analysis, in particular, loops
     with multiple latch edges will not be disambiguated.  If a loop has
     multiple latches, its latch block is set to NULL.  Most of the loop
     manipulation functions will not work for loops in this shape.  No
     other flags that require CFG changes can be passed to
     loop_optimizer_init.
   • ‘LOOPS_HAVE_PREHEADERS’: Forwarder blocks are created in such a way
     that each loop has only one entry edge, and additionally, the
     source block of this entry edge has only one successor.  This
     creates a natural place where the code can be moved out of the
     loop, and ensures that the entry edge of the loop leads from its
     immediate super-loop.
   • ‘LOOPS_HAVE_SIMPLE_LATCHES’: Forwarder blocks are created to force
     the latch block of each loop to have only one successor.  This
     ensures that the latch of the loop does not belong to any of its
     sub-loops, and makes manipulation with the loops significantly
     easier.  Most of the loop manipulation functions assume that the
     loops are in this shape.  Note that with this flag, the "normal"
     loop without any control flow inside and with one exit consists of
     two basic blocks.
   • ‘LOOPS_HAVE_MARKED_IRREDUCIBLE_REGIONS’: Basic blocks and edges in
     the strongly connected components that are not natural loops (have
     more than one entry block) are marked with ‘BB_IRREDUCIBLE_LOOP’
     and ‘EDGE_IRREDUCIBLE_LOOP’ flags.  The flag is not set for blocks
     and edges that belong to natural loops that are in such an
     irreducible region (but it is set for the entry and exit edges of
     such a loop, if they lead to/from this region).
   • ‘LOOPS_HAVE_RECORDED_EXITS’: The lists of exits are recorded and
     updated for each loop.  This makes some functions (e.g.,
     ‘get_loop_exit_edges’) more efficient.  Some functions (e.g.,
     ‘single_exit’) can be used only if the lists of exits are recorded.

 These properties may also be computed/enforced later, using functions
‘create_preheaders’, ‘force_single_succ_latches’,
‘mark_irreducible_loops’ and ‘record_loop_exits’.  The properties can be
queried using ‘loops_state_satisfies_p’.

 The memory occupied by the loops structures should be freed with
‘loop_optimizer_finalize’ function.  When loop structures are setup to
be preserved across passes this function reduces the information to be
kept up-to-date to a minimum (only ‘LOOPS_MAY_HAVE_MULTIPLE_LATCHES’
set).

 The CFG manipulation functions in general do not update loop
structures.  Specialized versions that additionally do so are provided
for the most common tasks.  On GIMPLE, ‘cleanup_tree_cfg_loop’ function
can be used to cleanup CFG while updating the loops structures if
‘current_loops’ is set.

 At the moment loop structure is preserved from the start of GIMPLE loop
optimizations until the end of RTL loop optimizations.  During this time
a loop can be tracked by its ‘struct loop’ and number.


File: gccint.info,  Node: Loop querying,  Next: Loop manipulation,  Prev: Loop representation,  Up: Loop Analysis and Representation

16.2 Loop querying
==================

The functions to query the information about loops are declared in
‘cfgloop.h’.  Some of the information can be taken directly from the
structures.  ‘loop_father’ field of each basic block contains the
innermost loop to that the block belongs.  The most useful fields of
loop structure (that are kept up-to-date at all times) are:

   • ‘header’, ‘latch’: Header and latch basic blocks of the loop.
   • ‘num_nodes’: Number of basic blocks in the loop (including the
     basic blocks of the sub-loops).
   • ‘outer’, ‘inner’, ‘next’: The super-loop, the first sub-loop, and
     the sibling of the loop in the loops tree.

 There are other fields in the loop structures, many of them used only
by some of the passes, or not updated during CFG changes; in general,
they should not be accessed directly.

 The most important functions to query loop structures are:

   • ‘loop_depth’: The depth of the loop in the loops tree, i.e., the
     number of super-loops of the loop.
   • ‘flow_loops_dump’: Dumps the information about loops to a file.
   • ‘verify_loop_structure’: Checks consistency of the loop structures.
   • ‘loop_latch_edge’: Returns the latch edge of a loop.
   • ‘loop_preheader_edge’: If loops have preheaders, returns the
     preheader edge of a loop.
   • ‘flow_loop_nested_p’: Tests whether loop is a sub-loop of another
     loop.
   • ‘flow_bb_inside_loop_p’: Tests whether a basic block belongs to a
     loop (including its sub-loops).
   • ‘find_common_loop’: Finds the common super-loop of two loops.
   • ‘superloop_at_depth’: Returns the super-loop of a loop with the
     given depth.
   • ‘tree_num_loop_insns’, ‘num_loop_insns’: Estimates the number of
     insns in the loop, on GIMPLE and on RTL.
   • ‘loop_exit_edge_p’: Tests whether edge is an exit from a loop.
   • ‘mark_loop_exit_edges’: Marks all exit edges of all loops with
     ‘EDGE_LOOP_EXIT’ flag.
   • ‘get_loop_body’, ‘get_loop_body_in_dom_order’,
     ‘get_loop_body_in_bfs_order’: Enumerates the basic blocks in the
     loop in depth-first search order in reversed CFG, ordered by
     dominance relation, and breath-first search order, respectively.
   • ‘single_exit’: Returns the single exit edge of the loop, or ‘NULL’
     if the loop has more than one exit.  You can only use this function
     if ‘LOOPS_HAVE_RECORDED_EXITS’ is used.
   • ‘get_loop_exit_edges’: Enumerates the exit edges of a loop.
   • ‘just_once_each_iteration_p’: Returns true if the basic block is
     executed exactly once during each iteration of a loop (that is, it
     does not belong to a sub-loop, and it dominates the latch of the
     loop).


File: gccint.info,  Node: Loop manipulation,  Next: LCSSA,  Prev: Loop querying,  Up: Loop Analysis and Representation

16.3 Loop manipulation
======================

The loops tree can be manipulated using the following functions:

   • ‘flow_loop_tree_node_add’: Adds a node to the tree.
   • ‘flow_loop_tree_node_remove’: Removes a node from the tree.
   • ‘add_bb_to_loop’: Adds a basic block to a loop.
   • ‘remove_bb_from_loops’: Removes a basic block from loops.

 Most low-level CFG functions update loops automatically.  The following
functions handle some more complicated cases of CFG manipulations:

   • ‘remove_path’: Removes an edge and all blocks it dominates.
   • ‘split_loop_exit_edge’: Splits exit edge of the loop, ensuring that
     PHI node arguments remain in the loop (this ensures that
     loop-closed SSA form is preserved).  Only useful on GIMPLE.

 Finally, there are some higher-level loop transformations implemented.
While some of them are written so that they should work on non-innermost
loops, they are mostly untested in that case, and at the moment, they
are only reliable for the innermost loops:

   • ‘create_iv’: Creates a new induction variable.  Only works on
     GIMPLE.  ‘standard_iv_increment_position’ can be used to find a
     suitable place for the iv increment.
   • ‘duplicate_loop_body_to_header_edge’,
     ‘tree_duplicate_loop_body_to_header_edge’: These functions (on RTL
     and on GIMPLE) duplicate the body of the loop prescribed number of
     times on one of the edges entering loop header, thus performing
     either loop unrolling or loop peeling.  ‘can_duplicate_loop_p’
     (‘can_unroll_loop_p’ on GIMPLE) must be true for the duplicated
     loop.
   • ‘loop_version’: This function creates a copy of a loop, and a
     branch before them that selects one of them depending on the
     prescribed condition.  This is useful for optimizations that need
     to verify some assumptions in runtime (one of the copies of the
     loop is usually left unchanged, while the other one is transformed
     in some way).
   • ‘tree_unroll_loop’: Unrolls the loop, including peeling the extra
     iterations to make the number of iterations divisible by unroll
     factor, updating the exit condition, and removing the exits that
     now cannot be taken.  Works only on GIMPLE.


File: gccint.info,  Node: LCSSA,  Next: Scalar evolutions,  Prev: Loop manipulation,  Up: Loop Analysis and Representation

16.4 Loop-closed SSA form
=========================

Throughout the loop optimizations on tree level, one extra condition is
enforced on the SSA form: No SSA name is used outside of the loop in
that it is defined.  The SSA form satisfying this condition is called
"loop-closed SSA form" - LCSSA.  To enforce LCSSA, PHI nodes must be
created at the exits of the loops for the SSA names that are used
outside of them.  Only the real operands (not virtual SSA names) are
held in LCSSA, in order to save memory.

 There are various benefits of LCSSA:

   • Many optimizations (value range analysis, final value replacement)
     are interested in the values that are defined in the loop and used
     outside of it, i.e., exactly those for that we create new PHI
     nodes.
   • In induction variable analysis, it is not necessary to specify the
     loop in that the analysis should be performed - the scalar
     evolution analysis always returns the results with respect to the
     loop in that the SSA name is defined.
   • It makes updating of SSA form during loop transformations simpler.
     Without LCSSA, operations like loop unrolling may force creation of
     PHI nodes arbitrarily far from the loop, while in LCSSA, the SSA
     form can be updated locally.  However, since we only keep real
     operands in LCSSA, we cannot use this advantage (we could have
     local updating of real operands, but it is not much more efficient
     than to use generic SSA form updating for it as well; the amount of
     changes to SSA is the same).

 However, it also means LCSSA must be updated.  This is usually
straightforward, unless you create a new value in loop and use it
outside, or unless you manipulate loop exit edges (functions are
provided to make these manipulations simple).
‘rewrite_into_loop_closed_ssa’ is used to rewrite SSA form to LCSSA, and
‘verify_loop_closed_ssa’ to check that the invariant of LCSSA is
preserved.


File: gccint.info,  Node: Scalar evolutions,  Next: loop-iv,  Prev: LCSSA,  Up: Loop Analysis and Representation

16.5 Scalar evolutions
======================

Scalar evolutions (SCEV) are used to represent results of induction
variable analysis on GIMPLE.  They enable us to represent variables with
complicated behavior in a simple and consistent way (we only use it to
express values of polynomial induction variables, but it is possible to
extend it).  The interfaces to SCEV analysis are declared in
‘tree-scalar-evolution.h’.  To use scalar evolutions analysis,
‘scev_initialize’ must be used.  To stop using SCEV, ‘scev_finalize’
should be used.  SCEV analysis caches results in order to save time and
memory.  This cache however is made invalid by most of the loop
transformations, including removal of code.  If such a transformation is
performed, ‘scev_reset’ must be called to clean the caches.

 Given an SSA name, its behavior in loops can be analyzed using the
‘analyze_scalar_evolution’ function.  The returned SCEV however does not
have to be fully analyzed and it may contain references to other SSA
names defined in the loop.  To resolve these (potentially recursive)
references, ‘instantiate_parameters’ or ‘resolve_mixers’ functions must
be used.  ‘instantiate_parameters’ is useful when you use the results of
SCEV only for some analysis, and when you work with whole nest of loops
at once.  It will try replacing all SSA names by their SCEV in all
loops, including the super-loops of the current loop, thus providing a
complete information about the behavior of the variable in the loop
nest.  ‘resolve_mixers’ is useful if you work with only one loop at a
time, and if you possibly need to create code based on the value of the
induction variable.  It will only resolve the SSA names defined in the
current loop, leaving the SSA names defined outside unchanged, even if
their evolution in the outer loops is known.

 The SCEV is a normal tree expression, except for the fact that it may
contain several special tree nodes.  One of them is ‘SCEV_NOT_KNOWN’,
used for SSA names whose value cannot be expressed.  The other one is
‘POLYNOMIAL_CHREC’.  Polynomial chrec has three arguments - base, step
and loop (both base and step may contain further polynomial chrecs).
Type of the expression and of base and step must be the same.  A
variable has evolution ‘POLYNOMIAL_CHREC(base, step, loop)’ if it is (in
the specified loop) equivalent to ‘x_1’ in the following example

     while (...)
       {
         x_1 = phi (base, x_2);
         x_2 = x_1 + step;
       }

 Note that this includes the language restrictions on the operations.
For example, if we compile C code and ‘x’ has signed type, then the
overflow in addition would cause undefined behavior, and we may assume
that this does not happen.  Hence, the value with this SCEV cannot
overflow (which restricts the number of iterations of such a loop).

 In many cases, one wants to restrict the attention just to affine
induction variables.  In this case, the extra expressive power of SCEV
is not useful, and may complicate the optimizations.  In this case,
‘simple_iv’ function may be used to analyze a value - the result is a
loop-invariant base and step.


File: gccint.info,  Node: loop-iv,  Next: Number of iterations,  Prev: Scalar evolutions,  Up: Loop Analysis and Representation

16.6 IV analysis on RTL
=======================

The induction variable on RTL is simple and only allows analysis of
affine induction variables, and only in one loop at once.  The interface
is declared in ‘cfgloop.h’.  Before analyzing induction variables in a
loop L, ‘iv_analysis_loop_init’ function must be called on L. After the
analysis (possibly calling ‘iv_analysis_loop_init’ for several loops) is
finished, ‘iv_analysis_done’ should be called.  The following functions
can be used to access the results of the analysis:

   • ‘iv_analyze’: Analyzes a single register used in the given insn.
     If no use of the register in this insn is found, the following
     insns are scanned, so that this function can be called on the insn
     returned by get_condition.
   • ‘iv_analyze_result’: Analyzes result of the assignment in the given
     insn.
   • ‘iv_analyze_expr’: Analyzes a more complicated expression.  All its
     operands are analyzed by ‘iv_analyze’, and hence they must be used
     in the specified insn or one of the following insns.

 The description of the induction variable is provided in ‘struct
rtx_iv’.  In order to handle subregs, the representation is a bit
complicated; if the value of the ‘extend’ field is not ‘UNKNOWN’, the
value of the induction variable in the i-th iteration is

     delta + mult * extend_{extend_mode} (subreg_{mode} (base + i * step)),

 with the following exception: if ‘first_special’ is true, then the
value in the first iteration (when ‘i’ is zero) is ‘delta + mult *
base’.  However, if ‘extend’ is equal to ‘UNKNOWN’, then ‘first_special’
must be false, ‘delta’ 0, ‘mult’ 1 and the value in the i-th iteration
is

     subreg_{mode} (base + i * step)

 The function ‘get_iv_value’ can be used to perform these calculations.


File: gccint.info,  Node: Number of iterations,  Next: Dependency analysis,  Prev: loop-iv,  Up: Loop Analysis and Representation

16.7 Number of iterations analysis
==================================

Both on GIMPLE and on RTL, there are functions available to determine
the number of iterations of a loop, with a similar interface.  The
number of iterations of a loop in GCC is defined as the number of
executions of the loop latch.  In many cases, it is not possible to
determine the number of iterations unconditionally - the determined
number is correct only if some assumptions are satisfied.  The analysis
tries to verify these conditions using the information contained in the
program; if it fails, the conditions are returned together with the
result.  The following information and conditions are provided by the
analysis:

   • ‘assumptions’: If this condition is false, the rest of the
     information is invalid.
   • ‘noloop_assumptions’ on RTL, ‘may_be_zero’ on GIMPLE: If this
     condition is true, the loop exits in the first iteration.
   • ‘infinite’: If this condition is true, the loop is infinite.  This
     condition is only available on RTL.  On GIMPLE, conditions for
     finiteness of the loop are included in ‘assumptions’.
   • ‘niter_expr’ on RTL, ‘niter’ on GIMPLE: The expression that gives
     number of iterations.  The number of iterations is defined as the
     number of executions of the loop latch.

 Both on GIMPLE and on RTL, it necessary for the induction variable
analysis framework to be initialized (SCEV on GIMPLE, loop-iv on RTL).
On GIMPLE, the results are stored to ‘struct tree_niter_desc’ structure.
Number of iterations before the loop is exited through a given exit can
be determined using ‘number_of_iterations_exit’ function.  On RTL, the
results are returned in ‘struct niter_desc’ structure.  The
corresponding function is named ‘check_simple_exit’.  There are also
functions that pass through all the exits of a loop and try to find one
with easy to determine number of iterations - ‘find_loop_niter’ on
GIMPLE and ‘find_simple_exit’ on RTL.  Finally, there are functions that
provide the same information, but additionally cache it, so that
repeated calls to number of iterations are not so costly -
‘number_of_latch_executions’ on GIMPLE and ‘get_simple_loop_desc’ on
RTL.

 Note that some of these functions may behave slightly differently than
others - some of them return only the expression for the number of
iterations, and fail if there are some assumptions.  The function
‘number_of_latch_executions’ works only for single-exit loops.  The
function ‘number_of_cond_exit_executions’ can be used to determine
number of executions of the exit condition of a single-exit loop (i.e.,
the ‘number_of_latch_executions’ increased by one).

 On GIMPLE, below constraint flags affect semantics of some APIs of
number of iterations analyzer:

   • ‘LOOP_C_INFINITE’: If this constraint flag is set, the loop is
     known to be infinite.  APIs like ‘number_of_iterations_exit’ can
     return false directly without doing any analysis.
   • ‘LOOP_C_FINITE’: If this constraint flag is set, the loop is known
     to be finite, in other words, loop's number of iterations can be
     computed with ‘assumptions’ be true.

 Generally, the constraint flags are set/cleared by consumers which are
loop optimizers.  It's also the consumers' responsibility to set/clear
constraints correctly.  Failing to do that might result in hard to track
down bugs in scev/niter consumers.  One typical use case is vectorizer:
it drives number of iterations analyzer by setting ‘LOOP_C_FINITE’ and
vectorizes possibly infinite loop by versioning loop with analysis
result.  In return, constraints set by consumers can also help number of
iterations analyzer in following optimizers.  For example, ‘niter’ of a
loop versioned under ‘assumptions’ is valid unconditionally.

 Other constraints may be added in the future, for example, a constraint
indicating that loops' latch must roll thus ‘may_be_zero’ would be false
unconditionally.


File: gccint.info,  Node: Dependency analysis,  Prev: Number of iterations,  Up: Loop Analysis and Representation

16.8 Data Dependency Analysis
=============================

The code for the data dependence analysis can be found in
‘tree-data-ref.cc’ and its interface and data structures are described
in ‘tree-data-ref.h’.  The function that computes the data dependences
for all the array and pointer references for a given loop is
‘compute_data_dependences_for_loop’.  This function is currently used by
the linear loop transform and the vectorization passes.  Before calling
this function, one has to allocate two vectors: a first vector will
contain the set of data references that are contained in the analyzed
loop body, and the second vector will contain the dependence relations
between the data references.  Thus if the vector of data references is
of size ‘n’, the vector containing the dependence relations will contain
‘n*n’ elements.  However if the analyzed loop contains side effects,
such as calls that potentially can interfere with the data references in
the current analyzed loop, the analysis stops while scanning the loop
body for data references, and inserts a single ‘chrec_dont_know’ in the
dependence relation array.

 The data references are discovered in a particular order during the
scanning of the loop body: the loop body is analyzed in execution order,
and the data references of each statement are pushed at the end of the
data reference array.  Two data references syntactically occur in the
program in the same order as in the array of data references.  This
syntactic order is important in some classical data dependence tests,
and mapping this order to the elements of this array avoids costly
queries to the loop body representation.

 Three types of data references are currently handled: ARRAY_REF,
INDIRECT_REF and COMPONENT_REF.  The data structure for the data
reference is ‘data_reference’, where ‘data_reference_p’ is a name of a
pointer to the data reference structure.  The structure contains the
following elements:

   • ‘base_object_info’: Provides information about the base object of
     the data reference and its access functions.  These access
     functions represent the evolution of the data reference in the loop
     relative to its base, in keeping with the classical meaning of the
     data reference access function for the support of arrays.  For
     example, for a reference ‘a.b[i][j]’, the base object is ‘a.b’ and
     the access functions, one for each array subscript, are: ‘{i_init,
     + i_step}_1, {j_init, +, j_step}_2’.

   • ‘first_location_in_loop’: Provides information about the first
     location accessed by the data reference in the loop and about the
     access function used to represent evolution relative to this
     location.  This data is used to support pointers, and is not used
     for arrays (for which we have base objects).  Pointer accesses are
     represented as a one-dimensional access that starts from the first
     location accessed in the loop.  For example:

                for1 i
                   for2 j
                    *((int *)p + i + j) = a[i][j];

     The access function of the pointer access is ‘{0, + 4B}_for2’
     relative to ‘p + i’.  The access functions of the array are
     ‘{i_init, + i_step}_for1’ and ‘{j_init, +, j_step}_for2’ relative
     to ‘a’.

     Usually, the object the pointer refers to is either unknown, or we
     cannot prove that the access is confined to the boundaries of a
     certain object.

     Two data references can be compared only if at least one of these
     two representations has all its fields filled for both data
     references.

     The current strategy for data dependence tests is as follows: If
     both ‘a’ and ‘b’ are represented as arrays, compare ‘a.base_object’
     and ‘b.base_object’; if they are equal, apply dependence tests (use
     access functions based on base_objects).  Else if both ‘a’ and ‘b’
     are represented as pointers, compare ‘a.first_location’ and
     ‘b.first_location’; if they are equal, apply dependence tests (use
     access functions based on first location).  However, if ‘a’ and ‘b’
     are represented differently, only try to prove that the bases are
     definitely different.

   • Aliasing information.
   • Alignment information.

 The structure describing the relation between two data references is
‘data_dependence_relation’ and the shorter name for a pointer to such a
structure is ‘ddr_p’.  This structure contains:

   • a pointer to each data reference,
   • a tree node ‘are_dependent’ that is set to ‘chrec_known’ if the
     analysis has proved that there is no dependence between these two
     data references, ‘chrec_dont_know’ if the analysis was not able to
     determine any useful result and potentially there could exist a
     dependence between these data references, and ‘are_dependent’ is
     set to ‘NULL_TREE’ if there exist a dependence relation between the
     data references, and the description of this dependence relation is
     given in the ‘subscripts’, ‘dir_vects’, and ‘dist_vects’ arrays,
   • a boolean that determines whether the dependence relation can be
     represented by a classical distance vector,
   • an array ‘subscripts’ that contains a description of each subscript
     of the data references.  Given two array accesses a subscript is
     the tuple composed of the access functions for a given dimension.
     For example, given ‘A[f1][f2][f3]’ and ‘B[g1][g2][g3]’, there are
     three subscripts: ‘(f1, g1), (f2, g2), (f3, g3)’.
   • two arrays ‘dir_vects’ and ‘dist_vects’ that contain classical
     representations of the data dependences under the form of direction
     and distance dependence vectors,
   • an array of loops ‘loop_nest’ that contains the loops to which the
     distance and direction vectors refer to.

 Several functions for pretty printing the information extracted by the
data dependence analysis are available: ‘dump_ddrs’ prints with a
maximum verbosity the details of a data dependence relations array,
‘dump_dist_dir_vectors’ prints only the classical distance and direction
vectors for a data dependence relations array, and
‘dump_data_references’ prints the details of the data references
contained in a data reference array.


File: gccint.info,  Node: Machine Desc,  Next: Target Macros,  Prev: Loop Analysis and Representation,  Up: Top

17 Machine Descriptions
***********************

A machine description has two parts: a file of instruction patterns
(‘.md’ file) and a C header file of macro definitions.

 The ‘.md’ file for a target machine contains a pattern for each
instruction that the target machine supports (or at least each
instruction that is worth telling the compiler about).  It may also
contain comments.  A semicolon causes the rest of the line to be a
comment, unless the semicolon is inside a quoted string.

 See the next chapter for information on the C header file.

* Menu:

* Overview::            How the machine description is used.
* Patterns::            How to write instruction patterns.
* Example::             An explained example of a ‘define_insn’ pattern.
* RTL Template::        The RTL template defines what insns match a pattern.
* Output Template::     The output template says how to make assembler code
                        from such an insn.
* Output Statement::    For more generality, write C code to output
                        the assembler code.
* Compact Syntax::      Compact syntax for writing machine descriptors.
* Predicates::          Controlling what kinds of operands can be used
                        for an insn.
* Constraints::         Fine-tuning operand selection.
* Standard Names::      Names mark patterns to use for code generation.
* Pattern Ordering::    When the order of patterns makes a difference.
* Dependent Patterns::  Having one pattern may make you need another.
* Jump Patterns::       Special considerations for patterns for jump insns.
* Looping Patterns::    How to define patterns for special looping insns.
* Insn Canonicalizations::Canonicalization of Instructions
* Expander Definitions::Generating a sequence of several RTL insns
                        for a standard operation.
* Insn Splitting::      Splitting Instructions into Multiple Instructions.
* Including Patterns::  Including Patterns in Machine Descriptions.
* Peephole Definitions::Defining machine-specific peephole optimizations.
* Insn Attributes::     Specifying the value of attributes for generated insns.
* Conditional Execution::Generating ‘define_insn’ patterns for
                         predication.
* Define Subst::	Generating ‘define_insn’ and ‘define_expand’
			patterns from other patterns.
* Constant Definitions::Defining symbolic constants that can be used in the
                        md file.
* Iterators::           Using iterators to generate patterns from a template.


File: gccint.info,  Node: Overview,  Next: Patterns,  Up: Machine Desc

17.1 Overview of How the Machine Description is Used
====================================================

There are three main conversions that happen in the compiler:

  1. The front end reads the source code and builds a parse tree.

  2. The parse tree is used to generate an RTL insn list based on named
     instruction patterns.

  3. The insn list is matched against the RTL templates to produce
     assembler code.

 For the generate pass, only the names of the insns matter, from either
a named ‘define_insn’ or a ‘define_expand’.  The compiler will choose
the pattern with the right name and apply the operands according to the
documentation later in this chapter, without regard for the RTL template
or operand constraints.  Note that the names the compiler looks for are
hard-coded in the compiler--it will ignore unnamed patterns and patterns
with names it doesn't know about, but if you don't provide a named
pattern it needs, it will abort.

 If a ‘define_insn’ is used, the template given is inserted into the
insn list.  If a ‘define_expand’ is used, one of three things happens,
based on the condition logic.  The condition logic may manually create
new insns for the insn list, say via ‘emit_insn()’, and invoke ‘DONE’.
For certain named patterns, it may invoke ‘FAIL’ to tell the compiler to
use an alternate way of performing that task.  If it invokes neither
‘DONE’ nor ‘FAIL’, the template given in the pattern is inserted, as if
the ‘define_expand’ were a ‘define_insn’.

 Once the insn list is generated, various optimization passes convert,
replace, and rearrange the insns in the insn list.  This is where the
‘define_split’ and ‘define_peephole’ patterns get used, for example.

 Finally, the insn list's RTL is matched up with the RTL templates in
the ‘define_insn’ patterns, and those patterns are used to emit the
final assembly code.  For this purpose, each named ‘define_insn’ acts
like it's unnamed, since the names are ignored.


File: gccint.info,  Node: Patterns,  Next: Example,  Prev: Overview,  Up: Machine Desc

17.2 Everything about Instruction Patterns
==========================================

A ‘define_insn’ expression is used to define instruction patterns to
which insns may be matched.  A ‘define_insn’ expression contains an
incomplete RTL expression, with pieces to be filled in later, operand
constraints that restrict how the pieces can be filled in, and an output
template or C code to generate the assembler output.

 A ‘define_insn’ is an RTL expression containing four or five operands:

  1. An optional name N.  When a name is present, the compiler
     automically generates a C++ function ‘gen_N’ that takes the
     operands of the instruction as arguments and returns the
     instruction's rtx pattern.  The compiler also assigns the
     instruction a unique code ‘CODE_FOR_N’, with all such codes
     belonging to an enum called ‘insn_code’.

     These names serve one of two purposes.  The first is to indicate
     that the instruction performs a certain standard job for the
     RTL-generation pass of the compiler, such as a move, an addition,
     or a conditional jump.  The second is to help the target generate
     certain target-specific operations, such as when implementing
     target-specific intrinsic functions.

     It is better to prefix target-specific names with the name of the
     target, to avoid any clash with current or future standard names.

     The absence of a name is indicated by writing an empty string where
     the name should go.  Nameless instruction patterns are never used
     for generating RTL code, but they may permit several simpler insns
     to be combined later on.

     For the purpose of debugging the compiler, you may also specify a
     name beginning with the ‘*’ character.  Such a name is used only
     for identifying the instruction in RTL dumps; it is equivalent to
     having a nameless pattern for all other purposes.  Names beginning
     with the ‘*’ character are not required to be unique.

     The name may also have the form ‘@N’.  This has the same effect as
     a name ‘N’, but in addition tells the compiler to generate further
     helper functions; see *note Parameterized Names:: for details.

  2. The “RTL template”: This is a vector of incomplete RTL expressions
     which describe the semantics of the instruction (*note RTL
     Template::).  It is incomplete because it may contain
     ‘match_operand’, ‘match_operator’, and ‘match_dup’ expressions that
     stand for operands of the instruction.

     If the vector has multiple elements, the RTL template is treated as
     a ‘parallel’ expression.

  3. The condition: This is a string which contains a C expression.
     When the compiler attempts to match RTL against a pattern, the
     condition is evaluated.  If the condition evaluates to ‘true’, the
     match is permitted.  The condition may be an empty string, which is
     treated as always ‘true’.

     For a named pattern, the condition may not depend on the data in
     the insn being matched, but only the target-machine-type flags.
     The compiler needs to test these conditions during initialization
     in order to learn exactly which named instructions are available in
     a particular run.

     For nameless patterns, the condition is applied only when matching
     an individual insn, and only after the insn has matched the
     pattern's recognition template.  The insn's operands may be found
     in the vector ‘operands’.

     An instruction condition cannot become more restrictive as
     compilation progresses.  If the condition accepts a particular RTL
     instruction at one stage of compilation, it must continue to accept
     that instruction until the final pass.  For example,
     ‘!reload_completed’ and ‘can_create_pseudo_p ()’ are both invalid
     instruction conditions, because they are true during the earlier
     RTL passes and false during the later ones.  For the same reason,
     if a condition accepts an instruction before register allocation,
     it cannot later try to control register allocation by excluding
     certain register or value combinations.

     Although a condition cannot become more restrictive as compilation
     progresses, the condition for a nameless pattern _can_ become more
     permissive.  For example, a nameless instruction can require
     ‘reload_completed’ to be true, in which case it only matches after
     register allocation.

  4. The “output template” or “output statement”: This is either a
     string, or a fragment of C code which returns a string.

     When simple substitution isn't general enough, you can specify a
     piece of C code to compute the output.  *Note Output Statement::.

  5. The “insn attributes”: This is an optional vector containing the
     values of attributes for insns matching this pattern (*note Insn
     Attributes::).


File: gccint.info,  Node: Example,  Next: RTL Template,  Prev: Patterns,  Up: Machine Desc

17.3 Example of ‘define_insn’
=============================

Here is an example of an instruction pattern, taken from the machine
description for the 68000/68020.

     (define_insn "tstsi"
       [(set (cc0)
             (match_operand:SI 0 "general_operand" "rm"))]
       ""
       "*
     {
       if (TARGET_68020 || ! ADDRESS_REG_P (operands[0]))
         return \"tstl %0\";
       return \"cmpl #0,%0\";
     }")

This can also be written using braced strings:

     (define_insn "tstsi"
       [(set (cc0)
             (match_operand:SI 0 "general_operand" "rm"))]
       ""
     {
       if (TARGET_68020 || ! ADDRESS_REG_P (operands[0]))
         return "tstl %0";
       return "cmpl #0,%0";
     })

 This describes an instruction which sets the condition codes based on
the value of a general operand.  It has no condition, so any insn with
an RTL description of the form shown may be matched to this pattern.
The name ‘tstsi’ means "test a ‘SImode’ value" and tells the RTL
generation pass that, when it is necessary to test such a value, an insn
to do so can be constructed using this pattern.

 The output control string is a piece of C code which chooses which
output template to return based on the kind of operand and the specific
type of CPU for which code is being generated.

 ‘"rm"’ is an operand constraint.  Its meaning is explained below.


File: gccint.info,  Node: RTL Template,  Next: Output Template,  Prev: Example,  Up: Machine Desc

17.4 RTL Template
=================

The RTL template is used to define which insns match the particular
pattern and how to find their operands.  For named patterns, the RTL
template also says how to construct an insn from specified operands.

 Construction involves substituting specified operands into a copy of
the template.  Matching involves determining the values that serve as
the operands in the insn being matched.  Both of these activities are
controlled by special expression types that direct matching and
substitution of the operands.

‘(match_operand:M N PREDICATE CONSTRAINT)’
     This expression is a placeholder for operand number N of the insn.
     When constructing an insn, operand number N will be substituted at
     this point.  When matching an insn, whatever appears at this
     position in the insn will be taken as operand number N; but it must
     satisfy PREDICATE or this instruction pattern will not match at
     all.

     Operand numbers must be chosen consecutively counting from zero in
     each instruction pattern.  There may be only one ‘match_operand’
     expression in the pattern for each operand number.  Usually
     operands are numbered in the order of appearance in ‘match_operand’
     expressions.  In the case of a ‘define_expand’, any operand numbers
     used only in ‘match_dup’ expressions have higher values than all
     other operand numbers.

     PREDICATE is a string that is the name of a function that accepts
     two arguments, an expression and a machine mode.  *Note
     Predicates::.  During matching, the function will be called with
     the putative operand as the expression and M as the mode argument
     (if M is not specified, ‘VOIDmode’ will be used, which normally
     causes PREDICATE to accept any mode).  If it returns zero, this
     instruction pattern fails to match.  PREDICATE may be an empty
     string; then it means no test is to be done on the operand, so
     anything which occurs in this position is valid.

     Most of the time, PREDICATE will reject modes other than M--but not
     always.  For example, the predicate ‘address_operand’ uses M as the
     mode of memory ref that the address should be valid for.  Many
     predicates accept ‘const_int’ nodes even though their mode is
     ‘VOIDmode’.

     CONSTRAINT controls reloading and the choice of the best register
     class to use for a value, as explained later (*note Constraints::).
     If the constraint would be an empty string, it can be omitted.

     People are often unclear on the difference between the constraint
     and the predicate.  The predicate helps decide whether a given insn
     matches the pattern.  The constraint plays no role in this
     decision; instead, it controls various decisions in the case of an
     insn which does match.

‘(match_scratch:M N CONSTRAINT)’
     This expression is also a placeholder for operand number N and
     indicates that operand must be a ‘scratch’ or ‘reg’ expression.

     When matching patterns, this is equivalent to

          (match_operand:M N "scratch_operand" CONSTRAINT)

     but, when generating RTL, it produces a (‘scratch’:M) expression.

     If the last few expressions in a ‘parallel’ are ‘clobber’
     expressions whose operands are either a hard register or
     ‘match_scratch’, the combiner can add or delete them when
     necessary.  *Note Side Effects::.

‘(match_dup N)’
     This expression is also a placeholder for operand number N.  It is
     used when the operand needs to appear more than once in the insn.

     In construction, ‘match_dup’ acts just like ‘match_operand’: the
     operand is substituted into the insn being constructed.  But in
     matching, ‘match_dup’ behaves differently.  It assumes that operand
     number N has already been determined by a ‘match_operand’ appearing
     earlier in the recognition template, and it matches only an
     identical-looking expression.

     Note that ‘match_dup’ should not be used to tell the compiler that
     a particular register is being used for two operands (example:
     ‘add’ that adds one register to another; the second register is
     both an input operand and the output operand).  Use a matching
     constraint (*note Simple Constraints::) for those.  ‘match_dup’ is
     for the cases where one operand is used in two places in the
     template, such as an instruction that computes both a quotient and
     a remainder, where the opcode takes two input operands but the RTL
     template has to refer to each of those twice; once for the quotient
     pattern and once for the remainder pattern.

‘(match_operator:M N PREDICATE [OPERANDS...])’
     This pattern is a kind of placeholder for a variable RTL expression
     code.

     When constructing an insn, it stands for an RTL expression whose
     expression code is taken from that of operand N, and whose operands
     are constructed from the patterns OPERANDS.

     When matching an expression, it matches an expression if the
     function PREDICATE returns nonzero on that expression _and_ the
     patterns OPERANDS match the operands of the expression.

     Suppose that the function ‘commutative_operator’ is defined as
     follows, to match any expression whose operator is one of the
     commutative arithmetic operators of RTL and whose mode is MODE:

          int
          commutative_operator (x, mode)
               rtx x;
               machine_mode mode;
          {
            enum rtx_code code = GET_CODE (x);
            if (GET_MODE (x) != mode)
              return 0;
            return (GET_RTX_CLASS (code) == RTX_COMM_ARITH
                    || code == EQ || code == NE);
          }

     Then the following pattern will match any RTL expression consisting
     of a commutative operator applied to two general operands:

          (match_operator:SI 3 "commutative_operator"
            [(match_operand:SI 1 "general_operand" "g")
             (match_operand:SI 2 "general_operand" "g")])

     Here the vector ‘[OPERANDS...]’ contains two patterns because the
     expressions to be matched all contain two operands.

     When this pattern does match, the two operands of the commutative
     operator are recorded as operands 1 and 2 of the insn.  (This is
     done by the two instances of ‘match_operand’.)  Operand 3 of the
     insn will be the entire commutative expression: use ‘GET_CODE
     (operands[3])’ to see which commutative operator was used.

     The machine mode M of ‘match_operator’ works like that of
     ‘match_operand’: it is passed as the second argument to the
     predicate function, and that function is solely responsible for
     deciding whether the expression to be matched "has" that mode.

     When constructing an insn, argument 3 of the gen-function will
     specify the operation (i.e. the expression code) for the expression
     to be made.  It should be an RTL expression, whose expression code
     is copied into a new expression whose operands are arguments 1 and
     2 of the gen-function.  The subexpressions of argument 3 are not
     used; only its expression code matters.

     When ‘match_operator’ is used in a pattern for matching an insn, it
     usually best if the operand number of the ‘match_operator’ is
     higher than that of the actual operands of the insn.  This improves
     register allocation because the register allocator often looks at
     operands 1 and 2 of insns to see if it can do register tying.

     There is no way to specify constraints in ‘match_operator’.  The
     operand of the insn which corresponds to the ‘match_operator’ never
     has any constraints because it is never reloaded as a whole.
     However, if parts of its OPERANDS are matched by ‘match_operand’
     patterns, those parts may have constraints of their own.

‘(match_op_dup:M N[OPERANDS...])’
     Like ‘match_dup’, except that it applies to operators instead of
     operands.  When constructing an insn, operand number N will be
     substituted at this point.  But in matching, ‘match_op_dup’ behaves
     differently.  It assumes that operand number N has already been
     determined by a ‘match_operator’ appearing earlier in the
     recognition template, and it matches only an identical-looking
     expression.

‘(match_parallel N PREDICATE [SUBPAT...])’
     This pattern is a placeholder for an insn that consists of a
     ‘parallel’ expression with a variable number of elements.  This
     expression should only appear at the top level of an insn pattern.

     When constructing an insn, operand number N will be substituted at
     this point.  When matching an insn, it matches if the body of the
     insn is a ‘parallel’ expression with at least as many elements as
     the vector of SUBPAT expressions in the ‘match_parallel’, if each
     SUBPAT matches the corresponding element of the ‘parallel’, _and_
     the function PREDICATE returns nonzero on the ‘parallel’ that is
     the body of the insn.  It is the responsibility of the predicate to
     validate elements of the ‘parallel’ beyond those listed in the
     ‘match_parallel’.

     A typical use of ‘match_parallel’ is to match load and store
     multiple expressions, which can contain a variable number of
     elements in a ‘parallel’.  For example,

          (define_insn ""
            [(match_parallel 0 "load_multiple_operation"
               [(set (match_operand:SI 1 "gpc_reg_operand" "=r")
                     (match_operand:SI 2 "memory_operand" "m"))
                (use (reg:SI 179))
                (clobber (reg:SI 179))])]
            ""
            "loadm 0,0,%1,%2")

     This example comes from ‘a29k.md’.  The function
     ‘load_multiple_operation’ is defined in ‘a29k.c’ and checks that
     subsequent elements in the ‘parallel’ are the same as the ‘set’ in
     the pattern, except that they are referencing subsequent registers
     and memory locations.

     An insn that matches this pattern might look like:

          (parallel
           [(set (reg:SI 20) (mem:SI (reg:SI 100)))
            (use (reg:SI 179))
            (clobber (reg:SI 179))
            (set (reg:SI 21)
                 (mem:SI (plus:SI (reg:SI 100)
                                  (const_int 4))))
            (set (reg:SI 22)
                 (mem:SI (plus:SI (reg:SI 100)
                                  (const_int 8))))])

‘(match_par_dup N [SUBPAT...])’
     Like ‘match_op_dup’, but for ‘match_parallel’ instead of
     ‘match_operator’.


File: gccint.info,  Node: Output Template,  Next: Output Statement,  Prev: RTL Template,  Up: Machine Desc

17.5 Output Templates and Operand Substitution
==============================================

The “output template” is a string which specifies how to output the
assembler code for an instruction pattern.  Most of the template is a
fixed string which is output literally.  The character ‘%’ is used to
specify where to substitute an operand; it can also be used to identify
places where different variants of the assembler require different
syntax.

 In the simplest case, a ‘%’ followed by a digit N says to output
operand N at that point in the string.

 ‘%’ followed by a letter and a digit says to output an operand in an
alternate fashion.  Four letters have standard, built-in meanings
described below.  The machine description macro ‘PRINT_OPERAND’ can
define additional letters with nonstandard meanings.

 ‘%cDIGIT’ can be used to substitute an operand that is a constant value
without the syntax that normally indicates an immediate operand.

 ‘%nDIGIT’ is like ‘%cDIGIT’ except that the value of the constant is
negated before printing.

 ‘%aDIGIT’ can be used to substitute an operand as if it were a memory
reference, with the actual operand treated as the address.  This may be
useful when outputting a "load address" instruction, because often the
assembler syntax for such an instruction requires you to write the
operand as if it were a memory reference.

 ‘%lDIGIT’ is used to substitute a ‘label_ref’ into a jump instruction.

 ‘%=’ outputs a number which is unique to each instruction in the entire
compilation.  This is useful for making local labels to be referred to
more than once in a single template that generates multiple assembler
instructions.

 ‘%’ followed by a punctuation character specifies a substitution that
does not use an operand.  Only one case is standard: ‘%%’ outputs a ‘%’
into the assembler code.  Other nonstandard cases can be defined in the
‘PRINT_OPERAND’ macro.  You must also define which punctuation
characters are valid with the ‘PRINT_OPERAND_PUNCT_VALID_P’ macro.

 The template may generate multiple assembler instructions.  Write the
text for the instructions, with ‘\;’ between them.

 When the RTL contains two operands which are required by constraint to
match each other, the output template must refer only to the
lower-numbered operand.  Matching operands are not always identical, and
the rest of the compiler arranges to put the proper RTL expression for
printing into the lower-numbered operand.

 One use of nonstandard letters or punctuation following ‘%’ is to
distinguish between different assembler languages for the same machine;
for example, Motorola syntax versus MIT syntax for the 68000.  Motorola
syntax requires periods in most opcode names, while MIT syntax does not.
For example, the opcode ‘movel’ in MIT syntax is ‘move.l’ in Motorola
syntax.  The same file of patterns is used for both kinds of output
syntax, but the character sequence ‘%.’ is used in each place where
Motorola syntax wants a period.  The ‘PRINT_OPERAND’ macro for Motorola
syntax defines the sequence to output a period; the macro for MIT syntax
defines it to do nothing.

 As a special case, a template consisting of the single character ‘#’
instructs the compiler to first split the insn, and then output the
resulting instructions separately.  This helps eliminate redundancy in
the output templates.  If you have a ‘define_insn’ that needs to emit
multiple assembler instructions, and there is a matching ‘define_split’
already defined, then you can simply use ‘#’ as the output template
instead of writing an output template that emits the multiple assembler
instructions.

 Note that ‘#’ only has an effect while generating assembly code; it
does not affect whether a split occurs earlier.  An associated
‘define_split’ must exist and it must be suitable for use after register
allocation.

 If the macro ‘ASSEMBLER_DIALECT’ is defined, you can use construct of
the form ‘{option0|option1|option2}’ in the templates.  These describe
multiple variants of assembler language syntax.  *Note Instruction
Output::.


File: gccint.info,  Node: Output Statement,  Next: Compact Syntax,  Prev: Output Template,  Up: Machine Desc

17.6 C Statements for Assembler Output
======================================

Often a single fixed template string cannot produce correct and
efficient assembler code for all the cases that are recognized by a
single instruction pattern.  For example, the opcodes may depend on the
kinds of operands; or some unfortunate combinations of operands may
require extra machine instructions.

 If the output control string starts with a ‘@’, then it is actually a
series of templates, each on a separate line.  (Blank lines and leading
spaces and tabs are ignored.)  The templates correspond to the pattern's
constraint alternatives (*note Multi-Alternative::).  For example, if a
target machine has a two-address add instruction ‘addr’ to add into a
register and another ‘addm’ to add a register to memory, you might write
this pattern:

     (define_insn "addsi3"
       [(set (match_operand:SI 0 "general_operand" "=r,m")
             (plus:SI (match_operand:SI 1 "general_operand" "0,0")
                      (match_operand:SI 2 "general_operand" "g,r")))]
       ""
       "@
        addr %2,%0
        addm %2,%0")

 If the output control string starts with a ‘*’, then it is not an
output template but rather a piece of C program that should compute a
template.  It should execute a ‘return’ statement to return the
template-string you want.  Most such templates use C string literals,
which require doublequote characters to delimit them.  To include these
doublequote characters in the string, prefix each one with ‘\’.

 If the output control string is written as a brace block instead of a
double-quoted string, it is automatically assumed to be C code.  In that
case, it is not necessary to put in a leading asterisk, or to escape the
doublequotes surrounding C string literals.

 The operands may be found in the array ‘operands’, whose C data type is
‘rtx []’.

 It is very common to select different ways of generating assembler code
based on whether an immediate operand is within a certain range.  Be
careful when doing this, because the result of ‘INTVAL’ is an integer on
the host machine.  If the host machine has more bits in an ‘int’ than
the target machine has in the mode in which the constant will be used,
then some of the bits you get from ‘INTVAL’ will be superfluous.  For
proper results, you must carefully disregard the values of those bits.

 It is possible to output an assembler instruction and then go on to
output or compute more of them, using the subroutine ‘output_asm_insn’.
This receives two arguments: a template-string and a vector of operands.
The vector may be ‘operands’, or it may be another array of ‘rtx’ that
you declare locally and initialize yourself.

 When an insn pattern has multiple alternatives in its constraints,
often the appearance of the assembler code is determined mostly by which
alternative was matched.  When this is so, the C code can test the
variable ‘which_alternative’, which is the ordinal number of the
alternative that was actually satisfied (0 for the first, 1 for the
second alternative, etc.).

 For example, suppose there are two opcodes for storing zero, ‘clrreg’
for registers and ‘clrmem’ for memory locations.  Here is how a pattern
could use ‘which_alternative’ to choose between them:

     (define_insn ""
       [(set (match_operand:SI 0 "general_operand" "=r,m")
             (const_int 0))]
       ""
       {
       return (which_alternative == 0
               ? "clrreg %0" : "clrmem %0");
       })

 The example above, where the assembler code to generate was _solely_
determined by the alternative, could also have been specified as
follows, having the output control string start with a ‘@’:

     (define_insn ""
       [(set (match_operand:SI 0 "general_operand" "=r,m")
             (const_int 0))]
       ""
       "@
        clrreg %0
        clrmem %0")

 If you just need a little bit of C code in one (or a few) alternatives,
you can use ‘*’ inside of a ‘@’ multi-alternative template:

     (define_insn ""
       [(set (match_operand:SI 0 "general_operand" "=r,<,m")
             (const_int 0))]
       ""
       "@
        clrreg %0
        * return stack_mem_p (operands[0]) ? \"push 0\" : \"clrmem %0\";
        clrmem %0")


File: gccint.info,  Node: Compact Syntax,  Next: Predicates,  Prev: Output Statement,  Up: Machine Desc

17.7 Compact Syntax
===================

When a ‘define_insn’ or ‘define_insn_and_split’ has multiple
alternatives it may be beneficial to use the compact syntax when
specifying alternatives.

 This syntax puts the constraints and attributes on the same horizontal
line as the instruction assembly template.

 As an example

     (define_insn_and_split ""
       [(set (match_operand:SI 0 "nonimmediate_operand" "=r,k,r,r,r,r")
     	(match_operand:SI 1 "aarch64_mov_operand"  " r,r,k,M,n,Usv"))]
       ""
       "@
        mov\\t%w0, %w1
        mov\\t%w0, %w1
        mov\\t%w0, %w1
        mov\\t%w0, %1
        #
        * return aarch64_output_sve_cnt_immediate ('cnt', '%x0', operands[1]);"
       "&& true"
        [(const_int 0)]
       {
          aarch64_expand_mov_immediate (operands[0], operands[1]);
          DONE;
       }
       [(set_attr "type" "mov_reg,mov_reg,mov_reg,mov_imm,mov_imm,mov_imm")
        (set_attr "arch"   "*,*,*,*,*,sve")
        (set_attr "length" "4,4,4,4,*,  4")
     ]
     )

 can be better expressed as:

     (define_insn_and_split ""
       [(set (match_operand:SI 0 "nonimmediate_operand")
     	(match_operand:SI 1 "aarch64_mov_operand"))]
       ""
       {@ [cons: =0, 1; attrs: type, arch, length]
          [r , r  ; mov_reg  , *   , 4] mov\t%w0, %w1
          [k , r  ; mov_reg  , *   , 4] ^
          [r , k  ; mov_reg  , *   , 4] ^
          [r , M  ; mov_imm  , *   , 4] mov\t%w0, %1
          [r , n  ; mov_imm  , *   , *] #
          [r , Usv; mov_imm  , sve , 4] << aarch64_output_sve_cnt_immediate ("cnt", "%x0", operands[1]);
       }
       "&& true"
       [(const_int 0)]
       {
         aarch64_expand_mov_immediate (operands[0], operands[1]);
         DONE;
       }
     )

 The syntax rules are as follows:
   • Templates must start with ‘{@’ to use the new syntax.

   • ‘{@’ is followed by a layout in square brackets which is ‘cons:’
     followed by a comma-separated list of
     ‘match_operand’/‘match_scratch’ operand numbers, then a semicolon,
     followed by the same for attributes (‘attrs:’).  Operand modifiers
     like ‘=’ and ‘+’ can be placed before an operand number.  Both
     sections are optional (so you can use only ‘cons’, or only ‘attrs’,
     or both), and ‘cons’ must come before ‘attrs’ if present.

   • Each alternative begins with any amount of whitespace.

   • Following the whitespace is a comma-separated list of "constraints"
     and/or "attributes" within brackets ‘[]’, with sections separated
     by a semicolon.

   • Should you want to copy the previous asm line, the symbol ‘^’ can
     be used.  This allows less copy pasting between alternative and
     reduces the number of lines to update on changes.

   • When using C functions for output, the idiom ‘* return FUNCTION;’
     can be replaced with the shorthand ‘<< FUNCTION;’.

   • Following the closing ‘]’ is any amount of whitespace, and then the
     actual asm output.

   • Spaces are allowed in the list (they will simply be removed).

   • All constraint alternatives should be specified.  For example, a
     list of of three blank alternatives should be written ‘[,,]’ rather
     than ‘[]’.

   • All attribute alternatives should be non-empty, with ‘*’
     representing the default attribute value.  For example, a list of
     three default attribute values should be written ‘[*,*,*]’ rather
     than ‘[]’.

   • Within an ‘{@’ block both multiline and singleline C comments are
     allowed, but when used outside of a C block they must be the only
     non-whitespace blocks on the line.

   • Within an ‘{@’ block, any iterators that do not get expanded will
     result in an error.  If for some reason it is required to have ‘<’
     or ‘>’ in the output then these must be escaped using ‘\’.

   • It is possible to use the ‘attrs’ list to specify some attributes
     and to use the normal ‘set_attr’ syntax to specify other
     attributes.  There must not be any overlap between the two lists.

     In other words, the following is valid:
          (define_insn_and_split ""
            [(set (match_operand:SI 0 "nonimmediate_operand")
          	(match_operand:SI 1 "aarch64_mov_operand"))]
            ""
            {@ [cons: 0, 1; attrs: type, arch, length]}
            ...
            [(set_attr "foo" "mov_imm")]
          )

     but this is not valid:
          (define_insn_and_split ""
            [(set (match_operand:SI 0 "nonimmediate_operand")
          	(match_operand:SI 1 "aarch64_mov_operand"))]
            ""
            {@ [cons: 0, 1; attrs: type, arch, length]}
            ...
            [(set_attr "arch" "bar")
             (set_attr "foo" "mov_imm")]
          )

     because it specifies ‘arch’ twice.


File: gccint.info,  Node: Predicates,  Next: Constraints,  Prev: Compact Syntax,  Up: Machine Desc

17.8 Predicates
===============

A predicate determines whether a ‘match_operand’ or ‘match_operator’
expression matches, and therefore whether the surrounding instruction
pattern will be used for that combination of operands.  GCC has a number
of machine-independent predicates, and you can define machine-specific
predicates as needed.  By convention, predicates used with
‘match_operand’ have names that end in ‘_operand’, and those used with
‘match_operator’ have names that end in ‘_operator’.

 All predicates are boolean functions (in the mathematical sense) of two
arguments: the RTL expression that is being considered at that position
in the instruction pattern, and the machine mode that the
‘match_operand’ or ‘match_operator’ specifies.  In this section, the
first argument is called OP and the second argument MODE.  Predicates
can be called from C as ordinary two-argument functions; this can be
useful in output templates or other machine-specific code.

 Operand predicates can allow operands that are not actually acceptable
to the hardware, as long as the constraints give reload the ability to
fix them up (*note Constraints::).  However, GCC will usually generate
better code if the predicates specify the requirements of the machine
instructions as closely as possible.  Reload cannot fix up operands that
must be constants ("immediate operands"); you must use a predicate that
allows only constants, or else enforce the requirement in the extra
condition.

 Most predicates handle their MODE argument in a uniform manner.  If
MODE is ‘VOIDmode’ (unspecified), then OP can have any mode.  If MODE is
anything else, then OP must have the same mode, unless OP is a
‘CONST_INT’ or integer ‘CONST_DOUBLE’.  These RTL expressions always
have ‘VOIDmode’, so it would be counterproductive to check that their
mode matches.  Instead, predicates that accept ‘CONST_INT’ and/or
integer ‘CONST_DOUBLE’ check that the value stored in the constant will
fit in the requested mode.

 Predicates with this behavior are called “normal”.  ‘genrecog’ can
optimize the instruction recognizer based on knowledge of how normal
predicates treat modes.  It can also diagnose certain kinds of common
errors in the use of normal predicates; for instance, it is almost
always an error to use a normal predicate without specifying a mode.

 Predicates that do something different with their MODE argument are
called “special”.  The generic predicates ‘address_operand’ and
‘pmode_register_operand’ are special predicates.  ‘genrecog’ does not do
any optimizations or diagnosis when special predicates are used.

* Menu:

* Machine-Independent Predicates::  Predicates available to all back ends.
* Defining Predicates::             How to write machine-specific predicate
                                    functions.


File: gccint.info,  Node: Machine-Independent Predicates,  Next: Defining Predicates,  Up: Predicates

17.8.1 Machine-Independent Predicates
-------------------------------------

These are the generic predicates available to all back ends.  They are
defined in ‘recog.cc’.  The first category of predicates allow only
constant, or “immediate”, operands.

 -- Function: immediate_operand
     This predicate allows any sort of constant that fits in MODE.  It
     is an appropriate choice for instructions that take operands that
     must be constant.

 -- Function: const_int_operand
     This predicate allows any ‘CONST_INT’ expression that fits in MODE.
     It is an appropriate choice for an immediate operand that does not
     allow a symbol or label.

 -- Function: const_double_operand
     This predicate accepts any ‘CONST_DOUBLE’ expression that has
     exactly MODE.  If MODE is ‘VOIDmode’, it will also accept
     ‘CONST_INT’.  It is intended for immediate floating point
     constants.

The second category of predicates allow only some kind of machine
register.

 -- Function: register_operand
     This predicate allows any ‘REG’ or ‘SUBREG’ expression that is
     valid for MODE.  It is often suitable for arithmetic instruction
     operands on a RISC machine.

 -- Function: pmode_register_operand
     This is a slight variant on ‘register_operand’ which works around a
     limitation in the machine-description reader.

          (match_operand N "pmode_register_operand" CONSTRAINT)

     means exactly what

          (match_operand:P N "register_operand" CONSTRAINT)

     would mean, if the machine-description reader accepted ‘:P’ mode
     suffixes.  Unfortunately, it cannot, because ‘Pmode’ is an alias
     for some other mode, and might vary with machine-specific options.
     *Note Misc::.

 -- Function: scratch_operand
     This predicate allows hard registers and ‘SCRATCH’ expressions, but
     not pseudo-registers.  It is used internally by ‘match_scratch’; it
     should not be used directly.

The third category of predicates allow only some kind of memory
reference.

 -- Function: memory_operand
     This predicate allows any valid reference to a quantity of mode
     MODE in memory, as determined by the weak form of
     ‘GO_IF_LEGITIMATE_ADDRESS’ (*note Addressing Modes::).

 -- Function: address_operand
     This predicate is a little unusual; it allows any operand that is a
     valid expression for the _address_ of a quantity of mode MODE,
     again determined by the weak form of ‘GO_IF_LEGITIMATE_ADDRESS’.
     To first order, if ‘(mem:MODE (EXP))’ is acceptable to
     ‘memory_operand’, then EXP is acceptable to ‘address_operand’.
     Note that EXP does not necessarily have the mode MODE.

 -- Function: indirect_operand
     This is a stricter form of ‘memory_operand’ which allows only
     memory references with a ‘general_operand’ as the address
     expression.  New uses of this predicate are discouraged, because
     ‘general_operand’ is very permissive, so it's hard to tell what an
     ‘indirect_operand’ does or does not allow.  If a target has
     different requirements for memory operands for different
     instructions, it is better to define target-specific predicates
     which enforce the hardware's requirements explicitly.

 -- Function: push_operand
     This predicate allows a memory reference suitable for pushing a
     value onto the stack.  This will be a ‘MEM’ which refers to
     ‘stack_pointer_rtx’, with a side effect in its address expression
     (*note Incdec::); which one is determined by the ‘STACK_PUSH_CODE’
     macro (*note Frame Layout::).

 -- Function: pop_operand
     This predicate allows a memory reference suitable for popping a
     value off the stack.  Again, this will be a ‘MEM’ referring to
     ‘stack_pointer_rtx’, with a side effect in its address expression.
     However, this time ‘STACK_POP_CODE’ is expected.

The fourth category of predicates allow some combination of the above
operands.

 -- Function: nonmemory_operand
     This predicate allows any immediate or register operand valid for
     MODE.

 -- Function: nonimmediate_operand
     This predicate allows any register or memory operand valid for
     MODE.

 -- Function: general_operand
     This predicate allows any immediate, register, or memory operand
     valid for MODE.

Finally, there are two generic operator predicates.

 -- Function: comparison_operator
     This predicate matches any expression which performs an arithmetic
     comparison in MODE; that is, ‘COMPARISON_P’ is true for the
     expression code.

 -- Function: ordered_comparison_operator
     This predicate matches any expression which performs an arithmetic
     comparison in MODE and whose expression code is valid for integer
     modes; that is, the expression code will be one of ‘eq’, ‘ne’,
     ‘lt’, ‘ltu’, ‘le’, ‘leu’, ‘gt’, ‘gtu’, ‘ge’, ‘geu’.


File: gccint.info,  Node: Defining Predicates,  Prev: Machine-Independent Predicates,  Up: Predicates

17.8.2 Defining Machine-Specific Predicates
-------------------------------------------

Many machines have requirements for their operands that cannot be
expressed precisely using the generic predicates.  You can define
additional predicates using ‘define_predicate’ and
‘define_special_predicate’ expressions.  These expressions have three
operands:

   • The name of the predicate, as it will be referred to in
     ‘match_operand’ or ‘match_operator’ expressions.

   • An RTL expression which evaluates to true if the predicate allows
     the operand OP, false if it does not.  This expression can only use
     the following RTL codes:

     ‘MATCH_OPERAND’
          When written inside a predicate expression, a ‘MATCH_OPERAND’
          expression evaluates to true if the predicate it names would
          allow OP.  The operand number and constraint are ignored.  Due
          to limitations in ‘genrecog’, you can only refer to generic
          predicates and predicates that have already been defined.

     ‘MATCH_CODE’
          This expression evaluates to true if OP or a specified
          subexpression of OP has one of a given list of RTX codes.

          The first operand of this expression is a string constant
          containing a comma-separated list of RTX code names (in lower
          case).  These are the codes for which the ‘MATCH_CODE’ will be
          true.

          The second operand is a string constant which indicates what
          subexpression of OP to examine.  If it is absent or the empty
          string, OP itself is examined.  Otherwise, the string constant
          must be a sequence of digits and/or lowercase letters.  Each
          character indicates a subexpression to extract from the
          current expression; for the first character this is OP, for
          the second and subsequent characters it is the result of the
          previous character.  A digit N extracts ‘XEXP (E, N)’; a
          letter L extracts ‘XVECEXP (E, 0, N)’ where N is the
          alphabetic ordinal of L (0 for 'a', 1 for 'b', and so on).
          The ‘MATCH_CODE’ then examines the RTX code of the
          subexpression extracted by the complete string.  It is not
          possible to extract components of an ‘rtvec’ that is not at
          position 0 within its RTX object.

     ‘MATCH_TEST’
          This expression has one operand, a string constant containing
          a C expression.  The predicate's arguments, OP and MODE, are
          available with those names in the C expression.  The
          ‘MATCH_TEST’ evaluates to true if the C expression evaluates
          to a nonzero value.  ‘MATCH_TEST’ expressions must not have
          side effects.

     ‘AND’
     ‘IOR’
     ‘NOT’
     ‘IF_THEN_ELSE’
          The basic ‘MATCH_’ expressions can be combined using these
          logical operators, which have the semantics of the C operators
          ‘&&’, ‘||’, ‘!’, and ‘? :’ respectively.  As in Common Lisp,
          you may give an ‘AND’ or ‘IOR’ expression an arbitrary number
          of arguments; this has exactly the same effect as writing a
          chain of two-argument ‘AND’ or ‘IOR’ expressions.

   • An optional block of C code, which should execute ‘return true’ if
     the predicate is found to match and ‘return false’ if it does not.
     It must not have any side effects.  The predicate arguments, OP and
     MODE, are available with those names.

     If a code block is present in a predicate definition, then the RTL
     expression must evaluate to true _and_ the code block must execute
     ‘return true’ for the predicate to allow the operand.  The RTL
     expression is evaluated first; do not re-check anything in the code
     block that was checked in the RTL expression.

 The program ‘genrecog’ scans ‘define_predicate’ and
‘define_special_predicate’ expressions to determine which RTX codes are
possibly allowed.  You should always make this explicit in the RTL
predicate expression, using ‘MATCH_OPERAND’ and ‘MATCH_CODE’.

 Here is an example of a simple predicate definition, from the IA64
machine description:

     ;; True if OP is a ‘SYMBOL_REF’ which refers to the sdata section.
     (define_predicate "small_addr_symbolic_operand"
       (and (match_code "symbol_ref")
            (match_test "SYMBOL_REF_SMALL_ADDR_P (op)")))

And here is another, showing the use of the C block.

     ;; True if OP is a register operand that is (or could be) a GR reg.
     (define_predicate "gr_register_operand"
       (match_operand 0 "register_operand")
     {
       unsigned int regno;
       if (GET_CODE (op) == SUBREG)
         op = SUBREG_REG (op);

       regno = REGNO (op);
       return (regno >= FIRST_PSEUDO_REGISTER || GENERAL_REGNO_P (regno));
     })

 Predicates written with ‘define_predicate’ automatically include a test
that MODE is ‘VOIDmode’, or OP has the same mode as MODE, or OP is a
‘CONST_INT’ or ‘CONST_DOUBLE’.  They do _not_ check specifically for
integer ‘CONST_DOUBLE’, nor do they test that the value of either kind
of constant fits in the requested mode.  This is because target-specific
predicates that take constants usually have to do more stringent value
checks anyway.  If you need the exact same treatment of ‘CONST_INT’ or
‘CONST_DOUBLE’ that the generic predicates provide, use a
‘MATCH_OPERAND’ subexpression to call ‘const_int_operand’,
‘const_double_operand’, or ‘immediate_operand’.

 Predicates written with ‘define_special_predicate’ do not get any
automatic mode checks, and are treated as having special mode handling
by ‘genrecog’.

 The program ‘genpreds’ is responsible for generating code to test
predicates.  It also writes a header file containing function
declarations for all machine-specific predicates.  It is not necessary
to declare these predicates in ‘CPU-protos.h’.


File: gccint.info,  Node: Constraints,  Next: Standard Names,  Prev: Predicates,  Up: Machine Desc

17.9 Operand Constraints
========================

Each ‘match_operand’ in an instruction pattern can specify constraints
for the operands allowed.  The constraints allow you to fine-tune
matching within the set of operands allowed by the predicate.

 Constraints can say whether an operand may be in a register, and which
kinds of register; whether the operand can be a memory reference, and
which kinds of address; whether the operand may be an immediate
constant, and which possible values it may have.  Constraints can also
require two operands to match.  Side-effects aren't allowed in operands
of inline ‘asm’, unless ‘<’ or ‘>’ constraints are used, because there
is no guarantee that the side effects will happen exactly once in an
instruction that can update the addressing register.

* Menu:

* Simple Constraints::  Basic use of constraints.
* Multi-Alternative::   When an insn has two alternative constraint-patterns.
* Class Preferences::   Constraints guide which hard register to put things in.
* Modifiers::           More precise control over effects of constraints.
* Machine Constraints:: Existing constraints for some particular machines.
* Disable Insn Alternatives:: Disable insn alternatives using attributes.
* Define Constraints::  How to define machine-specific constraints.
* C Constraint Interface:: How to test constraints from C code.


File: gccint.info,  Node: Simple Constraints,  Next: Multi-Alternative,  Up: Constraints

17.9.1 Simple Constraints
-------------------------

The simplest kind of constraint is a string full of letters, each of
which describes one kind of operand that is permitted.  Here are the
letters that are allowed:

whitespace
     Whitespace characters are ignored and can be inserted at any
     position except the first.  This enables each alternative for
     different operands to be visually aligned in the machine
     description even if they have different number of constraints and
     modifiers.

‘m’
     A memory operand is allowed, with any kind of address that the
     machine supports in general.  Note that the letter used for the
     general memory constraint can be re-defined by a back end using the
     ‘TARGET_MEM_CONSTRAINT’ macro.

‘o’
     A memory operand is allowed, but only if the address is
     “offsettable”.  This means that adding a small integer (actually,
     the width in bytes of the operand, as determined by its machine
     mode) may be added to the address and the result is also a valid
     memory address.

     For example, an address which is constant is offsettable; so is an
     address that is the sum of a register and a constant (as long as a
     slightly larger constant is also within the range of
     address-offsets supported by the machine); but an autoincrement or
     autodecrement address is not offsettable.  More complicated
     indirect/indexed addresses may or may not be offsettable depending
     on the other addressing modes that the machine supports.

     Note that in an output operand which can be matched by another
     operand, the constraint letter ‘o’ is valid only when accompanied
     by both ‘<’ (if the target machine has predecrement addressing) and
     ‘>’ (if the target machine has preincrement addressing).

‘V’
     A memory operand that is not offsettable.  In other words, anything
     that would fit the ‘m’ constraint but not the ‘o’ constraint.

‘<’
     A memory operand with autodecrement addressing (either predecrement
     or postdecrement) is allowed.  In inline ‘asm’ this constraint is
     only allowed if the operand is used exactly once in an instruction
     that can handle the side effects.  Not using an operand with ‘<’ in
     constraint string in the inline ‘asm’ pattern at all or using it in
     multiple instructions isn't valid, because the side effects
     wouldn't be performed or would be performed more than once.
     Furthermore, on some targets the operand with ‘<’ in constraint
     string must be accompanied by special instruction suffixes like
     ‘%U0’ instruction suffix on PowerPC or ‘%P0’ on IA-64.

‘>’
     A memory operand with autoincrement addressing (either preincrement
     or postincrement) is allowed.  In inline ‘asm’ the same
     restrictions as for ‘<’ apply.

‘r’
     A register operand is allowed provided that it is in a general
     register.

‘i’
     An immediate integer operand (one with constant value) is allowed.
     This includes symbolic constants whose values will be known only at
     assembly time or later.

‘n’
     An immediate integer operand with a known numeric value is allowed.
     Many systems cannot support assembly-time constants for operands
     less than a word wide.  Constraints for these operands should use
     ‘n’ rather than ‘i’.

‘I’, ‘J’, ‘K’, ... ‘P’
     Other letters in the range ‘I’ through ‘P’ may be defined in a
     machine-dependent fashion to permit immediate integer operands with
     explicit integer values in specified ranges.  For example, on the
     68000, ‘I’ is defined to stand for the range of values 1 to 8.
     This is the range permitted as a shift count in the shift
     instructions.

‘E’
     An immediate floating operand (expression code ‘const_double’) is
     allowed, but only if the target floating point format is the same
     as that of the host machine (on which the compiler is running).

‘F’
     An immediate floating operand (expression code ‘const_double’ or
     ‘const_vector’) is allowed.

‘G’, ‘H’
     ‘G’ and ‘H’ may be defined in a machine-dependent fashion to permit
     immediate floating operands in particular ranges of values.

‘s’
     An immediate integer operand whose value is not an explicit integer
     is allowed.

     This might appear strange; if an insn allows a constant operand
     with a value not known at compile time, it certainly must allow any
     known value.  So why use ‘s’ instead of ‘i’?  Sometimes it allows
     better code to be generated.

     For example, on the 68000 in a fullword instruction it is possible
     to use an immediate operand; but if the immediate value is between
     −128 and 127, better code results from loading the value into a
     register and using the register.  This is because the load into the
     register can be done with a ‘moveq’ instruction.  We arrange for
     this to happen by defining the letter ‘K’ to mean "any integer
     outside the range −128 to 127", and then specifying ‘Ks’ in the
     operand constraints.

‘g’
     Any register, memory or immediate integer operand is allowed,
     except for registers that are not general registers.

‘X’
     Any operand whatsoever is allowed, even if it does not satisfy
     ‘general_operand’.  This is normally used in the constraint of a
     ‘match_scratch’ when certain alternatives will not actually require
     a scratch register.

‘0’, ‘1’, ‘2’, ... ‘9’
     An operand that matches the specified operand number is allowed.
     If a digit is used together with letters within the same
     alternative, the digit should come last.

     This number is allowed to be more than a single digit.  If multiple
     digits are encountered consecutively, they are interpreted as a
     single decimal integer.  There is scant chance for ambiguity, since
     to-date it has never been desirable that ‘10’ be interpreted as
     matching either operand 1 _or_ operand 0.  Should this be desired,
     one can use multiple alternatives instead.

     This is called a “matching constraint” and what it really means is
     that the assembler has only a single operand that fills two roles
     considered separate in the RTL insn.  For example, an add insn has
     two input operands and one output operand in the RTL, but on most
     CISC machines an add instruction really has only two operands, one
     of them an input-output operand:

          addl #35,r12

     Matching constraints are used in these circumstances.  More
     precisely, the two operands that match must include one input-only
     operand and one output-only operand.  Moreover, the digit must be a
     smaller number than the number of the operand that uses it in the
     constraint.

     For operands to match in a particular case usually means that they
     are identical-looking RTL expressions.  But in a few special cases
     specific kinds of dissimilarity are allowed.  For example, ‘*x’ as
     an input operand will match ‘*x++’ as an output operand.  For
     proper results in such cases, the output template should always use
     the output-operand's number when printing the operand.

‘p’
     An operand that is a valid memory address is allowed.  This is for
     "load address" and "push address" instructions.

     ‘p’ in the constraint must be accompanied by ‘address_operand’ as
     the predicate in the ‘match_operand’.  This predicate interprets
     the mode specified in the ‘match_operand’ as the mode of the memory
     reference for which the address would be valid.

OTHER-LETTERS
     Other letters can be defined in machine-dependent fashion to stand
     for particular classes of registers or other arbitrary operand
     types.  ‘d’, ‘a’ and ‘f’ are defined on the 68000/68020 to stand
     for data, address and floating point registers.

 In order to have valid assembler code, each operand must satisfy its
constraint.  But a failure to do so does not prevent the pattern from
applying to an insn.  Instead, it directs the compiler to modify the
code so that the constraint will be satisfied.  Usually this is done by
copying an operand into a register.

 Contrast, therefore, the two instruction patterns that follow:

     (define_insn ""
       [(set (match_operand:SI 0 "general_operand" "=r")
             (plus:SI (match_dup 0)
                      (match_operand:SI 1 "general_operand" "r")))]
       ""
       "...")

which has two operands, one of which must appear in two places, and

     (define_insn ""
       [(set (match_operand:SI 0 "general_operand" "=r")
             (plus:SI (match_operand:SI 1 "general_operand" "0")
                      (match_operand:SI 2 "general_operand" "r")))]
       ""
       "...")

which has three operands, two of which are required by a constraint to
be identical.  If we are considering an insn of the form

     (insn N PREV NEXT
       (set (reg:SI 3)
            (plus:SI (reg:SI 6) (reg:SI 109)))
       ...)

the first pattern would not apply at all, because this insn does not
contain two identical subexpressions in the right place.  The pattern
would say, "That does not look like an add instruction; try other
patterns".  The second pattern would say, "Yes, that's an add
instruction, but there is something wrong with it".  It would direct the
reload pass of the compiler to generate additional insns to make the
constraint true.  The results might look like this:

     (insn N2 PREV N
       (set (reg:SI 3) (reg:SI 6))
       ...)

     (insn N N2 NEXT
       (set (reg:SI 3)
            (plus:SI (reg:SI 3) (reg:SI 109)))
       ...)

 It is up to you to make sure that each operand, in each pattern, has
constraints that can handle any RTL expression that could be present for
that operand.  (When multiple alternatives are in use, each pattern
must, for each possible combination of operand expressions, have at
least one alternative which can handle that combination of operands.)
The constraints don't need to _allow_ any possible operand--when this is
the case, they do not constrain--but they must at least point the way to
reloading any possible operand so that it will fit.

   • If the constraint accepts whatever operands the predicate permits,
     there is no problem: reloading is never necessary for this operand.

     For example, an operand whose constraints permit everything except
     registers is safe provided its predicate rejects registers.

     An operand whose predicate accepts only constant values is safe
     provided its constraints include the letter ‘i’.  If any possible
     constant value is accepted, then nothing less than ‘i’ will do; if
     the predicate is more selective, then the constraints may also be
     more selective.

   • Any operand expression can be reloaded by copying it into a
     register.  So if an operand's constraints allow some kind of
     register, it is certain to be safe.  It need not permit all classes
     of registers; the compiler knows how to copy a register into
     another register of the proper class in order to make an
     instruction valid.

   • A nonoffsettable memory reference can be reloaded by copying the
     address into a register.  So if the constraint uses the letter ‘o’,
     all memory references are taken care of.

   • A constant operand can be reloaded by allocating space in memory to
     hold it as preinitialized data.  Then the memory reference can be
     used in place of the constant.  So if the constraint uses the
     letters ‘o’ or ‘m’, constant operands are not a problem.

   • If the constraint permits a constant and a pseudo register used in
     an insn was not allocated to a hard register and is equivalent to a
     constant, the register will be replaced with the constant.  If the
     predicate does not permit a constant and the insn is re-recognized
     for some reason, the compiler will crash.  Thus the predicate must
     always recognize any objects allowed by the constraint.

 If the operand's predicate can recognize registers, but the constraint
does not permit them, it can make the compiler crash.  When this operand
happens to be a register, the reload pass will be stymied, because it
does not know how to copy a register temporarily into memory.

 If the predicate accepts a unary operator, the constraint applies to
the operand.  For example, the MIPS processor at ISA level 3 supports an
instruction which adds two registers in ‘SImode’ to produce a ‘DImode’
result, but only if the registers are correctly sign extended.  This
predicate for the input operands accepts a ‘sign_extend’ of an ‘SImode’
register.  Write the constraint to indicate the type of register that is
required for the operand of the ‘sign_extend’.


File: gccint.info,  Node: Multi-Alternative,  Next: Class Preferences,  Prev: Simple Constraints,  Up: Constraints

17.9.2 Multiple Alternative Constraints
---------------------------------------

Sometimes a single instruction has multiple alternative sets of possible
operands.  For example, on the 68000, a logical-or instruction can
combine register or an immediate value into memory, or it can combine
any kind of operand into a register; but it cannot combine one memory
location into another.

 These constraints are represented as multiple alternatives.  An
alternative can be described by a series of letters for each operand.
The overall constraint for an operand is made from the letters for this
operand from the first alternative, a comma, the letters for this
operand from the second alternative, a comma, and so on until the last
alternative.  All operands for a single instruction must have the same
number of alternatives.  Here is how it is done for fullword logical-or
on the 68000:

     (define_insn "iorsi3"
       [(set (match_operand:SI 0 "general_operand" "=m,d")
             (ior:SI (match_operand:SI 1 "general_operand" "%0,0")
                     (match_operand:SI 2 "general_operand" "dKs,dmKs")))]
       ...)

 The first alternative has ‘m’ (memory) for operand 0, ‘0’ for operand 1
(meaning it must match operand 0), and ‘dKs’ for operand 2.  The second
alternative has ‘d’ (data register) for operand 0, ‘0’ for operand 1,
and ‘dmKs’ for operand 2.  The ‘=’ and ‘%’ in the constraints apply to
all the alternatives; their meaning is explained in a later section
(*note Modifiers::).

 If all the operands fit any one alternative, the instruction is valid.
Otherwise, for each alternative, the compiler counts how many
instructions must be added to copy the operands so that that alternative
applies.  The alternative requiring the least copying is chosen.  If two
alternatives need the same amount of copying, the one that comes first
is chosen.  These choices can be altered with the ‘?’ and ‘!’
characters:

‘?’
     Disparage slightly the alternative that the ‘?’ appears in, as a
     choice when no alternative applies exactly.  The compiler regards
     this alternative as one unit more costly for each ‘?’ that appears
     in it.

‘!’
     Disparage severely the alternative that the ‘!’ appears in.  This
     alternative can still be used if it fits without reloading, but if
     reloading is needed, some other alternative will be used.

‘^’
     This constraint is analogous to ‘?’ but it disparages slightly the
     alternative only if the operand with the ‘^’ needs a reload.

‘$’
     This constraint is analogous to ‘!’ but it disparages severely the
     alternative only if the operand with the ‘$’ needs a reload.

 When an insn pattern has multiple alternatives in its constraints,
often the appearance of the assembler code is determined mostly by which
alternative was matched.  When this is so, the C code for writing the
assembler code can use the variable ‘which_alternative’, which is the
ordinal number of the alternative that was actually satisfied (0 for the
first, 1 for the second alternative, etc.).  *Note Output Statement::.


File: gccint.info,  Node: Class Preferences,  Next: Modifiers,  Prev: Multi-Alternative,  Up: Constraints

17.9.3 Register Class Preferences
---------------------------------

The operand constraints have another function: they enable the compiler
to decide which kind of hardware register a pseudo register is best
allocated to.  The compiler examines the constraints that apply to the
insns that use the pseudo register, looking for the machine-dependent
letters such as ‘d’ and ‘a’ that specify classes of registers.  The
pseudo register is put in whichever class gets the most "votes".  The
constraint letters ‘g’ and ‘r’ also vote: they vote in favor of a
general register.  The machine description says which registers are
considered general.

 Of course, on some machines all registers are equivalent, and no
register classes are defined.  Then none of this complexity is relevant.


File: gccint.info,  Node: Modifiers,  Next: Machine Constraints,  Prev: Class Preferences,  Up: Constraints

17.9.4 Constraint Modifier Characters
-------------------------------------

Here are constraint modifier characters.

‘=’
     Means that this operand is written to by this instruction: the
     previous value is discarded and replaced by new data.

‘+’
     Means that this operand is both read and written by the
     instruction.

     When the compiler fixes up the operands to satisfy the constraints,
     it needs to know which operands are read by the instruction and
     which are written by it.  ‘=’ identifies an operand which is only
     written; ‘+’ identifies an operand that is both read and written;
     all other operands are assumed to only be read.

     If you specify ‘=’ or ‘+’ in a constraint, you put it in the first
     character of the constraint string.

‘&’
     Means (in a particular alternative) that this operand is an
     “earlyclobber” operand, which is written before the instruction is
     finished using the input operands.  Therefore, this operand may not
     lie in a register that is read by the instruction or as part of any
     memory address.

     ‘&’ applies only to the alternative in which it is written.  In
     constraints with multiple alternatives, sometimes one alternative
     requires ‘&’ while others do not.  See, for example, the ‘movdf’
     insn of the 68000.

     An operand which is read by the instruction can be tied to an
     earlyclobber operand if its only use as an input occurs before the
     early result is written.  Adding alternatives of this form often
     allows GCC to produce better code when only some of the read
     operands can be affected by the earlyclobber.  See, for example,
     the ‘mulsi3’ insn of the ARM.

     Furthermore, if the “earlyclobber” operand is also a read/write
     operand, then that operand is written only after it's used.

     ‘&’ does not obviate the need to write ‘=’ or ‘+’.  As
     “earlyclobber” operands are always written, a read-only
     “earlyclobber” operand is ill-formed and will be rejected by the
     compiler.

‘%’
     Declares the instruction to be commutative for this operand and the
     following operand.  This means that the compiler may interchange
     the two operands if that is the cheapest way to make all operands
     fit the constraints.  ‘%’ applies to all alternatives and must
     appear as the first character in the constraint.  Only read-only
     operands can use ‘%’.

     This is often used in patterns for addition instructions that
     really have only two operands: the result must go in one of the
     arguments.  Here for example, is how the 68000 halfword-add
     instruction is defined:

          (define_insn "addhi3"
            [(set (match_operand:HI 0 "general_operand" "=m,r")
               (plus:HI (match_operand:HI 1 "general_operand" "%0,0")
                        (match_operand:HI 2 "general_operand" "di,g")))]
            ...)
     GCC can only handle one commutative pair in an asm; if you use
     more, the compiler may fail.  Note that you need not use the
     modifier if the two alternatives are strictly identical; this would
     only waste time in the reload pass.  The modifier is not
     operational after register allocation, so the result of
     ‘define_peephole2’ and ‘define_split’s performed after reload
     cannot rely on ‘%’ to make the intended insn match.

‘#’
     Says that all following characters, up to the next comma, are to be
     ignored as a constraint.  They are significant only for choosing
     register preferences.

‘*’
     Says that the following character should be ignored when choosing
     register preferences.  ‘*’ has no effect on the meaning of the
     constraint as a constraint, and no effect on reloading.  For LRA
     ‘*’ additionally disparages slightly the alternative if the
     following character matches the operand.

     Here is an example: the 68000 has an instruction to sign-extend a
     halfword in a data register, and can also sign-extend a value by
     copying it into an address register.  While either kind of register
     is acceptable, the constraints on an address-register destination
     are less strict, so it is best if register allocation makes an
     address register its goal.  Therefore, ‘*’ is used so that the ‘d’
     constraint letter (for data register) is ignored when computing
     register preferences.

          (define_insn "extendhisi2"
            [(set (match_operand:SI 0 "general_operand" "=*d,a")
                  (sign_extend:SI
                   (match_operand:HI 1 "general_operand" "0,g")))]
            ...)


File: gccint.info,  Node: Machine Constraints,  Next: Disable Insn Alternatives,  Prev: Modifiers,  Up: Constraints

17.9.5 Constraints for Particular Machines
------------------------------------------

Whenever possible, you should use the general-purpose constraint letters
in ‘asm’ arguments, since they will convey meaning more readily to
people reading your code.  Failing that, use the constraint letters that
usually have very similar meanings across architectures.  The most
commonly used constraints are ‘m’ and ‘r’ (for memory and
general-purpose registers respectively; *note Simple Constraints::), and
‘I’, usually the letter indicating the most common immediate-constant
format.

 Each architecture defines additional constraints.  These constraints
are used by the compiler itself for instruction generation, as well as
for ‘asm’ statements; therefore, some of the constraints are not
particularly useful for ‘asm’.  Here is a summary of some of the
machine-dependent constraints available on some particular machines; it
includes both constraints that are useful for ‘asm’ and constraints that
aren't.  The compiler source file mentioned in the table heading for
each architecture is the definitive reference for the meanings of that
architecture's constraints.

_AArch64 family--‘config/aarch64/constraints.md’_
     ‘k’
          The stack pointer register (‘SP’)

     ‘w’
          Floating point register, Advanced SIMD vector register or SVE
          vector register

     ‘x’
          Like ‘w’, but restricted to registers 0 to 15 inclusive.

     ‘y’
          Like ‘w’, but restricted to registers 0 to 7 inclusive.

     ‘Upl’
          One of the low eight SVE predicate registers (‘P0’ to ‘P7’)

     ‘Upa’
          Any of the SVE predicate registers (‘P0’ to ‘P15’)

     ‘I’
          Integer constant that is valid as an immediate operand in an
          ‘ADD’ instruction

     ‘J’
          Integer constant that is valid as an immediate operand in a
          ‘SUB’ instruction (once negated)

     ‘K’
          Integer constant that can be used with a 32-bit logical
          instruction

     ‘L’
          Integer constant that can be used with a 64-bit logical
          instruction

     ‘M’
          Integer constant that is valid as an immediate operand in a
          32-bit ‘MOV’ pseudo instruction.  The ‘MOV’ may be assembled
          to one of several different machine instructions depending on
          the value

     ‘N’
          Integer constant that is valid as an immediate operand in a
          64-bit ‘MOV’ pseudo instruction

     ‘S’
          An absolute symbolic address or a label reference

     ‘Y’
          Floating point constant zero

     ‘Z’
          Integer constant zero

     ‘Ush’
          The high part (bits 12 and upwards) of the pc-relative address
          of a symbol within 4GB of the instruction

     ‘Q’
          A memory address which uses a single base register with no
          offset

     ‘Ump’
          A memory address suitable for a load/store pair instruction in
          SI, DI, SF and DF modes

_AMD GCN --‘config/gcn/constraints.md’_
     ‘I’
          Immediate integer in the range −16 to 64

     ‘J’
          Immediate 16-bit signed integer

     ‘Kf’
          Immediate constant −1

     ‘L’
          Immediate 15-bit unsigned integer

     ‘A’
          Immediate constant that can be inlined in an instruction
          encoding: integer −16..64, or float 0.0, +/−0.5, +/−1.0,
          +/−2.0, +/−4.0, 1.0/(2.0*PI)

     ‘B’
          Immediate 32-bit signed integer that can be attached to an
          instruction encoding

     ‘C’
          Immediate 32-bit integer in range −16..4294967295 (i.e.
          32-bit unsigned integer or ‘A’ constraint)

     ‘DA’
          Immediate 64-bit constant that can be split into two ‘A’
          constants

     ‘DB’
          Immediate 64-bit constant that can be split into two ‘B’
          constants

     ‘U’
          Any ‘unspec’

     ‘Y’
          Any ‘symbol_ref’ or ‘label_ref’

     ‘v’
          VGPR register

     ‘a’
          Accelerator VGPR register (CDNA1 onwards)

     ‘Sg’
          SGPR register

     ‘SD’
          SGPR registers valid for instruction destinations, including
          VCC, M0 and EXEC

     ‘SS’
          SGPR registers valid for instruction sources, including VCC,
          M0, EXEC and SCC

     ‘Sm’
          SGPR registers valid as a source for scalar memory
          instructions (excludes M0 and EXEC)

     ‘Sv’
          SGPR registers valid as a source or destination for vector
          instructions (excludes EXEC)

     ‘ca’
          All condition registers: SCC, VCCZ, EXECZ

     ‘cs’
          Scalar condition register: SCC

     ‘cV’
          Vector condition register: VCC, VCC_LO, VCC_HI

     ‘e’
          EXEC register (EXEC_LO and EXEC_HI)

     ‘RB’
          Memory operand with address space suitable for ‘buffer_*’
          instructions

     ‘RF’
          Memory operand with address space suitable for ‘flat_*’
          instructions

     ‘RS’
          Memory operand with address space suitable for ‘s_*’
          instructions

     ‘RL’
          Memory operand with address space suitable for ‘ds_*’ LDS
          instructions

     ‘RG’
          Memory operand with address space suitable for ‘ds_*’ GDS
          instructions

     ‘RD’
          Memory operand with address space suitable for any ‘ds_*’
          instructions

     ‘RM’
          Memory operand with address space suitable for ‘global_*’
          instructions

_ARC --‘config/arc/constraints.md’_
     ‘q’
          Registers usable in ARCompact 16-bit instructions: ‘r0’-‘r3’,
          ‘r12’-‘r15’.  This constraint can only match when the ‘-mq’
          option is in effect.

     ‘e’
          Registers usable as base-regs of memory addresses in ARCompact
          16-bit memory instructions: ‘r0’-‘r3’, ‘r12’-‘r15’, ‘sp’.
          This constraint can only match when the ‘-mq’ option is in
          effect.
     ‘D’
          ARC FPX (dpfp) 64-bit registers.  ‘D0’, ‘D1’.

     ‘I’
          A signed 12-bit integer constant.

     ‘Cal’
          constant for arithmetic/logical operations.  This might be any
          constant that can be put into a long immediate by the assmbler
          or linker without involving a PIC relocation.

     ‘K’
          A 3-bit unsigned integer constant.

     ‘L’
          A 6-bit unsigned integer constant.

     ‘CnL’
          One's complement of a 6-bit unsigned integer constant.

     ‘CmL’
          Two's complement of a 6-bit unsigned integer constant.

     ‘M’
          A 5-bit unsigned integer constant.

     ‘O’
          A 7-bit unsigned integer constant.

     ‘P’
          A 8-bit unsigned integer constant.

     ‘H’
          Any const_double value.

_ARM family--‘config/arm/constraints.md’_

     ‘h’
          In Thumb state, the core registers ‘r8’-‘r15’.

     ‘k’
          The stack pointer register.

     ‘l’
          In Thumb State the core registers ‘r0’-‘r7’.  In ARM state
          this is an alias for the ‘r’ constraint.

     ‘t’
          VFP floating-point registers ‘s0’-‘s31’.  Used for 32 bit
          values.

     ‘w’
          VFP floating-point registers ‘d0’-‘d31’ and the appropriate
          subset ‘d0’-‘d15’ based on command line options.  Used for 64
          bit values only.  Not valid for Thumb1.

     ‘y’
          The iWMMX co-processor registers.

     ‘z’
          The iWMMX GR registers.

     ‘G’
          The floating-point constant 0.0

     ‘I’
          Integer that is valid as an immediate operand in a data
          processing instruction.  That is, an integer in the range 0 to
          255 rotated by a multiple of 2

     ‘J’
          Integer in the range −4095 to 4095

     ‘K’
          Integer that satisfies constraint ‘I’ when inverted (ones
          complement)

     ‘L’
          Integer that satisfies constraint ‘I’ when negated (twos
          complement)

     ‘M’
          Integer in the range 0 to 32

     ‘Q’
          A memory reference where the exact address is in a single
          register ('‘m’' is preferable for ‘asm’ statements)

     ‘R’
          An item in the constant pool

     ‘S’
          A symbol in the text segment of the current file

     ‘Uv’
          A memory reference suitable for VFP load/store insns
          (reg+constant offset)

     ‘Uy’
          A memory reference suitable for iWMMXt load/store
          instructions.

     ‘Uq’
          A memory reference suitable for the ARMv4 ldrsb instruction.

_AVR family--‘config/avr/constraints.md’_
     ‘l’
          Registers from r0 to r15

     ‘a’
          Registers from r16 to r23

     ‘d’
          Registers from r16 to r31

     ‘w’
          Registers from r24 to r31.  These registers can be used in
          ‘adiw’ command

     ‘e’
          Pointer register (r26-r31)

     ‘b’
          Base pointer register (r28-r31)

     ‘q’
          Stack pointer register (SPH:SPL)

     ‘t’
          Temporary register r0

     ‘x’
          Register pair X (r27:r26)

     ‘y’
          Register pair Y (r29:r28)

     ‘z’
          Register pair Z (r31:r30)

     ‘I’
          Constant greater than −1, less than 64

     ‘J’
          Constant greater than −64, less than 1

     ‘K’
          Constant integer 2

     ‘L’
          Constant integer 0

     ‘M’
          Constant that fits in 8 bits

     ‘N’
          Constant integer −1

     ‘O’
          Constant integer 8, 16, or 24

     ‘P’
          Constant integer 1

     ‘G’
          A floating point constant 0.0

     ‘Q’
          A memory address based on Y or Z pointer with displacement.

_Blackfin family--‘config/bfin/constraints.md’_
     ‘a’
          P register

     ‘d’
          D register

     ‘z’
          A call clobbered P register.

     ‘qN’
          A single register.  If N is in the range 0 to 7, the
          corresponding D register.  If it is ‘A’, then the register P0.

     ‘D’
          Even-numbered D register

     ‘W’
          Odd-numbered D register

     ‘e’
          Accumulator register.

     ‘A’
          Even-numbered accumulator register.

     ‘B’
          Odd-numbered accumulator register.

     ‘b’
          I register

     ‘v’
          B register

     ‘f’
          M register

     ‘c’
          Registers used for circular buffering, i.e. I, B, or L
          registers.

     ‘C’
          The CC register.

     ‘t’
          LT0 or LT1.

     ‘k’
          LC0 or LC1.

     ‘u’
          LB0 or LB1.

     ‘x’
          Any D, P, B, M, I or L register.

     ‘y’
          Additional registers typically used only in prologues and
          epilogues: RETS, RETN, RETI, RETX, RETE, ASTAT, SEQSTAT and
          USP.

     ‘w’
          Any register except accumulators or CC.

     ‘Ksh’
          Signed 16 bit integer (in the range −32768 to 32767)

     ‘Kuh’
          Unsigned 16 bit integer (in the range 0 to 65535)

     ‘Ks7’
          Signed 7 bit integer (in the range −64 to 63)

     ‘Ku7’
          Unsigned 7 bit integer (in the range 0 to 127)

     ‘Ku5’
          Unsigned 5 bit integer (in the range 0 to 31)

     ‘Ks4’
          Signed 4 bit integer (in the range −8 to 7)

     ‘Ks3’
          Signed 3 bit integer (in the range −3 to 4)

     ‘Ku3’
          Unsigned 3 bit integer (in the range 0 to 7)

     ‘PN’
          Constant N, where N is a single-digit constant in the range 0
          to 4.

     ‘PA’
          An integer equal to one of the MACFLAG_XXX constants that is
          suitable for use with either accumulator.

     ‘PB’
          An integer equal to one of the MACFLAG_XXX constants that is
          suitable for use only with accumulator A1.

     ‘M1’
          Constant 255.

     ‘M2’
          Constant 65535.

     ‘J’
          An integer constant with exactly a single bit set.

     ‘L’
          An integer constant with all bits set except exactly one.

     ‘H’
     ‘Q’
          Any SYMBOL_REF.

_C-SKY--‘config/csky/constraints.md’_

     ‘a’
          The mini registers r0 - r7.

     ‘b’
          The low registers r0 - r15.

     ‘c’
          C register.

     ‘y’
          HI and LO registers.

     ‘l’
          LO register.

     ‘h’
          HI register.

     ‘v’
          Vector registers.

     ‘z’
          Stack pointer register (SP).

     ‘Q’
          A memory address which uses a base register with a short
          offset or with a index register with its scale.

     ‘W’
          A memory address which uses a base register with a index
          register with its scale.

     The C-SKY back end supports a large set of additional constraints
     that are only useful for instruction selection or splitting rather
     than inline asm, such as constraints representing constant integer
     ranges accepted by particular instruction encodings.  Refer to the
     source code for details.

_Epiphany--‘config/epiphany/constraints.md’_
     ‘U16’
          An unsigned 16-bit constant.

     ‘K’
          An unsigned 5-bit constant.

     ‘L’
          A signed 11-bit constant.

     ‘Cm1’
          A signed 11-bit constant added to −1.  Can only match when the
          ‘-m1reg-REG’ option is active.

     ‘Cl1’
          Left-shift of −1, i.e., a bit mask with a block of leading
          ones, the rest being a block of trailing zeroes.  Can only
          match when the ‘-m1reg-REG’ option is active.

     ‘Cr1’
          Right-shift of −1, i.e., a bit mask with a trailing block of
          ones, the rest being zeroes.  Or to put it another way, one
          less than a power of two.  Can only match when the
          ‘-m1reg-REG’ option is active.

     ‘Cal’
          Constant for arithmetic/logical operations.  This is like ‘i’,
          except that for position independent code, no symbols /
          expressions needing relocations are allowed.

     ‘Csy’
          Symbolic constant for call/jump instruction.

     ‘Rcs’
          The register class usable in short insns.  This is a register
          class constraint, and can thus drive register allocation.
          This constraint won't match unless ‘-mprefer-short-insn-regs’
          is in effect.

     ‘Rsc’
          The register class of registers that can be used to hold a
          sibcall call address.  I.e., a caller-saved register.

     ‘Rct’
          Core control register class.

     ‘Rgs’
          The register group usable in short insns.  This constraint
          does not use a register class, so that it only passively
          matches suitable registers, and doesn't drive register
          allocation.

     ‘Car’
          Constant suitable for the addsi3_r pattern.  This is a valid
          offset For byte, halfword, or word addressing.

     ‘Rra’
          Matches the return address if it can be replaced with the link
          register.

     ‘Rcc’
          Matches the integer condition code register.

     ‘Sra’
          Matches the return address if it is in a stack slot.

     ‘Cfm’
          Matches control register values to switch fp mode, which are
          encapsulated in ‘UNSPEC_FP_MODE’.

_FRV--‘config/frv/frv.h’_
     ‘a’
          Register in the class ‘ACC_REGS’ (‘acc0’ to ‘acc7’).

     ‘b’
          Register in the class ‘EVEN_ACC_REGS’ (‘acc0’ to ‘acc7’).

     ‘c’
          Register in the class ‘CC_REGS’ (‘fcc0’ to ‘fcc3’ and ‘icc0’
          to ‘icc3’).

     ‘d’
          Register in the class ‘GPR_REGS’ (‘gr0’ to ‘gr63’).

     ‘e’
          Register in the class ‘EVEN_REGS’ (‘gr0’ to ‘gr63’).  Odd
          registers are excluded not in the class but through the use of
          a machine mode larger than 4 bytes.

     ‘f’
          Register in the class ‘FPR_REGS’ (‘fr0’ to ‘fr63’).

     ‘h’
          Register in the class ‘FEVEN_REGS’ (‘fr0’ to ‘fr63’).  Odd
          registers are excluded not in the class but through the use of
          a machine mode larger than 4 bytes.

     ‘l’
          Register in the class ‘LR_REG’ (the ‘lr’ register).

     ‘q’
          Register in the class ‘QUAD_REGS’ (‘gr2’ to ‘gr63’).  Register
          numbers not divisible by 4 are excluded not in the class but
          through the use of a machine mode larger than 8 bytes.

     ‘t’
          Register in the class ‘ICC_REGS’ (‘icc0’ to ‘icc3’).

     ‘u’
          Register in the class ‘FCC_REGS’ (‘fcc0’ to ‘fcc3’).

     ‘v’
          Register in the class ‘ICR_REGS’ (‘cc4’ to ‘cc7’).

     ‘w’
          Register in the class ‘FCR_REGS’ (‘cc0’ to ‘cc3’).

     ‘x’
          Register in the class ‘QUAD_FPR_REGS’ (‘fr0’ to ‘fr63’).
          Register numbers not divisible by 4 are excluded not in the
          class but through the use of a machine mode larger than 8
          bytes.

     ‘z’
          Register in the class ‘SPR_REGS’ (‘lcr’ and ‘lr’).

     ‘A’
          Register in the class ‘QUAD_ACC_REGS’ (‘acc0’ to ‘acc7’).

     ‘B’
          Register in the class ‘ACCG_REGS’ (‘accg0’ to ‘accg7’).

     ‘C’
          Register in the class ‘CR_REGS’ (‘cc0’ to ‘cc7’).

     ‘G’
          Floating point constant zero

     ‘I’
          6-bit signed integer constant

     ‘J’
          10-bit signed integer constant

     ‘L’
          16-bit signed integer constant

     ‘M’
          16-bit unsigned integer constant

     ‘N’
          12-bit signed integer constant that is negative--i.e. in the
          range of −2048 to −1

     ‘O’
          Constant zero

     ‘P’
          12-bit signed integer constant that is greater than zero--i.e.
          in the range of 1 to 2047.

_FT32--‘config/ft32/constraints.md’_
     ‘A’
          An absolute address

     ‘B’
          An offset address

     ‘W’
          A register indirect memory operand

     ‘e’
          An offset address.

     ‘f’
          An offset address.

     ‘O’
          The constant zero or one

     ‘I’
          A 16-bit signed constant (−32768 ... 32767)

     ‘w’
          A bitfield mask suitable for bext or bins

     ‘x’
          An inverted bitfield mask suitable for bext or bins

     ‘L’
          A 16-bit unsigned constant, multiple of 4 (0 ... 65532)

     ‘S’
          A 20-bit signed constant (−524288 ... 524287)

     ‘b’
          A constant for a bitfield width (1 ... 16)

     ‘KA’
          A 10-bit signed constant (−512 ... 511)

_Hewlett-Packard PA-RISC--‘config/pa/pa.h’_
     ‘a’
          General register 1

     ‘f’
          Floating point register

     ‘q’
          Shift amount register

     ‘x’
          Floating point register (deprecated)

     ‘y’
          Upper floating point register (32-bit), floating point
          register (64-bit)

     ‘Z’
          Any register

     ‘I’
          Signed 11-bit integer constant

     ‘J’
          Signed 14-bit integer constant

     ‘K’
          Integer constant that can be deposited with a ‘zdepi’
          instruction

     ‘L’
          Signed 5-bit integer constant

     ‘M’
          Integer constant 0

     ‘N’
          Integer constant that can be loaded with a ‘ldil’ instruction

     ‘O’
          Integer constant whose value plus one is a power of 2

     ‘P’
          Integer constant that can be used for ‘and’ operations in
          ‘depi’ and ‘extru’ instructions

     ‘S’
          Integer constant 31

     ‘U’
          Integer constant 63

     ‘G’
          Floating-point constant 0.0

     ‘A’
          A ‘lo_sum’ data-linkage-table memory operand

     ‘Q’
          A memory operand that can be used as the destination operand
          of an integer store instruction

     ‘R’
          A scaled or unscaled indexed memory operand

     ‘T’
          A memory operand for floating-point loads and stores

     ‘W’
          A register indirect memory operand

_Intel IA-64--‘config/ia64/ia64.h’_
     ‘a’
          General register ‘r0’ to ‘r3’ for ‘addl’ instruction

     ‘b’
          Branch register

     ‘c’
          Predicate register (‘c’ as in "conditional")

     ‘d’
          Application register residing in M-unit

     ‘e’
          Application register residing in I-unit

     ‘f’
          Floating-point register

     ‘m’
          Memory operand.  If used together with ‘<’ or ‘>’, the operand
          can have postincrement and postdecrement which require
          printing with ‘%Pn’ on IA-64.

     ‘G’
          Floating-point constant 0.0 or 1.0

     ‘I’
          14-bit signed integer constant

     ‘J’
          22-bit signed integer constant

     ‘K’
          8-bit signed integer constant for logical instructions

     ‘L’
          8-bit adjusted signed integer constant for compare pseudo-ops

     ‘M’
          6-bit unsigned integer constant for shift counts

     ‘N’
          9-bit signed integer constant for load and store
          postincrements

     ‘O’
          The constant zero

     ‘P’
          0 or −1 for ‘dep’ instruction

     ‘Q’
          Non-volatile memory for floating-point loads and stores

     ‘R’
          Integer constant in the range 1 to 4 for ‘shladd’ instruction

     ‘S’
          Memory operand except postincrement and postdecrement.  This
          is now roughly the same as ‘m’ when not used together with ‘<’
          or ‘>’.

_M32C--‘config/m32c/m32c.cc’_
     ‘Rsp’
     ‘Rfb’
     ‘Rsb’
          ‘$sp’, ‘$fb’, ‘$sb’.

     ‘Rcr’
          Any control register, when they're 16 bits wide (nothing if
          control registers are 24 bits wide)

     ‘Rcl’
          Any control register, when they're 24 bits wide.

     ‘R0w’
     ‘R1w’
     ‘R2w’
     ‘R3w’
          $r0, $r1, $r2, $r3.

     ‘R02’
          $r0 or $r2, or $r2r0 for 32 bit values.

     ‘R13’
          $r1 or $r3, or $r3r1 for 32 bit values.

     ‘Rdi’
          A register that can hold a 64 bit value.

     ‘Rhl’
          $r0 or $r1 (registers with addressable high/low bytes)

     ‘R23’
          $r2 or $r3

     ‘Raa’
          Address registers

     ‘Raw’
          Address registers when they're 16 bits wide.

     ‘Ral’
          Address registers when they're 24 bits wide.

     ‘Rqi’
          Registers that can hold QI values.

     ‘Rad’
          Registers that can be used with displacements ($a0, $a1, $sb).

     ‘Rsi’
          Registers that can hold 32 bit values.

     ‘Rhi’
          Registers that can hold 16 bit values.

     ‘Rhc’
          Registers chat can hold 16 bit values, including all control
          registers.

     ‘Rra’
          $r0 through R1, plus $a0 and $a1.

     ‘Rfl’
          The flags register.

     ‘Rmm’
          The memory-based pseudo-registers $mem0 through $mem15.

     ‘Rpi’
          Registers that can hold pointers (16 bit registers for r8c,
          m16c; 24 bit registers for m32cm, m32c).

     ‘Rpa’
          Matches multiple registers in a PARALLEL to form a larger
          register.  Used to match function return values.

     ‘Is3’
          −8 ... 7

     ‘IS1’
          −128 ... 127

     ‘IS2’
          −32768 ... 32767

     ‘IU2’
          0 ... 65535

     ‘In4’
          −8 ... −1 or 1 ... 8

     ‘In5’
          −16 ... −1 or 1 ... 16

     ‘In6’
          −32 ... −1 or 1 ... 32

     ‘IM2’
          −65536 ... −1

     ‘Ilb’
          An 8 bit value with exactly one bit set.

     ‘Ilw’
          A 16 bit value with exactly one bit set.

     ‘Sd’
          The common src/dest memory addressing modes.

     ‘Sa’
          Memory addressed using $a0 or $a1.

     ‘Si’
          Memory addressed with immediate addresses.

     ‘Ss’
          Memory addressed using the stack pointer ($sp).

     ‘Sf’
          Memory addressed using the frame base register ($fb).

     ‘Ss’
          Memory addressed using the small base register ($sb).

     ‘S1’
          $r1h

_LoongArch--‘config/loongarch/constraints.md’_
     ‘f’
          A floating-point or vector register (if available).
     ‘k’
          A memory operand whose address is formed by a base register
          and (optionally scaled) index register.
     ‘l’
          A signed 16-bit constant.
     ‘m’
          A memory operand whose address is formed by a base register
          and offset that is suitable for use in instructions with the
          same addressing mode as ‘st.w’ and ‘ld.w’.
     ‘I’
          A signed 12-bit constant (for arithmetic instructions).
     ‘K’
          An unsigned 12-bit constant (for logic instructions).
     ‘M’
          A constant that cannot be loaded using ‘lui’, ‘addiu’ or
          ‘ori’.
     ‘N’
          A constant in the range -65535 to -1 (inclusive).
     ‘O’
          A signed 15-bit constant.
     ‘P’
          A constant in the range 1 to 65535 (inclusive).
     ‘R’
          An address that can be used in a non-macro load or store.
     ‘ZB’
          An address that is held in a general-purpose register.  The
          offset is zero.
     ‘ZC’
          A memory operand whose address is formed by a base register
          and offset that is suitable for use in instructions with the
          same addressing mode as ‘ll.w’ and ‘sc.w’.

_MicroBlaze--‘config/microblaze/constraints.md’_
     ‘d’
          A general register (‘r0’ to ‘r31’).

     ‘z’
          A status register (‘rmsr’, ‘$fcc1’ to ‘$fcc7’).

_MIPS--‘config/mips/constraints.md’_
     ‘d’
          A general-purpose register.  This is equivalent to ‘r’ unless
          generating MIPS16 code, in which case the MIPS16 register set
          is used.

     ‘f’
          A floating-point register (if available).

     ‘h’
          Formerly the ‘hi’ register.  This constraint is no longer
          supported.

     ‘l’
          The ‘lo’ register.  Use this register to store values that are
          no bigger than a word.

     ‘x’
          The concatenated ‘hi’ and ‘lo’ registers.  Use this register
          to store doubleword values.

     ‘c’
          A register suitable for use in an indirect jump.  This will
          always be ‘$25’ for ‘-mabicalls’.

     ‘v’
          Register ‘$3’.  Do not use this constraint in new code; it is
          retained only for compatibility with glibc.

     ‘y’
          Equivalent to ‘r’; retained for backwards compatibility.

     ‘z’
          A floating-point condition code register.

     ‘I’
          A signed 16-bit constant (for arithmetic instructions).

     ‘J’
          Integer zero.

     ‘K’
          An unsigned 16-bit constant (for logic instructions).

     ‘L’
          A signed 32-bit constant in which the lower 16 bits are zero.
          Such constants can be loaded using ‘lui’.

     ‘M’
          A constant that cannot be loaded using ‘lui’, ‘addiu’ or
          ‘ori’.

     ‘N’
          A constant in the range −65535 to −1 (inclusive).

     ‘O’
          A signed 15-bit constant.

     ‘P’
          A constant in the range 1 to 65535 (inclusive).

     ‘G’
          Floating-point zero.

     ‘R’
          An address that can be used in a non-macro load or store.

     ‘ZC’
          A memory operand whose address is formed by a base register
          and offset that is suitable for use in instructions with the
          same addressing mode as ‘ll’ and ‘sc’.

     ‘ZD’
          An address suitable for a ‘prefetch’ instruction, or for any
          other instruction with the same addressing mode as ‘prefetch’.

_Motorola 680x0--‘config/m68k/constraints.md’_
     ‘a’
          Address register

     ‘d’
          Data register

     ‘f’
          68881 floating-point register, if available

     ‘I’
          Integer in the range 1 to 8

     ‘J’
          16-bit signed number

     ‘K’
          Signed number whose magnitude is greater than 0x80

     ‘L’
          Integer in the range −8 to −1

     ‘M’
          Signed number whose magnitude is greater than 0x100

     ‘N’
          Range 24 to 31, rotatert:SI 8 to 1 expressed as rotate

     ‘O’
          16 (for rotate using swap)

     ‘P’
          Range 8 to 15, rotatert:HI 8 to 1 expressed as rotate

     ‘R’
          Numbers that mov3q can handle

     ‘G’
          Floating point constant that is not a 68881 constant

     ‘S’
          Operands that satisfy 'm' when -mpcrel is in effect

     ‘T’
          Operands that satisfy 's' when -mpcrel is not in effect

     ‘Q’
          Address register indirect addressing mode

     ‘U’
          Register offset addressing

     ‘W’
          const_call_operand

     ‘Cs’
          symbol_ref or const

     ‘Ci’
          const_int

     ‘C0’
          const_int 0

     ‘Cj’
          Range of signed numbers that don't fit in 16 bits

     ‘Cmvq’
          Integers valid for mvq

     ‘Capsw’
          Integers valid for a moveq followed by a swap

     ‘Cmvz’
          Integers valid for mvz

     ‘Cmvs’
          Integers valid for mvs

     ‘Ap’
          push_operand

     ‘Ac’
          Non-register operands allowed in clr

_Moxie--‘config/moxie/constraints.md’_
     ‘A’
          An absolute address

     ‘B’
          An offset address

     ‘W’
          A register indirect memory operand

     ‘I’
          A constant in the range of 0 to 255.

     ‘N’
          A constant in the range of 0 to −255.

_MSP430-‘config/msp430/constraints.md’_

     ‘R12’
          Register R12.

     ‘R13’
          Register R13.

     ‘K’
          Integer constant 1.

     ‘L’
          Integer constant -1^20..1^19.

     ‘M’
          Integer constant 1-4.

     ‘Ya’
          Memory references which do not require an extended MOVX
          instruction.

     ‘Yl’
          Memory reference, labels only.

     ‘Ys’
          Memory reference, stack only.

_NDS32--‘config/nds32/constraints.md’_
     ‘w’
          LOW register class $r0 to $r7 constraint for V3/V3M ISA.
     ‘l’
          LOW register class $r0 to $r7.
     ‘d’
          MIDDLE register class $r0 to $r11, $r16 to $r19.
     ‘h’
          HIGH register class $r12 to $r14, $r20 to $r31.
     ‘t’
          Temporary assist register $ta (i.e. $r15).
     ‘k’
          Stack register $sp.
     ‘Iu03’
          Unsigned immediate 3-bit value.
     ‘In03’
          Negative immediate 3-bit value in the range of −7-0.
     ‘Iu04’
          Unsigned immediate 4-bit value.
     ‘Is05’
          Signed immediate 5-bit value.
     ‘Iu05’
          Unsigned immediate 5-bit value.
     ‘In05’
          Negative immediate 5-bit value in the range of −31-0.
     ‘Ip05’
          Unsigned immediate 5-bit value for movpi45 instruction with
          range 16-47.
     ‘Iu06’
          Unsigned immediate 6-bit value constraint for addri36.sp
          instruction.
     ‘Iu08’
          Unsigned immediate 8-bit value.
     ‘Iu09’
          Unsigned immediate 9-bit value.
     ‘Is10’
          Signed immediate 10-bit value.
     ‘Is11’
          Signed immediate 11-bit value.
     ‘Is15’
          Signed immediate 15-bit value.
     ‘Iu15’
          Unsigned immediate 15-bit value.
     ‘Ic15’
          A constant which is not in the range of imm15u but ok for bclr
          instruction.
     ‘Ie15’
          A constant which is not in the range of imm15u but ok for bset
          instruction.
     ‘It15’
          A constant which is not in the range of imm15u but ok for btgl
          instruction.
     ‘Ii15’
          A constant whose compliment value is in the range of imm15u
          and ok for bitci instruction.
     ‘Is16’
          Signed immediate 16-bit value.
     ‘Is17’
          Signed immediate 17-bit value.
     ‘Is19’
          Signed immediate 19-bit value.
     ‘Is20’
          Signed immediate 20-bit value.
     ‘Ihig’
          The immediate value that can be simply set high 20-bit.
     ‘Izeb’
          The immediate value 0xff.
     ‘Izeh’
          The immediate value 0xffff.
     ‘Ixls’
          The immediate value 0x01.
     ‘Ix11’
          The immediate value 0x7ff.
     ‘Ibms’
          The immediate value with power of 2.
     ‘Ifex’
          The immediate value with power of 2 minus 1.
     ‘U33’
          Memory constraint for 333 format.
     ‘U45’
          Memory constraint for 45 format.
     ‘U37’
          Memory constraint for 37 format.

_Nios II family--‘config/nios2/constraints.md’_

     ‘I’
          Integer that is valid as an immediate operand in an
          instruction taking a signed 16-bit number.  Range −32768 to
          32767.

     ‘J’
          Integer that is valid as an immediate operand in an
          instruction taking an unsigned 16-bit number.  Range 0 to
          65535.

     ‘K’
          Integer that is valid as an immediate operand in an
          instruction taking only the upper 16-bits of a 32-bit number.
          Range 32-bit numbers with the lower 16-bits being 0.

     ‘L’
          Integer that is valid as an immediate operand for a shift
          instruction.  Range 0 to 31.

     ‘M’
          Integer that is valid as an immediate operand for only the
          value 0.  Can be used in conjunction with the format modifier
          ‘z’ to use ‘r0’ instead of ‘0’ in the assembly output.

     ‘N’
          Integer that is valid as an immediate operand for a custom
          instruction opcode.  Range 0 to 255.

     ‘P’
          An immediate operand for R2 andchi/andci instructions.

     ‘S’
          Matches immediates which are addresses in the small data
          section and therefore can be added to ‘gp’ as a 16-bit
          immediate to re-create their 32-bit value.

     ‘U’
          Matches constants suitable as an operand for the rdprs and
          cache instructions.

     ‘v’
          A memory operand suitable for Nios II R2 load/store exclusive
          instructions.

     ‘w’
          A memory operand suitable for load/store IO and cache
          instructions.

     ‘T’
          A ‘const’ wrapped ‘UNSPEC’ expression, representing a
          supported PIC or TLS relocation.

_OpenRISC--‘config/or1k/constraints.md’_
     ‘I’
          Integer that is valid as an immediate operand in an
          instruction taking a signed 16-bit number.  Range −32768 to
          32767.

     ‘K’
          Integer that is valid as an immediate operand in an
          instruction taking an unsigned 16-bit number.  Range 0 to
          65535.

     ‘M’
          Signed 16-bit constant shifted left 16 bits.  (Used with
          ‘l.movhi’)

     ‘O’
          Zero

     ‘c’
          Register usable for sibcalls.

_PDP-11--‘config/pdp11/constraints.md’_
     ‘a’
          Floating point registers AC0 through AC3.  These can be loaded
          from/to memory with a single instruction.

     ‘d’
          Odd numbered general registers (R1, R3, R5).  These are used
          for 16-bit multiply operations.

     ‘D’
          A memory reference that is encoded within the opcode, but not
          auto-increment or auto-decrement.

     ‘f’
          Any of the floating point registers (AC0 through AC5).

     ‘G’
          Floating point constant 0.

     ‘h’
          Floating point registers AC4 and AC5.  These cannot be loaded
          from/to memory with a single instruction.

     ‘I’
          An integer constant that fits in 16 bits.

     ‘J’
          An integer constant whose low order 16 bits are zero.

     ‘K’
          An integer constant that does not meet the constraints for
          codes ‘I’ or ‘J’.

     ‘L’
          The integer constant 1.

     ‘M’
          The integer constant −1.

     ‘N’
          The integer constant 0.

     ‘O’
          Integer constants 0 through 3; shifts by these amounts are
          handled as multiple single-bit shifts rather than a single
          variable-length shift.

     ‘Q’
          A memory reference which requires an additional word (address
          or offset) after the opcode.

     ‘R’
          A memory reference that is encoded within the opcode.

_PowerPC and IBM RS6000--‘config/rs6000/constraints.md’_
     ‘r’
          A general purpose register (GPR), ‘r0’...‘r31’.

     ‘b’
          A base register.  Like ‘r’, but ‘r0’ is not allowed, so
          ‘r1’...‘r31’.

     ‘f’
          A floating point register (FPR), ‘f0’...‘f31’.

     ‘d’
          A floating point register.  This is the same as ‘f’ nowadays;
          historically ‘f’ was for single-precision and ‘d’ was for
          double-precision floating point.

     ‘v’
          An Altivec vector register (VR), ‘v0’...‘v31’.

     ‘wa’
          A VSX register (VSR), ‘vs0’...‘vs63’.  This is either an FPR
          (‘vs0’...‘vs31’ are ‘f0’...‘f31’) or a VR (‘vs32’...‘vs63’ are
          ‘v0’...‘v31’).

          When using ‘wa’, you should use the ‘%x’ output modifier, so
          that the correct register number is printed.  For example:

               asm ("xvadddp %x0,%x1,%x2"
                    : "=wa" (v1)
                    : "wa" (v2), "wa" (v3));

          You should not use ‘%x’ for ‘v’ operands:

               asm ("xsaddqp %0,%1,%2"
                    : "=v" (v1)
                    : "v" (v2), "v" (v3));

     ‘h’
          A special register (‘vrsave’, ‘ctr’, or ‘lr’).

     ‘c’
          The count register, ‘ctr’.

     ‘l’
          The link register, ‘lr’.

     ‘x’
          Condition register field 0, ‘cr0’.

     ‘y’
          Any condition register field, ‘cr0’...‘cr7’.

     ‘z’
          The carry bit, ‘XER[CA]’.

     ‘we’
          Like ‘wa’, if ‘-mpower9-vector’ and ‘-m64’ are used;
          otherwise, ‘NO_REGS’.

     ‘wn’
          No register (‘NO_REGS’).

     ‘wr’
          Like ‘r’, if ‘-mpowerpc64’ is used; otherwise, ‘NO_REGS’.

     ‘wx’
          Like ‘d’, if ‘-mpowerpc-gfxopt’ is used; otherwise, ‘NO_REGS’.

     ‘wA’
          Like ‘b’, if ‘-mpowerpc64’ is used; otherwise, ‘NO_REGS’.

     ‘wB’
          Signed 5-bit constant integer that can be loaded into an
          Altivec register.

     ‘wE’
          Vector constant that can be loaded with the XXSPLTIB
          instruction.

     ‘wF’
          Memory operand suitable for power8 GPR load fusion.

     ‘wL’
          Int constant that is the element number mfvsrld accesses in a
          vector.

     ‘wM’
          Match vector constant with all 1's if the XXLORC instruction
          is available.

     ‘wO’
          Memory operand suitable for the ISA 3.0 vector d-form
          instructions.

     ‘wQ’
          Memory operand suitable for the load/store quad instructions.

     ‘wS’
          Vector constant that can be loaded with XXSPLTIB & sign
          extension.

     ‘wY’
          A memory operand for a DS-form instruction.

     ‘wZ’
          An indexed or indirect memory operand, ignoring the bottom 4
          bits.

     ‘I’
          A signed 16-bit constant.

     ‘J’
          An unsigned 16-bit constant shifted left 16 bits (use ‘L’
          instead for ‘SImode’ constants).

     ‘K’
          An unsigned 16-bit constant.

     ‘L’
          A signed 16-bit constant shifted left 16 bits.

     ‘M’
          An integer constant greater than 31.

     ‘N’
          An exact power of 2.

     ‘O’
          The integer constant zero.

     ‘P’
          A constant whose negation is a signed 16-bit constant.

     ‘eI’
          A signed 34-bit integer constant if prefixed instructions are
          supported.

     ‘eP’
          A scalar floating point constant or a vector constant that can
          be loaded to a VSX register with one prefixed instruction.

     ‘eQ’
          An IEEE 128-bit constant that can be loaded into a VSX
          register with the ‘lxvkq’ instruction.

     ‘G’
          A floating point constant that can be loaded into a register
          with one instruction per word.

     ‘H’
          A floating point constant that can be loaded into a register
          using three instructions.

     ‘m’
          A memory operand.  Normally, ‘m’ does not allow addresses that
          update the base register.  If the ‘<’ or ‘>’ constraint is
          also used, they are allowed and therefore on PowerPC targets
          in that case it is only safe to use ‘m<>’ in an ‘asm’
          statement if that ‘asm’ statement accesses the operand exactly
          once.  The ‘asm’ statement must also use ‘%U<OPNO>’ as a
          placeholder for the "update" flag in the corresponding load or
          store instruction.  For example:

               asm ("st%U0 %1,%0" : "=m<>" (mem) : "r" (val));

          is correct but:

               asm ("st %1,%0" : "=m<>" (mem) : "r" (val));

          is not.

     ‘es’
          A "stable" memory operand; that is, one which does not include
          any automodification of the base register.  This used to be
          useful when ‘m’ allowed automodification of the base register,
          but as those are now only allowed when ‘<’ or ‘>’ is used,
          ‘es’ is basically the same as ‘m’ without ‘<’ and ‘>’.

     ‘Q’
          A memory operand addressed by just a base register.

     ‘Y’
          A memory operand for a DQ-form instruction.

     ‘Z’
          A memory operand accessed with indexed or indirect addressing.

     ‘R’
          An AIX TOC entry.

     ‘a’
          An indexed or indirect address.

     ‘U’
          A V.4 small data reference.

     ‘W’
          A vector constant that does not require memory.

     ‘j’
          The zero vector constant.

_PRU--‘config/pru/constraints.md’_
     ‘I’
          An unsigned 8-bit integer constant.

     ‘J’
          An unsigned 16-bit integer constant.

     ‘L’
          An unsigned 5-bit integer constant (for shift counts).

     ‘T’
          A text segment (program memory) constant label.

     ‘Z’
          Integer constant zero.

_RL78--‘config/rl78/constraints.md’_

     ‘Int3’
          An integer constant in the range 1 ... 7.
     ‘Int8’
          An integer constant in the range 0 ... 255.
     ‘J’
          An integer constant in the range −255 ... 0
     ‘K’
          The integer constant 1.
     ‘L’
          The integer constant -1.
     ‘M’
          The integer constant 0.
     ‘N’
          The integer constant 2.
     ‘O’
          The integer constant -2.
     ‘P’
          An integer constant in the range 1 ... 15.
     ‘Qbi’
          The built-in compare types-eq, ne, gtu, ltu, geu, and leu.
     ‘Qsc’
          The synthetic compare types-gt, lt, ge, and le.
     ‘Wab’
          A memory reference with an absolute address.
     ‘Wbc’
          A memory reference using ‘BC’ as a base register, with an
          optional offset.
     ‘Wca’
          A memory reference using ‘AX’, ‘BC’, ‘DE’, or ‘HL’ for the
          address, for calls.
     ‘Wcv’
          A memory reference using any 16-bit register pair for the
          address, for calls.
     ‘Wd2’
          A memory reference using ‘DE’ as a base register, with an
          optional offset.
     ‘Wde’
          A memory reference using ‘DE’ as a base register, without any
          offset.
     ‘Wfr’
          Any memory reference to an address in the far address space.
     ‘Wh1’
          A memory reference using ‘HL’ as a base register, with an
          optional one-byte offset.
     ‘Whb’
          A memory reference using ‘HL’ as a base register, with ‘B’ or
          ‘C’ as the index register.
     ‘Whl’
          A memory reference using ‘HL’ as a base register, without any
          offset.
     ‘Ws1’
          A memory reference using ‘SP’ as a base register, with an
          optional one-byte offset.
     ‘Y’
          Any memory reference to an address in the near address space.
     ‘A’
          The ‘AX’ register.
     ‘B’
          The ‘BC’ register.
     ‘D’
          The ‘DE’ register.
     ‘R’
          ‘A’ through ‘L’ registers.
     ‘S’
          The ‘SP’ register.
     ‘T’
          The ‘HL’ register.
     ‘Z08W’
          The 16-bit ‘R8’ register.
     ‘Z10W’
          The 16-bit ‘R10’ register.
     ‘Zint’
          The registers reserved for interrupts (‘R24’ to ‘R31’).
     ‘a’
          The ‘A’ register.
     ‘b’
          The ‘B’ register.
     ‘c’
          The ‘C’ register.
     ‘d’
          The ‘D’ register.
     ‘e’
          The ‘E’ register.
     ‘h’
          The ‘H’ register.
     ‘l’
          The ‘L’ register.
     ‘v’
          The virtual registers.
     ‘w’
          The ‘PSW’ register.
     ‘x’
          The ‘X’ register.

_RISC-V--‘config/riscv/constraints.md’_

     ‘f’
          A floating-point register (if available).

     ‘I’
          An I-type 12-bit signed immediate.

     ‘J’
          Integer zero.

     ‘K’
          A 5-bit unsigned immediate for CSR access instructions.

     ‘A’
          An address that is held in a general-purpose register.

     ‘S’
          A constraint that matches an absolute symbolic address.

     ‘vr’
          A vector register (if available)..

     ‘vd’
          A vector register, excluding v0 (if available).

     ‘vm’
          A vector register, only v0 (if available).

_RX--‘config/rx/constraints.md’_
     ‘Q’
          An address which does not involve register indirect addressing
          or pre/post increment/decrement addressing.

     ‘Symbol’
          A symbol reference.

     ‘Int08’
          A constant in the range −256 to 255, inclusive.

     ‘Sint08’
          A constant in the range −128 to 127, inclusive.

     ‘Sint16’
          A constant in the range −32768 to 32767, inclusive.

     ‘Sint24’
          A constant in the range −8388608 to 8388607, inclusive.

     ‘Uint04’
          A constant in the range 0 to 15, inclusive.

_S/390 and zSeries--‘config/s390/s390.h’_
     ‘a’
          Address register (general purpose register except r0)

     ‘c’
          Condition code register

     ‘d’
          Data register (arbitrary general purpose register)

     ‘f’
          Floating-point register

     ‘I’
          Unsigned 8-bit constant (0-255)

     ‘J’
          Unsigned 12-bit constant (0-4095)

     ‘K’
          Signed 16-bit constant (−32768-32767)

     ‘L’
          Value appropriate as displacement.
          ‘(0..4095)’
               for short displacement
          ‘(−524288..524287)’
               for long displacement

     ‘M’
          Constant integer with a value of 0x7fffffff.

     ‘N’
          Multiple letter constraint followed by 4 parameter letters.
          ‘0..9:’
               number of the part counting from most to least
               significant
          ‘H,Q:’
               mode of the part
          ‘D,S,H:’
               mode of the containing operand
          ‘0,F:’
               value of the other parts (F--all bits set)
          The constraint matches if the specified part of a constant has
          a value different from its other parts.

     ‘Q’
          Memory reference without index register and with short
          displacement.

     ‘R’
          Memory reference with index register and short displacement.

     ‘S’
          Memory reference without index register but with long
          displacement.

     ‘T’
          Memory reference with index register and long displacement.

     ‘U’
          Pointer with short displacement.

     ‘W’
          Pointer with long displacement.

     ‘Y’
          Shift count operand.

_SPARC--‘config/sparc/sparc.h’_
     ‘f’
          Floating-point register on the SPARC-V8 architecture and lower
          floating-point register on the SPARC-V9 architecture.

     ‘e’
          Floating-point register.  It is equivalent to ‘f’ on the
          SPARC-V8 architecture and contains both lower and upper
          floating-point registers on the SPARC-V9 architecture.

     ‘c’
          Floating-point condition code register.

     ‘d’
          Lower floating-point register.  It is only valid on the
          SPARC-V9 architecture when the Visual Instruction Set is
          available.

     ‘b’
          Floating-point register.  It is only valid on the SPARC-V9
          architecture when the Visual Instruction Set is available.

     ‘h’
          64-bit global or out register for the SPARC-V8+ architecture.

     ‘C’
          The constant all-ones, for floating-point.

     ‘A’
          Signed 5-bit constant

     ‘D’
          A vector constant

     ‘I’
          Signed 13-bit constant

     ‘J’
          Zero

     ‘K’
          32-bit constant with the low 12 bits clear (a constant that
          can be loaded with the ‘sethi’ instruction)

     ‘L’
          A constant in the range supported by ‘movcc’ instructions
          (11-bit signed immediate)

     ‘M’
          A constant in the range supported by ‘movrcc’ instructions
          (10-bit signed immediate)

     ‘N’
          Same as ‘K’, except that it verifies that bits that are not in
          the lower 32-bit range are all zero.  Must be used instead of
          ‘K’ for modes wider than ‘SImode’

     ‘O’
          The constant 4096

     ‘G’
          Floating-point zero

     ‘H’
          Signed 13-bit constant, sign-extended to 32 or 64 bits

     ‘P’
          The constant -1

     ‘Q’
          Floating-point constant whose integral representation can be
          moved into an integer register using a single sethi
          instruction

     ‘R’
          Floating-point constant whose integral representation can be
          moved into an integer register using a single mov instruction

     ‘S’
          Floating-point constant whose integral representation can be
          moved into an integer register using a high/lo_sum instruction
          sequence

     ‘T’
          Memory address aligned to an 8-byte boundary

     ‘U’
          Even register

     ‘W’
          Memory address for ‘e’ constraint registers

     ‘w’
          Memory address with only a base register

     ‘Y’
          Vector zero

_TI C6X family--‘config/c6x/constraints.md’_
     ‘a’
          Register file A (A0-A31).

     ‘b’
          Register file B (B0-B31).

     ‘A’
          Predicate registers in register file A (A0-A2 on C64X and
          higher, A1 and A2 otherwise).

     ‘B’
          Predicate registers in register file B (B0-B2).

     ‘C’
          A call-used register in register file B (B0-B9, B16-B31).

     ‘Da’
          Register file A, excluding predicate registers (A3-A31, plus
          A0 if not C64X or higher).

     ‘Db’
          Register file B, excluding predicate registers (B3-B31).

     ‘Iu4’
          Integer constant in the range 0 ... 15.

     ‘Iu5’
          Integer constant in the range 0 ... 31.

     ‘In5’
          Integer constant in the range −31 ... 0.

     ‘Is5’
          Integer constant in the range −16 ... 15.

     ‘I5x’
          Integer constant that can be the operand of an ADDA or a SUBA
          insn.

     ‘IuB’
          Integer constant in the range 0 ... 65535.

     ‘IsB’
          Integer constant in the range −32768 ... 32767.

     ‘IsC’
          Integer constant in the range -2^{20} ... 2^{20} - 1.

     ‘Jc’
          Integer constant that is a valid mask for the clr instruction.

     ‘Js’
          Integer constant that is a valid mask for the set instruction.

     ‘Q’
          Memory location with A base register.

     ‘R’
          Memory location with B base register.

     ‘S0’
          On C64x+ targets, a GP-relative small data reference.

     ‘S1’
          Any kind of ‘SYMBOL_REF’, for use in a call address.

     ‘Si’
          Any kind of immediate operand, unless it matches the S0
          constraint.

     ‘T’
          Memory location with B base register, but not using a long
          offset.

     ‘W’
          A memory operand with an address that cannot be used in an
          unaligned access.

     ‘Z’
          Register B14 (aka DP).

_Visium--‘config/visium/constraints.md’_
     ‘b’
          EAM register ‘mdb’

     ‘c’
          EAM register ‘mdc’

     ‘f’
          Floating point register

     ‘k’
          Register for sibcall optimization

     ‘l’
          General register, but not ‘r29’, ‘r30’ and ‘r31’

     ‘t’
          Register ‘r1’

     ‘u’
          Register ‘r2’

     ‘v’
          Register ‘r3’

     ‘G’
          Floating-point constant 0.0

     ‘J’
          Integer constant in the range 0 ..  65535 (16-bit immediate)

     ‘K’
          Integer constant in the range 1 ..  31 (5-bit immediate)

     ‘L’
          Integer constant in the range −65535 ..  −1 (16-bit negative
          immediate)

     ‘M’
          Integer constant −1

     ‘O’
          Integer constant 0

     ‘P’
          Integer constant 32

_x86 family--‘config/i386/constraints.md’_
     ‘R’
          Legacy register--the eight integer registers available on all
          i386 processors (‘a’, ‘b’, ‘c’, ‘d’, ‘si’, ‘di’, ‘bp’, ‘sp’).

     ‘q’
          Any register accessible as ‘Rl’.  In 32-bit mode, ‘a’, ‘b’,
          ‘c’, and ‘d’; in 64-bit mode, any integer register.

     ‘Q’
          Any register accessible as ‘Rh’: ‘a’, ‘b’, ‘c’, and ‘d’.

     ‘l’
          Any register that can be used as the index in a base+index
          memory access: that is, any general register except the stack
          pointer.

     ‘a’
          The ‘a’ register.

     ‘b’
          The ‘b’ register.

     ‘c’
          The ‘c’ register.

     ‘d’
          The ‘d’ register.

     ‘S’
          The ‘si’ register.

     ‘D’
          The ‘di’ register.

     ‘A’
          The ‘a’ and ‘d’ registers.  This class is used for
          instructions that return double word results in the ‘ax:dx’
          register pair.  Single word values will be allocated either in
          ‘ax’ or ‘dx’.  For example on i386 the following implements
          ‘rdtsc’:

               unsigned long long rdtsc (void)
               {
                 unsigned long long tick;
                 __asm__ __volatile__("rdtsc":"=A"(tick));
                 return tick;
               }

          This is not correct on x86-64 as it would allocate tick in
          either ‘ax’ or ‘dx’.  You have to use the following variant
          instead:

               unsigned long long rdtsc (void)
               {
                 unsigned int tickl, tickh;
                 __asm__ __volatile__("rdtsc":"=a"(tickl),"=d"(tickh));
                 return ((unsigned long long)tickh << 32)|tickl;
               }

     ‘U’
          The call-clobbered integer registers.

     ‘f’
          Any 80387 floating-point (stack) register.

     ‘t’
          Top of 80387 floating-point stack (‘%st(0)’).

     ‘u’
          Second from top of 80387 floating-point stack (‘%st(1)’).

     ‘Yk’
          Any mask register that can be used as a predicate, i.e.
          ‘k1-k7’.

     ‘k’
          Any mask register.

     ‘y’
          Any MMX register.

     ‘x’
          Any SSE register.

     ‘v’
          Any EVEX encodable SSE register (‘%xmm0-%xmm31’).

     ‘w’
          Any bound register.

     ‘Yz’
          First SSE register (‘%xmm0’).

     ‘Yi’
          Any SSE register, when SSE2 and inter-unit moves are enabled.

     ‘Yj’
          Any SSE register, when SSE2 and inter-unit moves from vector
          registers are enabled.

     ‘Ym’
          Any MMX register, when inter-unit moves are enabled.

     ‘Yn’
          Any MMX register, when inter-unit moves from vector registers
          are enabled.

     ‘Yp’
          Any integer register when ‘TARGET_PARTIAL_REG_STALL’ is
          disabled.

     ‘Ya’
          Any integer register when zero extensions with ‘AND’ are
          disabled.

     ‘Yb’
          Any register that can be used as the GOT base when calling
          ‘___tls_get_addr’: that is, any general register except ‘a’
          and ‘sp’ registers, for ‘-fno-plt’ if linker supports it.
          Otherwise, ‘b’ register.

     ‘Yf’
          Any x87 register when 80387 floating-point arithmetic is
          enabled.

     ‘Yr’
          Lower SSE register when avoiding REX prefix and all SSE
          registers otherwise.

     ‘Yv’
          For AVX512VL, any EVEX-encodable SSE register
          (‘%xmm0-%xmm31’), otherwise any SSE register.

     ‘Yh’
          Any EVEX-encodable SSE register, that has number factor of
          four.

     ‘Bf’
          Flags register operand.

     ‘Bg’
          GOT memory operand.

     ‘Bm’
          Vector memory operand.

     ‘Bc’
          Constant memory operand.

     ‘Bn’
          Memory operand without REX prefix.

     ‘Bs’
          Sibcall memory operand.

     ‘Bw’
          Call memory operand.

     ‘Bz’
          Constant call address operand.

     ‘BC’
          SSE constant -1 operand.

     ‘I’
          Integer constant in the range 0 ... 31, for 32-bit shifts.

     ‘J’
          Integer constant in the range 0 ... 63, for 64-bit shifts.

     ‘K’
          Signed 8-bit integer constant.

     ‘L’
          ‘0xFF’ or ‘0xFFFF’, for andsi as a zero-extending move.

     ‘M’
          0, 1, 2, or 3 (shifts for the ‘lea’ instruction).

     ‘N’
          Unsigned 8-bit integer constant (for ‘in’ and ‘out’
          instructions).

     ‘O’
          Integer constant in the range 0 ... 127, for 128-bit shifts.

     ‘G’
          Standard 80387 floating point constant.

     ‘C’
          SSE constant zero operand.

     ‘e’
          32-bit signed integer constant, or a symbolic reference known
          to fit that range (for immediate operands in sign-extending
          x86-64 instructions).

     ‘We’
          32-bit signed integer constant, or a symbolic reference known
          to fit that range (for sign-extending conversion operations
          that require non-‘VOIDmode’ immediate operands).

     ‘Wz’
          32-bit unsigned integer constant, or a symbolic reference
          known to fit that range (for zero-extending conversion
          operations that require non-‘VOIDmode’ immediate operands).

     ‘Wd’
          128-bit integer constant where both the high and low 64-bit
          word satisfy the ‘e’ constraint.

     ‘Ws’
          A symbolic reference or label reference.  You can use the ‘%p’
          modifier to print the raw symbol.

     ‘Z’
          32-bit unsigned integer constant, or a symbolic reference
          known to fit that range (for immediate operands in
          zero-extending x86-64 instructions).

     ‘Tv’
          VSIB address operand.

     ‘Ts’
          Address operand without segment register.

_Xstormy16--‘config/stormy16/stormy16.h’_
     ‘a’
          Register r0.

     ‘b’
          Register r1.

     ‘c’
          Register r2.

     ‘d’
          Register r8.

     ‘e’
          Registers r0 through r7.

     ‘t’
          Registers r0 and r1.

     ‘y’
          The carry register.

     ‘z’
          Registers r8 and r9.

     ‘I’
          A constant between 0 and 3 inclusive.

     ‘J’
          A constant that has exactly one bit set.

     ‘K’
          A constant that has exactly one bit clear.

     ‘L’
          A constant between 0 and 255 inclusive.

     ‘M’
          A constant between −255 and 0 inclusive.

     ‘N’
          A constant between −3 and 0 inclusive.

     ‘O’
          A constant between 1 and 4 inclusive.

     ‘P’
          A constant between −4 and −1 inclusive.

     ‘Q’
          A memory reference that is a stack push.

     ‘R’
          A memory reference that is a stack pop.

     ‘S’
          A memory reference that refers to a constant address of known
          value.

     ‘T’
          The register indicated by Rx (not implemented yet).

     ‘U’
          A constant that is not between 2 and 15 inclusive.

     ‘Z’
          The constant 0.

_Xtensa--‘config/xtensa/constraints.md’_
     ‘a’
          General-purpose 32-bit register

     ‘b’
          One-bit boolean register

     ‘A’
          MAC16 40-bit accumulator register

     ‘I’
          Signed 12-bit integer constant, for use in MOVI instructions

     ‘J’
          Signed 8-bit integer constant, for use in ADDI instructions

     ‘K’
          Integer constant valid for BccI instructions

     ‘L’
          Unsigned constant valid for BccUI instructions


File: gccint.info,  Node: Disable Insn Alternatives,  Next: Define Constraints,  Prev: Machine Constraints,  Up: Constraints

17.9.6 Disable insn alternatives using the ‘enabled’ attribute
--------------------------------------------------------------

There are three insn attributes that may be used to selectively disable
instruction alternatives:

‘enabled’
     Says whether an alternative is available on the current subtarget.

‘preferred_for_size’
     Says whether an enabled alternative should be used in code that is
     optimized for size.

‘preferred_for_speed’
     Says whether an enabled alternative should be used in code that is
     optimized for speed.

 All these attributes should use ‘(const_int 1)’ to allow an alternative
or ‘(const_int 0)’ to disallow it.  The attributes must be a static
property of the subtarget; they cannot for example depend on the current
operands, on the current optimization level, on the location of the insn
within the body of a loop, on whether register allocation has finished,
or on the current compiler pass.

 The ‘enabled’ attribute is a correctness property.  It tells GCC to act
as though the disabled alternatives were never defined in the first
place.  This is useful when adding new instructions to an existing
pattern in cases where the new instructions are only available for
certain cpu architecture levels (typically mapped to the ‘-march=’
command-line option).

 In contrast, the ‘preferred_for_size’ and ‘preferred_for_speed’
attributes are strong optimization hints rather than correctness
properties.  ‘preferred_for_size’ tells GCC which alternatives to
consider when adding or modifying an instruction that GCC wants to
optimize for size.  ‘preferred_for_speed’ does the same thing for speed.
Note that things like code motion can lead to cases where code optimized
for size uses alternatives that are not preferred for size, and
similarly for speed.

 Although ‘define_insn’s can in principle specify the ‘enabled’
attribute directly, it is often clearer to have subsiduary attributes
for each architectural feature of interest.  The ‘define_insn’s can then
use these subsiduary attributes to say which alternatives require which
features.  The example below does this for ‘cpu_facility’.

 E.g.  the following two patterns could easily be merged using the
‘enabled’ attribute:


     (define_insn "*movdi_old"
       [(set (match_operand:DI 0 "register_operand" "=d")
             (match_operand:DI 1 "register_operand" " d"))]
       "!TARGET_NEW"
       "lgr %0,%1")

     (define_insn "*movdi_new"
       [(set (match_operand:DI 0 "register_operand" "=d,f,d")
             (match_operand:DI 1 "register_operand" " d,d,f"))]
       "TARGET_NEW"
       "@
        lgr  %0,%1
        ldgr %0,%1
        lgdr %0,%1")


 to:


     (define_insn "*movdi_combined"
       [(set (match_operand:DI 0 "register_operand" "=d,f,d")
             (match_operand:DI 1 "register_operand" " d,d,f"))]
       ""
       "@
        lgr  %0,%1
        ldgr %0,%1
        lgdr %0,%1"
       [(set_attr "cpu_facility" "*,new,new")])


 with the ‘enabled’ attribute defined like this:


     (define_attr "cpu_facility" "standard,new" (const_string "standard"))

     (define_attr "enabled" ""
       (cond [(eq_attr "cpu_facility" "standard") (const_int 1)
              (and (eq_attr "cpu_facility" "new")
                   (ne (symbol_ref "TARGET_NEW") (const_int 0)))
              (const_int 1)]
             (const_int 0)))



File: gccint.info,  Node: Define Constraints,  Next: C Constraint Interface,  Prev: Disable Insn Alternatives,  Up: Constraints

17.9.7 Defining Machine-Specific Constraints
--------------------------------------------

Machine-specific constraints fall into two categories: register and
non-register constraints.  Within the latter category, constraints which
allow subsets of all possible memory or address operands should be
specially marked, to give ‘reload’ more information.

 Machine-specific constraints can be given names of arbitrary length,
but they must be entirely composed of letters, digits, underscores
(‘_’), and angle brackets (‘< >’).  Like C identifiers, they must begin
with a letter or underscore.

 In order to avoid ambiguity in operand constraint strings, no
constraint can have a name that begins with any other constraint's name.
For example, if ‘x’ is defined as a constraint name, ‘xy’ may not be,
and vice versa.  As a consequence of this rule, no constraint may begin
with one of the generic constraint letters: ‘E F V X g i m n o p r s’.

 Register constraints correspond directly to register classes.  *Note
Register Classes::.  There is thus not much flexibility in their
definitions.

 -- MD Expression: define_register_constraint name regclass docstring
          [filter]
     All arguments are string constants.  NAME is the name of the
     constraint, as it will appear in ‘match_operand’ expressions.  If
     NAME is a multi-letter constraint its length shall be the same for
     all constraints starting with the same letter.  REGCLASS can be
     either the name of the corresponding register class (*note Register
     Classes::), or a C expression which evaluates to the appropriate
     register class.  If it is an expression, it must have no side
     effects, and it cannot look at the operand.  The usual use of
     expressions is to map some register constraints to ‘NO_REGS’ when
     the register class is not available on a given subarchitecture.

     If an operand occupies multiple hard registers, the constraint
     requires all of those registers to belong to REGCLASS.  For
     example, if REGCLASS is ‘GENERAL_REGS’ and ‘GENERAL_REGS’ contains
     registers ‘r0’ to ‘r15’, the constraint does not allow R15 to be
     used for modes that occupy more than one register.

     The choice of register is also constrained by
     ‘TARGET_HARD_REGNO_MODE_OK’.  For example, if
     ‘TARGET_HARD_REGNO_MODE_OK’ disallows ‘(reg:DI r1)’, that
     requirement applies to all constraints whose classes include ‘r1’.

     However, it is sometimes useful to impose extra operand-specific
     requirements on the register number.  For example, a target might
     not want to prevent _all_ odd-even pairs from holding ‘DImode’
     values, but it might still need to prevent specific operands from
     having an odd-numbered register.  The optional FILTER argument
     exists for such cases.  When given, FILTER is a C++ expression that
     evaluates to true if ‘regno’ is a valid register for the operand.
     If an operand occupies multiple registers, the condition applies
     only to the first register.

     For example:

          (define_register_constraint "e" "GENERAL_REGS" "..." "regno % 2 == 0")

     defines a constraint that requires an even-numbered general
     register.

     Filter conditions that impose an alignment are encouraged to test
     the alignment of ‘regno’ itself, as in the example, rather than
     calculate an offset relative to the start of the class.  If it is
     sometimes necessary for a register of class C to be aligned to N,
     the first register in C should itself by divisible by N.

     DOCSTRING is a sentence documenting the meaning of the constraint.
     Docstrings are explained further below.

 Non-register constraints are more like predicates: the constraint
definition gives a boolean expression which indicates whether the
constraint matches.

 -- MD Expression: define_constraint name docstring exp
     The NAME and DOCSTRING arguments are the same as for
     ‘define_register_constraint’, but note that the docstring comes
     immediately after the name for these expressions.  EXP is an RTL
     expression, obeying the same rules as the RTL expressions in
     predicate definitions.  *Note Defining Predicates::, for details.
     If it evaluates true, the constraint matches; if it evaluates
     false, it doesn't.  Constraint expressions should indicate which
     RTL codes they might match, just like predicate expressions.

     ‘match_test’ C expressions have access to the following variables:

     OP
          The RTL object defining the operand.
     MODE
          The machine mode of OP.
     IVAL
          ‘INTVAL (OP)’, if OP is a ‘const_int’.
     HVAL
          ‘CONST_DOUBLE_HIGH (OP)’, if OP is an integer ‘const_double’.
     LVAL
          ‘CONST_DOUBLE_LOW (OP)’, if OP is an integer ‘const_double’.
     RVAL
          ‘CONST_DOUBLE_REAL_VALUE (OP)’, if OP is a floating-point
          ‘const_double’.

     The *VAL variables should only be used once another piece of the
     expression has verified that OP is the appropriate kind of RTL
     object.

 Most non-register constraints should be defined with
‘define_constraint’.  The remaining two definition expressions are only
appropriate for constraints that should be handled specially by ‘reload’
if they fail to match.

 -- MD Expression: define_memory_constraint name docstring exp
     Use this expression for constraints that match a subset of all
     memory operands: that is, ‘reload’ can make them match by
     converting the operand to the form ‘(mem (reg X))’, where X is a
     base register (from the register class specified by
     ‘BASE_REG_CLASS’, *note Register Classes::).

     For example, on the S/390, some instructions do not accept
     arbitrary memory references, but only those that do not make use of
     an index register.  The constraint letter ‘Q’ is defined to
     represent a memory address of this type.  If ‘Q’ is defined with
     ‘define_memory_constraint’, a ‘Q’ constraint can handle any memory
     operand, because ‘reload’ knows it can simply copy the memory
     address into a base register if required.  This is analogous to the
     way an ‘o’ constraint can handle any memory operand.

     The syntax and semantics are otherwise identical to
     ‘define_constraint’.

 -- MD Expression: define_special_memory_constraint name docstring exp
     Use this expression for constraints that match a subset of all
     memory operands: that is, ‘reload’ cannot make them match by
     reloading the address as it is described for
     ‘define_memory_constraint’ or such address reload is undesirable
     with the performance point of view.

     For example, ‘define_special_memory_constraint’ can be useful if
     specifically aligned memory is necessary or desirable for some insn
     operand.

     The syntax and semantics are otherwise identical to
     ‘define_memory_constraint’.

 -- MD Expression: define_relaxed_memory_constraint name docstring exp
     The test expression in a ‘define_memory_constraint’ can assume that
     ‘TARGET_LEGITIMATE_ADDRESS_P’ holds for the address inside a ‘mem’
     rtx and so it does not need to test this condition itself.  In
     other words, a ‘define_memory_constraint’ test of the form:

          (match_test "mem")

     is enough to test whether an rtx is a ‘mem’ _and_ whether its
     address satisfies ‘TARGET_MEM_CONSTRAINT’ (which is usually ‘'m'’).
     Thus the conditions imposed by a ‘define_memory_constraint’ always
     apply on top of the conditions imposed by ‘TARGET_MEM_CONSTRAINT’.

     However, it is sometimes useful to define memory constraints that
     allow addresses beyond those accepted by
     ‘TARGET_LEGITIMATE_ADDRESS_P’.  ‘define_relaxed_memory_constraint’
     exists for this case.  The test expression in a
     ‘define_relaxed_memory_constraint’ is applied with no
     preconditions, so that the expression can determine "from scratch"
     exactly which addresses are valid and which are not.

     The syntax and semantics are otherwise identical to
     ‘define_memory_constraint’.

 -- MD Expression: define_address_constraint name docstring exp
     Use this expression for constraints that match a subset of all
     address operands: that is, ‘reload’ can make the constraint match
     by converting the operand to the form ‘(reg X)’, again with X a
     base register.

     Constraints defined with ‘define_address_constraint’ can only be
     used with the ‘address_operand’ predicate, or machine-specific
     predicates that work the same way.  They are treated analogously to
     the generic ‘p’ constraint.

     The syntax and semantics are otherwise identical to
     ‘define_constraint’.

 For historical reasons, names beginning with the letters ‘G H’ are
reserved for constraints that match only ‘const_double’s, and names
beginning with the letters ‘I J K L M N O P’ are reserved for
constraints that match only ‘const_int’s.  This may change in the
future.  For the time being, constraints with these names must be
written in a stylized form, so that ‘genpreds’ can tell you did it
correctly:

     (define_constraint "[GHIJKLMNOP]..."
       "DOC..."
       (and (match_code "const_int")  ; ‘const_double’ for G/H
            CONDITION...))            ; usually a ‘match_test’

 It is fine to use names beginning with other letters for constraints
that match ‘const_double’s or ‘const_int’s.

 Each docstring in a constraint definition should be one or more
complete sentences, marked up in Texinfo format.  _They are currently
unused._  In the future they will be copied into the GCC manual, in
*note Machine Constraints::, replacing the hand-maintained tables
currently found in that section.  Also, in the future the compiler may
use this to give more helpful diagnostics when poor choice of ‘asm’
constraints causes a reload failure.

 If you put the pseudo-Texinfo directive ‘@internal’ at the beginning of
a docstring, then (in the future) it will appear only in the internals
manual's version of the machine-specific constraint tables.  Use this
for constraints that should not appear in ‘asm’ statements.


File: gccint.info,  Node: C Constraint Interface,  Prev: Define Constraints,  Up: Constraints

17.9.8 Testing constraints from C
---------------------------------

It is occasionally useful to test a constraint from C code rather than
implicitly via the constraint string in a ‘match_operand’.  The
generated file ‘tm_p.h’ declares a few interfaces for working with
constraints.  At present these are defined for all constraints except
‘g’ (which is equivalent to ‘general_operand’).

 Some valid constraint names are not valid C identifiers, so there is a
mangling scheme for referring to them from C.  Constraint names that do
not contain angle brackets or underscores are left unchanged.
Underscores are doubled, each ‘<’ is replaced with ‘_l’, and each ‘>’
with ‘_g’.  Here are some examples:

     *Original* *Mangled*
     x          x
     P42x       P42x
     P4_x       P4__x
     P4>x       P4_gx
     P4>>       P4_g_g
     P4_g>      P4__g_g

 Throughout this section, the variable C is either a constraint in the
abstract sense, or a constant from ‘enum constraint_num’; the variable M
is a mangled constraint name (usually as part of a larger identifier).

 -- Enum: constraint_num
     For each constraint except ‘g’, there is a corresponding
     enumeration constant: ‘CONSTRAINT_’ plus the mangled name of the
     constraint.  Functions that take an ‘enum constraint_num’ as an
     argument expect one of these constants.

 -- Function: inline bool satisfies_constraint_M (rtx EXP)
     For each non-register constraint M except ‘g’, there is one of
     these functions; it returns ‘true’ if EXP satisfies the constraint.
     These functions are only visible if ‘rtl.h’ was included before
     ‘tm_p.h’.

 -- Function: bool constraint_satisfied_p (rtx EXP, enum constraint_num
          C)
     Like the ‘satisfies_constraint_M’ functions, but the constraint to
     test is given as an argument, C.  If C specifies a register
     constraint, this function will always return ‘false’.

 -- Function: enum reg_class reg_class_for_constraint (enum
          constraint_num C)
     Returns the register class associated with C.  If C is not a
     register constraint, or those registers are not available for the
     currently selected subtarget, returns ‘NO_REGS’.

 Here is an example use of ‘satisfies_constraint_M’.  In peephole
optimizations (*note Peephole Definitions::), operand constraint strings
are ignored, so if there are relevant constraints, they must be tested
in the C condition.  In the example, the optimization is applied if
operand 2 does _not_ satisfy the ‘K’ constraint.  (This is a simplified
version of a peephole definition from the i386 machine description.)

     (define_peephole2
       [(match_scratch:SI 3 "r")
        (set (match_operand:SI 0 "register_operand" "")
             (mult:SI (match_operand:SI 1 "memory_operand" "")
                      (match_operand:SI 2 "immediate_operand" "")))]

       "!satisfies_constraint_K (operands[2])"

       [(set (match_dup 3) (match_dup 1))
        (set (match_dup 0) (mult:SI (match_dup 3) (match_dup 2)))]

       "")


File: gccint.info,  Node: Standard Names,  Next: Pattern Ordering,  Prev: Constraints,  Up: Machine Desc

17.10 Standard Pattern Names For Generation
===========================================

Here is a table of the instruction names that are meaningful in the RTL
generation pass of the compiler.  Giving one of these names to an
instruction pattern tells the RTL generation pass that it can use the
pattern to accomplish a certain task.

‘movM’
     Here M stands for a two-letter machine mode name, in lowercase.
     This instruction pattern moves data with that machine mode from
     operand 1 to operand 0.  For example, ‘movsi’ moves full-word data.

     If operand 0 is a ‘subreg’ with mode M of a register whose own mode
     is wider than M, the effect of this instruction is to store the
     specified value in the part of the register that corresponds to
     mode M.  Bits outside of M, but which are within the same target
     word as the ‘subreg’ are undefined.  Bits which are outside the
     target word are left unchanged.

     This class of patterns is special in several ways.  First of all,
     each of these names up to and including full word size _must_ be
     defined, because there is no other way to copy a datum from one
     place to another.  If there are patterns accepting operands in
     larger modes, ‘movM’ must be defined for integer modes of those
     sizes.

     Second, these patterns are not used solely in the RTL generation
     pass.  Even the reload pass can generate move insns to copy values
     from stack slots into temporary registers.  When it does so, one of
     the operands is a hard register and the other is an operand that
     can need to be reloaded into a register.

     Therefore, when given such a pair of operands, the pattern must
     generate RTL which needs no reloading and needs no temporary
     registers--no registers other than the operands.  For example, if
     you support the pattern with a ‘define_expand’, then in such a case
     the ‘define_expand’ mustn't call ‘force_reg’ or any other such
     function which might generate new pseudo registers.

     This requirement exists even for subword modes on a RISC machine
     where fetching those modes from memory normally requires several
     insns and some temporary registers.

     During reload a memory reference with an invalid address may be
     passed as an operand.  Such an address will be replaced with a
     valid address later in the reload pass.  In this case, nothing may
     be done with the address except to use it as it stands.  If it is
     copied, it will not be replaced with a valid address.  No attempt
     should be made to make such an address into a valid address and no
     routine (such as ‘change_address’) that will do so may be called.
     Note that ‘general_operand’ will fail when applied to such an
     address.

     The global variable ‘reload_in_progress’ (which must be explicitly
     declared if required) can be used to determine whether such special
     handling is required.

     The variety of operands that have reloads depends on the rest of
     the machine description, but typically on a RISC machine these can
     only be pseudo registers that did not get hard registers, while on
     other machines explicit memory references will get optional
     reloads.

     If a scratch register is required to move an object to or from
     memory, it can be allocated using ‘gen_reg_rtx’ prior to life
     analysis.

     If there are cases which need scratch registers during or after
     reload, you must provide an appropriate secondary_reload target
     hook.

     The macro ‘can_create_pseudo_p’ can be used to determine if it is
     unsafe to create new pseudo registers.  If this variable is
     nonzero, then it is unsafe to call ‘gen_reg_rtx’ to allocate a new
     pseudo.

     The constraints on a ‘movM’ must permit moving any hard register to
     any other hard register provided that ‘TARGET_HARD_REGNO_MODE_OK’
     permits mode M in both registers and ‘TARGET_REGISTER_MOVE_COST’
     applied to their classes returns a value of 2.

     It is obligatory to support floating point ‘movM’ instructions into
     and out of any registers that can hold fixed point values, because
     unions and structures (which have modes ‘SImode’ or ‘DImode’) can
     be in those registers and they may have floating point members.

     There may also be a need to support fixed point ‘movM’ instructions
     in and out of floating point registers.  Unfortunately, I have
     forgotten why this was so, and I don't know whether it is still
     true.  If ‘TARGET_HARD_REGNO_MODE_OK’ rejects fixed point values in
     floating point registers, then the constraints of the fixed point
     ‘movM’ instructions must be designed to avoid ever trying to reload
     into a floating point register.

‘reload_inM’
‘reload_outM’
     These named patterns have been obsoleted by the target hook
     ‘secondary_reload’.

     Like ‘movM’, but used when a scratch register is required to move
     between operand 0 and operand 1.  Operand 2 describes the scratch
     register.  See the discussion of the ‘SECONDARY_RELOAD_CLASS’ macro
     in *note Register Classes::.

     There are special restrictions on the form of the ‘match_operand’s
     used in these patterns.  First, only the predicate for the reload
     operand is examined, i.e., ‘reload_in’ examines operand 1, but not
     the predicates for operand 0 or 2.  Second, there may be only one
     alternative in the constraints.  Third, only a single register
     class letter may be used for the constraint; subsequent constraint
     letters are ignored.  As a special exception, an empty constraint
     string matches the ‘ALL_REGS’ register class.  This may relieve
     ports of the burden of defining an ‘ALL_REGS’ constraint letter
     just for these patterns.

‘movstrictM’
     Like ‘movM’ except that if operand 0 is a ‘subreg’ with mode M of a
     register whose natural mode is wider, the ‘movstrictM’ instruction
     is guaranteed not to alter any of the register except the part
     which belongs to mode M.

‘movmisalignM’
     This variant of a move pattern is designed to load or store a value
     from a memory address that is not naturally aligned for its mode.
     For a store, the memory will be in operand 0; for a load, the
     memory will be in operand 1.  The other operand is guaranteed not
     to be a memory, so that it's easy to tell whether this is a load or
     store.

     This pattern is used by the autovectorizer, and when expanding a
     ‘MISALIGNED_INDIRECT_REF’ expression.

‘load_multiple’
     Load several consecutive memory locations into consecutive
     registers.  Operand 0 is the first of the consecutive registers,
     operand 1 is the first memory location, and operand 2 is a
     constant: the number of consecutive registers.

     Define this only if the target machine really has such an
     instruction; do not define this if the most efficient way of
     loading consecutive registers from memory is to do them one at a
     time.

     On some machines, there are restrictions as to which consecutive
     registers can be stored into memory, such as particular starting or
     ending register numbers or only a range of valid counts.  For those
     machines, use a ‘define_expand’ (*note Expander Definitions::) and
     make the pattern fail if the restrictions are not met.

     Write the generated insn as a ‘parallel’ with elements being a
     ‘set’ of one register from the appropriate memory location (you may
     also need ‘use’ or ‘clobber’ elements).  Use a ‘match_parallel’
     (*note RTL Template::) to recognize the insn.  See ‘rs6000.md’ for
     examples of the use of this insn pattern.

‘store_multiple’
     Similar to ‘load_multiple’, but store several consecutive registers
     into consecutive memory locations.  Operand 0 is the first of the
     consecutive memory locations, operand 1 is the first register, and
     operand 2 is a constant: the number of consecutive registers.

‘vec_load_lanesMN’
     Perform an interleaved load of several vectors from memory operand
     1 into register operand 0.  Both operands have mode M.  The
     register operand is viewed as holding consecutive vectors of mode
     N, while the memory operand is a flat array that contains the same
     number of elements.  The operation is equivalent to:

          int c = GET_MODE_SIZE (M) / GET_MODE_SIZE (N);
          for (j = 0; j < GET_MODE_NUNITS (N); j++)
            for (i = 0; i < c; i++)
              operand0[i][j] = operand1[j * c + i];

     For example, ‘vec_load_lanestiv4hi’ loads 8 16-bit values from
     memory into a register of mode ‘TI’.  The register contains two
     consecutive vectors of mode ‘V4HI’.

     This pattern can only be used if:
          TARGET_ARRAY_MODE_SUPPORTED_P (N, C)
     is true.  GCC assumes that, if a target supports this kind of
     instruction for some mode N, it also supports unaligned loads for
     vectors of mode N.

     This pattern is not allowed to ‘FAIL’.

‘vec_mask_load_lanesMN’
     Like ‘vec_load_lanesMN’, but takes an additional mask operand
     (operand 2) that specifies which elements of the destination
     vectors should be loaded.  Other elements of the destination
     vectors are set to zero.  The operation is equivalent to:

          int c = GET_MODE_SIZE (M) / GET_MODE_SIZE (N);
          for (j = 0; j < GET_MODE_NUNITS (N); j++)
            if (operand2[j])
              for (i = 0; i < c; i++)
                operand0[i][j] = operand1[j * c + i];
            else
              for (i = 0; i < c; i++)
                operand0[i][j] = 0;

     This pattern is not allowed to ‘FAIL’.

‘vec_mask_len_load_lanesMN’
     Like ‘vec_load_lanesMN’, but takes an additional mask operand
     (operand 2), length operand (operand 3) as well as bias operand
     (operand 4) that specifies which elements of the destination
     vectors should be loaded.  Other elements of the destination
     vectors are undefined.  The operation is equivalent to:

          int c = GET_MODE_SIZE (M) / GET_MODE_SIZE (N);
          for (j = 0; j < operand3 + operand4; j++)
            if (operand2[j])
              for (i = 0; i < c; i++)
                operand0[i][j] = operand1[j * c + i];

     This pattern is not allowed to ‘FAIL’.

‘vec_store_lanesMN’
     Equivalent to ‘vec_load_lanesMN’, with the memory and register
     operands reversed.  That is, the instruction is equivalent to:

          int c = GET_MODE_SIZE (M) / GET_MODE_SIZE (N);
          for (j = 0; j < GET_MODE_NUNITS (N); j++)
            for (i = 0; i < c; i++)
              operand0[j * c + i] = operand1[i][j];

     for a memory operand 0 and register operand 1.

     This pattern is not allowed to ‘FAIL’.

‘vec_mask_store_lanesMN’
     Like ‘vec_store_lanesMN’, but takes an additional mask operand
     (operand 2) that specifies which elements of the source vectors
     should be stored.  The operation is equivalent to:

          int c = GET_MODE_SIZE (M) / GET_MODE_SIZE (N);
          for (j = 0; j < GET_MODE_NUNITS (N); j++)
            if (operand2[j])
              for (i = 0; i < c; i++)
                operand0[j * c + i] = operand1[i][j];

     This pattern is not allowed to ‘FAIL’.

‘vec_mask_len_store_lanesMN’
     Like ‘vec_store_lanesMN’, but takes an additional mask operand
     (operand 2), length operand (operand 3) as well as bias operand
     (operand 4) that specifies which elements of the source vectors
     should be stored.  The operation is equivalent to:

          int c = GET_MODE_SIZE (M) / GET_MODE_SIZE (N);
          for (j = 0; j < operand3 + operand4; j++)
            if (operand2[j])
              for (i = 0; i < c; i++)
                operand0[j * c + i] = operand1[i][j];

     This pattern is not allowed to ‘FAIL’.

‘gather_loadMN’
     Load several separate memory locations into a vector of mode M.
     Operand 1 is a scalar base address and operand 2 is a vector of
     mode N containing offsets from that base.  Operand 0 is a
     destination vector with the same number of elements as N.  For each
     element index I:

        • extend the offset element I to address width, using zero
          extension if operand 3 is 1 and sign extension if operand 3 is
          zero;
        • multiply the extended offset by operand 4;
        • add the result to the base; and
        • load the value at that address into element I of operand 0.

     The value of operand 3 does not matter if the offsets are already
     address width.

‘mask_gather_loadMN’
     Like ‘gather_loadMN’, but takes an extra mask operand as operand 5.
     Bit I of the mask is set if element I of the result should be
     loaded from memory and clear if element I of the result should be
     set to zero.

‘mask_len_gather_loadMN’
     Like ‘gather_loadMN’, but takes an extra mask operand (operand 5),
     a len operand (operand 6) as well as a bias operand (operand 7).
     Similar to mask_len_load, the instruction loads at most (operand 6
     + operand 7) elements from memory.  Bit I of the mask is set if
     element I of the result should be loaded from memory and clear if
     element I of the result should be undefined.  Mask elements I with
     I > (operand 6 + operand 7) are ignored.

‘scatter_storeMN’
     Store a vector of mode M into several distinct memory locations.
     Operand 0 is a scalar base address and operand 1 is a vector of
     mode N containing offsets from that base.  Operand 4 is the vector
     of values that should be stored, which has the same number of
     elements as N.  For each element index I:

        • extend the offset element I to address width, using zero
          extension if operand 2 is 1 and sign extension if operand 2 is
          zero;
        • multiply the extended offset by operand 3;
        • add the result to the base; and
        • store element I of operand 4 to that address.

     The value of operand 2 does not matter if the offsets are already
     address width.

‘mask_scatter_storeMN’
     Like ‘scatter_storeMN’, but takes an extra mask operand as operand
     5.  Bit I of the mask is set if element I of the result should be
     stored to memory.

‘mask_len_scatter_storeMN’
     Like ‘scatter_storeMN’, but takes an extra mask operand (operand
     5), a len operand (operand 6) as well as a bias operand (operand
     7).  The instruction stores at most (operand 6 + operand 7)
     elements of (operand 4) to memory.  Bit I of the mask is set if
     element I of (operand 4) should be stored.  Mask elements I with I
     > (operand 6 + operand 7) are ignored.

‘vec_setM’
     Set given field in the vector value.  Operand 0 is the vector to
     modify, operand 1 is new value of field and operand 2 specify the
     field index.

     This pattern is not allowed to ‘FAIL’.

‘vec_extractMN’
     Extract given field from the vector value.  Operand 1 is the
     vector, operand 2 specify field index and operand 0 place to store
     value into.  The N mode is the mode of the field or vector of
     fields that should be extracted, should be either element mode of
     the vector mode M, or a vector mode with the same element mode and
     smaller number of elements.  If N is a vector mode the index is
     counted in multiples of mode N.

     This pattern is not allowed to ‘FAIL’.

‘vec_initMN’
     Initialize the vector to given values.  Operand 0 is the vector to
     initialize and operand 1 is parallel containing values for
     individual fields.  The N mode is the mode of the elements, should
     be either element mode of the vector mode M, or a vector mode with
     the same element mode and smaller number of elements.

‘vec_duplicateM’
     Initialize vector output operand 0 so that each element has the
     value given by scalar input operand 1.  The vector has mode M and
     the scalar has the mode appropriate for one element of M.

     This pattern only handles duplicates of non-constant inputs.
     Constant vectors go through the ‘movM’ pattern instead.

     This pattern is not allowed to ‘FAIL’.

‘vec_seriesM’
     Initialize vector output operand 0 so that element I is equal to
     operand 1 plus I times operand 2.  In other words, create a linear
     series whose base value is operand 1 and whose step is operand 2.

     The vector output has mode M and the scalar inputs have the mode
     appropriate for one element of M.  This pattern is not used for
     floating-point vectors, in order to avoid having to specify the
     rounding behavior for I > 1.

     This pattern is not allowed to ‘FAIL’.

‘while_ultMN’
     Set operand 0 to a mask that is true while incrementing operand 1
     gives a value that is less than operand 2, for a vector length up
     to operand 3.  Operand 0 has mode N and operands 1 and 2 are scalar
     integers of mode M.  Operand 3 should be omitted when N is a vector
     mode, and a ‘CONST_INT’ otherwise.  The operation for vector modes
     is equivalent to:

          operand0[0] = operand1 < operand2;
          for (i = 1; i < GET_MODE_NUNITS (N); i++)
            operand0[i] = operand0[i - 1] && (operand1 + i < operand2);

     And for non-vector modes the operation is equivalent to:

          operand0[0] = operand1 < operand2;
          for (i = 1; i < operand3; i++)
            operand0[i] = operand0[i - 1] && (operand1 + i < operand2);

‘select_vlM’
     Set operand 0 to the number of scalar iterations that should be
     handled by one iteration of a vector loop.  Operand 1 is the total
     number of scalar iterations that the loop needs to process and
     operand 2 is a maximum bound on the result (also known as the
     maximum "vectorization factor").

     The maximum value of operand 0 is given by:
          operand0 = MIN (operand1, operand2)
     However, targets might choose a lower value than this, based on
     target-specific criteria.  Each iteration of the vector loop might
     therefore process a different number of scalar iterations, which in
     turn means that induction variables will have a variable step.
     Because of this, it is generally not useful to define this
     instruction if it will always calculate the maximum value.

     This optab is only useful on targets that implement ‘len_load_M’
     and/or ‘len_store_M’.

‘check_raw_ptrsM’
     Check whether, given two pointers A and B and a length LEN, a write
     of LEN bytes at A followed by a read of LEN bytes at B can be split
     into interleaved byte accesses ‘A[0], B[0], A[1], B[1], ...’
     without affecting the dependencies between the bytes.  Set operand
     0 to true if the split is possible and false otherwise.

     Operands 1, 2 and 3 provide the values of A, B and LEN
     respectively.  Operand 4 is a constant integer that provides the
     known common alignment of A and B.  All inputs have mode M.

     This split is possible if:

          A == B || A + LEN <= B || B + LEN <= A

     You should only define this pattern if the target has a way of
     accelerating the test without having to do the individual
     comparisons.

‘check_war_ptrsM’
     Like ‘check_raw_ptrsM’, but with the read and write swapped round.
     The split is possible in this case if:

          B <= A || A + LEN <= B

‘vec_cmpMN’
     Output a vector comparison.  Operand 0 of mode N is the destination
     for predicate in operand 1 which is a signed vector comparison with
     operands of mode M in operands 2 and 3.  Predicate is computed by
     element-wise evaluation of the vector comparison with a truth value
     of all-ones and a false value of all-zeros.

‘vec_cmpuMN’
     Similar to ‘vec_cmpMN’ but perform unsigned vector comparison.

‘vec_cmpeqMN’
     Similar to ‘vec_cmpMN’ but perform equality or non-equality vector
     comparison only.  If ‘vec_cmpMN’ or ‘vec_cmpuMN’ instruction
     pattern is supported, it will be preferred over ‘vec_cmpeqMN’, so
     there is no need to define this instruction pattern if the others
     are supported.

‘vcondMN’
     Output a conditional vector move.  Operand 0 is the destination to
     receive a combination of operand 1 and operand 2, which are of mode
     M, dependent on the outcome of the predicate in operand 3 which is
     a signed vector comparison with operands of mode N in operands 4
     and 5.  The modes M and N should have the same size.  Operand 0
     will be set to the value OP1 & MSK | OP2 & ~MSK where MSK is
     computed by element-wise evaluation of the vector comparison with a
     truth value of all-ones and a false value of all-zeros.

‘vconduMN’
     Similar to ‘vcondMN’ but performs unsigned vector comparison.

‘vcondeqMN’
     Similar to ‘vcondMN’ but performs equality or non-equality vector
     comparison only.  If ‘vcondMN’ or ‘vconduMN’ instruction pattern is
     supported, it will be preferred over ‘vcondeqMN’, so there is no
     need to define this instruction pattern if the others are
     supported.

‘vcond_mask_MN’
     Similar to ‘vcondMN’ but operand 3 holds a pre-computed result of
     vector comparison.

‘vcond_mask_MN’
     Set each element of operand 0 to the corresponding element of
     operand 2 or operand 3.  Choose operand 2 if both the element index
     is less than operand 4 plus operand 5 and the corresponding element
     of operand 1 is nonzero:

          for (i = 0; i < GET_MODE_NUNITS (M); i++)
            op0[i] = i < op4 + op5 && op1[i] ? op2[i] : op3[i];

     Operands 0, 2 and 3 have mode M.  Operand 1 has mode N.  Operands 4
     and 5 have a target-dependent scalar integer mode.

‘maskloadMN’
     Perform a masked load of vector from memory operand 1 of mode M
     into register operand 0.  Mask is provided in register operand 2 of
     mode N.

     This pattern is not allowed to ‘FAIL’.

‘maskstoreMN’
     Perform a masked store of vector from register operand 1 of mode M
     into memory operand 0.  Mask is provided in register operand 2 of
     mode N.

     This pattern is not allowed to ‘FAIL’.

‘len_load_M’
     Load (operand 2 + operand 3) elements from memory operand 1 into
     vector register operand 0, setting the other elements of operand 0
     to undefined values.  Operands 0 and 1 have mode M, which must be a
     vector mode.  Operand 2 has whichever integer mode the target
     prefers.  Operand 3 conceptually has mode ‘QI’.

     Operand 2 can be a variable or a constant amount.  Operand 3
     specifies a constant bias: it is either a constant 0 or a constant
     -1.  The predicate on operand 3 must only accept the bias values
     that the target actually supports.  GCC handles a bias of 0 more
     efficiently than a bias of -1.

     If (operand 2 + operand 3) exceeds the number of elements in mode
     M, the behavior is undefined.

     If the target prefers the length to be measured in bytes rather
     than elements, it should only implement this pattern for vectors of
     ‘QI’ elements.

     This pattern is not allowed to ‘FAIL’.

‘len_store_M’
     Store (operand 2 + operand 3) vector elements from vector register
     operand 1 into memory operand 0, leaving the other elements of
     operand 0 unchanged.  Operands 0 and 1 have mode M, which must be a
     vector mode.  Operand 2 has whichever integer mode the target
     prefers.  Operand 3 conceptually has mode ‘QI’.

     Operand 2 can be a variable or a constant amount.  Operand 3
     specifies a constant bias: it is either a constant 0 or a constant
     -1.  The predicate on operand 3 must only accept the bias values
     that the target actually supports.  GCC handles a bias of 0 more
     efficiently than a bias of -1.

     If (operand 2 + operand 3) exceeds the number of elements in mode
     M, the behavior is undefined.

     If the target prefers the length to be measured in bytes rather
     than elements, it should only implement this pattern for vectors of
     ‘QI’ elements.

     This pattern is not allowed to ‘FAIL’.

‘mask_len_loadMN’
     Perform a masked load from the memory location pointed to by
     operand 1 into register operand 0.  (operand 3 + operand 4)
     elements are loaded from memory and other elements in operand 0 are
     set to undefined values.  This is a combination of len_load and
     maskload.  Operands 0 and 1 have mode M, which must be a vector
     mode.  Operand 3 has whichever integer mode the target prefers.  A
     mask is specified in operand 2 which must be of type N.  The mask
     has lower precedence than the length and is itself subject to
     length masking, i.e.  only mask indices < (operand 3 + operand 4)
     are used.  Operand 4 conceptually has mode ‘QI’.

     Operand 2 can be a variable or a constant amount.  Operand 4
     specifies a constant bias: it is either a constant 0 or a constant
     -1.  The predicate on operand 4 must only accept the bias values
     that the target actually supports.  GCC handles a bias of 0 more
     efficiently than a bias of -1.

     If (operand 2 + operand 4) exceeds the number of elements in mode
     M, the behavior is undefined.

     If the target prefers the length to be measured in bytes rather
     than elements, it should only implement this pattern for vectors of
     ‘QI’ elements.

     This pattern is not allowed to ‘FAIL’.

‘mask_len_storeMN’
     Perform a masked store from vector register operand 1 into memory
     operand 0.  (operand 3 + operand 4) elements are stored to memory
     and leave the other elements of operand 0 unchanged.  This is a
     combination of len_store and maskstore.  Operands 0 and 1 have mode
     M, which must be a vector mode.  Operand 3 has whichever integer
     mode the target prefers.  A mask is specified in operand 2 which
     must be of type N.  The mask has lower precedence than the length
     and is itself subject to length masking, i.e.  only mask indices <
     (operand 3 + operand 4) are used.  Operand 4 conceptually has mode
     ‘QI’.

     Operand 2 can be a variable or a constant amount.  Operand 3
     specifies a constant bias: it is either a constant 0 or a constant
     -1.  The predicate on operand 4 must only accept the bias values
     that the target actually supports.  GCC handles a bias of 0 more
     efficiently than a bias of -1.

     If (operand 2 + operand 4) exceeds the number of elements in mode
     M, the behavior is undefined.

     If the target prefers the length to be measured in bytes rather
     than elements, it should only implement this pattern for vectors of
     ‘QI’ elements.

     This pattern is not allowed to ‘FAIL’.

‘vec_permM’
     Output a (variable) vector permutation.  Operand 0 is the
     destination to receive elements from operand 1 and operand 2, which
     are of mode M.  Operand 3 is the “selector”.  It is an integral
     mode vector of the same width and number of elements as mode M.

     The input elements are numbered from 0 in operand 1 through 2*N-1
     in operand 2.  The elements of the selector must be computed modulo
     2*N.  Note that if ‘rtx_equal_p(operand1, operand2)’, this can be
     implemented with just operand 1 and selector elements modulo N.

     In order to make things easy for a number of targets, if there is
     no ‘vec_perm’ pattern for mode M, but there is for mode Q where Q
     is a vector of ‘QImode’ of the same width as M, the middle-end will
     lower the mode M ‘VEC_PERM_EXPR’ to mode Q.

     See also ‘TARGET_VECTORIZER_VEC_PERM_CONST’, which performs the
     analogous operation for constant selectors.

‘pushM1’
     Output a push instruction.  Operand 0 is value to push.  Used only
     when ‘PUSH_ROUNDING’ is defined.  For historical reason, this
     pattern may be missing and in such case an ‘mov’ expander is used
     instead, with a ‘MEM’ expression forming the push operation.  The
     ‘mov’ expander method is deprecated.

‘addM3’
     Add operand 2 and operand 1, storing the result in operand 0.  All
     operands must have mode M.  This can be used even on two-address
     machines, by means of constraints requiring operands 1 and 0 to be
     the same location.

‘ssaddM3’, ‘usaddM3’
‘subM3’, ‘sssubM3’, ‘ussubM3’
‘mulM3’, ‘ssmulM3’, ‘usmulM3’
‘divM3’, ‘ssdivM3’
‘udivM3’, ‘usdivM3’
‘modM3’, ‘umodM3’
‘uminM3’, ‘umaxM3’
‘andM3’, ‘iorM3’, ‘xorM3’
     Similar, for other arithmetic operations.

‘addvM4’
     Like ‘addM3’ but takes a ‘code_label’ as operand 3 and emits code
     to jump to it if signed overflow occurs during the addition.  This
     pattern is used to implement the built-in functions performing
     signed integer addition with overflow checking.

‘subvM4’, ‘mulvM4’
     Similar, for other signed arithmetic operations.

‘uaddvM4’
     Like ‘addvM4’ but for unsigned addition.  That is to say, the
     operation is the same as signed addition but the jump is taken only
     on unsigned overflow.

‘usubvM4’, ‘umulvM4’
     Similar, for other unsigned arithmetic operations.

‘uaddcM5’
     Adds unsigned operands 2, 3 and 4 (where the last operand is
     guaranteed to have only values 0 or 1) together, sets operand 0 to
     the result of the addition of the 3 operands and sets operand 1 to
     1 iff there was overflow on the unsigned additions, and to 0
     otherwise.  So, it is an addition with carry in (operand 4) and
     carry out (operand 1).  All operands have the same mode.

‘usubcM5’
     Similarly to ‘uaddcM5’, except subtracts unsigned operands 3 and 4
     from operand 2 instead of adding them.  So, it is a subtraction
     with carry/borrow in (operand 4) and carry/borrow out (operand 1).
     All operands have the same mode.

‘addptrM3’
     Like ‘addM3’ but is guaranteed to only be used for address
     calculations.  The expanded code is not allowed to clobber the
     condition code.  It only needs to be defined if ‘addM3’ sets the
     condition code.  If adds used for address calculations and normal
     adds are not compatible it is required to expand a distinct pattern
     (e.g. using an unspec).  The pattern is used by LRA to emit address
     calculations.  ‘addM3’ is used if ‘addptrM3’ is not defined.

‘fmaM4’
     Multiply operand 2 and operand 1, then add operand 3, storing the
     result in operand 0 without doing an intermediate rounding step.
     All operands must have mode M.  This pattern is used to implement
     the ‘fma’, ‘fmaf’, and ‘fmal’ builtin functions from the ISO C99
     standard.

‘fmsM4’
     Like ‘fmaM4’, except operand 3 subtracted from the product instead
     of added to the product.  This is represented in the rtl as

          (fma:M OP1 OP2 (neg:M OP3))

‘fnmaM4’
     Like ‘fmaM4’ except that the intermediate product is negated before
     being added to operand 3.  This is represented in the rtl as

          (fma:M (neg:M OP1) OP2 OP3)

‘fnmsM4’
     Like ‘fmsM4’ except that the intermediate product is negated before
     subtracting operand 3.  This is represented in the rtl as

          (fma:M (neg:M OP1) OP2 (neg:M OP3))

‘sminM3’, ‘smaxM3’
     Signed minimum and maximum operations.  When used with floating
     point, if both operands are zeros, or if either operand is ‘NaN’,
     then it is unspecified which of the two operands is returned as the
     result.

‘fminM3’, ‘fmaxM3’
     IEEE-conformant minimum and maximum operations.  If one operand is
     a quiet ‘NaN’, then the other operand is returned.  If both
     operands are quiet ‘NaN’, then a quiet ‘NaN’ is returned.  In the
     case when gcc supports signaling ‘NaN’ (-fsignaling-nans) an
     invalid floating point exception is raised and a quiet ‘NaN’ is
     returned.

     All operands have mode M, which is a scalar or vector
     floating-point mode.  These patterns are not allowed to ‘FAIL’.

‘reduc_smin_scal_M’, ‘reduc_smax_scal_M’
     Find the signed minimum/maximum of the elements of a vector.  The
     vector is operand 1, and operand 0 is the scalar result, with mode
     equal to the mode of the elements of the input vector.

‘reduc_umin_scal_M’, ‘reduc_umax_scal_M’
     Find the unsigned minimum/maximum of the elements of a vector.  The
     vector is operand 1, and operand 0 is the scalar result, with mode
     equal to the mode of the elements of the input vector.

‘reduc_fmin_scal_M’, ‘reduc_fmax_scal_M’
     Find the floating-point minimum/maximum of the elements of a
     vector, using the same rules as ‘fminM3’ and ‘fmaxM3’.  Operand 1
     is a vector of mode M and operand 0 is the scalar result, which has
     mode ‘GET_MODE_INNER (M)’.

‘reduc_plus_scal_M’
     Compute the sum of the elements of a vector.  The vector is operand
     1, and operand 0 is the scalar result, with mode equal to the mode
     of the elements of the input vector.

‘reduc_and_scal_M’
‘reduc_ior_scal_M’
‘reduc_xor_scal_M’
     Compute the bitwise ‘AND’/‘IOR’/‘XOR’ reduction of the elements of
     a vector of mode M.  Operand 1 is the vector input and operand 0 is
     the scalar result.  The mode of the scalar result is the same as
     one element of M.

‘extract_last_M’
     Find the last set bit in mask operand 1 and extract the associated
     element of vector operand 2.  Store the result in scalar operand 0.
     Operand 2 has vector mode M while operand 0 has the mode
     appropriate for one element of M.  Operand 1 has the usual mask
     mode for vectors of mode M; see ‘TARGET_VECTORIZE_GET_MASK_MODE’.

‘fold_extract_last_M’
     If any bits of mask operand 2 are set, find the last set bit,
     extract the associated element from vector operand 3, and store the
     result in operand 0.  Store operand 1 in operand 0 otherwise.
     Operand 3 has mode M and operands 0 and 1 have the mode appropriate
     for one element of M.  Operand 2 has the usual mask mode for
     vectors of mode M; see ‘TARGET_VECTORIZE_GET_MASK_MODE’.

‘len_fold_extract_last_M’
     Like ‘fold_extract_last_M’, but takes an extra length operand as
     operand 4 and an extra bias operand as operand 5.  The last
     associated element is extracted should have the index i < len
     (operand 4) + bias (operand 5).

‘fold_left_plus_M’
     Take scalar operand 1 and successively add each element from vector
     operand 2.  Store the result in scalar operand 0.  The vector has
     mode M and the scalars have the mode appropriate for one element of
     M.  The operation is strictly in-order: there is no reassociation.

‘mask_fold_left_plus_M’
     Like ‘fold_left_plus_M’, but takes an additional mask operand
     (operand 3) that specifies which elements of the source vector
     should be added.

‘mask_len_fold_left_plus_M’
     Like ‘fold_left_plus_M’, but takes an additional mask operand
     (operand 3), len operand (operand 4) and bias operand (operand 5)
     that performs following operations strictly in-order (no
     reassociation):

          operand0 = operand1;
          for (i = 0; i < LEN + BIAS; i++)
            if (operand3[i])
              operand0 += operand2[i];

‘sdot_prodM’

     Compute the sum of the products of two signed elements.  Operand 1
     and operand 2 are of the same mode.  Their product, which is of a
     wider mode, is computed and added to operand 3.  Operand 3 is of a
     mode equal or wider than the mode of the product.  The result is
     placed in operand 0, which is of the same mode as operand 3.  M is
     the mode of operand 1 and operand 2.

     Semantically the expressions perform the multiplication in the
     following signs

          sdot<signed op0, signed op1, signed op2, signed op3> ==
             op0 = sign-ext (op1) * sign-ext (op2) + op3
          ...

‘udot_prodM’

     Compute the sum of the products of two unsigned elements.  Operand
     1 and operand 2 are of the same mode.  Their product, which is of a
     wider mode, is computed and added to operand 3.  Operand 3 is of a
     mode equal or wider than the mode of the product.  The result is
     placed in operand 0, which is of the same mode as operand 3.  M is
     the mode of operand 1 and operand 2.

     Semantically the expressions perform the multiplication in the
     following signs

          udot<unsigned op0, unsigned op1, unsigned op2, unsigned op3> ==
             op0 = zero-ext (op1) * zero-ext (op2) + op3
          ...

‘usdot_prodM’
     Compute the sum of the products of elements of different signs.
     Operand 1 must be unsigned and operand 2 signed.  Their product,
     which is of a wider mode, is computed and added to operand 3.
     Operand 3 is of a mode equal or wider than the mode of the product.
     The result is placed in operand 0, which is of the same mode as
     operand 3.  M is the mode of operand 1 and operand 2.

     Semantically the expressions perform the multiplication in the
     following signs

          usdot<signed op0, unsigned op1, signed op2, signed op3> ==
             op0 = ((signed-conv) zero-ext (op1)) * sign-ext (op2) + op3
          ...

‘ssadM’
‘usadM’
     Compute the sum of absolute differences of two signed/unsigned
     elements.  Operand 1 and operand 2 are of the same mode.  Their
     absolute difference, which is of a wider mode, is computed and
     added to operand 3.  Operand 3 is of a mode equal or wider than the
     mode of the absolute difference.  The result is placed in operand
     0, which is of the same mode as operand 3.  M is the mode of
     operand 1 and operand 2.

‘widen_ssumM3’
‘widen_usumM3’
     Operands 0 and 2 are of the same mode, which is wider than the mode
     of operand 1.  Add operand 1 to operand 2 and place the widened
     result in operand 0.  (This is used express accumulation of
     elements into an accumulator of a wider mode.)  M is the mode of
     operand 1.

‘smulhsM3’
‘umulhsM3’
     Signed/unsigned multiply high with scale.  This is equivalent to
     the C code:
          narrow op0, op1, op2;
          ...
          op0 = (narrow) (((wide) op1 * (wide) op2) >> (N / 2 - 1));
     where the sign of ‘narrow’ determines whether this is a signed or
     unsigned operation, and N is the size of ‘wide’ in bits.  M is the
     mode for all 3 operands (narrow).  The wide mode is not specified
     and is defined to fit the whole multiply.

‘smulhrsM3’
‘umulhrsM3’
     Signed/unsigned multiply high with round and scale.  This is
     equivalent to the C code:
          narrow op0, op1, op2;
          ...
          op0 = (narrow) (((((wide) op1 * (wide) op2) >> (N / 2 - 2)) + 1) >> 1);
     where the sign of ‘narrow’ determines whether this is a signed or
     unsigned operation, and N is the size of ‘wide’ in bits.  M is the
     mode for all 3 operands (narrow).  The wide mode is not specified
     and is defined to fit the whole multiply.

‘sdiv_pow2M3’
‘sdiv_pow2M3’
     Signed division by power-of-2 immediate.  Equivalent to:
          signed op0, op1;
          ...
          op0 = op1 / (1 << imm);

‘vec_shl_insert_M’
     Shift the elements in vector input operand 1 left one element (i.e.
     away from element 0) and fill the vacated element 0 with the scalar
     in operand 2.  Store the result in vector output operand 0.
     Operands 0 and 1 have mode M and operand 2 has the mode appropriate
     for one element of M.

‘vec_shl_M’
     Whole vector left shift in bits, i.e. away from element 0.  Operand
     1 is a vector to be shifted.  Operand 2 is an integer shift amount
     in bits.  Operand 0 is where the resulting shifted vector is
     stored.  The output and input vectors should have the same modes.

‘vec_shr_M’
     Whole vector right shift in bits, i.e. towards element 0.  Operand
     1 is a vector to be shifted.  Operand 2 is an integer shift amount
     in bits.  Operand 0 is where the resulting shifted vector is
     stored.  The output and input vectors should have the same modes.

‘vec_pack_trunc_M’
     Narrow (demote) and merge the elements of two vectors.  Operands 1
     and 2 are vectors of the same mode having N integral or floating
     point elements of size S.  Operand 0 is the resulting vector in
     which 2*N elements of size S/2 are concatenated after narrowing
     them down using truncation.

‘vec_pack_sbool_trunc_M’
     Narrow and merge the elements of two vectors.  Operands 1 and 2 are
     vectors of the same type having N boolean elements.  Operand 0 is
     the resulting vector in which 2*N elements are concatenated.  The
     last operand (operand 3) is the number of elements in the output
     vector 2*N as a ‘CONST_INT’.  This instruction pattern is used when
     all the vector input and output operands have the same scalar mode
     M and thus using ‘vec_pack_trunc_M’ would be ambiguous.

‘vec_pack_ssat_M’, ‘vec_pack_usat_M’
     Narrow (demote) and merge the elements of two vectors.  Operands 1
     and 2 are vectors of the same mode having N integral elements of
     size S. Operand 0 is the resulting vector in which the elements of
     the two input vectors are concatenated after narrowing them down
     using signed/unsigned saturating arithmetic.

‘vec_pack_sfix_trunc_M’, ‘vec_pack_ufix_trunc_M’
     Narrow, convert to signed/unsigned integral type and merge the
     elements of two vectors.  Operands 1 and 2 are vectors of the same
     mode having N floating point elements of size S.  Operand 0 is the
     resulting vector in which 2*N elements of size S/2 are
     concatenated.

‘vec_packs_float_M’, ‘vec_packu_float_M’
     Narrow, convert to floating point type and merge the elements of
     two vectors.  Operands 1 and 2 are vectors of the same mode having
     N signed/unsigned integral elements of size S.  Operand 0 is the
     resulting vector in which 2*N elements of size S/2 are
     concatenated.

‘vec_unpacks_hi_M’, ‘vec_unpacks_lo_M’
     Extract and widen (promote) the high/low part of a vector of signed
     integral or floating point elements.  The input vector (operand 1)
     has N elements of size S.  Widen (promote) the high/low elements of
     the vector using signed or floating point extension and place the
     resulting N/2 values of size 2*S in the output vector (operand 0).

‘vec_unpacku_hi_M’, ‘vec_unpacku_lo_M’
     Extract and widen (promote) the high/low part of a vector of
     unsigned integral elements.  The input vector (operand 1) has N
     elements of size S. Widen (promote) the high/low elements of the
     vector using zero extension and place the resulting N/2 values of
     size 2*S in the output vector (operand 0).

‘vec_unpacks_sbool_hi_M’, ‘vec_unpacks_sbool_lo_M’
     Extract the high/low part of a vector of boolean elements that have
     scalar mode M.  The input vector (operand 1) has N elements, the
     output vector (operand 0) has N/2 elements.  The last operand
     (operand 2) is the number of elements of the input vector N as a
     ‘CONST_INT’.  These patterns are used if both the input and output
     vectors have the same scalar mode M and thus using
     ‘vec_unpacks_hi_M’ or ‘vec_unpacks_lo_M’ would be ambiguous.

‘vec_unpacks_float_hi_M’, ‘vec_unpacks_float_lo_M’
‘vec_unpacku_float_hi_M’, ‘vec_unpacku_float_lo_M’
     Extract, convert to floating point type and widen the high/low part
     of a vector of signed/unsigned integral elements.  The input vector
     (operand 1) has N elements of size S.  Convert the high/low
     elements of the vector using floating point conversion and place
     the resulting N/2 values of size 2*S in the output vector (operand
     0).

‘vec_unpack_sfix_trunc_hi_M’,
‘vec_unpack_sfix_trunc_lo_M’
‘vec_unpack_ufix_trunc_hi_M’
‘vec_unpack_ufix_trunc_lo_M’
     Extract, convert to signed/unsigned integer type and widen the
     high/low part of a vector of floating point elements.  The input
     vector (operand 1) has N elements of size S.  Convert the high/low
     elements of the vector to integers and place the resulting N/2
     values of size 2*S in the output vector (operand 0).

‘vec_widen_umult_hi_M’, ‘vec_widen_umult_lo_M’
‘vec_widen_smult_hi_M’, ‘vec_widen_smult_lo_M’
‘vec_widen_umult_even_M’, ‘vec_widen_umult_odd_M’
‘vec_widen_smult_even_M’, ‘vec_widen_smult_odd_M’
     Signed/Unsigned widening multiplication.  The two inputs (operands
     1 and 2) are vectors with N signed/unsigned elements of size S.
     Multiply the high/low or even/odd elements of the two vectors, and
     put the N/2 products of size 2*S in the output vector (operand 0).
     A target shouldn't implement even/odd pattern pair if it is less
     efficient than lo/hi one.

‘vec_widen_ushiftl_hi_M’, ‘vec_widen_ushiftl_lo_M’
‘vec_widen_sshiftl_hi_M’, ‘vec_widen_sshiftl_lo_M’
     Signed/Unsigned widening shift left.  The first input (operand 1)
     is a vector with N signed/unsigned elements of size S.  Operand 2
     is a constant.  Shift the high/low elements of operand 1, and put
     the N/2 results of size 2*S in the output vector (operand 0).

‘vec_widen_uaddl_hi_M’, ‘vec_widen_uaddl_lo_M’
‘vec_widen_saddl_hi_M’, ‘vec_widen_saddl_lo_M’
     Signed/Unsigned widening add long.  Operands 1 and 2 are vectors
     with N signed/unsigned elements of size S.  Add the high/low
     elements of 1 and 2 together, widen the resulting elements and put
     the N/2 results of size 2*S in the output vector (operand 0).

‘vec_widen_usubl_hi_M’, ‘vec_widen_usubl_lo_M’
‘vec_widen_ssubl_hi_M’, ‘vec_widen_ssubl_lo_M’
     Signed/Unsigned widening subtract long.  Operands 1 and 2 are
     vectors with N signed/unsigned elements of size S.  Subtract the
     high/low elements of 2 from 1 and widen the resulting elements.
     Put the N/2 results of size 2*S in the output vector (operand 0).

‘vec_widen_uabd_hi_M’, ‘vec_widen_uabd_lo_M’
‘vec_widen_uabd_odd_M’, ‘vec_widen_uabd_even_M’
‘vec_widen_sabd_hi_M’, ‘vec_widen_sabd_lo_M’
‘vec_widen_sabd_odd_M’, ‘vec_widen_sabd_even_M’
     Signed/Unsigned widening absolute difference.  Operands 1 and 2 are
     vectors with N signed/unsigned elements of size S.  Find the
     absolute difference between operands 1 and 2 and widen the
     resulting elements.  Put the N/2 results of size 2*S in the output
     vector (operand 0).

‘vec_addsubM3’
     Alternating subtract, add with even lanes doing subtract and odd
     lanes doing addition.  Operands 1 and 2 and the outout operand are
     vectors with mode M.

‘vec_fmaddsubM4’
     Alternating multiply subtract, add with even lanes doing subtract
     and odd lanes doing addition of the third operand to the
     multiplication result of the first two operands.  Operands 1, 2 and
     3 and the outout operand are vectors with mode M.

‘vec_fmsubaddM4’
     Alternating multiply add, subtract with even lanes doing addition
     and odd lanes doing subtraction of the third operand to the
     multiplication result of the first two operands.  Operands 1, 2 and
     3 and the outout operand are vectors with mode M.

     These instructions are not allowed to ‘FAIL’.

‘mulhisi3’
     Multiply operands 1 and 2, which have mode ‘HImode’, and store a
     ‘SImode’ product in operand 0.

‘mulqihi3’, ‘mulsidi3’
     Similar widening-multiplication instructions of other widths.

‘umulqihi3’, ‘umulhisi3’, ‘umulsidi3’
     Similar widening-multiplication instructions that do unsigned
     multiplication.

‘usmulqihi3’, ‘usmulhisi3’, ‘usmulsidi3’
     Similar widening-multiplication instructions that interpret the
     first operand as unsigned and the second operand as signed, then do
     a signed multiplication.

‘smulM3_highpart’
     Perform a signed multiplication of operands 1 and 2, which have
     mode M, and store the most significant half of the product in
     operand 0.  The least significant half of the product is discarded.
     This may be represented in RTL using a ‘smul_highpart’ RTX
     expression.

‘umulM3_highpart’
     Similar, but the multiplication is unsigned.  This may be
     represented in RTL using an ‘umul_highpart’ RTX expression.

‘maddMN4’
     Multiply operands 1 and 2, sign-extend them to mode N, add operand
     3, and store the result in operand 0.  Operands 1 and 2 have mode M
     and operands 0 and 3 have mode N.  Both modes must be integer or
     fixed-point modes and N must be twice the size of M.

     In other words, ‘maddMN4’ is like ‘mulMN3’ except that it also adds
     operand 3.

     These instructions are not allowed to ‘FAIL’.

‘umaddMN4’
     Like ‘maddMN4’, but zero-extend the multiplication operands instead
     of sign-extending them.

‘ssmaddMN4’
     Like ‘maddMN4’, but all involved operations must be
     signed-saturating.

‘usmaddMN4’
     Like ‘umaddMN4’, but all involved operations must be
     unsigned-saturating.

‘msubMN4’
     Multiply operands 1 and 2, sign-extend them to mode N, subtract the
     result from operand 3, and store the result in operand 0.  Operands
     1 and 2 have mode M and operands 0 and 3 have mode N.  Both modes
     must be integer or fixed-point modes and N must be twice the size
     of M.

     In other words, ‘msubMN4’ is like ‘mulMN3’ except that it also
     subtracts the result from operand 3.

     These instructions are not allowed to ‘FAIL’.

‘umsubMN4’
     Like ‘msubMN4’, but zero-extend the multiplication operands instead
     of sign-extending them.

‘ssmsubMN4’
     Like ‘msubMN4’, but all involved operations must be
     signed-saturating.

‘usmsubMN4’
     Like ‘umsubMN4’, but all involved operations must be
     unsigned-saturating.

‘divmodM4’
     Signed division that produces both a quotient and a remainder.
     Operand 1 is divided by operand 2 to produce a quotient stored in
     operand 0 and a remainder stored in operand 3.

     For machines with an instruction that produces both a quotient and
     a remainder, provide a pattern for ‘divmodM4’ but do not provide
     patterns for ‘divM3’ and ‘modM3’.  This allows optimization in the
     relatively common case when both the quotient and remainder are
     computed.

     If an instruction that just produces a quotient or just a remainder
     exists and is more efficient than the instruction that produces
     both, write the output routine of ‘divmodM4’ to call
     ‘find_reg_note’ and look for a ‘REG_UNUSED’ note on the quotient or
     remainder and generate the appropriate instruction.

‘udivmodM4’
     Similar, but does unsigned division.

‘ashlM3’, ‘ssashlM3’, ‘usashlM3’
     Arithmetic-shift operand 1 left by a number of bits specified by
     operand 2, and store the result in operand 0.  Here M is the mode
     of operand 0 and operand 1; operand 2's mode is specified by the
     instruction pattern, and the compiler will convert the operand to
     that mode before generating the instruction.  The shift or rotate
     expander or instruction pattern should explicitly specify the mode
     of the operand 2, it should never be ‘VOIDmode’.  The meaning of
     out-of-range shift counts can optionally be specified by
     ‘TARGET_SHIFT_TRUNCATION_MASK’.  *Note
     TARGET_SHIFT_TRUNCATION_MASK::.  Operand 2 is always a scalar type.

‘ashrM3’, ‘lshrM3’, ‘rotlM3’, ‘rotrM3’
     Other shift and rotate instructions, analogous to the ‘ashlM3’
     instructions.  Operand 2 is always a scalar type.

‘vashlM3’, ‘vashrM3’, ‘vlshrM3’, ‘vrotlM3’, ‘vrotrM3’
     Vector shift and rotate instructions that take vectors as operand 2
     instead of a scalar type.

‘uabdM’, ‘sabdM’
     Signed and unsigned absolute difference instructions.  These
     instructions find the difference between operands 1 and 2 then
     return the absolute value.  A C code equivalent would be:
          op0 = op1 > op2 ? op1 - op2 : op2 - op1;

‘avgM3_floor’
‘uavgM3_floor’
     Signed and unsigned average instructions.  These instructions add
     operands 1 and 2 without truncation, divide the result by 2, round
     towards -Inf, and store the result in operand 0.  This is
     equivalent to the C code:
          narrow op0, op1, op2;
          ...
          op0 = (narrow) (((wide) op1 + (wide) op2) >> 1);
     where the sign of ‘narrow’ determines whether this is a signed or
     unsigned operation.

‘avgM3_ceil’
‘uavgM3_ceil’
     Like ‘avgM3_floor’ and ‘uavgM3_floor’, but round towards +Inf.
     This is equivalent to the C code:
          narrow op0, op1, op2;
          ...
          op0 = (narrow) (((wide) op1 + (wide) op2 + 1) >> 1);

‘bswapM2’
     Reverse the order of bytes of operand 1 and store the result in
     operand 0.

‘negM2’, ‘ssnegM2’, ‘usnegM2’
     Negate operand 1 and store the result in operand 0.

‘negvM3’
     Like ‘negM2’ but takes a ‘code_label’ as operand 2 and emits code
     to jump to it if signed overflow occurs during the negation.

‘absM2’
     Store the absolute value of operand 1 into operand 0.

‘sqrtM2’
     Store the square root of operand 1 into operand 0.  Both operands
     have mode M, which is a scalar or vector floating-point mode.

     This pattern is not allowed to ‘FAIL’.

‘rsqrtM2’
     Store the reciprocal of the square root of operand 1 into operand
     0.  Both operands have mode M, which is a scalar or vector
     floating-point mode.

     On most architectures this pattern is only approximate, so either
     its C condition or the ‘TARGET_OPTAB_SUPPORTED_P’ hook should check
     for the appropriate math flags.  (Using the C condition is more
     direct, but using ‘TARGET_OPTAB_SUPPORTED_P’ can be useful if a
     target-specific built-in also uses the ‘rsqrtM2’ pattern.)

     This pattern is not allowed to ‘FAIL’.

‘fmodM3’
     Store the remainder of dividing operand 1 by operand 2 into operand
     0, rounded towards zero to an integer.  All operands have mode M,
     which is a scalar or vector floating-point mode.

     This pattern is not allowed to ‘FAIL’.

‘remainderM3’
     Store the remainder of dividing operand 1 by operand 2 into operand
     0, rounded to the nearest integer.  All operands have mode M, which
     is a scalar or vector floating-point mode.

     This pattern is not allowed to ‘FAIL’.

‘scalbM3’
     Raise ‘FLT_RADIX’ to the power of operand 2, multiply it by operand
     1, and store the result in operand 0.  All operands have mode M,
     which is a scalar or vector floating-point mode.

     This pattern is not allowed to ‘FAIL’.

‘ldexpM3’
     Raise 2 to the power of operand 2, multiply it by operand 1, and
     store the result in operand 0.  Operands 0 and 1 have mode M, which
     is a scalar or vector floating-point mode.  Operand 2's mode has
     the same number of elements as M and each element is wide enough to
     store an ‘int’.  The integers are signed.

     This pattern is not allowed to ‘FAIL’.

‘cosM2’
     Store the cosine of operand 1 into operand 0.  Both operands have
     mode M, which is a scalar or vector floating-point mode.

     This pattern is not allowed to ‘FAIL’.

‘sinM2’
     Store the sine of operand 1 into operand 0.  Both operands have
     mode M, which is a scalar or vector floating-point mode.

     This pattern is not allowed to ‘FAIL’.

‘sincosM3’
     Store the cosine of operand 2 into operand 0 and the sine of
     operand 2 into operand 1.  All operands have mode M, which is a
     scalar or vector floating-point mode.

     Targets that can calculate the sine and cosine simultaneously can
     implement this pattern as opposed to implementing individual
     ‘sinM2’ and ‘cosM2’ patterns.  The ‘sin’ and ‘cos’ built-in
     functions will then be expanded to the ‘sincosM3’ pattern, with one
     of the output values left unused.

‘tanM2’
     Store the tangent of operand 1 into operand 0.  Both operands have
     mode M, which is a scalar or vector floating-point mode.

     This pattern is not allowed to ‘FAIL’.

‘asinM2’
     Store the arc sine of operand 1 into operand 0.  Both operands have
     mode M, which is a scalar or vector floating-point mode.

     This pattern is not allowed to ‘FAIL’.

‘acosM2’
     Store the arc cosine of operand 1 into operand 0.  Both operands
     have mode M, which is a scalar or vector floating-point mode.

     This pattern is not allowed to ‘FAIL’.

‘atanM2’
     Store the arc tangent of operand 1 into operand 0.  Both operands
     have mode M, which is a scalar or vector floating-point mode.

     This pattern is not allowed to ‘FAIL’.

‘fegetroundM’
     Store the current machine floating-point rounding mode into operand
     0.  Operand 0 has mode M, which is scalar.  This pattern is used to
     implement the ‘fegetround’ function from the ISO C99 standard.

‘feclearexceptM’
‘feraiseexceptM’
     Clears or raises the supported machine floating-point exceptions
     represented by the bits in operand 1.  Error status is stored as
     nonzero value in operand 0.  Both operands have mode M, which is a
     scalar.  These patterns are used to implement the ‘feclearexcept’
     and ‘feraiseexcept’ functions from the ISO C99 standard.

‘expM2’
     Raise e (the base of natural logarithms) to the power of operand 1
     and store the result in operand 0.  Both operands have mode M,
     which is a scalar or vector floating-point mode.

     This pattern is not allowed to ‘FAIL’.

‘expm1M2’
     Raise e (the base of natural logarithms) to the power of operand 1,
     subtract 1, and store the result in operand 0.  Both operands have
     mode M, which is a scalar or vector floating-point mode.

     For inputs close to zero, the pattern is expected to be more
     accurate than a separate ‘expM2’ and ‘subM3’ would be.

     This pattern is not allowed to ‘FAIL’.

‘exp10M2’
     Raise 10 to the power of operand 1 and store the result in operand
     0.  Both operands have mode M, which is a scalar or vector
     floating-point mode.

     This pattern is not allowed to ‘FAIL’.

‘exp2M2’
     Raise 2 to the power of operand 1 and store the result in operand
     0.  Both operands have mode M, which is a scalar or vector
     floating-point mode.

     This pattern is not allowed to ‘FAIL’.

‘logM2’
     Store the natural logarithm of operand 1 into operand 0.  Both
     operands have mode M, which is a scalar or vector floating-point
     mode.

     This pattern is not allowed to ‘FAIL’.

‘log1pM2’
     Add 1 to operand 1, compute the natural logarithm, and store the
     result in operand 0.  Both operands have mode M, which is a scalar
     or vector floating-point mode.

     For inputs close to zero, the pattern is expected to be more
     accurate than a separate ‘addM3’ and ‘logM2’ would be.

     This pattern is not allowed to ‘FAIL’.

‘log10M2’
     Store the base-10 logarithm of operand 1 into operand 0.  Both
     operands have mode M, which is a scalar or vector floating-point
     mode.

     This pattern is not allowed to ‘FAIL’.

‘log2M2’
     Store the base-2 logarithm of operand 1 into operand 0.  Both
     operands have mode M, which is a scalar or vector floating-point
     mode.

     This pattern is not allowed to ‘FAIL’.

‘logbM2’
     Store the base-‘FLT_RADIX’ logarithm of operand 1 into operand 0.
     Both operands have mode M, which is a scalar or vector
     floating-point mode.

     This pattern is not allowed to ‘FAIL’.

‘signbitM2’
     Store the sign bit of floating-point operand 1 in operand 0.  M is
     either a scalar or vector mode.  When it is a scalar, operand 1 has
     mode M but operand 0 must have mode ‘SImode’.  When M is a vector,
     operand 1 has the mode M.  operand 0's mode should be an vector
     integer mode which has the same number of elements and the same
     size as mode M.

     This pattern is not allowed to ‘FAIL’.

‘significandM2’
     Store the significand of floating-point operand 1 in operand 0.
     Both operands have mode M, which is a scalar or vector
     floating-point mode.

     This pattern is not allowed to ‘FAIL’.

‘powM3’
     Store the value of operand 1 raised to the exponent operand 2 into
     operand 0.  All operands have mode M, which is a scalar or vector
     floating-point mode.

     This pattern is not allowed to ‘FAIL’.

‘atan2M3’
     Store the arc tangent (inverse tangent) of operand 1 divided by
     operand 2 into operand 0, using the signs of both arguments to
     determine the quadrant of the result.  All operands have mode M,
     which is a scalar or vector floating-point mode.

     This pattern is not allowed to ‘FAIL’.

‘floorM2’
     Store the largest integral value not greater than operand 1 in
     operand 0.  Both operands have mode M, which is a scalar or vector
     floating-point mode.  If ‘-ffp-int-builtin-inexact’ is in effect,
     the "inexact" exception may be raised for noninteger operands;
     otherwise, it may not.

     This pattern is not allowed to ‘FAIL’.

‘btruncM2’
     Round operand 1 to an integer, towards zero, and store the result
     in operand 0.  Both operands have mode M, which is a scalar or
     vector floating-point mode.  If ‘-ffp-int-builtin-inexact’ is in
     effect, the "inexact" exception may be raised for noninteger
     operands; otherwise, it may not.

     This pattern is not allowed to ‘FAIL’.

‘roundM2’
     Round operand 1 to the nearest integer, rounding away from zero in
     the event of a tie, and store the result in operand 0.  Both
     operands have mode M, which is a scalar or vector floating-point
     mode.  If ‘-ffp-int-builtin-inexact’ is in effect, the "inexact"
     exception may be raised for noninteger operands; otherwise, it may
     not.

     This pattern is not allowed to ‘FAIL’.

‘ceilM2’
     Store the smallest integral value not less than operand 1 in
     operand 0.  Both operands have mode M, which is a scalar or vector
     floating-point mode.  If ‘-ffp-int-builtin-inexact’ is in effect,
     the "inexact" exception may be raised for noninteger operands;
     otherwise, it may not.

     This pattern is not allowed to ‘FAIL’.

‘nearbyintM2’
     Round operand 1 to an integer, using the current rounding mode, and
     store the result in operand 0.  Do not raise an inexact condition
     when the result is different from the argument.  Both operands have
     mode M, which is a scalar or vector floating-point mode.

     This pattern is not allowed to ‘FAIL’.

‘rintM2’
     Round operand 1 to an integer, using the current rounding mode, and
     store the result in operand 0.  Raise an inexact condition when the
     result is different from the argument.  Both operands have mode M,
     which is a scalar or vector floating-point mode.

     This pattern is not allowed to ‘FAIL’.

‘lrintMN2’
     Convert operand 1 (valid for floating point mode M) to fixed point
     mode N as a signed number according to the current rounding mode
     and store in operand 0 (which has mode N).

‘lroundMN2’
     Convert operand 1 (valid for floating point mode M) to fixed point
     mode N as a signed number rounding to nearest and away from zero
     and store in operand 0 (which has mode N).

‘lfloorMN2’
     Convert operand 1 (valid for floating point mode M) to fixed point
     mode N as a signed number rounding down and store in operand 0
     (which has mode N).

‘lceilMN2’
     Convert operand 1 (valid for floating point mode M) to fixed point
     mode N as a signed number rounding up and store in operand 0 (which
     has mode N).

‘copysignM3’
     Store a value with the magnitude of operand 1 and the sign of
     operand 2 into operand 0.  All operands have mode M, which is a
     scalar or vector floating-point mode.

     This pattern is not allowed to ‘FAIL’.

‘xorsignM3’
     Equivalent to ‘op0 = op1 * copysign (1.0, op2)’: store a value with
     the magnitude of operand 1 and the sign of operand 2 into operand
     0.  All operands have mode M, which is a scalar or vector
     floating-point mode.

     This pattern is not allowed to ‘FAIL’.

‘issignalingM2’
     Set operand 0 to 1 if operand 1 is a signaling NaN and to 0
     otherwise.

‘cadd90M3’
     Perform vector add and subtract on even/odd number pairs.  The
     operation being matched is semantically described as

            for (int i = 0; i < N; i += 2)
              {
                c[i] = a[i] - b[i+1];
                c[i+1] = a[i+1] + b[i];
              }

     This operation is semantically equivalent to performing a vector
     addition of complex numbers in operand 1 with operand 2 rotated by
     90 degrees around the argand plane and storing the result in
     operand 0.

     In GCC lane ordering the real part of the number must be in the
     even lanes with the imaginary part in the odd lanes.

     The operation is only supported for vector modes M.

     This pattern is not allowed to ‘FAIL’.

‘cadd270M3’
     Perform vector add and subtract on even/odd number pairs.  The
     operation being matched is semantically described as

            for (int i = 0; i < N; i += 2)
              {
                c[i] = a[i] + b[i+1];
                c[i+1] = a[i+1] - b[i];
              }

     This operation is semantically equivalent to performing a vector
     addition of complex numbers in operand 1 with operand 2 rotated by
     270 degrees around the argand plane and storing the result in
     operand 0.

     In GCC lane ordering the real part of the number must be in the
     even lanes with the imaginary part in the odd lanes.

     The operation is only supported for vector modes M.

     This pattern is not allowed to ‘FAIL’.

‘cmlaM4’
     Perform a vector multiply and accumulate that is semantically the
     same as a multiply and accumulate of complex numbers.

            complex TYPE op0[N];
            complex TYPE op1[N];
            complex TYPE op2[N];
            complex TYPE op3[N];
            for (int i = 0; i < N; i += 1)
              {
                op0[i] = op1[i] * op2[i] + op3[i];
              }

     In GCC lane ordering the real part of the number must be in the
     even lanes with the imaginary part in the odd lanes.

     The operation is only supported for vector modes M.

     This pattern is not allowed to ‘FAIL’.

‘cmla_conjM4’
     Perform a vector multiply by conjugate and accumulate that is
     semantically the same as a multiply and accumulate of complex
     numbers where the second multiply arguments is conjugated.

            complex TYPE op0[N];
            complex TYPE op1[N];
            complex TYPE op2[N];
            complex TYPE op3[N];
            for (int i = 0; i < N; i += 1)
              {
                op0[i] = op1[i] * conj (op2[i]) + op3[i];
              }

     In GCC lane ordering the real part of the number must be in the
     even lanes with the imaginary part in the odd lanes.

     The operation is only supported for vector modes M.

     This pattern is not allowed to ‘FAIL’.

‘cmlsM4’
     Perform a vector multiply and subtract that is semantically the
     same as a multiply and subtract of complex numbers.

            complex TYPE op0[N];
            complex TYPE op1[N];
            complex TYPE op2[N];
            complex TYPE op3[N];
            for (int i = 0; i < N; i += 1)
              {
                op0[i] = op1[i] * op2[i] - op3[i];
              }

     In GCC lane ordering the real part of the number must be in the
     even lanes with the imaginary part in the odd lanes.

     The operation is only supported for vector modes M.

     This pattern is not allowed to ‘FAIL’.

‘cmls_conjM4’
     Perform a vector multiply by conjugate and subtract that is
     semantically the same as a multiply and subtract of complex numbers
     where the second multiply arguments is conjugated.

            complex TYPE op0[N];
            complex TYPE op1[N];
            complex TYPE op2[N];
            complex TYPE op3[N];
            for (int i = 0; i < N; i += 1)
              {
                op0[i] = op1[i] * conj (op2[i]) - op3[i];
              }

     In GCC lane ordering the real part of the number must be in the
     even lanes with the imaginary part in the odd lanes.

     The operation is only supported for vector modes M.

     This pattern is not allowed to ‘FAIL’.

‘cmulM4’
     Perform a vector multiply that is semantically the same as multiply
     of complex numbers.

            complex TYPE op0[N];
            complex TYPE op1[N];
            complex TYPE op2[N];
            for (int i = 0; i < N; i += 1)
              {
                op0[i] = op1[i] * op2[i];
              }

     In GCC lane ordering the real part of the number must be in the
     even lanes with the imaginary part in the odd lanes.

     The operation is only supported for vector modes M.

     This pattern is not allowed to ‘FAIL’.

‘cmul_conjM4’
     Perform a vector multiply by conjugate that is semantically the
     same as a multiply of complex numbers where the second multiply
     arguments is conjugated.

            complex TYPE op0[N];
            complex TYPE op1[N];
            complex TYPE op2[N];
            for (int i = 0; i < N; i += 1)
              {
                op0[i] = op1[i] * conj (op2[i]);
              }

     In GCC lane ordering the real part of the number must be in the
     even lanes with the imaginary part in the odd lanes.

     The operation is only supported for vector modes M.

     This pattern is not allowed to ‘FAIL’.

‘ffsM2’
     Store into operand 0 one plus the index of the least significant
     1-bit of operand 1.  If operand 1 is zero, store zero.

     M is either a scalar or vector integer mode.  When it is a scalar,
     operand 1 has mode M but operand 0 can have whatever scalar integer
     mode is suitable for the target.  The compiler will insert
     conversion instructions as necessary (typically to convert the
     result to the same width as ‘int’).  When M is a vector, both
     operands must have mode M.

     This pattern is not allowed to ‘FAIL’.

‘clrsbM2’
     Count leading redundant sign bits.  Store into operand 0 the number
     of redundant sign bits in operand 1, starting at the most
     significant bit position.  A redundant sign bit is defined as any
     sign bit after the first.  As such, this count will be one less
     than the count of leading sign bits.

     M is either a scalar or vector integer mode.  When it is a scalar,
     operand 1 has mode M but operand 0 can have whatever scalar integer
     mode is suitable for the target.  The compiler will insert
     conversion instructions as necessary (typically to convert the
     result to the same width as ‘int’).  When M is a vector, both
     operands must have mode M.

     This pattern is not allowed to ‘FAIL’.

‘clzM2’
     Store into operand 0 the number of leading 0-bits in operand 1,
     starting at the most significant bit position.  If operand 1 is 0,
     the ‘CLZ_DEFINED_VALUE_AT_ZERO’ (*note Misc::) macro defines if the
     result is undefined or has a useful value.

     M is either a scalar or vector integer mode.  When it is a scalar,
     operand 1 has mode M but operand 0 can have whatever scalar integer
     mode is suitable for the target.  The compiler will insert
     conversion instructions as necessary (typically to convert the
     result to the same width as ‘int’).  When M is a vector, both
     operands must have mode M.

     This pattern is not allowed to ‘FAIL’.

‘ctzM2’
     Store into operand 0 the number of trailing 0-bits in operand 1,
     starting at the least significant bit position.  If operand 1 is 0,
     the ‘CTZ_DEFINED_VALUE_AT_ZERO’ (*note Misc::) macro defines if the
     result is undefined or has a useful value.

     M is either a scalar or vector integer mode.  When it is a scalar,
     operand 1 has mode M but operand 0 can have whatever scalar integer
     mode is suitable for the target.  The compiler will insert
     conversion instructions as necessary (typically to convert the
     result to the same width as ‘int’).  When M is a vector, both
     operands must have mode M.

     This pattern is not allowed to ‘FAIL’.

‘popcountM2’
     Store into operand 0 the number of 1-bits in operand 1.

     M is either a scalar or vector integer mode.  When it is a scalar,
     operand 1 has mode M but operand 0 can have whatever scalar integer
     mode is suitable for the target.  The compiler will insert
     conversion instructions as necessary (typically to convert the
     result to the same width as ‘int’).  When M is a vector, both
     operands must have mode M.

     This pattern is not allowed to ‘FAIL’.

‘parityM2’
     Store into operand 0 the parity of operand 1, i.e. the number of
     1-bits in operand 1 modulo 2.

     M is either a scalar or vector integer mode.  When it is a scalar,
     operand 1 has mode M but operand 0 can have whatever scalar integer
     mode is suitable for the target.  The compiler will insert
     conversion instructions as necessary (typically to convert the
     result to the same width as ‘int’).  When M is a vector, both
     operands must have mode M.

     This pattern is not allowed to ‘FAIL’.

‘one_cmplM2’
     Store the bitwise-complement of operand 1 into operand 0.

‘cpymemM’
     Block copy instruction.  The destination and source blocks of
     memory are the first two operands, and both are ‘mem:BLK’s with an
     address in mode ‘Pmode’.

     The number of bytes to copy is the third operand, in mode M.
     Usually, you specify ‘Pmode’ for M.  However, if you can generate
     better code knowing the range of valid lengths is smaller than
     those representable in a full Pmode pointer, you should provide a
     pattern with a mode corresponding to the range of values you can
     handle efficiently (e.g., ‘QImode’ for values in the range 0-127;
     note we avoid numbers that appear negative) and also a pattern with
     ‘Pmode’.

     The fourth operand is the known shared alignment of the source and
     destination, in the form of a ‘const_int’ rtx.  Thus, if the
     compiler knows that both source and destination are word-aligned,
     it may provide the value 4 for this operand.

     Optional operands 5 and 6 specify expected alignment and size of
     block respectively.  The expected alignment differs from alignment
     in operand 4 in a way that the blocks are not required to be
     aligned according to it in all cases.  This expected alignment is
     also in bytes, just like operand 4.  Expected size, when unknown,
     is set to ‘(const_int -1)’.

     Descriptions of multiple ‘cpymemM’ patterns can only be beneficial
     if the patterns for smaller modes have fewer restrictions on their
     first, second and fourth operands.  Note that the mode M in
     ‘cpymemM’ does not impose any restriction on the mode of
     individually copied data units in the block.

     The ‘cpymemM’ patterns need not give special consideration to the
     possibility that the source and destination strings might overlap.
     An exception is the case where source and destination are equal,
     this case needs to be handled correctly.  These patterns are used
     to do inline expansion of ‘__builtin_memcpy’.

‘movmemM’
     Block move instruction.  The destination and source blocks of
     memory are the first two operands, and both are ‘mem:BLK’s with an
     address in mode ‘Pmode’.

     The number of bytes to copy is the third operand, in mode M.
     Usually, you specify ‘Pmode’ for M.  However, if you can generate
     better code knowing the range of valid lengths is smaller than
     those representable in a full Pmode pointer, you should provide a
     pattern with a mode corresponding to the range of values you can
     handle efficiently (e.g., ‘QImode’ for values in the range 0-127;
     note we avoid numbers that appear negative) and also a pattern with
     ‘Pmode’.

     The fourth operand is the known shared alignment of the source and
     destination, in the form of a ‘const_int’ rtx.  Thus, if the
     compiler knows that both source and destination are word-aligned,
     it may provide the value 4 for this operand.

     Optional operands 5 and 6 specify expected alignment and size of
     block respectively.  The expected alignment differs from alignment
     in operand 4 in a way that the blocks are not required to be
     aligned according to it in all cases.  This expected alignment is
     also in bytes, just like operand 4.  Expected size, when unknown,
     is set to ‘(const_int -1)’.

     Descriptions of multiple ‘movmemM’ patterns can only be beneficial
     if the patterns for smaller modes have fewer restrictions on their
     first, second and fourth operands.  Note that the mode M in
     ‘movmemM’ does not impose any restriction on the mode of
     individually copied data units in the block.

     The ‘movmemM’ patterns must correctly handle the case where the
     source and destination strings overlap.  These patterns are used to
     do inline expansion of ‘__builtin_memmove’.

‘movstr’
     String copy instruction, with ‘stpcpy’ semantics.  Operand 0 is an
     output operand in mode ‘Pmode’.  The addresses of the destination
     and source strings are operands 1 and 2, and both are ‘mem:BLK’s
     with addresses in mode ‘Pmode’.  The execution of the expansion of
     this pattern should store in operand 0 the address in which the
     ‘NUL’ terminator was stored in the destination string.

     This pattern has also several optional operands that are same as in
     ‘setmem’.

‘setmemM’
     Block set instruction.  The destination string is the first
     operand, given as a ‘mem:BLK’ whose address is in mode ‘Pmode’.
     The number of bytes to set is the second operand, in mode M.  The
     value to initialize the memory with is the third operand.  Targets
     that only support the clearing of memory should reject any value
     that is not the constant 0.  See ‘cpymemM’ for a discussion of the
     choice of mode.

     The fourth operand is the known alignment of the destination, in
     the form of a ‘const_int’ rtx.  Thus, if the compiler knows that
     the destination is word-aligned, it may provide the value 4 for
     this operand.

     Optional operands 5 and 6 specify expected alignment and size of
     block respectively.  The expected alignment differs from alignment
     in operand 4 in a way that the blocks are not required to be
     aligned according to it in all cases.  This expected alignment is
     also in bytes, just like operand 4.  Expected size, when unknown,
     is set to ‘(const_int -1)’.  Operand 7 is the minimal size of the
     block and operand 8 is the maximal size of the block (NULL if it
     cannot be represented as CONST_INT). Operand 9 is the probable
     maximal size (i.e. we cannot rely on it for correctness, but it can
     be used for choosing proper code sequence for a given size).

     The use for multiple ‘setmemM’ is as for ‘cpymemM’.

‘cmpstrnM’
     String compare instruction, with five operands.  Operand 0 is the
     output; it has mode M.  The remaining four operands are like the
     operands of ‘cpymemM’.  The two memory blocks specified are
     compared byte by byte in lexicographic order starting at the
     beginning of each string.  The instruction is not allowed to
     prefetch more than one byte at a time since either string may end
     in the first byte and reading past that may access an invalid page
     or segment and cause a fault.  The comparison terminates early if
     the fetched bytes are different or if they are equal to zero.  The
     effect of the instruction is to store a value in operand 0 whose
     sign indicates the result of the comparison.

‘cmpstrM’
     String compare instruction, without known maximum length.  Operand
     0 is the output; it has mode M.  The second and third operand are
     the blocks of memory to be compared; both are ‘mem:BLK’ with an
     address in mode ‘Pmode’.

     The fourth operand is the known shared alignment of the source and
     destination, in the form of a ‘const_int’ rtx.  Thus, if the
     compiler knows that both source and destination are word-aligned,
     it may provide the value 4 for this operand.

     The two memory blocks specified are compared byte by byte in
     lexicographic order starting at the beginning of each string.  The
     instruction is not allowed to prefetch more than one byte at a time
     since either string may end in the first byte and reading past that
     may access an invalid page or segment and cause a fault.  The
     comparison will terminate when the fetched bytes are different or
     if they are equal to zero.  The effect of the instruction is to
     store a value in operand 0 whose sign indicates the result of the
     comparison.

‘cmpmemM’
     Block compare instruction, with five operands like the operands of
     ‘cmpstrM’.  The two memory blocks specified are compared byte by
     byte in lexicographic order starting at the beginning of each
     block.  Unlike ‘cmpstrM’ the instruction can prefetch any bytes in
     the two memory blocks.  Also unlike ‘cmpstrM’ the comparison will
     not stop if both bytes are zero.  The effect of the instruction is
     to store a value in operand 0 whose sign indicates the result of
     the comparison.

‘strlenM’
     Compute the length of a string, with three operands.  Operand 0 is
     the result (of mode M), operand 1 is a ‘mem’ referring to the first
     character of the string, operand 2 is the character to search for
     (normally zero), and operand 3 is a constant describing the known
     alignment of the beginning of the string.

‘rawmemchrM’
     Scan memory referred to by operand 1 for the first occurrence of
     operand 2.  Operand 1 is a ‘mem’ and operand 2 a ‘const_int’ of
     mode M.  Operand 0 is the result, i.e., a pointer to the first
     occurrence of operand 2 in the memory block given by operand 1.

‘floatMN2’
     Convert signed integer operand 1 (valid for fixed point mode M) to
     floating point mode N and store in operand 0 (which has mode N).

‘floatunsMN2’
     Convert unsigned integer operand 1 (valid for fixed point mode M)
     to floating point mode N and store in operand 0 (which has mode N).

‘fixMN2’
     Convert operand 1 (valid for floating point mode M) to fixed point
     mode N as a signed number and store in operand 0 (which has mode
     N).  This instruction's result is defined only when the value of
     operand 1 is an integer.

     If the machine description defines this pattern, it also needs to
     define the ‘ftrunc’ pattern.

‘fixunsMN2’
     Convert operand 1 (valid for floating point mode M) to fixed point
     mode N as an unsigned number and store in operand 0 (which has mode
     N).  This instruction's result is defined only when the value of
     operand 1 is an integer.

‘ftruncM2’
     Convert operand 1 (valid for floating point mode M) to an integer
     value, still represented in floating point mode M, and store it in
     operand 0 (valid for floating point mode M).

‘fix_truncMN2’
     Like ‘fixMN2’ but works for any floating point value of mode M by
     converting the value to an integer.

‘fixuns_truncMN2’
     Like ‘fixunsMN2’ but works for any floating point value of mode M
     by converting the value to an integer.

‘truncMN2’
     Truncate operand 1 (valid for mode M) to mode N and store in
     operand 0 (which has mode N).  Both modes must be fixed point or
     both floating point.

‘extendMN2’
     Sign-extend operand 1 (valid for mode M) to mode N and store in
     operand 0 (which has mode N).  Both modes must be fixed point or
     both floating point.

‘zero_extendMN2’
     Zero-extend operand 1 (valid for mode M) to mode N and store in
     operand 0 (which has mode N).  Both modes must be fixed point.

‘fractMN2’
     Convert operand 1 of mode M to mode N and store in operand 0 (which
     has mode N).  Mode M and mode N could be fixed-point to
     fixed-point, signed integer to fixed-point, fixed-point to signed
     integer, floating-point to fixed-point, or fixed-point to
     floating-point.  When overflows or underflows happen, the results
     are undefined.

‘satfractMN2’
     Convert operand 1 of mode M to mode N and store in operand 0 (which
     has mode N).  Mode M and mode N could be fixed-point to
     fixed-point, signed integer to fixed-point, or floating-point to
     fixed-point.  When overflows or underflows happen, the instruction
     saturates the results to the maximum or the minimum.

‘fractunsMN2’
     Convert operand 1 of mode M to mode N and store in operand 0 (which
     has mode N).  Mode M and mode N could be unsigned integer to
     fixed-point, or fixed-point to unsigned integer.  When overflows or
     underflows happen, the results are undefined.

‘satfractunsMN2’
     Convert unsigned integer operand 1 of mode M to fixed-point mode N
     and store in operand 0 (which has mode N).  When overflows or
     underflows happen, the instruction saturates the results to the
     maximum or the minimum.

‘extvM’
     Extract a bit-field from register operand 1, sign-extend it, and
     store it in operand 0.  Operand 2 specifies the width of the field
     in bits and operand 3 the starting bit, which counts from the most
     significant bit if ‘BITS_BIG_ENDIAN’ is true and from the least
     significant bit otherwise.

     Operands 0 and 1 both have mode M.  Operands 2 and 3 have a
     target-specific mode.

‘extvmisalignM’
     Extract a bit-field from memory operand 1, sign extend it, and
     store it in operand 0.  Operand 2 specifies the width in bits and
     operand 3 the starting bit.  The starting bit is always somewhere
     in the first byte of operand 1; it counts from the most significant
     bit if ‘BITS_BIG_ENDIAN’ is true and from the least significant bit
     otherwise.

     Operand 0 has mode M while operand 1 has ‘BLK’ mode.  Operands 2
     and 3 have a target-specific mode.

     The instruction must not read beyond the last byte of the
     bit-field.

‘extzvM’
     Like ‘extvM’ except that the bit-field value is zero-extended.

‘extzvmisalignM’
     Like ‘extvmisalignM’ except that the bit-field value is
     zero-extended.

‘insvM’
     Insert operand 3 into a bit-field of register operand 0.  Operand 1
     specifies the width of the field in bits and operand 2 the starting
     bit, which counts from the most significant bit if
     ‘BITS_BIG_ENDIAN’ is true and from the least significant bit
     otherwise.

     Operands 0 and 3 both have mode M.  Operands 1 and 2 have a
     target-specific mode.

‘insvmisalignM’
     Insert operand 3 into a bit-field of memory operand 0.  Operand 1
     specifies the width of the field in bits and operand 2 the starting
     bit.  The starting bit is always somewhere in the first byte of
     operand 0; it counts from the most significant bit if
     ‘BITS_BIG_ENDIAN’ is true and from the least significant bit
     otherwise.

     Operand 3 has mode M while operand 0 has ‘BLK’ mode.  Operands 1
     and 2 have a target-specific mode.

     The instruction must not read or write beyond the last byte of the
     bit-field.

‘extv’
     Extract a bit-field from operand 1 (a register or memory operand),
     where operand 2 specifies the width in bits and operand 3 the
     starting bit, and store it in operand 0.  Operand 0 must have mode
     ‘word_mode’.  Operand 1 may have mode ‘byte_mode’ or ‘word_mode’;
     often ‘word_mode’ is allowed only for registers.  Operands 2 and 3
     must be valid for ‘word_mode’.

     The RTL generation pass generates this instruction only with
     constants for operands 2 and 3 and the constant is never zero for
     operand 2.

     The bit-field value is sign-extended to a full word integer before
     it is stored in operand 0.

     This pattern is deprecated; please use ‘extvM’ and ‘extvmisalignM’
     instead.

‘extzv’
     Like ‘extv’ except that the bit-field value is zero-extended.

     This pattern is deprecated; please use ‘extzvM’ and
     ‘extzvmisalignM’ instead.

‘insv’
     Store operand 3 (which must be valid for ‘word_mode’) into a
     bit-field in operand 0, where operand 1 specifies the width in bits
     and operand 2 the starting bit.  Operand 0 may have mode
     ‘byte_mode’ or ‘word_mode’; often ‘word_mode’ is allowed only for
     registers.  Operands 1 and 2 must be valid for ‘word_mode’.

     The RTL generation pass generates this instruction only with
     constants for operands 1 and 2 and the constant is never zero for
     operand 1.

     This pattern is deprecated; please use ‘insvM’ and ‘insvmisalignM’
     instead.

‘movMODEcc’
     Conditionally move operand 2 or operand 3 into operand 0 according
     to the comparison in operand 1.  If the comparison is true, operand
     2 is moved into operand 0, otherwise operand 3 is moved.

     The mode of the operands being compared need not be the same as the
     operands being moved.  Some machines, sparc64 for example, have
     instructions that conditionally move an integer value based on the
     floating point condition codes and vice versa.

     If the machine does not have conditional move instructions, do not
     define these patterns.

‘addMODEcc’
     Similar to ‘movMODEcc’ but for conditional addition.  Conditionally
     move operand 2 or (operands 2 + operand 3) into operand 0 according
     to the comparison in operand 1.  If the comparison is false,
     operand 2 is moved into operand 0, otherwise (operand 2 + operand
     3) is moved.

‘cond_negMODE’
‘cond_one_cmplMODE’
     When operand 1 is true, perform an operation on operands 2 and
     store the result in operand 0, otherwise store operand 3 in operand
     0.  The operation works elementwise if the operands are vectors.

     The scalar case is equivalent to:

          op0 = op1 ? OP op2 : op3;

     while the vector case is equivalent to:

          for (i = 0; i < GET_MODE_NUNITS (M); i++)
            op0[i] = op1[i] ? OP op2[i] : op3[i];

     where, for example, OP is ‘~’ for ‘cond_one_cmplMODE’.

     When defined for floating-point modes, the contents of ‘op2[i]’ are
     not interpreted if ‘op1[i]’ is false, just like they would not be
     in a normal C ‘?:’ condition.

     Operands 0, 2, and 3 all have mode M.  Operand 1 is a scalar
     integer if M is scalar, otherwise it has the mode returned by
     ‘TARGET_VECTORIZE_GET_MASK_MODE’.

     ‘cond_OPMODE’ generally corresponds to a conditional form of
     ‘OPMODE2’.

‘cond_addMODE’
‘cond_subMODE’
‘cond_mulMODE’
‘cond_divMODE’
‘cond_udivMODE’
‘cond_modMODE’
‘cond_umodMODE’
‘cond_andMODE’
‘cond_iorMODE’
‘cond_xorMODE’
‘cond_sminMODE’
‘cond_smaxMODE’
‘cond_uminMODE’
‘cond_umaxMODE’
‘cond_copysignMODE’
‘cond_fminMODE’
‘cond_fmaxMODE’
‘cond_ashlMODE’
‘cond_ashrMODE’
‘cond_lshrMODE’
     When operand 1 is true, perform an operation on operands 2 and 3
     and store the result in operand 0, otherwise store operand 4 in
     operand 0.  The operation works elementwise if the operands are
     vectors.

     The scalar case is equivalent to:

          op0 = op1 ? op2 OP op3 : op4;

     while the vector case is equivalent to:

          for (i = 0; i < GET_MODE_NUNITS (M); i++)
            op0[i] = op1[i] ? op2[i] OP op3[i] : op4[i];

     where, for example, OP is ‘+’ for ‘cond_addMODE’.

     When defined for floating-point modes, the contents of ‘op3[i]’ are
     not interpreted if ‘op1[i]’ is false, just like they would not be
     in a normal C ‘?:’ condition.

     Operands 0, 2, 3 and 4 all have mode M.  Operand 1 is a scalar
     integer if M is scalar, otherwise it has the mode returned by
     ‘TARGET_VECTORIZE_GET_MASK_MODE’.

     ‘cond_OPMODE’ generally corresponds to a conditional form of
     ‘OPMODE3’.  As an exception, the vector forms of shifts correspond
     to patterns like ‘vashlMODE3’ rather than patterns like
     ‘ashlMODE3’.

     ‘cond_copysignMODE’ is only defined for floating point modes.

‘cond_fmaMODE’
‘cond_fmsMODE’
‘cond_fnmaMODE’
‘cond_fnmsMODE’
     Like ‘cond_addM’, except that the conditional operation takes 3
     operands rather than two.  For example, the vector form of
     ‘cond_fmaMODE’ is equivalent to:

          for (i = 0; i < GET_MODE_NUNITS (M); i++)
            op0[i] = op1[i] ? fma (op2[i], op3[i], op4[i]) : op5[i];

‘cond_len_negMODE’
‘cond_len_one_cmplMODE’
     When operand 1 is true and element index < operand 4 + operand 5,
     perform an operation on operands 1 and store the result in operand
     0, otherwise store operand 2 in operand 0.  The operation only
     works for the operands are vectors.

          for (i = 0; i < GET_MODE_NUNITS (M); i++)
            op0[i] = (i < ops[4] + ops[5] && op1[i]
                      ? OP op2[i]
                      : op3[i]);

     where, for example, OP is ‘~’ for ‘cond_len_one_cmplMODE’.

     When defined for floating-point modes, the contents of ‘op2[i]’ are
     not interpreted if ‘op1[i]’ is false, just like they would not be
     in a normal C ‘?:’ condition.

     Operands 0, 2, and 3 all have mode M.  Operand 1 is a scalar
     integer if M is scalar, otherwise it has the mode returned by
     ‘TARGET_VECTORIZE_GET_MASK_MODE’.  Operand 4 has whichever integer
     mode the target prefers.

     ‘cond_len_OPMODE’ generally corresponds to a conditional form of
     ‘OPMODE2’.

‘cond_len_addMODE’
‘cond_len_subMODE’
‘cond_len_mulMODE’
‘cond_len_divMODE’
‘cond_len_udivMODE’
‘cond_len_modMODE’
‘cond_len_umodMODE’
‘cond_len_andMODE’
‘cond_len_iorMODE’
‘cond_len_xorMODE’
‘cond_len_sminMODE’
‘cond_len_smaxMODE’
‘cond_len_uminMODE’
‘cond_len_umaxMODE’
‘cond_len_copysignMODE’
‘cond_len_fminMODE’
‘cond_len_fmaxMODE’
‘cond_len_ashlMODE’
‘cond_len_ashrMODE’
‘cond_len_lshrMODE’
     When operand 1 is true and element index < operand 5 + operand 6,
     perform an operation on operands 2 and 3 and store the result in
     operand 0, otherwise store operand 4 in operand 0.  The operation
     only works for the operands are vectors.

          for (i = 0; i < GET_MODE_NUNITS (M); i++)
            op0[i] = (i < ops[5] + ops[6] && op1[i]
                      ? op2[i] OP op3[i]
                      : op4[i]);

     where, for example, OP is ‘+’ for ‘cond_len_addMODE’.

     When defined for floating-point modes, the contents of ‘op3[i]’ are
     not interpreted if ‘op1[i]’ is false, just like they would not be
     in a normal C ‘?:’ condition.

     Operands 0, 2, 3 and 4 all have mode M.  Operand 1 is a scalar
     integer if M is scalar, otherwise it has the mode returned by
     ‘TARGET_VECTORIZE_GET_MASK_MODE’.  Operand 5 has whichever integer
     mode the target prefers.

     ‘cond_len_OPMODE’ generally corresponds to a conditional form of
     ‘OPMODE3’.  As an exception, the vector forms of shifts correspond
     to patterns like ‘vashlMODE3’ rather than patterns like
     ‘ashlMODE3’.

     ‘cond_len_copysignMODE’ is only defined for floating point modes.

‘cond_len_fmaMODE’
‘cond_len_fmsMODE’
‘cond_len_fnmaMODE’
‘cond_len_fnmsMODE’
     Like ‘cond_len_addM’, except that the conditional operation takes 3
     operands rather than two.  For example, the vector form of
     ‘cond_len_fmaMODE’ is equivalent to:

          for (i = 0; i < GET_MODE_NUNITS (M); i++)
            op0[i] = (i < ops[6] + ops[7] && op1[i]
                      ? fma (op2[i], op3[i], op4[i])
                      : op5[i]);

‘negMODEcc’
     Similar to ‘movMODEcc’ but for conditional negation.  Conditionally
     move the negation of operand 2 or the unchanged operand 3 into
     operand 0 according to the comparison in operand 1.  If the
     comparison is true, the negation of operand 2 is moved into operand
     0, otherwise operand 3 is moved.

‘notMODEcc’
     Similar to ‘negMODEcc’ but for conditional complement.
     Conditionally move the bitwise complement of operand 2 or the
     unchanged operand 3 into operand 0 according to the comparison in
     operand 1.  If the comparison is true, the complement of operand 2
     is moved into operand 0, otherwise operand 3 is moved.

‘cstoreMODE4’
     Store zero or nonzero in operand 0 according to whether a
     comparison is true.  Operand 1 is a comparison operator.  Operand 2
     and operand 3 are the first and second operand of the comparison,
     respectively.  You specify the mode that operand 0 must have when
     you write the ‘match_operand’ expression.  The compiler
     automatically sees which mode you have used and supplies an operand
     of that mode.

     The value stored for a true condition must have 1 as its low bit,
     or else must be negative.  Otherwise the instruction is not
     suitable and you should omit it from the machine description.  You
     describe to the compiler exactly which value is stored by defining
     the macro ‘STORE_FLAG_VALUE’ (*note Misc::).  If a description
     cannot be found that can be used for all the possible comparison
     operators, you should pick one and use a ‘define_expand’ to map all
     results onto the one you chose.

     These operations may ‘FAIL’, but should do so only in relatively
     uncommon cases; if they would ‘FAIL’ for common cases involving
     integer comparisons, it is best to restrict the predicates to not
     allow these operands.  Likewise if a given comparison operator will
     always fail, independent of the operands (for floating-point modes,
     the ‘ordered_comparison_operator’ predicate is often useful in this
     case).

     If this pattern is omitted, the compiler will generate a
     conditional branch--for example, it may copy a constant one to the
     target and branching around an assignment of zero to the target--or
     a libcall.  If the predicate for operand 1 only rejects some
     operators, it will also try reordering the operands and/or
     inverting the result value (e.g. by an exclusive OR). These
     possibilities could be cheaper or equivalent to the instructions
     used for the ‘cstoreMODE4’ pattern followed by those required to
     convert a positive result from ‘STORE_FLAG_VALUE’ to 1; in this
     case, you can and should make operand 1's predicate reject some
     operators in the ‘cstoreMODE4’ pattern, or remove the pattern
     altogether from the machine description.

‘tbranch_OPMODE3’
     Conditional branch instruction combined with a bit test-and-compare
     instruction.  Operand 0 is the operand of the comparison.  Operand
     1 is the bit position of Operand 1 to test.  Operand 3 is the
     ‘code_label’ to jump to.  OP is one of EQ or NE.

‘cbranchMODE4’
     Conditional branch instruction combined with a compare instruction.
     Operand 0 is a comparison operator.  Operand 1 and operand 2 are
     the first and second operands of the comparison, respectively.
     Operand 3 is the ‘code_label’ to jump to.

‘jump’
     A jump inside a function; an unconditional branch.  Operand 0 is
     the ‘code_label’ to jump to.  This pattern name is mandatory on all
     machines.

‘call’
     Subroutine call instruction returning no value.  Operand 0 is the
     function to call; operand 1 is the number of bytes of arguments
     pushed as a ‘const_int’.  Operand 2 is the result of calling the
     target hook ‘TARGET_FUNCTION_ARG’ with the second argument ‘arg’
     yielding true for ‘arg.end_marker_p ()’, in a call after all
     parameters have been passed to that hook.  By default this is the
     first register beyond those used for arguments in the call, or
     ‘NULL’ if all the argument-registers are used in the call.

     On most machines, operand 2 is not actually stored into the RTL
     pattern.  It is supplied for the sake of some RISC machines which
     need to put this information into the assembler code; they can put
     it in the RTL instead of operand 1.

     Operand 0 should be a ‘mem’ RTX whose address is the address of the
     function.  Note, however, that this address can be a ‘symbol_ref’
     expression even if it would not be a legitimate memory address on
     the target machine.  If it is also not a valid argument for a call
     instruction, the pattern for this operation should be a
     ‘define_expand’ (*note Expander Definitions::) that places the
     address into a register and uses that register in the call
     instruction.

‘call_value’
     Subroutine call instruction returning a value.  Operand 0 is the
     hard register in which the value is returned.  There are three more
     operands, the same as the three operands of the ‘call’ instruction
     (but with numbers increased by one).

     Subroutines that return ‘BLKmode’ objects use the ‘call’ insn.

‘call_pop’, ‘call_value_pop’
     Similar to ‘call’ and ‘call_value’, except used if defined and if
     ‘RETURN_POPS_ARGS’ is nonzero.  They should emit a ‘parallel’ that
     contains both the function call and a ‘set’ to indicate the
     adjustment made to the frame pointer.

     For machines where ‘RETURN_POPS_ARGS’ can be nonzero, the use of
     these patterns increases the number of functions for which the
     frame pointer can be eliminated, if desired.

‘untyped_call’
     Subroutine call instruction returning a value of any type.  Operand
     0 is the function to call; operand 1 is a memory location where the
     result of calling the function is to be stored; operand 2 is a
     ‘parallel’ expression where each element is a ‘set’ expression that
     indicates the saving of a function return value into the result
     block.

     This instruction pattern should be defined to support
     ‘__builtin_apply’ on machines where special instructions are needed
     to call a subroutine with arbitrary arguments or to save the value
     returned.  This instruction pattern is required on machines that
     have multiple registers that can hold a return value (i.e.
     ‘FUNCTION_VALUE_REGNO_P’ is true for more than one register).

‘return’
     Subroutine return instruction.  This instruction pattern name
     should be defined only if a single instruction can do all the work
     of returning from a function.

     Like the ‘movM’ patterns, this pattern is also used after the RTL
     generation phase.  In this case it is to support machines where
     multiple instructions are usually needed to return from a function,
     but some class of functions only requires one instruction to
     implement a return.  Normally, the applicable functions are those
     which do not need to save any registers or allocate stack space.

     It is valid for this pattern to expand to an instruction using
     ‘simple_return’ if no epilogue is required.

‘simple_return’
     Subroutine return instruction.  This instruction pattern name
     should be defined only if a single instruction can do all the work
     of returning from a function on a path where no epilogue is
     required.  This pattern is very similar to the ‘return’ instruction
     pattern, but it is emitted only by the shrink-wrapping optimization
     on paths where the function prologue has not been executed, and a
     function return should occur without any of the effects of the
     epilogue.  Additional uses may be introduced on paths where both
     the prologue and the epilogue have executed.

     For such machines, the condition specified in this pattern should
     only be true when ‘reload_completed’ is nonzero and the function's
     epilogue would only be a single instruction.  For machines with
     register windows, the routine ‘leaf_function_p’ may be used to
     determine if a register window push is required.

     Machines that have conditional return instructions should define
     patterns such as

          (define_insn ""
            [(set (pc)
                  (if_then_else (match_operator
                                   0 "comparison_operator"
                                   [(reg:CC CC_REG) (const_int 0)])
                                (return)
                                (pc)))]
            "CONDITION"
            "...")

     where CONDITION would normally be the same condition specified on
     the named ‘return’ pattern.

‘untyped_return’
     Untyped subroutine return instruction.  This instruction pattern
     should be defined to support ‘__builtin_return’ on machines where
     special instructions are needed to return a value of any type.

     Operand 0 is a memory location where the result of calling a
     function with ‘__builtin_apply’ is stored; operand 1 is a
     ‘parallel’ expression where each element is a ‘set’ expression that
     indicates the restoring of a function return value from the result
     block.

‘nop’
     No-op instruction.  This instruction pattern name should always be
     defined to output a no-op in assembler code.  ‘(const_int 0)’ will
     do as an RTL pattern.

‘indirect_jump’
     An instruction to jump to an address which is operand zero.  This
     pattern name is mandatory on all machines.

‘casesi’
     Instruction to jump through a dispatch table, including bounds
     checking.  This instruction takes five operands:

       1. The index to dispatch on, which has mode ‘SImode’.

       2. The lower bound for indices in the table, an integer constant.

       3. The total range of indices in the table--the largest index
          minus the smallest one (both inclusive).

       4. A label that precedes the table itself.

       5. A label to jump to if the index has a value outside the
          bounds.

     The table is an ‘addr_vec’ or ‘addr_diff_vec’ inside of a
     ‘jump_table_data’.  The number of elements in the table is one plus
     the difference between the upper bound and the lower bound.

‘tablejump’
     Instruction to jump to a variable address.  This is a low-level
     capability which can be used to implement a dispatch table when
     there is no ‘casesi’ pattern.

     This pattern requires two operands: the address or offset, and a
     label which should immediately precede the jump table.  If the
     macro ‘CASE_VECTOR_PC_RELATIVE’ evaluates to a nonzero value then
     the first operand is an offset which counts from the address of the
     table; otherwise, it is an absolute address to jump to.  In either
     case, the first operand has mode ‘Pmode’.

     The ‘tablejump’ insn is always the last insn before the jump table
     it uses.  Its assembler code normally has no need to use the second
     operand, but you should incorporate it in the RTL pattern so that
     the jump optimizer will not delete the table as unreachable code.

‘doloop_end’
     Conditional branch instruction that decrements a register and jumps
     if the register is nonzero.  Operand 0 is the register to decrement
     and test; operand 1 is the label to jump to if the register is
     nonzero.  *Note Looping Patterns::.

     This optional instruction pattern should be defined for machines
     with low-overhead looping instructions as the loop optimizer will
     try to modify suitable loops to utilize it.  The target hook
     ‘TARGET_CAN_USE_DOLOOP_P’ controls the conditions under which
     low-overhead loops can be used.

‘doloop_begin’
     Companion instruction to ‘doloop_end’ required for machines that
     need to perform some initialization, such as loading a special
     counter register.  Operand 1 is the associated ‘doloop_end’ pattern
     and operand 0 is the register that it decrements.

     If initialization insns do not always need to be emitted, use a
     ‘define_expand’ (*note Expander Definitions::) and make it fail.

‘canonicalize_funcptr_for_compare’
     Canonicalize the function pointer in operand 1 and store the result
     into operand 0.

     Operand 0 is always a ‘reg’ and has mode ‘Pmode’; operand 1 may be
     a ‘reg’, ‘mem’, ‘symbol_ref’, ‘const_int’, etc and also has mode
     ‘Pmode’.

     Canonicalization of a function pointer usually involves computing
     the address of the function which would be called if the function
     pointer were used in an indirect call.

     Only define this pattern if function pointers on the target machine
     can have different values but still call the same function when
     used in an indirect call.

‘save_stack_block’
‘save_stack_function’
‘save_stack_nonlocal’
‘restore_stack_block’
‘restore_stack_function’
‘restore_stack_nonlocal’
     Most machines save and restore the stack pointer by copying it to
     or from an object of mode ‘Pmode’.  Do not define these patterns on
     such machines.

     Some machines require special handling for stack pointer saves and
     restores.  On those machines, define the patterns corresponding to
     the non-standard cases by using a ‘define_expand’ (*note Expander
     Definitions::) that produces the required insns.  The three types
     of saves and restores are:

       1. ‘save_stack_block’ saves the stack pointer at the start of a
          block that allocates a variable-sized object, and
          ‘restore_stack_block’ restores the stack pointer when the
          block is exited.

       2. ‘save_stack_function’ and ‘restore_stack_function’ do a
          similar job for the outermost block of a function and are used
          when the function allocates variable-sized objects or calls
          ‘alloca’.  Only the epilogue uses the restored stack pointer,
          allowing a simpler save or restore sequence on some machines.

       3. ‘save_stack_nonlocal’ is used in functions that contain labels
          branched to by nested functions.  It saves the stack pointer
          in such a way that the inner function can use
          ‘restore_stack_nonlocal’ to restore the stack pointer.  The
          compiler generates code to restore the frame and argument
          pointer registers, but some machines require saving and
          restoring additional data such as register window information
          or stack backchains.  Place insns in these patterns to save
          and restore any such required data.

     When saving the stack pointer, operand 0 is the save area and
     operand 1 is the stack pointer.  The mode used to allocate the save
     area defaults to ‘Pmode’ but you can override that choice by
     defining the ‘STACK_SAVEAREA_MODE’ macro (*note Storage Layout::).
     You must specify an integral mode, or ‘VOIDmode’ if no save area is
     needed for a particular type of save (either because no save is
     needed or because a machine-specific save area can be used).
     Operand 0 is the stack pointer and operand 1 is the save area for
     restore operations.  If ‘save_stack_block’ is defined, operand 0
     must not be ‘VOIDmode’ since these saves can be arbitrarily nested.

     A save area is a ‘mem’ that is at a constant offset from
     ‘virtual_stack_vars_rtx’ when the stack pointer is saved for use by
     nonlocal gotos and a ‘reg’ in the other two cases.

‘allocate_stack’
     Subtract (or add if ‘STACK_GROWS_DOWNWARD’ is undefined) operand 1
     from the stack pointer to create space for dynamically allocated
     data.

     Store the resultant pointer to this space into operand 0.  If you
     are allocating space from the main stack, do this by emitting a
     move insn to copy ‘virtual_stack_dynamic_rtx’ to operand 0.  If you
     are allocating the space elsewhere, generate code to copy the
     location of the space to operand 0.  In the latter case, you must
     ensure this space gets freed when the corresponding space on the
     main stack is free.

     Do not define this pattern if all that must be done is the
     subtraction.  Some machines require other operations such as stack
     probes or maintaining the back chain.  Define this pattern to emit
     those operations in addition to updating the stack pointer.

‘check_stack’
     If stack checking (*note Stack Checking::) cannot be done on your
     system by probing the stack, define this pattern to perform the
     needed check and signal an error if the stack has overflowed.  The
     single operand is the address in the stack farthest from the
     current stack pointer that you need to validate.  Normally, on
     platforms where this pattern is needed, you would obtain the stack
     limit from a global or thread-specific variable or register.

‘probe_stack_address’
     If stack checking (*note Stack Checking::) can be done on your
     system by probing the stack but without the need to actually access
     it, define this pattern and signal an error if the stack has
     overflowed.  The single operand is the memory address in the stack
     that needs to be probed.

‘probe_stack’
     If stack checking (*note Stack Checking::) can be done on your
     system by probing the stack but doing it with a "store zero"
     instruction is not valid or optimal, define this pattern to do the
     probing differently and signal an error if the stack has
     overflowed.  The single operand is the memory reference in the
     stack that needs to be probed.

‘nonlocal_goto’
     Emit code to generate a non-local goto, e.g., a jump from one
     function to a label in an outer function.  This pattern has four
     arguments, each representing a value to be used in the jump.  The
     first argument is to be loaded into the frame pointer, the second
     is the address to branch to (code to dispatch to the actual label),
     the third is the address of a location where the stack is saved,
     and the last is the address of the label, to be placed in the
     location for the incoming static chain.

     On most machines you need not define this pattern, since GCC will
     already generate the correct code, which is to load the frame
     pointer and static chain, restore the stack (using the
     ‘restore_stack_nonlocal’ pattern, if defined), and jump indirectly
     to the dispatcher.  You need only define this pattern if this code
     will not work on your machine.

‘nonlocal_goto_receiver’
     This pattern, if defined, contains code needed at the target of a
     nonlocal goto after the code already generated by GCC.  You will
     not normally need to define this pattern.  A typical reason why you
     might need this pattern is if some value, such as a pointer to a
     global table, must be restored when the frame pointer is restored.
     Note that a nonlocal goto only occurs within a unit-of-translation,
     so a global table pointer that is shared by all functions of a
     given module need not be restored.  There are no arguments.

‘exception_receiver’
     This pattern, if defined, contains code needed at the site of an
     exception handler that isn't needed at the site of a nonlocal goto.
     You will not normally need to define this pattern.  A typical
     reason why you might need this pattern is if some value, such as a
     pointer to a global table, must be restored after control flow is
     branched to the handler of an exception.  There are no arguments.

‘builtin_setjmp_setup’
     This pattern, if defined, contains additional code needed to
     initialize the ‘jmp_buf’.  You will not normally need to define
     this pattern.  A typical reason why you might need this pattern is
     if some value, such as a pointer to a global table, must be
     restored.  Though it is preferred that the pointer value be
     recalculated if possible (given the address of a label for
     instance).  The single argument is a pointer to the ‘jmp_buf’.
     Note that the buffer is five words long and that the first three
     are normally used by the generic mechanism.

‘builtin_setjmp_receiver’
     This pattern, if defined, contains code needed at the site of a
     built-in setjmp that isn't needed at the site of a nonlocal goto.
     You will not normally need to define this pattern.  A typical
     reason why you might need this pattern is if some value, such as a
     pointer to a global table, must be restored.  It takes one
     argument, which is the label to which builtin_longjmp transferred
     control; this pattern may be emitted at a small offset from that
     label.

‘builtin_longjmp’
     This pattern, if defined, performs the entire action of the
     longjmp.  You will not normally need to define this pattern unless
     you also define ‘builtin_setjmp_setup’.  The single argument is a
     pointer to the ‘jmp_buf’.

‘eh_return’
     This pattern, if defined, affects the way ‘__builtin_eh_return’,
     and thence the call frame exception handling library routines, are
     built.  It is intended to handle non-trivial actions needed along
     the abnormal return path.

     The address of the exception handler to which the function should
     return is passed as operand to this pattern.  It will normally need
     to copied by the pattern to some special register or memory
     location.  If the pattern needs to determine the location of the
     target call frame in order to do so, it may use
     ‘EH_RETURN_STACKADJ_RTX’, if defined; it will have already been
     assigned.

     If this pattern is not defined, the default action will be to
     simply copy the return address to ‘EH_RETURN_HANDLER_RTX’.  Either
     that macro or this pattern needs to be defined if call frame
     exception handling is to be used.

‘prologue’
     This pattern, if defined, emits RTL for entry to a function.  The
     function entry is responsible for setting up the stack frame,
     initializing the frame pointer register, saving callee saved
     registers, etc.

     Using a prologue pattern is generally preferred over defining
     ‘TARGET_ASM_FUNCTION_PROLOGUE’ to emit assembly code for the
     prologue.

     The ‘prologue’ pattern is particularly useful for targets which
     perform instruction scheduling.

‘window_save’
     This pattern, if defined, emits RTL for a register window save.  It
     should be defined if the target machine has register windows but
     the window events are decoupled from calls to subroutines.  The
     canonical example is the SPARC architecture.

‘epilogue’
     This pattern emits RTL for exit from a function.  The function exit
     is responsible for deallocating the stack frame, restoring callee
     saved registers and emitting the return instruction.

     Using an epilogue pattern is generally preferred over defining
     ‘TARGET_ASM_FUNCTION_EPILOGUE’ to emit assembly code for the
     epilogue.

     The ‘epilogue’ pattern is particularly useful for targets which
     perform instruction scheduling or which have delay slots for their
     return instruction.

‘sibcall_epilogue’
     This pattern, if defined, emits RTL for exit from a function
     without the final branch back to the calling function.  This
     pattern will be emitted before any sibling call (aka tail call)
     sites.

     The ‘sibcall_epilogue’ pattern must not clobber any arguments used
     for parameter passing or any stack slots for arguments passed to
     the current function.

‘trap’
     This pattern, if defined, signals an error, typically by causing
     some kind of signal to be raised.

‘ctrapMM4’
     Conditional trap instruction.  Operand 0 is a piece of RTL which
     performs a comparison, and operands 1 and 2 are the arms of the
     comparison.  Operand 3 is the trap code, an integer.

     A typical ‘ctrap’ pattern looks like

          (define_insn "ctrapsi4"
            [(trap_if (match_operator 0 "trap_operator"
                       [(match_operand 1 "register_operand")
                        (match_operand 2 "immediate_operand")])
                      (match_operand 3 "const_int_operand" "i"))]
            ""
            "...")

‘prefetch’
     This pattern, if defined, emits code for a non-faulting data
     prefetch instruction.  Operand 0 is the address of the memory to
     prefetch.  Operand 1 is a constant 1 if the prefetch is preparing
     for a write to the memory address, or a constant 0 otherwise.
     Operand 2 is the expected degree of temporal locality of the data
     and is a value between 0 and 3, inclusive; 0 means that the data
     has no temporal locality, so it need not be left in the cache after
     the access; 3 means that the data has a high degree of temporal
     locality and should be left in all levels of cache possible; 1 and
     2 mean, respectively, a low or moderate degree of temporal
     locality.

     Targets that do not support write prefetches or locality hints can
     ignore the values of operands 1 and 2.

‘blockage’
     This pattern defines a pseudo insn that prevents the instruction
     scheduler and other passes from moving instructions and using
     register equivalences across the boundary defined by the blockage
     insn.  This needs to be an UNSPEC_VOLATILE pattern or a volatile
     ASM.

‘memory_blockage’
     This pattern, if defined, represents a compiler memory barrier, and
     will be placed at points across which RTL passes may not propagate
     memory accesses.  This instruction needs to read and write volatile
     BLKmode memory.  It does not need to generate any machine
     instruction.  If this pattern is not defined, the compiler falls
     back to emitting an instruction corresponding to ‘asm volatile (""
     ::: "memory")’.

‘memory_barrier’
     If the target memory model is not fully synchronous, then this
     pattern should be defined to an instruction that orders both loads
     and stores before the instruction with respect to loads and stores
     after the instruction.  This pattern has no operands.

‘speculation_barrier’
     If the target can support speculative execution, then this pattern
     should be defined to an instruction that will block subsequent
     execution until any prior speculation conditions has been resolved.
     The pattern must also ensure that the compiler cannot move memory
     operations past the barrier, so it needs to be an UNSPEC_VOLATILE
     pattern.  The pattern has no operands.

     If this pattern is not defined then the default expansion of
     ‘__builtin_speculation_safe_value’ will emit a warning.  You can
     suppress this warning by defining this pattern with a final
     condition of ‘0’ (zero), which tells the compiler that a
     speculation barrier is not needed for this target.

‘sync_compare_and_swapMODE’
     This pattern, if defined, emits code for an atomic compare-and-swap
     operation.  Operand 1 is the memory on which the atomic operation
     is performed.  Operand 2 is the "old" value to be compared against
     the current contents of the memory location.  Operand 3 is the
     "new" value to store in the memory if the compare succeeds.
     Operand 0 is the result of the operation; it should contain the
     contents of the memory before the operation.  If the compare
     succeeds, this should obviously be a copy of operand 2.

     This pattern must show that both operand 0 and operand 1 are
     modified.

     This pattern must issue any memory barrier instructions such that
     all memory operations before the atomic operation occur before the
     atomic operation and all memory operations after the atomic
     operation occur after the atomic operation.

     For targets where the success or failure of the compare-and-swap
     operation is available via the status flags, it is possible to
     avoid a separate compare operation and issue the subsequent branch
     or store-flag operation immediately after the compare-and-swap.  To
     this end, GCC will look for a ‘MODE_CC’ set in the output of
     ‘sync_compare_and_swapMODE’; if the machine description includes
     such a set, the target should also define special ‘cbranchcc4’
     and/or ‘cstorecc4’ instructions.  GCC will then be able to take the
     destination of the ‘MODE_CC’ set and pass it to the ‘cbranchcc4’ or
     ‘cstorecc4’ pattern as the first operand of the comparison (the
     second will be ‘(const_int 0)’).

     For targets where the operating system may provide support for this
     operation via library calls, the ‘sync_compare_and_swap_optab’ may
     be initialized to a function with the same interface as the
     ‘__sync_val_compare_and_swap_N’ built-in.  If the entire set of
     __SYNC builtins are supported via library calls, the target can
     initialize all of the optabs at once with ‘init_sync_libfuncs’.
     For the purposes of C++11 ‘std::atomic::is_lock_free’, it is
     assumed that these library calls do _not_ use any kind of
     interruptable locking.

‘sync_addMODE’, ‘sync_subMODE’
‘sync_iorMODE’, ‘sync_andMODE’
‘sync_xorMODE’, ‘sync_nandMODE’
     These patterns emit code for an atomic operation on memory.
     Operand 0 is the memory on which the atomic operation is performed.
     Operand 1 is the second operand to the binary operator.

     This pattern must issue any memory barrier instructions such that
     all memory operations before the atomic operation occur before the
     atomic operation and all memory operations after the atomic
     operation occur after the atomic operation.

     If these patterns are not defined, the operation will be
     constructed from a compare-and-swap operation, if defined.

‘sync_old_addMODE’, ‘sync_old_subMODE’
‘sync_old_iorMODE’, ‘sync_old_andMODE’
‘sync_old_xorMODE’, ‘sync_old_nandMODE’
     These patterns emit code for an atomic operation on memory, and
     return the value that the memory contained before the operation.
     Operand 0 is the result value, operand 1 is the memory on which the
     atomic operation is performed, and operand 2 is the second operand
     to the binary operator.

     This pattern must issue any memory barrier instructions such that
     all memory operations before the atomic operation occur before the
     atomic operation and all memory operations after the atomic
     operation occur after the atomic operation.

     If these patterns are not defined, the operation will be
     constructed from a compare-and-swap operation, if defined.

‘sync_new_addMODE’, ‘sync_new_subMODE’
‘sync_new_iorMODE’, ‘sync_new_andMODE’
‘sync_new_xorMODE’, ‘sync_new_nandMODE’
     These patterns are like their ‘sync_old_OP’ counterparts, except
     that they return the value that exists in the memory location after
     the operation, rather than before the operation.

‘sync_lock_test_and_setMODE’
     This pattern takes two forms, based on the capabilities of the
     target.  In either case, operand 0 is the result of the operand,
     operand 1 is the memory on which the atomic operation is performed,
     and operand 2 is the value to set in the lock.

     In the ideal case, this operation is an atomic exchange operation,
     in which the previous value in memory operand is copied into the
     result operand, and the value operand is stored in the memory
     operand.

     For less capable targets, any value operand that is not the
     constant 1 should be rejected with ‘FAIL’.  In this case the target
     may use an atomic test-and-set bit operation.  The result operand
     should contain 1 if the bit was previously set and 0 if the bit was
     previously clear.  The true contents of the memory operand are
     implementation defined.

     This pattern must issue any memory barrier instructions such that
     the pattern as a whole acts as an acquire barrier, that is all
     memory operations after the pattern do not occur until the lock is
     acquired.

     If this pattern is not defined, the operation will be constructed
     from a compare-and-swap operation, if defined.

‘sync_lock_releaseMODE’
     This pattern, if defined, releases a lock set by
     ‘sync_lock_test_and_setMODE’.  Operand 0 is the memory that
     contains the lock; operand 1 is the value to store in the lock.

     If the target doesn't implement full semantics for
     ‘sync_lock_test_and_setMODE’, any value operand which is not the
     constant 0 should be rejected with ‘FAIL’, and the true contents of
     the memory operand are implementation defined.

     This pattern must issue any memory barrier instructions such that
     the pattern as a whole acts as a release barrier, that is the lock
     is released only after all previous memory operations have
     completed.

     If this pattern is not defined, then a ‘memory_barrier’ pattern
     will be emitted, followed by a store of the value to the memory
     operand.

‘atomic_compare_and_swapMODE’
     This pattern, if defined, emits code for an atomic compare-and-swap
     operation with memory model semantics.  Operand 2 is the memory on
     which the atomic operation is performed.  Operand 0 is an output
     operand which is set to true or false based on whether the
     operation succeeded.  Operand 1 is an output operand which is set
     to the contents of the memory before the operation was attempted.
     Operand 3 is the value that is expected to be in memory.  Operand 4
     is the value to put in memory if the expected value is found there.
     Operand 5 is set to 1 if this compare and swap is to be treated as
     a weak operation.  Operand 6 is the memory model to be used if the
     operation is a success.  Operand 7 is the memory model to be used
     if the operation fails.

     If memory referred to in operand 2 contains the value in operand 3,
     then operand 4 is stored in memory pointed to by operand 2 and
     fencing based on the memory model in operand 6 is issued.

     If memory referred to in operand 2 does not contain the value in
     operand 3, then fencing based on the memory model in operand 7 is
     issued.

     If a target does not support weak compare-and-swap operations, or
     the port elects not to implement weak operations, the argument in
     operand 5 can be ignored.  Note a strong implementation must be
     provided.

     If this pattern is not provided, the ‘__atomic_compare_exchange’
     built-in functions will utilize the legacy ‘sync_compare_and_swap’
     pattern with an ‘__ATOMIC_SEQ_CST’ memory model.

‘atomic_loadMODE’
     This pattern implements an atomic load operation with memory model
     semantics.  Operand 1 is the memory address being loaded from.
     Operand 0 is the result of the load.  Operand 2 is the memory model
     to be used for the load operation.

     If not present, the ‘__atomic_load’ built-in function will either
     resort to a normal load with memory barriers, or a compare-and-swap
     operation if a normal load would not be atomic.

‘atomic_storeMODE’
     This pattern implements an atomic store operation with memory model
     semantics.  Operand 0 is the memory address being stored to.
     Operand 1 is the value to be written.  Operand 2 is the memory
     model to be used for the operation.

     If not present, the ‘__atomic_store’ built-in function will attempt
     to perform a normal store and surround it with any required memory
     fences.  If the store would not be atomic, then an
     ‘__atomic_exchange’ is attempted with the result being ignored.

‘atomic_exchangeMODE’
     This pattern implements an atomic exchange operation with memory
     model semantics.  Operand 1 is the memory location the operation is
     performed on.  Operand 0 is an output operand which is set to the
     original value contained in the memory pointed to by operand 1.
     Operand 2 is the value to be stored.  Operand 3 is the memory model
     to be used.

     If this pattern is not present, the built-in function
     ‘__atomic_exchange’ will attempt to preform the operation with a
     compare and swap loop.

‘atomic_addMODE’, ‘atomic_subMODE’
‘atomic_orMODE’, ‘atomic_andMODE’
‘atomic_xorMODE’, ‘atomic_nandMODE’
     These patterns emit code for an atomic operation on memory with
     memory model semantics.  Operand 0 is the memory on which the
     atomic operation is performed.  Operand 1 is the second operand to
     the binary operator.  Operand 2 is the memory model to be used by
     the operation.

     If these patterns are not defined, attempts will be made to use
     legacy ‘sync’ patterns, or equivalent patterns which return a
     result.  If none of these are available a compare-and-swap loop
     will be used.

‘atomic_fetch_addMODE’, ‘atomic_fetch_subMODE’
‘atomic_fetch_orMODE’, ‘atomic_fetch_andMODE’
‘atomic_fetch_xorMODE’, ‘atomic_fetch_nandMODE’
     These patterns emit code for an atomic operation on memory with
     memory model semantics, and return the original value.  Operand 0
     is an output operand which contains the value of the memory
     location before the operation was performed.  Operand 1 is the
     memory on which the atomic operation is performed.  Operand 2 is
     the second operand to the binary operator.  Operand 3 is the memory
     model to be used by the operation.

     If these patterns are not defined, attempts will be made to use
     legacy ‘sync’ patterns.  If none of these are available a
     compare-and-swap loop will be used.

‘atomic_add_fetchMODE’, ‘atomic_sub_fetchMODE’
‘atomic_or_fetchMODE’, ‘atomic_and_fetchMODE’
‘atomic_xor_fetchMODE’, ‘atomic_nand_fetchMODE’
     These patterns emit code for an atomic operation on memory with
     memory model semantics and return the result after the operation is
     performed.  Operand 0 is an output operand which contains the value
     after the operation.  Operand 1 is the memory on which the atomic
     operation is performed.  Operand 2 is the second operand to the
     binary operator.  Operand 3 is the memory model to be used by the
     operation.

     If these patterns are not defined, attempts will be made to use
     legacy ‘sync’ patterns, or equivalent patterns which return the
     result before the operation followed by the arithmetic operation
     required to produce the result.  If none of these are available a
     compare-and-swap loop will be used.

‘atomic_test_and_set’
     This pattern emits code for ‘__builtin_atomic_test_and_set’.
     Operand 0 is an output operand which is set to true if the previous
     previous contents of the byte was "set", and false otherwise.
     Operand 1 is the ‘QImode’ memory to be modified.  Operand 2 is the
     memory model to be used.

     The specific value that defines "set" is implementation defined,
     and is normally based on what is performed by the native atomic
     test and set instruction.

‘atomic_bit_test_and_setMODE’
‘atomic_bit_test_and_complementMODE’
‘atomic_bit_test_and_resetMODE’
     These patterns emit code for an atomic bitwise operation on memory
     with memory model semantics, and return the original value of the
     specified bit.  Operand 0 is an output operand which contains the
     value of the specified bit from the memory location before the
     operation was performed.  Operand 1 is the memory on which the
     atomic operation is performed.  Operand 2 is the bit within the
     operand, starting with least significant bit.  Operand 3 is the
     memory model to be used by the operation.  Operand 4 is a flag - it
     is ‘const1_rtx’ if operand 0 should contain the original value of
     the specified bit in the least significant bit of the operand, and
     ‘const0_rtx’ if the bit should be in its original position in the
     operand.  ‘atomic_bit_test_and_setMODE’ atomically sets the
     specified bit after remembering its original value,
     ‘atomic_bit_test_and_complementMODE’ inverts the specified bit and
     ‘atomic_bit_test_and_resetMODE’ clears the specified bit.

     If these patterns are not defined, attempts will be made to use
     ‘atomic_fetch_orMODE’, ‘atomic_fetch_xorMODE’ or
     ‘atomic_fetch_andMODE’ instruction patterns, or their ‘sync’
     counterparts.  If none of these are available a compare-and-swap
     loop will be used.

‘atomic_add_fetch_cmp_0MODE’
‘atomic_sub_fetch_cmp_0MODE’
‘atomic_and_fetch_cmp_0MODE’
‘atomic_or_fetch_cmp_0MODE’
‘atomic_xor_fetch_cmp_0MODE’
     These patterns emit code for an atomic operation on memory with
     memory model semantics if the fetch result is used only in a
     comparison against zero.  Operand 0 is an output operand which
     contains a boolean result of comparison of the value after the
     operation against zero.  Operand 1 is the memory on which the
     atomic operation is performed.  Operand 2 is the second operand to
     the binary operator.  Operand 3 is the memory model to be used by
     the operation.  Operand 4 is an integer holding the comparison
     code, one of ‘EQ’, ‘NE’, ‘LT’, ‘GT’, ‘LE’ or ‘GE’.

     If these patterns are not defined, attempts will be made to use
     separate atomic operation and fetch pattern followed by comparison
     of the result against zero.

‘mem_thread_fence’
     This pattern emits code required to implement a thread fence with
     memory model semantics.  Operand 0 is the memory model to be used.

     For the ‘__ATOMIC_RELAXED’ model no instructions need to be issued
     and this expansion is not invoked.

     The compiler always emits a compiler memory barrier regardless of
     what expanding this pattern produced.

     If this pattern is not defined, the compiler falls back to
     expanding the ‘memory_barrier’ pattern, then to emitting
     ‘__sync_synchronize’ library call, and finally to just placing a
     compiler memory barrier.

‘get_thread_pointerMODE’
‘set_thread_pointerMODE’
     These patterns emit code that reads/sets the TLS thread pointer.
     Currently, these are only needed if the target needs to support the
     ‘__builtin_thread_pointer’ and ‘__builtin_set_thread_pointer’
     builtins.

     The get/set patterns have a single output/input operand
     respectively, with MODE intended to be ‘Pmode’.

‘stack_protect_combined_set’
     This pattern, if defined, moves a ‘ptr_mode’ value from an address
     whose declaration RTX is given in operand 1 to the memory in
     operand 0 without leaving the value in a register afterward.  If
     several instructions are needed by the target to perform the
     operation (eg.  to load the address from a GOT entry then load the
     ‘ptr_mode’ value and finally store it), it is the backend's
     responsibility to ensure no intermediate result gets spilled.  This
     is to avoid leaking the value some place that an attacker might use
     to rewrite the stack guard slot after having clobbered it.

     If this pattern is not defined, then the address declaration is
     expanded first in the standard way and a ‘stack_protect_set’
     pattern is then generated to move the value from that address to
     the address in operand 0.

‘stack_protect_set’
     This pattern, if defined, moves a ‘ptr_mode’ value from the valid
     memory location in operand 1 to the memory in operand 0 without
     leaving the value in a register afterward.  This is to avoid
     leaking the value some place that an attacker might use to rewrite
     the stack guard slot after having clobbered it.

     Note: on targets where the addressing modes do not allow to load
     directly from stack guard address, the address is expanded in a
     standard way first which could cause some spills.

     If this pattern is not defined, then a plain move pattern is
     generated.

‘stack_protect_combined_test’
     This pattern, if defined, compares a ‘ptr_mode’ value from an
     address whose declaration RTX is given in operand 1 with the memory
     in operand 0 without leaving the value in a register afterward and
     branches to operand 2 if the values were equal.  If several
     instructions are needed by the target to perform the operation (eg.
     to load the address from a GOT entry then load the ‘ptr_mode’ value
     and finally store it), it is the backend's responsibility to ensure
     no intermediate result gets spilled.  This is to avoid leaking the
     value some place that an attacker might use to rewrite the stack
     guard slot after having clobbered it.

     If this pattern is not defined, then the address declaration is
     expanded first in the standard way and a ‘stack_protect_test’
     pattern is then generated to compare the value from that address to
     the value at the memory in operand 0.

‘stack_protect_test’
     This pattern, if defined, compares a ‘ptr_mode’ value from the
     valid memory location in operand 1 with the memory in operand 0
     without leaving the value in a register afterward and branches to
     operand 2 if the values were equal.

     If this pattern is not defined, then a plain compare pattern and
     conditional branch pattern is used.

‘clear_cache’
     This pattern, if defined, flushes the instruction cache for a
     region of memory.  The region is bounded to by the Pmode pointers
     in operand 0 inclusive and operand 1 exclusive.

     If this pattern is not defined, a call to the library function
     ‘__clear_cache’ is used.

‘spaceshipM3’
     Initialize output operand 0 with mode of integer type to -1, 0, 1
     or 2 if operand 1 with mode M compares less than operand 2, equal
     to operand 2, greater than operand 2 or is unordered with operand
     2.  M should be a scalar floating point mode.

     This pattern is not allowed to ‘FAIL’.