This is gccint.info, produced by makeinfo version 7.1 from gccint.texi. Copyright © 1988-2024 Free Software Foundation, Inc. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with the Invariant Sections being "Funding Free Software", the Front-Cover Texts being (a) (see below), and with the Back-Cover Texts being (b) (see below). A copy of the license is included in the section entitled "GNU Free Documentation License". (a) The FSF's Front-Cover Text is: A GNU Manual (b) The FSF's Back-Cover Text is: You have freedom to copy and modify this GNU Manual, like GNU software. Copies published by the Free Software Foundation raise funds for GNU development. INFO-DIR-SECTION Software development START-INFO-DIR-ENTRY * gccint: (gccint). Internals of the GNU Compiler Collection. END-INFO-DIR-ENTRY This file documents the internals of the GNU compilers. Copyright © 1988-2024 Free Software Foundation, Inc. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with the Invariant Sections being "Funding Free Software", the Front-Cover Texts being (a) (see below), and with the Back-Cover Texts being (b) (see below). A copy of the license is included in the section entitled "GNU Free Documentation License". (a) The FSF's Front-Cover Text is: A GNU Manual (b) The FSF's Back-Cover Text is: You have freedom to copy and modify this GNU Manual, like GNU software. Copies published by the Free Software Foundation raise funds for GNU development.  File: gccint.info, Node: Calls, Next: RTL SSA, Prev: Insns, Up: RTL 14.20 RTL Representation of Function-Call Insns =============================================== Insns that call subroutines have the RTL expression code ‘call_insn’. These insns must satisfy special rules, and their bodies must use a special RTL expression code, ‘call’. A ‘call’ expression has two operands, as follows: (call (mem:FM ADDR) NBYTES) Here NBYTES is an operand that represents the number of bytes of argument data being passed to the subroutine, FM is a machine mode (which must equal as the definition of the ‘FUNCTION_MODE’ macro in the machine description) and ADDR represents the address of the subroutine. For a subroutine that returns no value, the ‘call’ expression as shown above is the entire body of the insn, except that the insn might also contain ‘use’ or ‘clobber’ expressions. For a subroutine that returns a value whose mode is not ‘BLKmode’, the value is returned in a hard register. If this register's number is R, then the body of the call insn looks like this: (set (reg:M R) (call (mem:FM ADDR) NBYTES)) This RTL expression makes it clear (to the optimizer passes) that the appropriate register receives a useful value in this insn. When a subroutine returns a ‘BLKmode’ value, it is handled by passing to the subroutine the address of a place to store the value. So the call insn itself does not "return" any value, and it has the same RTL form as a call that returns nothing. On some machines, the call instruction itself clobbers some register, for example to contain the return address. ‘call_insn’ insns on these machines should have a body which is a ‘parallel’ that contains both the ‘call’ expression and ‘clobber’ expressions that indicate which registers are destroyed. Similarly, if the call instruction requires some register other than the stack pointer that is not explicitly mentioned in its RTL, a ‘use’ subexpression should mention that register. Functions that are called are assumed to modify all registers listed in the configuration macro ‘CALL_USED_REGISTERS’ (*note Register Basics::) and, with the exception of ‘const’ functions and library calls, to modify all of memory. Insns containing just ‘use’ expressions directly precede the ‘call_insn’ insn to indicate which registers contain inputs to the function. Similarly, if registers other than those in ‘CALL_USED_REGISTERS’ are clobbered by the called function, insns containing a single ‘clobber’ follow immediately after the call to indicate which registers.  File: gccint.info, Node: RTL SSA, Next: Sharing, Prev: Calls, Up: RTL 14.21 On-the-Side SSA Form for RTL ================================== The patterns of an individual RTL instruction describe which registers are inputs to that instruction and which registers are outputs from that instruction. However, it is often useful to know where the definition of a register input comes from and where the result of a register output is used. One way of obtaining this information is to use the RTL SSA form, which provides a Static Single Assignment representation of the RTL instructions. The RTL SSA code is located in the ‘rtl-ssa’ subdirectory of the GCC source tree. This section only gives a brief overview of it; please see the comments in the source code for more details. * Menu: * Using RTL SSA:: What a pass needs to do to use the RTL SSA form * RTL SSA Instructions:: How instructions are represented and organized * RTL SSA Basic Blocks:: How instructions are grouped into blocks * RTL SSA Resources:: How registers and memory are represented * RTL SSA Accesses:: How register and memory accesses are represented * RTL SSA Phi Nodes:: How multiple sources are combined into one * RTL SSA Access Lists:: How accesses are chained together * Changing RTL Instructions:: How to use the RTL SSA framework to change insns  File: gccint.info, Node: Using RTL SSA, Next: RTL SSA Instructions, Up: RTL SSA 14.21.1 Using RTL SSA in a pass ------------------------------- A pass that wants to use the RTL SSA form should start with the following: #define INCLUDE_ALGORITHM #define INCLUDE_FUNCTIONAL #include "config.h" #include "system.h" #include "coretypes.h" #include "backend.h" #include "rtl.h" #include "df.h" #include "rtl-ssa.h" All the RTL SSA code is contained in the ‘rtl_ssa’ namespace, so most passes will then want to do: using namespace rtl_ssa; However, this is purely a matter of taste, and the examples in the rest of this section do not require it. The RTL SSA represention is an optional on-the-side feature that applies on top of the normal RTL instructions. It is currently local to individual RTL passes and is not maintained across passes. However, in order to allow the RTL SSA information to be preserved across passes in future, ‘crtl->ssa’ points to the current function's SSA form (if any). Passes that want to use the RTL SSA form should first do: crtl->ssa = new rtl_ssa::function_info (FN); where FN is the function that the pass is processing. (Passes that are ‘using namespace rtl_ssa’ do not need the ‘rtl_ssa::’.) Once the pass has finished with the SSA form, it should do the following: free_dominance_info (CDI_DOMINATORS); if (crtl->ssa->perform_pending_updates ()) cleanup_cfg (0); delete crtl->ssa; crtl->ssa = nullptr; The ‘free_dominance_info’ call is necessary because dominance information is not currently maintained between RTL passes. The next two lines commit any changes to the RTL instructions that were queued for later; see the comment above the declaration of ‘perform_pending_updates’ for details. The final two lines discard the RTL SSA form and free the associated memory.  File: gccint.info, Node: RTL SSA Instructions, Next: RTL SSA Basic Blocks, Prev: Using RTL SSA, Up: RTL SSA 14.21.2 RTL SSA Instructions ---------------------------- RTL SSA instructions are represented by an ‘rtl_ssa::insn_info’. These instructions are chained together in a single list that follows a reverse postorder (RPO) traversal of the function. This means that if any path through the function can execute an instruction I1 and then later execute an instruction I2 for the first time, I1 appears before I2 in the list(1). Two RTL SSA instructions can be compared to find which instruction occurs earlier than the other in the RPO. One way to do this is to use the C++ comparison operators, such as: *INSN1 < *INSN2 Another way is to use the ‘compare_with’ function: INSN1->compare_with (INSN2) This expression is greater than zero if INSN1 comes after INSN2 in the RPO, less than zero if INSN1 comes before INSN2 in the RPO, or zero if INSN1 and INSN2 are the same. This order is maintained even if instructions are added to the function or moved around. The main purpose of ‘rtl_ssa::insn_info’ is to hold SSA information about an instruction. However, it also caches certain properties of the instruction, such as whether it is an inline assembly instruction, whether it has volatile accesses, and so on. ---------- Footnotes ---------- (1) Note that this order is different from the order of the underlying RTL instructions, which follow machine code order instead.  File: gccint.info, Node: RTL SSA Basic Blocks, Next: RTL SSA Resources, Prev: RTL SSA Instructions, Up: RTL SSA 14.21.3 RTL SSA Basic Blocks ---------------------------- RTL SSA instructions (*note RTL SSA Instructions::) are organized into basic blocks, with each block being represented by an ‘rtl_ssa:bb_info’. There is a one-to-one mapping between these ‘rtl_ssa:bb_info’ structures and the underlying CFG ‘basic_block’ structures (*note Basic Blocks::). If a CFG basic block BB contains an RTL instruction INSN, the RTL SSA represenation of BB also contains an RTL SSA representation of INSN(1). Within RTL SSA, these instructions are referred to as "real" instructions. These real instructions fall into two groups: debug instructions and nondebug instructions. Only nondebug instructions should affect code generation decisions. In addition, each RTL SSA basic block has two "artificial" instructions: a "head" instruction that comes before all the real instructions and an "end" instruction that comes after all real instructions. These instructions exist to represent things that are conceptually defined or used at the start and end of a basic block. The instructions always exist, even if they do not currently do anything. Like instructions, these blocks are chained together in a reverse postorder. This list includes the entry block (which always comes first) and the exit block (which always comes last). RTL SSA basic blocks are chained together into "extended basic blocks" (EBBs), represented by an ‘rtl_ssa::ebb_info’. Extended basic blocks contain one or more basic blocks. They have the property that if a block BBY comes immediately after a block BBX in an EBB, then BBY can only be reached by BBX; in other words, BBX is the sole predecessor of BBY. Each extended basic block starts with an artificial "phi node" instruction. This instruction defines all phi nodes for the EBB (*note RTL SSA Phi Nodes::). (Individual blocks in an EBB do not need phi nodes because their live values can only come from one source.) The contents of a function are therefore represented using a four-level hierarchy: • functions (‘rtl_ssa::function_info’), which contain ... • extended basic blocks (‘rtl_ssa::ebb_info’), which contain ... • basic blocks (‘rtl_ssa::bb_info’), which contain ... • instructions (‘rtl_ssa::insn_info’) In dumps, a basic block is identified as ‘bbN’, where N is the index of the associated CFG ‘basic_block’ structure. An EBB is in turn identified by the index of its first block. For example, an EBB that contains ‘bb10’, ‘bb5’, ‘bb6’ and ‘bb9’ is identified as EBB10. ---------- Footnotes ---------- (1) Note that this excludes non-instruction things like ‘note’s and ‘barrier’s that also appear in the chain of RTL instructions.  File: gccint.info, Node: RTL SSA Resources, Next: RTL SSA Accesses, Prev: RTL SSA Basic Blocks, Up: RTL SSA 14.21.4 RTL SSA Resources ------------------------- The RTL SSA form tracks two types of "resource": registers and memory. Each hard and pseudo register is a separate resource. Memory is a single unified resource, like it is in GIMPLE (*note GIMPLE::). Each resource has a unique identifier. The unique identifier for a register is simply its register number. The unique identifier for memory is a special register number called ‘MEM_REGNO’. Since resource numbers so closely match register numbers, it is sometimes convenient to refer to them simply as register numbers, or "regnos" for short. However, the RTL SSA form also provides an abstraction of resources in the form of ‘rtl_ssa::resource_info’. This is a lightweight class that records both the regno of a resource and the ‘machine_mode’ that the resource has (*note Machine Modes::). It has functions for testing whether a resource is a register or memory. In principle it could be extended to other kinds of resource in future.  File: gccint.info, Node: RTL SSA Accesses, Next: RTL SSA Phi Nodes, Prev: RTL SSA Resources, Up: RTL SSA 14.21.5 RTL SSA Register and Memory Accesses -------------------------------------------- In the RTL SSA form, most reads or writes of a resource are represented as a ‘rtl_ssa::access_info’(1). These ‘rtl_ssa::access_info’s are organized into the following class hierarchy: rtl_ssa::access_info | +-- rtl_ssa::use_info | +-- rtl_ssa::def_info | +-- rtl_ssa::clobber_info | +-- rtl_ssa::set_info | +-- rtl_ssa::phi_info A ‘rtl_ssa::use_info’ represents a read or use of a resource and a ‘rtl_ssa::def_info’ represents a write or definition of a resource. As in the main RTL representation, there are two basic types of definition: clobbers and sets. The difference is that a clobber leaves the register with an unspecified value that cannot be used or relied on by later instructions, while a set leaves the register with a known value that later instructions could use if they wanted to. A ‘rtl_ssa::clobber_info’ represents a clobber and a ‘rtl_ssa::set_info’ represent a set. Each ‘rtl_ssa::use_info’ records which single ‘rtl_ssa::set_info’ provides the value of the resource; this is null if the resource is completely undefined at the point of use. Each ‘rtl_ssa::set_info’ in turn records all the ‘rtl_ssa::use_info’s that use its value. If a value of a resource can come from multiple sources, a ‘rtl_ssa::phi_info’ brings those multiple sources together into a single definition (*note RTL SSA Phi Nodes::). ---------- Footnotes ---------- (1) The exceptions are call clobbers, which are generally represented separately. See the comment above ‘rtl_ssa::insn_info’ for details.  File: gccint.info, Node: RTL SSA Phi Nodes, Next: RTL SSA Access Lists, Prev: RTL SSA Accesses, Up: RTL SSA 14.21.6 RTL SSA Phi Nodes ------------------------- If a resource is live on entry to an extended basic block and if the resource's value can come from multiple sources, the extended basic block has a "phi node" that collects together these multiple sources. The phi node conceptually has one input for each incoming edge of the extended basic block, with the input specifying the value of the resource on that edge. For example, suppose a function contains the following RTL: ;; Basic block bb3 ... (set (reg:SI R1) (const_int 0)) ;; A (set (pc) (label_ref bb5)) ;; Basic block bb4 ... (set (reg:SI R1) (const_int 1)) ;; B ;; Fall through ;; Basic block bb5 ;; preds: bb3, bb4 ;; live in: R1 ... (code_label bb5) ... (set (reg:SI R2) (plus:SI (reg:SI R1) ...)) ;; C The value of R1 on entry to block 5 can come from either A or B. The extended basic block that contains block 5 would therefore have a phi node with two inputs: the first input would have the value of R1 defined by A and the second input would have the value of R1 defined by B. This phi node would then provide the value of R1 for C (assuming that R1 does not change again between the start of block 5 and C). Since RTL is not a "native" SSA representation, these phi nodes simply collect together definitions that already exist. Each input to a phi node for a resource R is itself a definition of resource R (or is null if the resource is completely undefined for a particular incoming edge). This is in contrast to a native SSA representation like GIMPLE, where the phi inputs can be arbitrary expressions. As a result, RTL SSA phi nodes never involve "hidden" moves: all moves are instead explicit. Phi nodes are represented as a ‘rtl_ssa::phi_node’. Each input to a phi node is represented as an ‘rtl_ssa::use_info’.  File: gccint.info, Node: RTL SSA Access Lists, Next: Changing RTL Instructions, Prev: RTL SSA Phi Nodes, Up: RTL SSA 14.21.7 RTL SSA Access Lists ---------------------------- All the definitions of a resource are chained together in reverse postorder. In general, this list can contain an arbitrary mix of both sets (‘rtl_ssa::set_info’) and clobbers (‘rtl_ssa::clobber_info’). However, it is often useful to skip over all intervening clobbers of a resource in order to find the next set. The list is constructed in such a way that this can be done in amortized constant time. All uses (‘rtl_ssa::use_info’) of a given set are also chained together into a list. This list of uses is divided into three parts: 1. uses by "real" nondebug instructions (*note real RTL SSA insns::) 2. uses by real debug instructions 3. uses by phi nodes (*note RTL SSA Phi Nodes::) The first and second parts individually follow reverse postorder. The third part has no particular order. The last use by a real nondebug instruction always comes earlier in the reverse postorder than the next definition of the resource (if any). This means that the accesses follow a linear sequence of the form: • first definition of resource R • first use by a real nondebug instruction of the first definition of resource R • ... • last use by a real nondebug instruction of the first definition of resource R • second definition of resource R • first use by a real nondebug instruction of the second definition of resource R • ... • last use by a real nondebug instruction of the second definition of resource R • ... • last definition of resource R • first use by a real nondebug instruction of the last definition of resource R • ... • last use by a real nondebug instruction of the last definition of resource R (Note that clobbers never have uses; only sets do.) This linear view is easy to achieve when there is only a single definition of a resource, which is commonly true for pseudo registers. However, things are more complex if code has a structure like the following: // ebb2, bb2 R = VA; // A if (...) { // ebb2, bb3 use1 (R); // B ... R = VC; // C } else { // ebb4, bb4 use2 (R); // D } The list of accesses would begin as follows: • definition of R by A • use of A's definition of R by B • definition of R by C The next access to R is in D, but the value of R that D uses comes from A rather than C. This is resolved by adding a phi node for ‘ebb4’. All inputs to this phi node have the same value, which in the example above is A's definition of R. In other circumstances, it would not be necessary to create a phi node when all inputs are equal, so these phi nodes are referred to as "degenerate" phi nodes. The full list of accesses to R is therefore: • definition of R by A • use of A's definition of R by B • definition of R by C • definition of R by ebb4's phi instruction, with the input coming from A • use of the ebb4's R phi definition of R by B Note that A's definition is also used by ebb4's phi node, but this use belongs to the third part of the use list described above and so does not form part of the linear sequence. It is possible to "look through" any degenerate phi to the ultimate definition using the function ‘look_through_degenerate_phi’. Note that the input to a degenerate phi is never itself provided by a degenerate phi. At present, the SSA form takes this principle one step further and guarantees that, for any given resource RES, one of the following is true: • The resource has a single definition DEF, which is not a phi node. Excluding uses of undefined registers, all uses of RES by real nondebug instructions use the value provided by DEF. • Excluding uses of undefined registers, all uses of RES use values provided by definitions that occur earlier in the same extended basic block. These definitions might come from phi nodes or from real instructions.  File: gccint.info, Node: Changing RTL Instructions, Prev: RTL SSA Access Lists, Up: RTL SSA 14.21.8 Using the RTL SSA framework to change instructions ---------------------------------------------------------- There are various routines that help to change a single RTL instruction or a group of RTL instructions while keeping the RTL SSA form up-to-date. This section first describes the process for changing a single instruction, then goes on to describe the differences when changing multiple instructions. * Menu: * Changing One RTL SSA Instruction:: * Changing Multiple RTL SSA Instructions::  File: gccint.info, Node: Changing One RTL SSA Instruction, Next: Changing Multiple RTL SSA Instructions, Up: Changing RTL Instructions 14.21.8.1 Changing One RTL SSA Instruction .......................................... Before making a change, passes should first use a statement like the following: auto attempt = crtl->ssa->new_change_attempt (); Here, ‘attempt’ is an RAII object that should remain in scope for the entire change attempt. It automatically frees temporary memory related to the changes when it goes out of scope. Next, the pass should create an ‘rtl_ssa::insn_change’ object for the instruction that it wants to change. This object specifies several things: • what the instruction's new list of uses should be (‘new_uses’). By default this is the same as the instruction's current list of uses. • what the instruction's new list of definitions should be (‘new_defs’). By default this is the same as the instruction's current list of definitions. • where the instruction should be located (‘move_range’). This is a range of instructions after which the instruction could be placed, represented as an ‘rtl_ssa::insn_range’. By default the instruction must remain at its current position. If a pass was attempting to change all these properties of an instruction ‘insn’, it might do something like this: rtl_ssa::insn_change change (insn); change.new_defs = ...; change.new_uses = ...; change.move_range = ...; This ‘rtl_ssa::insn_change’ only describes something that the pass _might_ do; at this stage, nothing has actually changed. As noted above, the default ‘move_range’ requires the instruction to remain where it is. At the other extreme, it is possible to allow the instruction to move anywhere within its extended basic block, provided that all the new uses and definitions can be performed at the new location. The way to do this is: change.move_range = insn->ebb ()->insn_range (); In either case, the next step is to make sure that move range is consistent with the new uses and definitions. The way to do this is: if (!rtl_ssa::restrict_movement (change)) return false; This function tries to limit ‘move_range’ to a range of instructions at which ‘new_uses’ and ‘new_defs’ can be correctly performed. It returns true on success or false if no suitable location exists. The pass should also tentatively change the pattern of the instruction to whatever form the pass wants the instruction to have. This should use the facilities provided by ‘recog.cc’. For example: rtl_insn *rtl = insn->rtl (); insn_change_watermark watermark; validate_change (rtl, &PATTERN (rtl), new_pat, 1); will tentatively replace ‘insn’'s pattern with ‘new_pat’. These changes and the construction of the ‘rtl_ssa::insn_change’ can happen in either order or be interleaved. After the tentative changes to the instruction are complete, the pass should check whether the new pattern matches a target instruction or satisfies the requirements of an inline asm: if (!rtl_ssa::recog (attempt, change)) return false; This step might change the instruction pattern further in order to make it match. It might also add new definitions or restrict the range of the move. For example, if the new pattern did not match in its original form, but could be made to match by adding a clobber of the flags register, ‘rtl_ssa::recog’ will check whether the flags register is free at an appropriate point. If so, it will add a clobber of the flags register to ‘new_defs’ and restrict ‘move_range’ to the locations at which the flags register can be safely clobbered. Even if the proposed new instruction is valid according to ‘rtl_ssa::recog’, the change might not be worthwhile. For example, when optimizing for speed, the new instruction might turn out to be slower than the original one. When optimizing for size, the new instruction might turn out to be bigger than the original one. Passes should check for this case using ‘change_is_worthwhile’. For example: if (!rtl_ssa::change_is_worthwhile (change)) return false; If the change passes this test too then the pass can perform the change using: confirm_change_group (); crtl->ssa->change_insn (change); Putting all this together, the change has the following form: auto attempt = crtl->ssa->new_change_attempt (); rtl_ssa::insn_change change (insn); change.new_defs = ...; change.new_uses = ...; change.move_range = ...; if (!rtl_ssa::restrict_movement (change)) return false; insn_change_watermark watermark; // Use validate_change etc. to change INSN's pattern. ... if (!rtl_ssa::recog (attempt, change) || !rtl_ssa::change_is_worthwhile (change)) return false; confirm_change_group (); crtl->ssa->change_insn (change);  File: gccint.info, Node: Changing Multiple RTL SSA Instructions, Prev: Changing One RTL SSA Instruction, Up: Changing RTL Instructions 14.21.8.2 Changing Multiple RTL SSA Instructions ................................................ The process for changing multiple instructions is similar to the process for changing single instructions (*note Changing One RTL SSA Instruction::). The pass should again start the change attempt with: auto attempt = crtl->ssa->new_change_attempt (); and keep ‘attempt’ in scope for the duration of the change attempt. It should then construct an ‘rtl_ssa::insn_change’ for each change that it wants to make. After this, it should combine the changes into a sequence of ‘rtl_ssa::insn_change’ pointers. This sequence must be in reverse postorder; the instructions will remain strictly in the order that the sequence specifies. For example, if a pass is changing exactly two instructions, it might do: rtl_ssa::insn_change *changes[] = { &change1, &change2 }; where ‘change1’'s instruction must come before ‘change2’'s. Alternatively, if the pass is changing a variable number of instructions, it might build up the sequence in a ‘vec’. By default, ‘rtl_ssa::restrict_movement’ assumes that all instructions other than the one passed to it will remain in their current positions and will retain their current uses and definitions. When changing multiple instructions, it is usually more effective to ignore the other instructions that are changing. The sequencing described above ensures that the changing instructions remain in the correct order with respect to each other. The way to do this is: if (!rtl_ssa::restrict_movement_ignoring (change, insn_is_changing (changes))) return false; Similarly, when ‘rtl_ssa::restrict_movement’ is detecting whether a register can be clobbered, it by default assumes that all other instructions will remain in their current positions and retain their current form. It is again more effective to ignore changing instructions (which might, for example, no longer need to clobber the flags register). The way to do this is: if (!rtl_ssa::recog_ignoring (attempt, change, insn_is_changing (changes))) return false; When changing multiple instructions, the important question is usually not whether each individual change is worthwhile, but whether the changes as a whole are worthwhile. The way to test this is: if (!rtl_ssa::changes_are_worthwhile (changes)) return false; The process for changing single instructions makes sure that one ‘rtl_ssa::insn_change’ in isolation is valid. But when changing multiple instructions, it is also necessary to test whether the sequence as a whole is valid. For example, it might be impossible to satisfy all of the ‘move_range’s at once. Therefore, once the pass has a sequence of changes that are individually correct, it should use: if (!crtl->ssa->verify_insn_changes (changes)) return false; to check whether the sequence as a whole is valid. If all checks pass, the final step is: confirm_change_group (); crtl->ssa->change_insns (changes); Putting all this together, the process for a two-instruction change is: auto attempt = crtl->ssa->new_change_attempt (); rtl_ssa::insn_change change1 (insn1); change1.new_defs = ...; change1.new_uses = ...; change1.move_range = ...; rtl_ssa::insn_change change2 (insn2); change2.new_defs = ...; change2.new_uses = ...; change2.move_range = ...; rtl_ssa::insn_change *changes[] = { &change1, &change2 }; auto is_changing = insn_is_changing (changes); if (!rtl_ssa::restrict_movement_ignoring (change1, is_changing) || !rtl_ssa::restrict_movement_ignoring (change2, is_changing)) return false; insn_change_watermark watermark; // Use validate_change etc. to change INSN1's and INSN2's patterns. ... if (!rtl_ssa::recog_ignoring (attempt, change1, is_changing) || !rtl_ssa::recog_ignoring (attempt, change2, is_changing) || !rtl_ssa::changes_are_worthwhile (changes) || !crtl->ssa->verify_insn_changes (changes)) return false; confirm_change_group (); crtl->ssa->change_insns (changes);  File: gccint.info, Node: Sharing, Next: Reading RTL, Prev: RTL SSA, Up: RTL 14.22 Structure Sharing Assumptions =================================== The compiler assumes that certain kinds of RTL expressions are unique; there do not exist two distinct objects representing the same value. In other cases, it makes an opposite assumption: that no RTL expression object of a certain kind appears in more than one place in the containing structure. These assumptions refer to a single function; except for the RTL objects that describe global variables and external functions, and a few standard objects such as small integer constants, no RTL objects are common to two functions. • Each pseudo-register has only a single ‘reg’ object to represent it, and therefore only a single machine mode. • For any symbolic label, there is only one ‘symbol_ref’ object referring to it. • All ‘const_int’ expressions with equal values are shared. • All ‘const_poly_int’ expressions with equal modes and values are shared. • There is only one ‘pc’ expression. • There is only one ‘const_double’ expression with value 0 for each floating point mode. Likewise for values 1 and 2. • There is only one ‘const_vector’ expression with value 0 for each vector mode, be it an integer or a double constant vector. • No ‘label_ref’ or ‘scratch’ appears in more than one place in the RTL structure; in other words, it is safe to do a tree-walk of all the insns in the function and assume that each time a ‘label_ref’ or ‘scratch’ is seen it is distinct from all others that are seen. • Only one ‘mem’ object is normally created for each static variable or stack slot, so these objects are frequently shared in all the places they appear. However, separate but equal objects for these variables are occasionally made. • When a single ‘asm’ statement has multiple output operands, a distinct ‘asm_operands’ expression is made for each output operand. However, these all share the vector which contains the sequence of input operands. This sharing is used later on to test whether two ‘asm_operands’ expressions come from the same statement, so all optimizations must carefully preserve the sharing if they copy the vector at all. • No RTL object appears in more than one place in the RTL structure except as described above. Many passes of the compiler rely on this by assuming that they can modify RTL objects in place without unwanted side-effects on other insns. • During initial RTL generation, shared structure is freely introduced. After all the RTL for a function has been generated, all shared structure is copied by ‘unshare_all_rtl’ in ‘emit-rtl.cc’, after which the above rules are guaranteed to be followed. • During the combiner pass, shared structure within an insn can exist temporarily. However, the shared structure is copied before the combiner is finished with the insn. This is done by calling ‘copy_rtx_if_shared’, which is a subroutine of ‘unshare_all_rtl’.  File: gccint.info, Node: Reading RTL, Prev: Sharing, Up: RTL 14.23 Reading RTL ================= To read an RTL object from a file, call ‘read_rtx’. It takes one argument, a stdio stream, and returns a single RTL object. This routine is defined in ‘read-rtl.cc’. It is not available in the compiler itself, only the various programs that generate the compiler back end from the machine description. People frequently have the idea of using RTL stored as text in a file as an interface between a language front end and the bulk of GCC. This idea is not feasible. GCC was designed to use RTL internally only. Correct RTL for a given program is very dependent on the particular target machine. And the RTL does not contain all the information about the program. The proper way to interface GCC to a new language front end is with the "tree" data structure, described in the files ‘tree.h’ and ‘tree.def’. The documentation for this structure (*note GENERIC::) is incomplete.  File: gccint.info, Node: Control Flow, Next: Loop Analysis and Representation, Prev: RTL, Up: Top 15 Control Flow Graph ********************* A control flow graph (CFG) is a data structure built on top of the intermediate code representation (the RTL or ‘GIMPLE’ instruction stream) abstracting the control flow behavior of a function that is being compiled. The CFG is a directed graph where the vertices represent basic blocks and edges represent possible transfer of control flow from one basic block to another. The data structures used to represent the control flow graph are defined in ‘basic-block.h’. In GCC, the representation of control flow is maintained throughout the compilation process, from constructing the CFG early in ‘pass_build_cfg’ to ‘pass_free_cfg’ (see ‘passes.def’). The CFG takes various different modes and may undergo extensive manipulations, but the graph is always valid between its construction and its release. This way, transfer of information such as data flow, a measured profile, or the loop tree, can be propagated through the passes pipeline, and even from ‘GIMPLE’ to ‘RTL’. Often the CFG may be better viewed as integral part of instruction chain, than structure built on the top of it. Updating the compiler's intermediate representation for instructions cannot be easily done without proper maintenance of the CFG simultaneously. * Menu: * Basic Blocks:: The definition and representation of basic blocks. * Edges:: Types of edges and their representation. * Profile information:: Representation of frequencies and probabilities. * Maintaining the CFG:: Keeping the control flow graph and up to date. * Liveness information:: Using and maintaining liveness information.  File: gccint.info, Node: Basic Blocks, Next: Edges, Up: Control Flow 15.1 Basic Blocks ================= A basic block is a straight-line sequence of code with only one entry point and only one exit. In GCC, basic blocks are represented using the ‘basic_block’ data type. Special basic blocks represent possible entry and exit points of a function. These blocks are called ‘ENTRY_BLOCK_PTR’ and ‘EXIT_BLOCK_PTR’. These blocks do not contain any code. The ‘BASIC_BLOCK’ array contains all basic blocks in an unspecified order. Each ‘basic_block’ structure has a field that holds a unique integer identifier ‘index’ that is the index of the block in the ‘BASIC_BLOCK’ array. The total number of basic blocks in the function is ‘n_basic_blocks’. Both the basic block indices and the total number of basic blocks may vary during the compilation process, as passes reorder, create, duplicate, and destroy basic blocks. The index for any block should never be greater than ‘last_basic_block’. The indices 0 and 1 are special codes reserved for ‘ENTRY_BLOCK’ and ‘EXIT_BLOCK’, the indices of ‘ENTRY_BLOCK_PTR’ and ‘EXIT_BLOCK_PTR’. Two pointer members of the ‘basic_block’ structure are the pointers ‘next_bb’ and ‘prev_bb’. These are used to keep doubly linked chain of basic blocks in the same order as the underlying instruction stream. The chain of basic blocks is updated transparently by the provided API for manipulating the CFG. The macro ‘FOR_EACH_BB’ can be used to visit all the basic blocks in lexicographical order, except ‘ENTRY_BLOCK’ and ‘EXIT_BLOCK’. The macro ‘FOR_ALL_BB’ also visits all basic blocks in lexicographical order, including ‘ENTRY_BLOCK’ and ‘EXIT_BLOCK’. The functions ‘post_order_compute’ and ‘inverted_post_order_compute’ can be used to compute topological orders of the CFG. The orders are stored as vectors of basic block indices. The ‘BASIC_BLOCK’ array can be used to iterate each basic block by index. Dominator traversals are also possible using ‘walk_dominator_tree’. Given two basic blocks A and B, block A dominates block B if A is _always_ executed before B. Each ‘basic_block’ also contains pointers to the first instruction (the “head”) and the last instruction (the “tail”) or “end” of the instruction stream contained in a basic block. In fact, since the ‘basic_block’ data type is used to represent blocks in both major intermediate representations of GCC (‘GIMPLE’ and RTL), there are pointers to the head and end of a basic block for both representations, stored in intermediate representation specific data in the ‘il’ field of ‘struct basic_block_def’. For RTL, these pointers are ‘BB_HEAD’ and ‘BB_END’. In the RTL representation of a function, the instruction stream contains not only the "real" instructions, but also “notes” or “insn notes” (to distinguish them from “reg notes”). Any function that moves or duplicates the basic blocks needs to take care of updating of these notes. Many of these notes expect that the instruction stream consists of linear regions, so updating can sometimes be tedious. All types of insn notes are defined in ‘insn-notes.def’. In the RTL function representation, the instructions contained in a basic block always follow a ‘NOTE_INSN_BASIC_BLOCK’, but zero or more ‘CODE_LABEL’ nodes can precede the block note. A basic block ends with a control flow instruction or with the last instruction before the next ‘CODE_LABEL’ or ‘NOTE_INSN_BASIC_BLOCK’. By definition, a ‘CODE_LABEL’ cannot appear in the middle of the instruction stream of a basic block. In addition to notes, the jump table vectors are also represented as "pseudo-instructions" inside the insn stream. These vectors never appear in the basic block and should always be placed just after the table jump instructions referencing them. After removing the table-jump it is often difficult to eliminate the code computing the address and referencing the vector, so cleaning up these vectors is postponed until after liveness analysis. Thus the jump table vectors may appear in the insn stream unreferenced and without any purpose. Before any edge is made “fall-thru”, the existence of such construct in the way needs to be checked by calling ‘can_fallthru’ function. For the ‘GIMPLE’ representation, the PHI nodes and statements contained in a basic block are in a ‘gimple_seq’ pointed to by the basic block intermediate language specific pointers. Abstract containers and iterators are used to access the PHI nodes and statements in a basic blocks. These iterators are called “GIMPLE statement iterators” (GSIs). Grep for ‘^gsi’ in the various ‘gimple-*’ and ‘tree-*’ files. There is a ‘gimple_stmt_iterator’ type for iterating over all kinds of statement, and a ‘gphi_iterator’ subclass for iterating over PHI nodes. The following snippet will pretty-print all PHI nodes the statements of the current function in the GIMPLE representation. basic_block bb; FOR_EACH_BB (bb) { gphi_iterator pi; gimple_stmt_iterator si; for (pi = gsi_start_phis (bb); !gsi_end_p (pi); gsi_next (&pi)) { gphi *phi = pi.phi (); print_gimple_stmt (dump_file, phi, 0, TDF_SLIM); } for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si)) { gimple stmt = gsi_stmt (si); print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM); } }  File: gccint.info, Node: Edges, Next: Profile information, Prev: Basic Blocks, Up: Control Flow 15.2 Edges ========== Edges represent possible control flow transfers from the end of some basic block A to the head of another basic block B. We say that A is a predecessor of B, and B is a successor of A. Edges are represented in GCC with the ‘edge’ data type. Each ‘edge’ acts as a link between two basic blocks: The ‘src’ member of an edge points to the predecessor basic block of the ‘dest’ basic block. The members ‘preds’ and ‘succs’ of the ‘basic_block’ data type point to type-safe vectors of edges to the predecessors and successors of the block. When walking the edges in an edge vector, “edge iterators” should be used. Edge iterators are constructed using the ‘edge_iterator’ data structure and several methods are available to operate on them: ‘ei_start’ This function initializes an ‘edge_iterator’ that points to the first edge in a vector of edges. ‘ei_last’ This function initializes an ‘edge_iterator’ that points to the last edge in a vector of edges. ‘ei_end_p’ This predicate is ‘true’ if an ‘edge_iterator’ represents the last edge in an edge vector. ‘ei_one_before_end_p’ This predicate is ‘true’ if an ‘edge_iterator’ represents the second last edge in an edge vector. ‘ei_next’ This function takes a pointer to an ‘edge_iterator’ and makes it point to the next edge in the sequence. ‘ei_prev’ This function takes a pointer to an ‘edge_iterator’ and makes it point to the previous edge in the sequence. ‘ei_edge’ This function returns the ‘edge’ currently pointed to by an ‘edge_iterator’. ‘ei_safe_edge’ This function returns the ‘edge’ currently pointed to by an ‘edge_iterator’, but returns ‘NULL’ if the iterator is pointing at the end of the sequence. This function has been provided for existing code makes the assumption that a ‘NULL’ edge indicates the end of the sequence. The convenience macro ‘FOR_EACH_EDGE’ can be used to visit all of the edges in a sequence of predecessor or successor edges. It must not be used when an element might be removed during the traversal, otherwise elements will be missed. Here is an example of how to use the macro: edge e; edge_iterator ei; FOR_EACH_EDGE (e, ei, bb->succs) { if (e->flags & EDGE_FALLTHRU) break; } There are various reasons why control flow may transfer from one block to another. One possibility is that some instruction, for example a ‘CODE_LABEL’, in a linearized instruction stream just always starts a new basic block. In this case a “fall-thru” edge links the basic block to the first following basic block. But there are several other reasons why edges may be created. The ‘flags’ field of the ‘edge’ data type is used to store information about the type of edge we are dealing with. Each edge is of one of the following types: _jump_ No type flags are set for edges corresponding to jump instructions. These edges are used for unconditional or conditional jumps and in RTL also for table jumps. They are the easiest to manipulate as they may be freely redirected when the flow graph is not in SSA form. _fall-thru_ Fall-thru edges are present in case where the basic block may continue execution to the following one without branching. These edges have the ‘EDGE_FALLTHRU’ flag set. Unlike other types of edges, these edges must come into the basic block immediately following in the instruction stream. The function ‘force_nonfallthru’ is available to insert an unconditional jump in the case that redirection is needed. Note that this may require creation of a new basic block. _exception handling_ Exception handling edges represent possible control transfers from a trapping instruction to an exception handler. The definition of "trapping" varies. In C++, only function calls can throw, but for Ada exceptions like division by zero or segmentation fault are defined and thus each instruction possibly throwing this kind of exception needs to be handled as control flow instruction. Exception edges have the ‘EDGE_ABNORMAL’ and ‘EDGE_EH’ flags set. When updating the instruction stream it is easy to change possibly trapping instruction to non-trapping, by simply removing the exception edge. The opposite conversion is difficult, but should not happen anyway. The edges can be eliminated via ‘purge_dead_edges’ call. In the RTL representation, the destination of an exception edge is specified by ‘REG_EH_REGION’ note attached to the insn. In case of a trapping call the ‘EDGE_ABNORMAL_CALL’ flag is set too. In the ‘GIMPLE’ representation, this extra flag is not set. In the RTL representation, the predicate ‘may_trap_p’ may be used to check whether instruction still may trap or not. For the tree representation, the ‘tree_could_trap_p’ predicate is available, but this predicate only checks for possible memory traps, as in dereferencing an invalid pointer location. _sibling calls_ Sibling calls or tail calls terminate the function in a non-standard way and thus an edge to the exit must be present. ‘EDGE_SIBCALL’ and ‘EDGE_ABNORMAL’ are set in such case. These edges only exist in the RTL representation. _computed jumps_ Computed jumps contain edges to all labels in the function referenced from the code. All those edges have ‘EDGE_ABNORMAL’ flag set. The edges used to represent computed jumps often cause compile time performance problems, since functions consisting of many taken labels and many computed jumps may have _very_ dense flow graphs, so these edges need to be handled with special care. During the earlier stages of the compilation process, GCC tries to avoid such dense flow graphs by factoring computed jumps. For example, given the following series of jumps, goto *x; [ ... ] goto *x; [ ... ] goto *x; [ ... ] factoring the computed jumps results in the following code sequence which has a much simpler flow graph: goto y; [ ... ] goto y; [ ... ] goto y; [ ... ] y: goto *x; However, the classic problem with this transformation is that it has a runtime cost in there resulting code: An extra jump. Therefore, the computed jumps are un-factored in the later passes of the compiler (in the pass called ‘pass_duplicate_computed_gotos’). Be aware of that when you work on passes in that area. There have been numerous examples already where the compile time for code with unfactored computed jumps caused some serious headaches. _nonlocal goto handlers_ GCC allows nested functions to return into caller using a ‘goto’ to a label passed to as an argument to the callee. The labels passed to nested functions contain special code to cleanup after function call. Such sections of code are referred to as "nonlocal goto receivers". If a function contains such nonlocal goto receivers, an edge from the call to the label is created with the ‘EDGE_ABNORMAL’ and ‘EDGE_ABNORMAL_CALL’ flags set. _function entry points_ By definition, execution of function starts at basic block 0, so there is always an edge from the ‘ENTRY_BLOCK_PTR’ to basic block 0. There is no ‘GIMPLE’ representation for alternate entry points at this moment. In RTL, alternate entry points are specified by ‘CODE_LABEL’ with ‘LABEL_ALTERNATE_NAME’ defined. This feature is currently used for multiple entry point prologues and is limited to post-reload passes only. This can be used by back-ends to emit alternate prologues for functions called from different contexts. In future full support for multiple entry functions defined by Fortran 90 needs to be implemented. _function exits_ In the pre-reload representation a function terminates after the last instruction in the insn chain and no explicit return instructions are used. This corresponds to the fall-thru edge into exit block. After reload, optimal RTL epilogues are used that use explicit (conditional) return instructions that are represented by edges with no flags set.  File: gccint.info, Node: Profile information, Next: Maintaining the CFG, Prev: Edges, Up: Control Flow 15.3 Profile information ======================== In many cases a compiler must make a choice whether to trade speed in one part of code for speed in another, or to trade code size for code speed. In such cases it is useful to know information about how often some given block will be executed. That is the purpose for maintaining profile within the flow graph. GCC can handle profile information obtained through “profile feedback”, but it can also estimate branch probabilities based on statics and heuristics. The feedback based profile is produced by compiling the program with instrumentation, executing it on a train run and reading the numbers of executions of basic blocks and edges back to the compiler while re-compiling the program to produce the final executable. This method provides very accurate information about where a program spends most of its time on the train run. Whether it matches the average run of course depends on the choice of train data set, but several studies have shown that the behavior of a program usually changes just marginally over different data sets. When profile feedback is not available, the compiler may be asked to attempt to predict the behavior of each branch in the program using a set of heuristics (see ‘predict.def’ for details) and compute estimated frequencies of each basic block by propagating the probabilities over the graph. Each ‘basic_block’ contains two integer fields to represent profile information: ‘frequency’ and ‘count’. The ‘frequency’ is an estimation how often is basic block executed within a function. It is represented as an integer scaled in the range from 0 to ‘BB_FREQ_BASE’. The most frequently executed basic block in function is initially set to ‘BB_FREQ_BASE’ and the rest of frequencies are scaled accordingly. During optimization, the frequency of the most frequent basic block can both decrease (for instance by loop unrolling) or grow (for instance by cross-jumping optimization), so scaling sometimes has to be performed multiple times. The ‘count’ contains hard-counted numbers of execution measured during training runs and is nonzero only when profile feedback is available. This value is represented as the host's widest integer (typically a 64 bit integer) of the special type ‘gcov_type’. Most optimization passes can use only the frequency information of a basic block, but a few passes may want to know hard execution counts. The frequencies should always match the counts after scaling, however during updating of the profile information numerical error may accumulate into quite large errors. Each edge also contains a branch probability field: an integer in the range from 0 to ‘REG_BR_PROB_BASE’. It represents probability of passing control from the end of the ‘src’ basic block to the ‘dest’ basic block, i.e. the probability that control will flow along this edge. The ‘EDGE_FREQUENCY’ macro is available to compute how frequently a given edge is taken. There is a ‘count’ field for each edge as well, representing same information as for a basic block. The basic block frequencies are not represented in the instruction stream, but in the RTL representation the edge frequencies are represented for conditional jumps (via the ‘REG_BR_PROB’ macro) since they are used when instructions are output to the assembly file and the flow graph is no longer maintained. The probability that control flow arrives via a given edge to its destination basic block is called “reverse probability” and is not directly represented, but it may be easily computed from frequencies of basic blocks. Updating profile information is a delicate task that can unfortunately not be easily integrated with the CFG manipulation API. Many of the functions and hooks to modify the CFG, such as ‘redirect_edge_and_branch’, do not have enough information to easily update the profile, so updating it is in the majority of cases left up to the caller. It is difficult to uncover bugs in the profile updating code, because they manifest themselves only by producing worse code, and checking profile consistency is not possible because of numeric error accumulation. Hence special attention needs to be given to this issue in each pass that modifies the CFG. It is important to point out that ‘REG_BR_PROB_BASE’ and ‘BB_FREQ_BASE’ are both set low enough to be possible to compute second power of any frequency or probability in the flow graph, it is not possible to even square the ‘count’ field, as modern CPUs are fast enough to execute $2^32$ operations quickly.  File: gccint.info, Node: Maintaining the CFG, Next: Liveness information, Prev: Profile information, Up: Control Flow 15.4 Maintaining the CFG ======================== An important task of each compiler pass is to keep both the control flow graph and all profile information up-to-date. Reconstruction of the control flow graph after each pass is not an option, since it may be very expensive and lost profile information cannot be reconstructed at all. GCC has two major intermediate representations, and both use the ‘basic_block’ and ‘edge’ data types to represent control flow. Both representations share as much of the CFG maintenance code as possible. For each representation, a set of “hooks” is defined so that each representation can provide its own implementation of CFG manipulation routines when necessary. These hooks are defined in ‘cfghooks.h’. There are hooks for almost all common CFG manipulations, including block splitting and merging, edge redirection and creating and deleting basic blocks. These hooks should provide everything you need to maintain and manipulate the CFG in both the RTL and ‘GIMPLE’ representation. At the moment, the basic block boundaries are maintained transparently when modifying instructions, so there rarely is a need to move them manually (such as in case someone wants to output instruction outside basic block explicitly). In the RTL representation, each instruction has a ‘BLOCK_FOR_INSN’ value that represents pointer to the basic block that contains the instruction. In the ‘GIMPLE’ representation, the function ‘gimple_bb’ returns a pointer to the basic block containing the queried statement. When changes need to be applied to a function in its ‘GIMPLE’ representation, “GIMPLE statement iterators” should be used. These iterators provide an integrated abstraction of the flow graph and the instruction stream. Block statement iterators are constructed using the ‘gimple_stmt_iterator’ data structure and several modifiers are available, including the following: ‘gsi_start’ This function initializes a ‘gimple_stmt_iterator’ that points to the first non-empty statement in a basic block. ‘gsi_last’ This function initializes a ‘gimple_stmt_iterator’ that points to the last statement in a basic block. ‘gsi_end_p’ This predicate is ‘true’ if a ‘gimple_stmt_iterator’ represents the end of a basic block. ‘gsi_next’ This function takes a ‘gimple_stmt_iterator’ and makes it point to its successor. ‘gsi_prev’ This function takes a ‘gimple_stmt_iterator’ and makes it point to its predecessor. ‘gsi_insert_after’ This function inserts a statement after the ‘gimple_stmt_iterator’ passed in. The final parameter determines whether the statement iterator is updated to point to the newly inserted statement, or left pointing to the original statement. ‘gsi_insert_before’ This function inserts a statement before the ‘gimple_stmt_iterator’ passed in. The final parameter determines whether the statement iterator is updated to point to the newly inserted statement, or left pointing to the original statement. ‘gsi_remove’ This function removes the ‘gimple_stmt_iterator’ passed in and rechains the remaining statements in a basic block, if any. In the RTL representation, the macros ‘BB_HEAD’ and ‘BB_END’ may be used to get the head and end ‘rtx’ of a basic block. No abstract iterators are defined for traversing the insn chain, but you can just use ‘NEXT_INSN’ and ‘PREV_INSN’ instead. *Note Insns::. Usually a code manipulating pass simplifies the instruction stream and the flow of control, possibly eliminating some edges. This may for example happen when a conditional jump is replaced with an unconditional jump. Updating of edges is not transparent and each optimization pass is required to do so manually. However only few cases occur in practice. The pass may call ‘purge_dead_edges’ on a given basic block to remove superfluous edges, if any. Another common scenario is redirection of branch instructions, but this is best modeled as redirection of edges in the control flow graph and thus use of ‘redirect_edge_and_branch’ is preferred over more low level functions, such as ‘redirect_jump’ that operate on RTL chain only. The CFG hooks defined in ‘cfghooks.h’ should provide the complete API required for manipulating and maintaining the CFG. It is also possible that a pass has to insert control flow instruction into the middle of a basic block, thus creating an entry point in the middle of the basic block, which is impossible by definition: The block must be split to make sure it only has one entry point, i.e. the head of the basic block. The CFG hook ‘split_block’ may be used when an instruction in the middle of a basic block has to become the target of a jump or branch instruction. For a global optimizer, a common operation is to split edges in the flow graph and insert instructions on them. In the RTL representation, this can be easily done using the ‘insert_insn_on_edge’ function that emits an instruction "on the edge", caching it for a later ‘commit_edge_insertions’ call that will take care of moving the inserted instructions off the edge into the instruction stream contained in a basic block. This includes the creation of new basic blocks where needed. In the ‘GIMPLE’ representation, the equivalent functions are ‘gsi_insert_on_edge’ which inserts a block statement iterator on an edge, and ‘gsi_commit_edge_inserts’ which flushes the instruction to actual instruction stream. While debugging the optimization pass, the ‘verify_flow_info’ function may be useful to find bugs in the control flow graph updating code.  File: gccint.info, Node: Liveness information, Prev: Maintaining the CFG, Up: Control Flow 15.5 Liveness information ========================= Liveness information is useful to determine whether some register is "live" at given point of program, i.e. that it contains a value that may be used at a later point in the program. This information is used, for instance, during register allocation, as the pseudo registers only need to be assigned to a unique hard register or to a stack slot if they are live. The hard registers and stack slots may be freely reused for other values when a register is dead. Liveness information is available in the back end starting with ‘pass_df_initialize’ and ending with ‘pass_df_finish’. Three flavors of live analysis are available: With ‘LR’, it is possible to determine at any point ‘P’ in the function if the register may be used on some path from ‘P’ to the end of the function. With ‘UR’, it is possible to determine if there is a path from the beginning of the function to ‘P’ that defines the variable. ‘LIVE’ is the intersection of the ‘LR’ and ‘UR’ and a variable is live at ‘P’ if there is both an assignment that reaches it from the beginning of the function and a use that can be reached on some path from ‘P’ to the end of the function. In general ‘LIVE’ is the most useful of the three. The macros ‘DF_[LR,UR,LIVE]_[IN,OUT]’ can be used to access this information. The macros take a basic block number and return a bitmap that is indexed by the register number. This information is only guaranteed to be up to date after calls are made to ‘df_analyze’. See the file ‘df-core.cc’ for details on using the dataflow. The liveness information is stored partly in the RTL instruction stream and partly in the flow graph. Local information is stored in the instruction stream: Each instruction may contain ‘REG_DEAD’ notes representing that the value of a given register is no longer needed, or ‘REG_UNUSED’ notes representing that the value computed by the instruction is never used. The second is useful for instructions computing multiple values at once.  File: gccint.info, Node: Loop Analysis and Representation, Next: Machine Desc, Prev: Control Flow, Up: Top 16 Analysis and Representation of Loops *************************************** GCC provides extensive infrastructure for work with natural loops, i.e., strongly connected components of CFG with only one entry block. This chapter describes representation of loops in GCC, both on GIMPLE and in RTL, as well as the interfaces to loop-related analyses (induction variable analysis and number of iterations analysis). * Menu: * Loop representation:: Representation and analysis of loops. * Loop querying:: Getting information about loops. * Loop manipulation:: Loop manipulation functions. * LCSSA:: Loop-closed SSA form. * Scalar evolutions:: Induction variables on GIMPLE. * loop-iv:: Induction variables on RTL. * Number of iterations:: Number of iterations analysis. * Dependency analysis:: Data dependency analysis.  File: gccint.info, Node: Loop representation, Next: Loop querying, Up: Loop Analysis and Representation 16.1 Loop representation ======================== This chapter describes the representation of loops in GCC, and functions that can be used to build, modify and analyze this representation. Most of the interfaces and data structures are declared in ‘cfgloop.h’. Loop structures are analyzed and this information disposed or updated at the discretion of individual passes. Still most of the generic CFG manipulation routines are aware of loop structures and try to keep them up-to-date. By this means an increasing part of the compilation pipeline is setup to maintain loop structure across passes to allow attaching meta information to individual loops for consumption by later passes. In general, a natural loop has one entry block (header) and possibly several back edges (latches) leading to the header from the inside of the loop. Loops with several latches may appear if several loops share a single header, or if there is a branching in the middle of the loop. The representation of loops in GCC however allows only loops with a single latch. During loop analysis, headers of such loops are split and forwarder blocks are created in order to disambiguate their structures. Heuristic based on profile information and structure of the induction variables in the loops is used to determine whether the latches correspond to sub-loops or to control flow in a single loop. This means that the analysis sometimes changes the CFG, and if you run it in the middle of an optimization pass, you must be able to deal with the new blocks. You may avoid CFG changes by passing ‘LOOPS_MAY_HAVE_MULTIPLE_LATCHES’ flag to the loop discovery, note however that most other loop manipulation functions will not work correctly for loops with multiple latch edges (the functions that only query membership of blocks to loops and subloop relationships, or enumerate and test loop exits, can be expected to work). Body of the loop is the set of blocks that are dominated by its header, and reachable from its latch against the direction of edges in CFG. The loops are organized in a containment hierarchy (tree) such that all the loops immediately contained inside loop L are the children of L in the tree. This tree is represented by the ‘struct loops’ structure. The root of this tree is a fake loop that contains all blocks in the function. Each of the loops is represented in a ‘struct loop’ structure. Each loop is assigned an index (‘num’ field of the ‘struct loop’ structure), and the pointer to the loop is stored in the corresponding field of the ‘larray’ vector in the loops structure. The indices do not have to be continuous, there may be empty (‘NULL’) entries in the ‘larray’ created by deleting loops. Also, there is no guarantee on the relative order of a loop and its subloops in the numbering. The index of a loop never changes. The entries of the ‘larray’ field should not be accessed directly. The function ‘get_loop’ returns the loop description for a loop with the given index. ‘number_of_loops’ function returns number of loops in the function. To traverse all loops, use a range-based for loop with class ‘loops_list’ instance. The ‘flags’ argument passed to the constructor function of class ‘loops_list’ is used to determine the direction of traversal and the set of loops visited. Each loop is guaranteed to be visited exactly once, regardless of the changes to the loop tree, and the loops may be removed during the traversal. The newly created loops are never traversed, if they need to be visited, this must be done separately after their creation. Each basic block contains the reference to the innermost loop it belongs to (‘loop_father’). For this reason, it is only possible to have one ‘struct loops’ structure initialized at the same time for each CFG. The global variable ‘current_loops’ contains the ‘struct loops’ structure. Many of the loop manipulation functions assume that dominance information is up-to-date. The loops are analyzed through ‘loop_optimizer_init’ function. The argument of this function is a set of flags represented in an integer bitmask. These flags specify what other properties of the loop structures should be calculated/enforced and preserved later: • ‘LOOPS_MAY_HAVE_MULTIPLE_LATCHES’: If this flag is set, no changes to CFG will be performed in the loop analysis, in particular, loops with multiple latch edges will not be disambiguated. If a loop has multiple latches, its latch block is set to NULL. Most of the loop manipulation functions will not work for loops in this shape. No other flags that require CFG changes can be passed to loop_optimizer_init. • ‘LOOPS_HAVE_PREHEADERS’: Forwarder blocks are created in such a way that each loop has only one entry edge, and additionally, the source block of this entry edge has only one successor. This creates a natural place where the code can be moved out of the loop, and ensures that the entry edge of the loop leads from its immediate super-loop. • ‘LOOPS_HAVE_SIMPLE_LATCHES’: Forwarder blocks are created to force the latch block of each loop to have only one successor. This ensures that the latch of the loop does not belong to any of its sub-loops, and makes manipulation with the loops significantly easier. Most of the loop manipulation functions assume that the loops are in this shape. Note that with this flag, the "normal" loop without any control flow inside and with one exit consists of two basic blocks. • ‘LOOPS_HAVE_MARKED_IRREDUCIBLE_REGIONS’: Basic blocks and edges in the strongly connected components that are not natural loops (have more than one entry block) are marked with ‘BB_IRREDUCIBLE_LOOP’ and ‘EDGE_IRREDUCIBLE_LOOP’ flags. The flag is not set for blocks and edges that belong to natural loops that are in such an irreducible region (but it is set for the entry and exit edges of such a loop, if they lead to/from this region). • ‘LOOPS_HAVE_RECORDED_EXITS’: The lists of exits are recorded and updated for each loop. This makes some functions (e.g., ‘get_loop_exit_edges’) more efficient. Some functions (e.g., ‘single_exit’) can be used only if the lists of exits are recorded. These properties may also be computed/enforced later, using functions ‘create_preheaders’, ‘force_single_succ_latches’, ‘mark_irreducible_loops’ and ‘record_loop_exits’. The properties can be queried using ‘loops_state_satisfies_p’. The memory occupied by the loops structures should be freed with ‘loop_optimizer_finalize’ function. When loop structures are setup to be preserved across passes this function reduces the information to be kept up-to-date to a minimum (only ‘LOOPS_MAY_HAVE_MULTIPLE_LATCHES’ set). The CFG manipulation functions in general do not update loop structures. Specialized versions that additionally do so are provided for the most common tasks. On GIMPLE, ‘cleanup_tree_cfg_loop’ function can be used to cleanup CFG while updating the loops structures if ‘current_loops’ is set. At the moment loop structure is preserved from the start of GIMPLE loop optimizations until the end of RTL loop optimizations. During this time a loop can be tracked by its ‘struct loop’ and number.  File: gccint.info, Node: Loop querying, Next: Loop manipulation, Prev: Loop representation, Up: Loop Analysis and Representation 16.2 Loop querying ================== The functions to query the information about loops are declared in ‘cfgloop.h’. Some of the information can be taken directly from the structures. ‘loop_father’ field of each basic block contains the innermost loop to that the block belongs. The most useful fields of loop structure (that are kept up-to-date at all times) are: • ‘header’, ‘latch’: Header and latch basic blocks of the loop. • ‘num_nodes’: Number of basic blocks in the loop (including the basic blocks of the sub-loops). • ‘outer’, ‘inner’, ‘next’: The super-loop, the first sub-loop, and the sibling of the loop in the loops tree. There are other fields in the loop structures, many of them used only by some of the passes, or not updated during CFG changes; in general, they should not be accessed directly. The most important functions to query loop structures are: • ‘loop_depth’: The depth of the loop in the loops tree, i.e., the number of super-loops of the loop. • ‘flow_loops_dump’: Dumps the information about loops to a file. • ‘verify_loop_structure’: Checks consistency of the loop structures. • ‘loop_latch_edge’: Returns the latch edge of a loop. • ‘loop_preheader_edge’: If loops have preheaders, returns the preheader edge of a loop. • ‘flow_loop_nested_p’: Tests whether loop is a sub-loop of another loop. • ‘flow_bb_inside_loop_p’: Tests whether a basic block belongs to a loop (including its sub-loops). • ‘find_common_loop’: Finds the common super-loop of two loops. • ‘superloop_at_depth’: Returns the super-loop of a loop with the given depth. • ‘tree_num_loop_insns’, ‘num_loop_insns’: Estimates the number of insns in the loop, on GIMPLE and on RTL. • ‘loop_exit_edge_p’: Tests whether edge is an exit from a loop. • ‘mark_loop_exit_edges’: Marks all exit edges of all loops with ‘EDGE_LOOP_EXIT’ flag. • ‘get_loop_body’, ‘get_loop_body_in_dom_order’, ‘get_loop_body_in_bfs_order’: Enumerates the basic blocks in the loop in depth-first search order in reversed CFG, ordered by dominance relation, and breath-first search order, respectively. • ‘single_exit’: Returns the single exit edge of the loop, or ‘NULL’ if the loop has more than one exit. You can only use this function if ‘LOOPS_HAVE_RECORDED_EXITS’ is used. • ‘get_loop_exit_edges’: Enumerates the exit edges of a loop. • ‘just_once_each_iteration_p’: Returns true if the basic block is executed exactly once during each iteration of a loop (that is, it does not belong to a sub-loop, and it dominates the latch of the loop).  File: gccint.info, Node: Loop manipulation, Next: LCSSA, Prev: Loop querying, Up: Loop Analysis and Representation 16.3 Loop manipulation ====================== The loops tree can be manipulated using the following functions: • ‘flow_loop_tree_node_add’: Adds a node to the tree. • ‘flow_loop_tree_node_remove’: Removes a node from the tree. • ‘add_bb_to_loop’: Adds a basic block to a loop. • ‘remove_bb_from_loops’: Removes a basic block from loops. Most low-level CFG functions update loops automatically. The following functions handle some more complicated cases of CFG manipulations: • ‘remove_path’: Removes an edge and all blocks it dominates. • ‘split_loop_exit_edge’: Splits exit edge of the loop, ensuring that PHI node arguments remain in the loop (this ensures that loop-closed SSA form is preserved). Only useful on GIMPLE. Finally, there are some higher-level loop transformations implemented. While some of them are written so that they should work on non-innermost loops, they are mostly untested in that case, and at the moment, they are only reliable for the innermost loops: • ‘create_iv’: Creates a new induction variable. Only works on GIMPLE. ‘standard_iv_increment_position’ can be used to find a suitable place for the iv increment. • ‘duplicate_loop_body_to_header_edge’, ‘tree_duplicate_loop_body_to_header_edge’: These functions (on RTL and on GIMPLE) duplicate the body of the loop prescribed number of times on one of the edges entering loop header, thus performing either loop unrolling or loop peeling. ‘can_duplicate_loop_p’ (‘can_unroll_loop_p’ on GIMPLE) must be true for the duplicated loop. • ‘loop_version’: This function creates a copy of a loop, and a branch before them that selects one of them depending on the prescribed condition. This is useful for optimizations that need to verify some assumptions in runtime (one of the copies of the loop is usually left unchanged, while the other one is transformed in some way). • ‘tree_unroll_loop’: Unrolls the loop, including peeling the extra iterations to make the number of iterations divisible by unroll factor, updating the exit condition, and removing the exits that now cannot be taken. Works only on GIMPLE.  File: gccint.info, Node: LCSSA, Next: Scalar evolutions, Prev: Loop manipulation, Up: Loop Analysis and Representation 16.4 Loop-closed SSA form ========================= Throughout the loop optimizations on tree level, one extra condition is enforced on the SSA form: No SSA name is used outside of the loop in that it is defined. The SSA form satisfying this condition is called "loop-closed SSA form" - LCSSA. To enforce LCSSA, PHI nodes must be created at the exits of the loops for the SSA names that are used outside of them. Only the real operands (not virtual SSA names) are held in LCSSA, in order to save memory. There are various benefits of LCSSA: • Many optimizations (value range analysis, final value replacement) are interested in the values that are defined in the loop and used outside of it, i.e., exactly those for that we create new PHI nodes. • In induction variable analysis, it is not necessary to specify the loop in that the analysis should be performed - the scalar evolution analysis always returns the results with respect to the loop in that the SSA name is defined. • It makes updating of SSA form during loop transformations simpler. Without LCSSA, operations like loop unrolling may force creation of PHI nodes arbitrarily far from the loop, while in LCSSA, the SSA form can be updated locally. However, since we only keep real operands in LCSSA, we cannot use this advantage (we could have local updating of real operands, but it is not much more efficient than to use generic SSA form updating for it as well; the amount of changes to SSA is the same). However, it also means LCSSA must be updated. This is usually straightforward, unless you create a new value in loop and use it outside, or unless you manipulate loop exit edges (functions are provided to make these manipulations simple). ‘rewrite_into_loop_closed_ssa’ is used to rewrite SSA form to LCSSA, and ‘verify_loop_closed_ssa’ to check that the invariant of LCSSA is preserved.  File: gccint.info, Node: Scalar evolutions, Next: loop-iv, Prev: LCSSA, Up: Loop Analysis and Representation 16.5 Scalar evolutions ====================== Scalar evolutions (SCEV) are used to represent results of induction variable analysis on GIMPLE. They enable us to represent variables with complicated behavior in a simple and consistent way (we only use it to express values of polynomial induction variables, but it is possible to extend it). The interfaces to SCEV analysis are declared in ‘tree-scalar-evolution.h’. To use scalar evolutions analysis, ‘scev_initialize’ must be used. To stop using SCEV, ‘scev_finalize’ should be used. SCEV analysis caches results in order to save time and memory. This cache however is made invalid by most of the loop transformations, including removal of code. If such a transformation is performed, ‘scev_reset’ must be called to clean the caches. Given an SSA name, its behavior in loops can be analyzed using the ‘analyze_scalar_evolution’ function. The returned SCEV however does not have to be fully analyzed and it may contain references to other SSA names defined in the loop. To resolve these (potentially recursive) references, ‘instantiate_parameters’ or ‘resolve_mixers’ functions must be used. ‘instantiate_parameters’ is useful when you use the results of SCEV only for some analysis, and when you work with whole nest of loops at once. It will try replacing all SSA names by their SCEV in all loops, including the super-loops of the current loop, thus providing a complete information about the behavior of the variable in the loop nest. ‘resolve_mixers’ is useful if you work with only one loop at a time, and if you possibly need to create code based on the value of the induction variable. It will only resolve the SSA names defined in the current loop, leaving the SSA names defined outside unchanged, even if their evolution in the outer loops is known. The SCEV is a normal tree expression, except for the fact that it may contain several special tree nodes. One of them is ‘SCEV_NOT_KNOWN’, used for SSA names whose value cannot be expressed. The other one is ‘POLYNOMIAL_CHREC’. Polynomial chrec has three arguments - base, step and loop (both base and step may contain further polynomial chrecs). Type of the expression and of base and step must be the same. A variable has evolution ‘POLYNOMIAL_CHREC(base, step, loop)’ if it is (in the specified loop) equivalent to ‘x_1’ in the following example while (...) { x_1 = phi (base, x_2); x_2 = x_1 + step; } Note that this includes the language restrictions on the operations. For example, if we compile C code and ‘x’ has signed type, then the overflow in addition would cause undefined behavior, and we may assume that this does not happen. Hence, the value with this SCEV cannot overflow (which restricts the number of iterations of such a loop). In many cases, one wants to restrict the attention just to affine induction variables. In this case, the extra expressive power of SCEV is not useful, and may complicate the optimizations. In this case, ‘simple_iv’ function may be used to analyze a value - the result is a loop-invariant base and step.  File: gccint.info, Node: loop-iv, Next: Number of iterations, Prev: Scalar evolutions, Up: Loop Analysis and Representation 16.6 IV analysis on RTL ======================= The induction variable on RTL is simple and only allows analysis of affine induction variables, and only in one loop at once. The interface is declared in ‘cfgloop.h’. Before analyzing induction variables in a loop L, ‘iv_analysis_loop_init’ function must be called on L. After the analysis (possibly calling ‘iv_analysis_loop_init’ for several loops) is finished, ‘iv_analysis_done’ should be called. The following functions can be used to access the results of the analysis: • ‘iv_analyze’: Analyzes a single register used in the given insn. If no use of the register in this insn is found, the following insns are scanned, so that this function can be called on the insn returned by get_condition. • ‘iv_analyze_result’: Analyzes result of the assignment in the given insn. • ‘iv_analyze_expr’: Analyzes a more complicated expression. All its operands are analyzed by ‘iv_analyze’, and hence they must be used in the specified insn or one of the following insns. The description of the induction variable is provided in ‘struct rtx_iv’. In order to handle subregs, the representation is a bit complicated; if the value of the ‘extend’ field is not ‘UNKNOWN’, the value of the induction variable in the i-th iteration is delta + mult * extend_{extend_mode} (subreg_{mode} (base + i * step)), with the following exception: if ‘first_special’ is true, then the value in the first iteration (when ‘i’ is zero) is ‘delta + mult * base’. However, if ‘extend’ is equal to ‘UNKNOWN’, then ‘first_special’ must be false, ‘delta’ 0, ‘mult’ 1 and the value in the i-th iteration is subreg_{mode} (base + i * step) The function ‘get_iv_value’ can be used to perform these calculations.  File: gccint.info, Node: Number of iterations, Next: Dependency analysis, Prev: loop-iv, Up: Loop Analysis and Representation 16.7 Number of iterations analysis ================================== Both on GIMPLE and on RTL, there are functions available to determine the number of iterations of a loop, with a similar interface. The number of iterations of a loop in GCC is defined as the number of executions of the loop latch. In many cases, it is not possible to determine the number of iterations unconditionally - the determined number is correct only if some assumptions are satisfied. The analysis tries to verify these conditions using the information contained in the program; if it fails, the conditions are returned together with the result. The following information and conditions are provided by the analysis: • ‘assumptions’: If this condition is false, the rest of the information is invalid. • ‘noloop_assumptions’ on RTL, ‘may_be_zero’ on GIMPLE: If this condition is true, the loop exits in the first iteration. • ‘infinite’: If this condition is true, the loop is infinite. This condition is only available on RTL. On GIMPLE, conditions for finiteness of the loop are included in ‘assumptions’. • ‘niter_expr’ on RTL, ‘niter’ on GIMPLE: The expression that gives number of iterations. The number of iterations is defined as the number of executions of the loop latch. Both on GIMPLE and on RTL, it necessary for the induction variable analysis framework to be initialized (SCEV on GIMPLE, loop-iv on RTL). On GIMPLE, the results are stored to ‘struct tree_niter_desc’ structure. Number of iterations before the loop is exited through a given exit can be determined using ‘number_of_iterations_exit’ function. On RTL, the results are returned in ‘struct niter_desc’ structure. The corresponding function is named ‘check_simple_exit’. There are also functions that pass through all the exits of a loop and try to find one with easy to determine number of iterations - ‘find_loop_niter’ on GIMPLE and ‘find_simple_exit’ on RTL. Finally, there are functions that provide the same information, but additionally cache it, so that repeated calls to number of iterations are not so costly - ‘number_of_latch_executions’ on GIMPLE and ‘get_simple_loop_desc’ on RTL. Note that some of these functions may behave slightly differently than others - some of them return only the expression for the number of iterations, and fail if there are some assumptions. The function ‘number_of_latch_executions’ works only for single-exit loops. The function ‘number_of_cond_exit_executions’ can be used to determine number of executions of the exit condition of a single-exit loop (i.e., the ‘number_of_latch_executions’ increased by one). On GIMPLE, below constraint flags affect semantics of some APIs of number of iterations analyzer: • ‘LOOP_C_INFINITE’: If this constraint flag is set, the loop is known to be infinite. APIs like ‘number_of_iterations_exit’ can return false directly without doing any analysis. • ‘LOOP_C_FINITE’: If this constraint flag is set, the loop is known to be finite, in other words, loop's number of iterations can be computed with ‘assumptions’ be true. Generally, the constraint flags are set/cleared by consumers which are loop optimizers. It's also the consumers' responsibility to set/clear constraints correctly. Failing to do that might result in hard to track down bugs in scev/niter consumers. One typical use case is vectorizer: it drives number of iterations analyzer by setting ‘LOOP_C_FINITE’ and vectorizes possibly infinite loop by versioning loop with analysis result. In return, constraints set by consumers can also help number of iterations analyzer in following optimizers. For example, ‘niter’ of a loop versioned under ‘assumptions’ is valid unconditionally. Other constraints may be added in the future, for example, a constraint indicating that loops' latch must roll thus ‘may_be_zero’ would be false unconditionally.  File: gccint.info, Node: Dependency analysis, Prev: Number of iterations, Up: Loop Analysis and Representation 16.8 Data Dependency Analysis ============================= The code for the data dependence analysis can be found in ‘tree-data-ref.cc’ and its interface and data structures are described in ‘tree-data-ref.h’. The function that computes the data dependences for all the array and pointer references for a given loop is ‘compute_data_dependences_for_loop’. This function is currently used by the linear loop transform and the vectorization passes. Before calling this function, one has to allocate two vectors: a first vector will contain the set of data references that are contained in the analyzed loop body, and the second vector will contain the dependence relations between the data references. Thus if the vector of data references is of size ‘n’, the vector containing the dependence relations will contain ‘n*n’ elements. However if the analyzed loop contains side effects, such as calls that potentially can interfere with the data references in the current analyzed loop, the analysis stops while scanning the loop body for data references, and inserts a single ‘chrec_dont_know’ in the dependence relation array. The data references are discovered in a particular order during the scanning of the loop body: the loop body is analyzed in execution order, and the data references of each statement are pushed at the end of the data reference array. Two data references syntactically occur in the program in the same order as in the array of data references. This syntactic order is important in some classical data dependence tests, and mapping this order to the elements of this array avoids costly queries to the loop body representation. Three types of data references are currently handled: ARRAY_REF, INDIRECT_REF and COMPONENT_REF. The data structure for the data reference is ‘data_reference’, where ‘data_reference_p’ is a name of a pointer to the data reference structure. The structure contains the following elements: • ‘base_object_info’: Provides information about the base object of the data reference and its access functions. These access functions represent the evolution of the data reference in the loop relative to its base, in keeping with the classical meaning of the data reference access function for the support of arrays. For example, for a reference ‘a.b[i][j]’, the base object is ‘a.b’ and the access functions, one for each array subscript, are: ‘{i_init, + i_step}_1, {j_init, +, j_step}_2’. • ‘first_location_in_loop’: Provides information about the first location accessed by the data reference in the loop and about the access function used to represent evolution relative to this location. This data is used to support pointers, and is not used for arrays (for which we have base objects). Pointer accesses are represented as a one-dimensional access that starts from the first location accessed in the loop. For example: for1 i for2 j *((int *)p + i + j) = a[i][j]; The access function of the pointer access is ‘{0, + 4B}_for2’ relative to ‘p + i’. The access functions of the array are ‘{i_init, + i_step}_for1’ and ‘{j_init, +, j_step}_for2’ relative to ‘a’. Usually, the object the pointer refers to is either unknown, or we cannot prove that the access is confined to the boundaries of a certain object. Two data references can be compared only if at least one of these two representations has all its fields filled for both data references. The current strategy for data dependence tests is as follows: If both ‘a’ and ‘b’ are represented as arrays, compare ‘a.base_object’ and ‘b.base_object’; if they are equal, apply dependence tests (use access functions based on base_objects). Else if both ‘a’ and ‘b’ are represented as pointers, compare ‘a.first_location’ and ‘b.first_location’; if they are equal, apply dependence tests (use access functions based on first location). However, if ‘a’ and ‘b’ are represented differently, only try to prove that the bases are definitely different. • Aliasing information. • Alignment information. The structure describing the relation between two data references is ‘data_dependence_relation’ and the shorter name for a pointer to such a structure is ‘ddr_p’. This structure contains: • a pointer to each data reference, • a tree node ‘are_dependent’ that is set to ‘chrec_known’ if the analysis has proved that there is no dependence between these two data references, ‘chrec_dont_know’ if the analysis was not able to determine any useful result and potentially there could exist a dependence between these data references, and ‘are_dependent’ is set to ‘NULL_TREE’ if there exist a dependence relation between the data references, and the description of this dependence relation is given in the ‘subscripts’, ‘dir_vects’, and ‘dist_vects’ arrays, • a boolean that determines whether the dependence relation can be represented by a classical distance vector, • an array ‘subscripts’ that contains a description of each subscript of the data references. Given two array accesses a subscript is the tuple composed of the access functions for a given dimension. For example, given ‘A[f1][f2][f3]’ and ‘B[g1][g2][g3]’, there are three subscripts: ‘(f1, g1), (f2, g2), (f3, g3)’. • two arrays ‘dir_vects’ and ‘dist_vects’ that contain classical representations of the data dependences under the form of direction and distance dependence vectors, • an array of loops ‘loop_nest’ that contains the loops to which the distance and direction vectors refer to. Several functions for pretty printing the information extracted by the data dependence analysis are available: ‘dump_ddrs’ prints with a maximum verbosity the details of a data dependence relations array, ‘dump_dist_dir_vectors’ prints only the classical distance and direction vectors for a data dependence relations array, and ‘dump_data_references’ prints the details of the data references contained in a data reference array.  File: gccint.info, Node: Machine Desc, Next: Target Macros, Prev: Loop Analysis and Representation, Up: Top 17 Machine Descriptions *********************** A machine description has two parts: a file of instruction patterns (‘.md’ file) and a C header file of macro definitions. The ‘.md’ file for a target machine contains a pattern for each instruction that the target machine supports (or at least each instruction that is worth telling the compiler about). It may also contain comments. A semicolon causes the rest of the line to be a comment, unless the semicolon is inside a quoted string. See the next chapter for information on the C header file. * Menu: * Overview:: How the machine description is used. * Patterns:: How to write instruction patterns. * Example:: An explained example of a ‘define_insn’ pattern. * RTL Template:: The RTL template defines what insns match a pattern. * Output Template:: The output template says how to make assembler code from such an insn. * Output Statement:: For more generality, write C code to output the assembler code. * Compact Syntax:: Compact syntax for writing machine descriptors. * Predicates:: Controlling what kinds of operands can be used for an insn. * Constraints:: Fine-tuning operand selection. * Standard Names:: Names mark patterns to use for code generation. * Pattern Ordering:: When the order of patterns makes a difference. * Dependent Patterns:: Having one pattern may make you need another. * Jump Patterns:: Special considerations for patterns for jump insns. * Looping Patterns:: How to define patterns for special looping insns. * Insn Canonicalizations::Canonicalization of Instructions * Expander Definitions::Generating a sequence of several RTL insns for a standard operation. * Insn Splitting:: Splitting Instructions into Multiple Instructions. * Including Patterns:: Including Patterns in Machine Descriptions. * Peephole Definitions::Defining machine-specific peephole optimizations. * Insn Attributes:: Specifying the value of attributes for generated insns. * Conditional Execution::Generating ‘define_insn’ patterns for predication. * Define Subst:: Generating ‘define_insn’ and ‘define_expand’ patterns from other patterns. * Constant Definitions::Defining symbolic constants that can be used in the md file. * Iterators:: Using iterators to generate patterns from a template.  File: gccint.info, Node: Overview, Next: Patterns, Up: Machine Desc 17.1 Overview of How the Machine Description is Used ==================================================== There are three main conversions that happen in the compiler: 1. The front end reads the source code and builds a parse tree. 2. The parse tree is used to generate an RTL insn list based on named instruction patterns. 3. The insn list is matched against the RTL templates to produce assembler code. For the generate pass, only the names of the insns matter, from either a named ‘define_insn’ or a ‘define_expand’. The compiler will choose the pattern with the right name and apply the operands according to the documentation later in this chapter, without regard for the RTL template or operand constraints. Note that the names the compiler looks for are hard-coded in the compiler--it will ignore unnamed patterns and patterns with names it doesn't know about, but if you don't provide a named pattern it needs, it will abort. If a ‘define_insn’ is used, the template given is inserted into the insn list. If a ‘define_expand’ is used, one of three things happens, based on the condition logic. The condition logic may manually create new insns for the insn list, say via ‘emit_insn()’, and invoke ‘DONE’. For certain named patterns, it may invoke ‘FAIL’ to tell the compiler to use an alternate way of performing that task. If it invokes neither ‘DONE’ nor ‘FAIL’, the template given in the pattern is inserted, as if the ‘define_expand’ were a ‘define_insn’. Once the insn list is generated, various optimization passes convert, replace, and rearrange the insns in the insn list. This is where the ‘define_split’ and ‘define_peephole’ patterns get used, for example. Finally, the insn list's RTL is matched up with the RTL templates in the ‘define_insn’ patterns, and those patterns are used to emit the final assembly code. For this purpose, each named ‘define_insn’ acts like it's unnamed, since the names are ignored.  File: gccint.info, Node: Patterns, Next: Example, Prev: Overview, Up: Machine Desc 17.2 Everything about Instruction Patterns ========================================== A ‘define_insn’ expression is used to define instruction patterns to which insns may be matched. A ‘define_insn’ expression contains an incomplete RTL expression, with pieces to be filled in later, operand constraints that restrict how the pieces can be filled in, and an output template or C code to generate the assembler output. A ‘define_insn’ is an RTL expression containing four or five operands: 1. An optional name N. When a name is present, the compiler automically generates a C++ function ‘gen_N’ that takes the operands of the instruction as arguments and returns the instruction's rtx pattern. The compiler also assigns the instruction a unique code ‘CODE_FOR_N’, with all such codes belonging to an enum called ‘insn_code’. These names serve one of two purposes. The first is to indicate that the instruction performs a certain standard job for the RTL-generation pass of the compiler, such as a move, an addition, or a conditional jump. The second is to help the target generate certain target-specific operations, such as when implementing target-specific intrinsic functions. It is better to prefix target-specific names with the name of the target, to avoid any clash with current or future standard names. The absence of a name is indicated by writing an empty string where the name should go. Nameless instruction patterns are never used for generating RTL code, but they may permit several simpler insns to be combined later on. For the purpose of debugging the compiler, you may also specify a name beginning with the ‘*’ character. Such a name is used only for identifying the instruction in RTL dumps; it is equivalent to having a nameless pattern for all other purposes. Names beginning with the ‘*’ character are not required to be unique. The name may also have the form ‘@N’. This has the same effect as a name ‘N’, but in addition tells the compiler to generate further helper functions; see *note Parameterized Names:: for details. 2. The “RTL template”: This is a vector of incomplete RTL expressions which describe the semantics of the instruction (*note RTL Template::). It is incomplete because it may contain ‘match_operand’, ‘match_operator’, and ‘match_dup’ expressions that stand for operands of the instruction. If the vector has multiple elements, the RTL template is treated as a ‘parallel’ expression. 3. The condition: This is a string which contains a C expression. When the compiler attempts to match RTL against a pattern, the condition is evaluated. If the condition evaluates to ‘true’, the match is permitted. The condition may be an empty string, which is treated as always ‘true’. For a named pattern, the condition may not depend on the data in the insn being matched, but only the target-machine-type flags. The compiler needs to test these conditions during initialization in order to learn exactly which named instructions are available in a particular run. For nameless patterns, the condition is applied only when matching an individual insn, and only after the insn has matched the pattern's recognition template. The insn's operands may be found in the vector ‘operands’. An instruction condition cannot become more restrictive as compilation progresses. If the condition accepts a particular RTL instruction at one stage of compilation, it must continue to accept that instruction until the final pass. For example, ‘!reload_completed’ and ‘can_create_pseudo_p ()’ are both invalid instruction conditions, because they are true during the earlier RTL passes and false during the later ones. For the same reason, if a condition accepts an instruction before register allocation, it cannot later try to control register allocation by excluding certain register or value combinations. Although a condition cannot become more restrictive as compilation progresses, the condition for a nameless pattern _can_ become more permissive. For example, a nameless instruction can require ‘reload_completed’ to be true, in which case it only matches after register allocation. 4. The “output template” or “output statement”: This is either a string, or a fragment of C code which returns a string. When simple substitution isn't general enough, you can specify a piece of C code to compute the output. *Note Output Statement::. 5. The “insn attributes”: This is an optional vector containing the values of attributes for insns matching this pattern (*note Insn Attributes::).  File: gccint.info, Node: Example, Next: RTL Template, Prev: Patterns, Up: Machine Desc 17.3 Example of ‘define_insn’ ============================= Here is an example of an instruction pattern, taken from the machine description for the 68000/68020. (define_insn "tstsi" [(set (cc0) (match_operand:SI 0 "general_operand" "rm"))] "" "* { if (TARGET_68020 || ! ADDRESS_REG_P (operands[0])) return \"tstl %0\"; return \"cmpl #0,%0\"; }") This can also be written using braced strings: (define_insn "tstsi" [(set (cc0) (match_operand:SI 0 "general_operand" "rm"))] "" { if (TARGET_68020 || ! ADDRESS_REG_P (operands[0])) return "tstl %0"; return "cmpl #0,%0"; }) This describes an instruction which sets the condition codes based on the value of a general operand. It has no condition, so any insn with an RTL description of the form shown may be matched to this pattern. The name ‘tstsi’ means "test a ‘SImode’ value" and tells the RTL generation pass that, when it is necessary to test such a value, an insn to do so can be constructed using this pattern. The output control string is a piece of C code which chooses which output template to return based on the kind of operand and the specific type of CPU for which code is being generated. ‘"rm"’ is an operand constraint. Its meaning is explained below.  File: gccint.info, Node: RTL Template, Next: Output Template, Prev: Example, Up: Machine Desc 17.4 RTL Template ================= The RTL template is used to define which insns match the particular pattern and how to find their operands. For named patterns, the RTL template also says how to construct an insn from specified operands. Construction involves substituting specified operands into a copy of the template. Matching involves determining the values that serve as the operands in the insn being matched. Both of these activities are controlled by special expression types that direct matching and substitution of the operands. ‘(match_operand:M N PREDICATE CONSTRAINT)’ This expression is a placeholder for operand number N of the insn. When constructing an insn, operand number N will be substituted at this point. When matching an insn, whatever appears at this position in the insn will be taken as operand number N; but it must satisfy PREDICATE or this instruction pattern will not match at all. Operand numbers must be chosen consecutively counting from zero in each instruction pattern. There may be only one ‘match_operand’ expression in the pattern for each operand number. Usually operands are numbered in the order of appearance in ‘match_operand’ expressions. In the case of a ‘define_expand’, any operand numbers used only in ‘match_dup’ expressions have higher values than all other operand numbers. PREDICATE is a string that is the name of a function that accepts two arguments, an expression and a machine mode. *Note Predicates::. During matching, the function will be called with the putative operand as the expression and M as the mode argument (if M is not specified, ‘VOIDmode’ will be used, which normally causes PREDICATE to accept any mode). If it returns zero, this instruction pattern fails to match. PREDICATE may be an empty string; then it means no test is to be done on the operand, so anything which occurs in this position is valid. Most of the time, PREDICATE will reject modes other than M--but not always. For example, the predicate ‘address_operand’ uses M as the mode of memory ref that the address should be valid for. Many predicates accept ‘const_int’ nodes even though their mode is ‘VOIDmode’. CONSTRAINT controls reloading and the choice of the best register class to use for a value, as explained later (*note Constraints::). If the constraint would be an empty string, it can be omitted. People are often unclear on the difference between the constraint and the predicate. The predicate helps decide whether a given insn matches the pattern. The constraint plays no role in this decision; instead, it controls various decisions in the case of an insn which does match. ‘(match_scratch:M N CONSTRAINT)’ This expression is also a placeholder for operand number N and indicates that operand must be a ‘scratch’ or ‘reg’ expression. When matching patterns, this is equivalent to (match_operand:M N "scratch_operand" CONSTRAINT) but, when generating RTL, it produces a (‘scratch’:M) expression. If the last few expressions in a ‘parallel’ are ‘clobber’ expressions whose operands are either a hard register or ‘match_scratch’, the combiner can add or delete them when necessary. *Note Side Effects::. ‘(match_dup N)’ This expression is also a placeholder for operand number N. It is used when the operand needs to appear more than once in the insn. In construction, ‘match_dup’ acts just like ‘match_operand’: the operand is substituted into the insn being constructed. But in matching, ‘match_dup’ behaves differently. It assumes that operand number N has already been determined by a ‘match_operand’ appearing earlier in the recognition template, and it matches only an identical-looking expression. Note that ‘match_dup’ should not be used to tell the compiler that a particular register is being used for two operands (example: ‘add’ that adds one register to another; the second register is both an input operand and the output operand). Use a matching constraint (*note Simple Constraints::) for those. ‘match_dup’ is for the cases where one operand is used in two places in the template, such as an instruction that computes both a quotient and a remainder, where the opcode takes two input operands but the RTL template has to refer to each of those twice; once for the quotient pattern and once for the remainder pattern. ‘(match_operator:M N PREDICATE [OPERANDS...])’ This pattern is a kind of placeholder for a variable RTL expression code. When constructing an insn, it stands for an RTL expression whose expression code is taken from that of operand N, and whose operands are constructed from the patterns OPERANDS. When matching an expression, it matches an expression if the function PREDICATE returns nonzero on that expression _and_ the patterns OPERANDS match the operands of the expression. Suppose that the function ‘commutative_operator’ is defined as follows, to match any expression whose operator is one of the commutative arithmetic operators of RTL and whose mode is MODE: int commutative_operator (x, mode) rtx x; machine_mode mode; { enum rtx_code code = GET_CODE (x); if (GET_MODE (x) != mode) return 0; return (GET_RTX_CLASS (code) == RTX_COMM_ARITH || code == EQ || code == NE); } Then the following pattern will match any RTL expression consisting of a commutative operator applied to two general operands: (match_operator:SI 3 "commutative_operator" [(match_operand:SI 1 "general_operand" "g") (match_operand:SI 2 "general_operand" "g")]) Here the vector ‘[OPERANDS...]’ contains two patterns because the expressions to be matched all contain two operands. When this pattern does match, the two operands of the commutative operator are recorded as operands 1 and 2 of the insn. (This is done by the two instances of ‘match_operand’.) Operand 3 of the insn will be the entire commutative expression: use ‘GET_CODE (operands[3])’ to see which commutative operator was used. The machine mode M of ‘match_operator’ works like that of ‘match_operand’: it is passed as the second argument to the predicate function, and that function is solely responsible for deciding whether the expression to be matched "has" that mode. When constructing an insn, argument 3 of the gen-function will specify the operation (i.e. the expression code) for the expression to be made. It should be an RTL expression, whose expression code is copied into a new expression whose operands are arguments 1 and 2 of the gen-function. The subexpressions of argument 3 are not used; only its expression code matters. When ‘match_operator’ is used in a pattern for matching an insn, it usually best if the operand number of the ‘match_operator’ is higher than that of the actual operands of the insn. This improves register allocation because the register allocator often looks at operands 1 and 2 of insns to see if it can do register tying. There is no way to specify constraints in ‘match_operator’. The operand of the insn which corresponds to the ‘match_operator’ never has any constraints because it is never reloaded as a whole. However, if parts of its OPERANDS are matched by ‘match_operand’ patterns, those parts may have constraints of their own. ‘(match_op_dup:M N[OPERANDS...])’ Like ‘match_dup’, except that it applies to operators instead of operands. When constructing an insn, operand number N will be substituted at this point. But in matching, ‘match_op_dup’ behaves differently. It assumes that operand number N has already been determined by a ‘match_operator’ appearing earlier in the recognition template, and it matches only an identical-looking expression. ‘(match_parallel N PREDICATE [SUBPAT...])’ This pattern is a placeholder for an insn that consists of a ‘parallel’ expression with a variable number of elements. This expression should only appear at the top level of an insn pattern. When constructing an insn, operand number N will be substituted at this point. When matching an insn, it matches if the body of the insn is a ‘parallel’ expression with at least as many elements as the vector of SUBPAT expressions in the ‘match_parallel’, if each SUBPAT matches the corresponding element of the ‘parallel’, _and_ the function PREDICATE returns nonzero on the ‘parallel’ that is the body of the insn. It is the responsibility of the predicate to validate elements of the ‘parallel’ beyond those listed in the ‘match_parallel’. A typical use of ‘match_parallel’ is to match load and store multiple expressions, which can contain a variable number of elements in a ‘parallel’. For example, (define_insn "" [(match_parallel 0 "load_multiple_operation" [(set (match_operand:SI 1 "gpc_reg_operand" "=r") (match_operand:SI 2 "memory_operand" "m")) (use (reg:SI 179)) (clobber (reg:SI 179))])] "" "loadm 0,0,%1,%2") This example comes from ‘a29k.md’. The function ‘load_multiple_operation’ is defined in ‘a29k.c’ and checks that subsequent elements in the ‘parallel’ are the same as the ‘set’ in the pattern, except that they are referencing subsequent registers and memory locations. An insn that matches this pattern might look like: (parallel [(set (reg:SI 20) (mem:SI (reg:SI 100))) (use (reg:SI 179)) (clobber (reg:SI 179)) (set (reg:SI 21) (mem:SI (plus:SI (reg:SI 100) (const_int 4)))) (set (reg:SI 22) (mem:SI (plus:SI (reg:SI 100) (const_int 8))))]) ‘(match_par_dup N [SUBPAT...])’ Like ‘match_op_dup’, but for ‘match_parallel’ instead of ‘match_operator’.  File: gccint.info, Node: Output Template, Next: Output Statement, Prev: RTL Template, Up: Machine Desc 17.5 Output Templates and Operand Substitution ============================================== The “output template” is a string which specifies how to output the assembler code for an instruction pattern. Most of the template is a fixed string which is output literally. The character ‘%’ is used to specify where to substitute an operand; it can also be used to identify places where different variants of the assembler require different syntax. In the simplest case, a ‘%’ followed by a digit N says to output operand N at that point in the string. ‘%’ followed by a letter and a digit says to output an operand in an alternate fashion. Four letters have standard, built-in meanings described below. The machine description macro ‘PRINT_OPERAND’ can define additional letters with nonstandard meanings. ‘%cDIGIT’ can be used to substitute an operand that is a constant value without the syntax that normally indicates an immediate operand. ‘%nDIGIT’ is like ‘%cDIGIT’ except that the value of the constant is negated before printing. ‘%aDIGIT’ can be used to substitute an operand as if it were a memory reference, with the actual operand treated as the address. This may be useful when outputting a "load address" instruction, because often the assembler syntax for such an instruction requires you to write the operand as if it were a memory reference. ‘%lDIGIT’ is used to substitute a ‘label_ref’ into a jump instruction. ‘%=’ outputs a number which is unique to each instruction in the entire compilation. This is useful for making local labels to be referred to more than once in a single template that generates multiple assembler instructions. ‘%’ followed by a punctuation character specifies a substitution that does not use an operand. Only one case is standard: ‘%%’ outputs a ‘%’ into the assembler code. Other nonstandard cases can be defined in the ‘PRINT_OPERAND’ macro. You must also define which punctuation characters are valid with the ‘PRINT_OPERAND_PUNCT_VALID_P’ macro. The template may generate multiple assembler instructions. Write the text for the instructions, with ‘\;’ between them. When the RTL contains two operands which are required by constraint to match each other, the output template must refer only to the lower-numbered operand. Matching operands are not always identical, and the rest of the compiler arranges to put the proper RTL expression for printing into the lower-numbered operand. One use of nonstandard letters or punctuation following ‘%’ is to distinguish between different assembler languages for the same machine; for example, Motorola syntax versus MIT syntax for the 68000. Motorola syntax requires periods in most opcode names, while MIT syntax does not. For example, the opcode ‘movel’ in MIT syntax is ‘move.l’ in Motorola syntax. The same file of patterns is used for both kinds of output syntax, but the character sequence ‘%.’ is used in each place where Motorola syntax wants a period. The ‘PRINT_OPERAND’ macro for Motorola syntax defines the sequence to output a period; the macro for MIT syntax defines it to do nothing. As a special case, a template consisting of the single character ‘#’ instructs the compiler to first split the insn, and then output the resulting instructions separately. This helps eliminate redundancy in the output templates. If you have a ‘define_insn’ that needs to emit multiple assembler instructions, and there is a matching ‘define_split’ already defined, then you can simply use ‘#’ as the output template instead of writing an output template that emits the multiple assembler instructions. Note that ‘#’ only has an effect while generating assembly code; it does not affect whether a split occurs earlier. An associated ‘define_split’ must exist and it must be suitable for use after register allocation. If the macro ‘ASSEMBLER_DIALECT’ is defined, you can use construct of the form ‘{option0|option1|option2}’ in the templates. These describe multiple variants of assembler language syntax. *Note Instruction Output::.  File: gccint.info, Node: Output Statement, Next: Compact Syntax, Prev: Output Template, Up: Machine Desc 17.6 C Statements for Assembler Output ====================================== Often a single fixed template string cannot produce correct and efficient assembler code for all the cases that are recognized by a single instruction pattern. For example, the opcodes may depend on the kinds of operands; or some unfortunate combinations of operands may require extra machine instructions. If the output control string starts with a ‘@’, then it is actually a series of templates, each on a separate line. (Blank lines and leading spaces and tabs are ignored.) The templates correspond to the pattern's constraint alternatives (*note Multi-Alternative::). For example, if a target machine has a two-address add instruction ‘addr’ to add into a register and another ‘addm’ to add a register to memory, you might write this pattern: (define_insn "addsi3" [(set (match_operand:SI 0 "general_operand" "=r,m") (plus:SI (match_operand:SI 1 "general_operand" "0,0") (match_operand:SI 2 "general_operand" "g,r")))] "" "@ addr %2,%0 addm %2,%0") If the output control string starts with a ‘*’, then it is not an output template but rather a piece of C program that should compute a template. It should execute a ‘return’ statement to return the template-string you want. Most such templates use C string literals, which require doublequote characters to delimit them. To include these doublequote characters in the string, prefix each one with ‘\’. If the output control string is written as a brace block instead of a double-quoted string, it is automatically assumed to be C code. In that case, it is not necessary to put in a leading asterisk, or to escape the doublequotes surrounding C string literals. The operands may be found in the array ‘operands’, whose C data type is ‘rtx []’. It is very common to select different ways of generating assembler code based on whether an immediate operand is within a certain range. Be careful when doing this, because the result of ‘INTVAL’ is an integer on the host machine. If the host machine has more bits in an ‘int’ than the target machine has in the mode in which the constant will be used, then some of the bits you get from ‘INTVAL’ will be superfluous. For proper results, you must carefully disregard the values of those bits. It is possible to output an assembler instruction and then go on to output or compute more of them, using the subroutine ‘output_asm_insn’. This receives two arguments: a template-string and a vector of operands. The vector may be ‘operands’, or it may be another array of ‘rtx’ that you declare locally and initialize yourself. When an insn pattern has multiple alternatives in its constraints, often the appearance of the assembler code is determined mostly by which alternative was matched. When this is so, the C code can test the variable ‘which_alternative’, which is the ordinal number of the alternative that was actually satisfied (0 for the first, 1 for the second alternative, etc.). For example, suppose there are two opcodes for storing zero, ‘clrreg’ for registers and ‘clrmem’ for memory locations. Here is how a pattern could use ‘which_alternative’ to choose between them: (define_insn "" [(set (match_operand:SI 0 "general_operand" "=r,m") (const_int 0))] "" { return (which_alternative == 0 ? "clrreg %0" : "clrmem %0"); }) The example above, where the assembler code to generate was _solely_ determined by the alternative, could also have been specified as follows, having the output control string start with a ‘@’: (define_insn "" [(set (match_operand:SI 0 "general_operand" "=r,m") (const_int 0))] "" "@ clrreg %0 clrmem %0") If you just need a little bit of C code in one (or a few) alternatives, you can use ‘*’ inside of a ‘@’ multi-alternative template: (define_insn "" [(set (match_operand:SI 0 "general_operand" "=r,<,m") (const_int 0))] "" "@ clrreg %0 * return stack_mem_p (operands[0]) ? \"push 0\" : \"clrmem %0\"; clrmem %0")  File: gccint.info, Node: Compact Syntax, Next: Predicates, Prev: Output Statement, Up: Machine Desc 17.7 Compact Syntax =================== When a ‘define_insn’ or ‘define_insn_and_split’ has multiple alternatives it may be beneficial to use the compact syntax when specifying alternatives. This syntax puts the constraints and attributes on the same horizontal line as the instruction assembly template. As an example (define_insn_and_split "" [(set (match_operand:SI 0 "nonimmediate_operand" "=r,k,r,r,r,r") (match_operand:SI 1 "aarch64_mov_operand" " r,r,k,M,n,Usv"))] "" "@ mov\\t%w0, %w1 mov\\t%w0, %w1 mov\\t%w0, %w1 mov\\t%w0, %1 # * return aarch64_output_sve_cnt_immediate ('cnt', '%x0', operands[1]);" "&& true" [(const_int 0)] { aarch64_expand_mov_immediate (operands[0], operands[1]); DONE; } [(set_attr "type" "mov_reg,mov_reg,mov_reg,mov_imm,mov_imm,mov_imm") (set_attr "arch" "*,*,*,*,*,sve") (set_attr "length" "4,4,4,4,*, 4") ] ) can be better expressed as: (define_insn_and_split "" [(set (match_operand:SI 0 "nonimmediate_operand") (match_operand:SI 1 "aarch64_mov_operand"))] "" {@ [cons: =0, 1; attrs: type, arch, length] [r , r ; mov_reg , * , 4] mov\t%w0, %w1 [k , r ; mov_reg , * , 4] ^ [r , k ; mov_reg , * , 4] ^ [r , M ; mov_imm , * , 4] mov\t%w0, %1 [r , n ; mov_imm , * , *] # [r , Usv; mov_imm , sve , 4] << aarch64_output_sve_cnt_immediate ("cnt", "%x0", operands[1]); } "&& true" [(const_int 0)] { aarch64_expand_mov_immediate (operands[0], operands[1]); DONE; } ) The syntax rules are as follows: • Templates must start with ‘{@’ to use the new syntax. • ‘{@’ is followed by a layout in square brackets which is ‘cons:’ followed by a comma-separated list of ‘match_operand’/‘match_scratch’ operand numbers, then a semicolon, followed by the same for attributes (‘attrs:’). Operand modifiers like ‘=’ and ‘+’ can be placed before an operand number. Both sections are optional (so you can use only ‘cons’, or only ‘attrs’, or both), and ‘cons’ must come before ‘attrs’ if present. • Each alternative begins with any amount of whitespace. • Following the whitespace is a comma-separated list of "constraints" and/or "attributes" within brackets ‘[]’, with sections separated by a semicolon. • Should you want to copy the previous asm line, the symbol ‘^’ can be used. This allows less copy pasting between alternative and reduces the number of lines to update on changes. • When using C functions for output, the idiom ‘* return FUNCTION;’ can be replaced with the shorthand ‘<< FUNCTION;’. • Following the closing ‘]’ is any amount of whitespace, and then the actual asm output. • Spaces are allowed in the list (they will simply be removed). • All constraint alternatives should be specified. For example, a list of of three blank alternatives should be written ‘[,,]’ rather than ‘[]’. • All attribute alternatives should be non-empty, with ‘*’ representing the default attribute value. For example, a list of three default attribute values should be written ‘[*,*,*]’ rather than ‘[]’. • Within an ‘{@’ block both multiline and singleline C comments are allowed, but when used outside of a C block they must be the only non-whitespace blocks on the line. • Within an ‘{@’ block, any iterators that do not get expanded will result in an error. If for some reason it is required to have ‘<’ or ‘>’ in the output then these must be escaped using ‘\’. • It is possible to use the ‘attrs’ list to specify some attributes and to use the normal ‘set_attr’ syntax to specify other attributes. There must not be any overlap between the two lists. In other words, the following is valid: (define_insn_and_split "" [(set (match_operand:SI 0 "nonimmediate_operand") (match_operand:SI 1 "aarch64_mov_operand"))] "" {@ [cons: 0, 1; attrs: type, arch, length]} ... [(set_attr "foo" "mov_imm")] ) but this is not valid: (define_insn_and_split "" [(set (match_operand:SI 0 "nonimmediate_operand") (match_operand:SI 1 "aarch64_mov_operand"))] "" {@ [cons: 0, 1; attrs: type, arch, length]} ... [(set_attr "arch" "bar") (set_attr "foo" "mov_imm")] ) because it specifies ‘arch’ twice.  File: gccint.info, Node: Predicates, Next: Constraints, Prev: Compact Syntax, Up: Machine Desc 17.8 Predicates =============== A predicate determines whether a ‘match_operand’ or ‘match_operator’ expression matches, and therefore whether the surrounding instruction pattern will be used for that combination of operands. GCC has a number of machine-independent predicates, and you can define machine-specific predicates as needed. By convention, predicates used with ‘match_operand’ have names that end in ‘_operand’, and those used with ‘match_operator’ have names that end in ‘_operator’. All predicates are boolean functions (in the mathematical sense) of two arguments: the RTL expression that is being considered at that position in the instruction pattern, and the machine mode that the ‘match_operand’ or ‘match_operator’ specifies. In this section, the first argument is called OP and the second argument MODE. Predicates can be called from C as ordinary two-argument functions; this can be useful in output templates or other machine-specific code. Operand predicates can allow operands that are not actually acceptable to the hardware, as long as the constraints give reload the ability to fix them up (*note Constraints::). However, GCC will usually generate better code if the predicates specify the requirements of the machine instructions as closely as possible. Reload cannot fix up operands that must be constants ("immediate operands"); you must use a predicate that allows only constants, or else enforce the requirement in the extra condition. Most predicates handle their MODE argument in a uniform manner. If MODE is ‘VOIDmode’ (unspecified), then OP can have any mode. If MODE is anything else, then OP must have the same mode, unless OP is a ‘CONST_INT’ or integer ‘CONST_DOUBLE’. These RTL expressions always have ‘VOIDmode’, so it would be counterproductive to check that their mode matches. Instead, predicates that accept ‘CONST_INT’ and/or integer ‘CONST_DOUBLE’ check that the value stored in the constant will fit in the requested mode. Predicates with this behavior are called “normal”. ‘genrecog’ can optimize the instruction recognizer based on knowledge of how normal predicates treat modes. It can also diagnose certain kinds of common errors in the use of normal predicates; for instance, it is almost always an error to use a normal predicate without specifying a mode. Predicates that do something different with their MODE argument are called “special”. The generic predicates ‘address_operand’ and ‘pmode_register_operand’ are special predicates. ‘genrecog’ does not do any optimizations or diagnosis when special predicates are used. * Menu: * Machine-Independent Predicates:: Predicates available to all back ends. * Defining Predicates:: How to write machine-specific predicate functions.  File: gccint.info, Node: Machine-Independent Predicates, Next: Defining Predicates, Up: Predicates 17.8.1 Machine-Independent Predicates ------------------------------------- These are the generic predicates available to all back ends. They are defined in ‘recog.cc’. The first category of predicates allow only constant, or “immediate”, operands. -- Function: immediate_operand This predicate allows any sort of constant that fits in MODE. It is an appropriate choice for instructions that take operands that must be constant. -- Function: const_int_operand This predicate allows any ‘CONST_INT’ expression that fits in MODE. It is an appropriate choice for an immediate operand that does not allow a symbol or label. -- Function: const_double_operand This predicate accepts any ‘CONST_DOUBLE’ expression that has exactly MODE. If MODE is ‘VOIDmode’, it will also accept ‘CONST_INT’. It is intended for immediate floating point constants. The second category of predicates allow only some kind of machine register. -- Function: register_operand This predicate allows any ‘REG’ or ‘SUBREG’ expression that is valid for MODE. It is often suitable for arithmetic instruction operands on a RISC machine. -- Function: pmode_register_operand This is a slight variant on ‘register_operand’ which works around a limitation in the machine-description reader. (match_operand N "pmode_register_operand" CONSTRAINT) means exactly what (match_operand:P N "register_operand" CONSTRAINT) would mean, if the machine-description reader accepted ‘:P’ mode suffixes. Unfortunately, it cannot, because ‘Pmode’ is an alias for some other mode, and might vary with machine-specific options. *Note Misc::. -- Function: scratch_operand This predicate allows hard registers and ‘SCRATCH’ expressions, but not pseudo-registers. It is used internally by ‘match_scratch’; it should not be used directly. The third category of predicates allow only some kind of memory reference. -- Function: memory_operand This predicate allows any valid reference to a quantity of mode MODE in memory, as determined by the weak form of ‘GO_IF_LEGITIMATE_ADDRESS’ (*note Addressing Modes::). -- Function: address_operand This predicate is a little unusual; it allows any operand that is a valid expression for the _address_ of a quantity of mode MODE, again determined by the weak form of ‘GO_IF_LEGITIMATE_ADDRESS’. To first order, if ‘(mem:MODE (EXP))’ is acceptable to ‘memory_operand’, then EXP is acceptable to ‘address_operand’. Note that EXP does not necessarily have the mode MODE. -- Function: indirect_operand This is a stricter form of ‘memory_operand’ which allows only memory references with a ‘general_operand’ as the address expression. New uses of this predicate are discouraged, because ‘general_operand’ is very permissive, so it's hard to tell what an ‘indirect_operand’ does or does not allow. If a target has different requirements for memory operands for different instructions, it is better to define target-specific predicates which enforce the hardware's requirements explicitly. -- Function: push_operand This predicate allows a memory reference suitable for pushing a value onto the stack. This will be a ‘MEM’ which refers to ‘stack_pointer_rtx’, with a side effect in its address expression (*note Incdec::); which one is determined by the ‘STACK_PUSH_CODE’ macro (*note Frame Layout::). -- Function: pop_operand This predicate allows a memory reference suitable for popping a value off the stack. Again, this will be a ‘MEM’ referring to ‘stack_pointer_rtx’, with a side effect in its address expression. However, this time ‘STACK_POP_CODE’ is expected. The fourth category of predicates allow some combination of the above operands. -- Function: nonmemory_operand This predicate allows any immediate or register operand valid for MODE. -- Function: nonimmediate_operand This predicate allows any register or memory operand valid for MODE. -- Function: general_operand This predicate allows any immediate, register, or memory operand valid for MODE. Finally, there are two generic operator predicates. -- Function: comparison_operator This predicate matches any expression which performs an arithmetic comparison in MODE; that is, ‘COMPARISON_P’ is true for the expression code. -- Function: ordered_comparison_operator This predicate matches any expression which performs an arithmetic comparison in MODE and whose expression code is valid for integer modes; that is, the expression code will be one of ‘eq’, ‘ne’, ‘lt’, ‘ltu’, ‘le’, ‘leu’, ‘gt’, ‘gtu’, ‘ge’, ‘geu’.  File: gccint.info, Node: Defining Predicates, Prev: Machine-Independent Predicates, Up: Predicates 17.8.2 Defining Machine-Specific Predicates ------------------------------------------- Many machines have requirements for their operands that cannot be expressed precisely using the generic predicates. You can define additional predicates using ‘define_predicate’ and ‘define_special_predicate’ expressions. These expressions have three operands: • The name of the predicate, as it will be referred to in ‘match_operand’ or ‘match_operator’ expressions. • An RTL expression which evaluates to true if the predicate allows the operand OP, false if it does not. This expression can only use the following RTL codes: ‘MATCH_OPERAND’ When written inside a predicate expression, a ‘MATCH_OPERAND’ expression evaluates to true if the predicate it names would allow OP. The operand number and constraint are ignored. Due to limitations in ‘genrecog’, you can only refer to generic predicates and predicates that have already been defined. ‘MATCH_CODE’ This expression evaluates to true if OP or a specified subexpression of OP has one of a given list of RTX codes. The first operand of this expression is a string constant containing a comma-separated list of RTX code names (in lower case). These are the codes for which the ‘MATCH_CODE’ will be true. The second operand is a string constant which indicates what subexpression of OP to examine. If it is absent or the empty string, OP itself is examined. Otherwise, the string constant must be a sequence of digits and/or lowercase letters. Each character indicates a subexpression to extract from the current expression; for the first character this is OP, for the second and subsequent characters it is the result of the previous character. A digit N extracts ‘XEXP (E, N)’; a letter L extracts ‘XVECEXP (E, 0, N)’ where N is the alphabetic ordinal of L (0 for 'a', 1 for 'b', and so on). The ‘MATCH_CODE’ then examines the RTX code of the subexpression extracted by the complete string. It is not possible to extract components of an ‘rtvec’ that is not at position 0 within its RTX object. ‘MATCH_TEST’ This expression has one operand, a string constant containing a C expression. The predicate's arguments, OP and MODE, are available with those names in the C expression. The ‘MATCH_TEST’ evaluates to true if the C expression evaluates to a nonzero value. ‘MATCH_TEST’ expressions must not have side effects. ‘AND’ ‘IOR’ ‘NOT’ ‘IF_THEN_ELSE’ The basic ‘MATCH_’ expressions can be combined using these logical operators, which have the semantics of the C operators ‘&&’, ‘||’, ‘!’, and ‘? :’ respectively. As in Common Lisp, you may give an ‘AND’ or ‘IOR’ expression an arbitrary number of arguments; this has exactly the same effect as writing a chain of two-argument ‘AND’ or ‘IOR’ expressions. • An optional block of C code, which should execute ‘return true’ if the predicate is found to match and ‘return false’ if it does not. It must not have any side effects. The predicate arguments, OP and MODE, are available with those names. If a code block is present in a predicate definition, then the RTL expression must evaluate to true _and_ the code block must execute ‘return true’ for the predicate to allow the operand. The RTL expression is evaluated first; do not re-check anything in the code block that was checked in the RTL expression. The program ‘genrecog’ scans ‘define_predicate’ and ‘define_special_predicate’ expressions to determine which RTX codes are possibly allowed. You should always make this explicit in the RTL predicate expression, using ‘MATCH_OPERAND’ and ‘MATCH_CODE’. Here is an example of a simple predicate definition, from the IA64 machine description: ;; True if OP is a ‘SYMBOL_REF’ which refers to the sdata section. (define_predicate "small_addr_symbolic_operand" (and (match_code "symbol_ref") (match_test "SYMBOL_REF_SMALL_ADDR_P (op)"))) And here is another, showing the use of the C block. ;; True if OP is a register operand that is (or could be) a GR reg. (define_predicate "gr_register_operand" (match_operand 0 "register_operand") { unsigned int regno; if (GET_CODE (op) == SUBREG) op = SUBREG_REG (op); regno = REGNO (op); return (regno >= FIRST_PSEUDO_REGISTER || GENERAL_REGNO_P (regno)); }) Predicates written with ‘define_predicate’ automatically include a test that MODE is ‘VOIDmode’, or OP has the same mode as MODE, or OP is a ‘CONST_INT’ or ‘CONST_DOUBLE’. They do _not_ check specifically for integer ‘CONST_DOUBLE’, nor do they test that the value of either kind of constant fits in the requested mode. This is because target-specific predicates that take constants usually have to do more stringent value checks anyway. If you need the exact same treatment of ‘CONST_INT’ or ‘CONST_DOUBLE’ that the generic predicates provide, use a ‘MATCH_OPERAND’ subexpression to call ‘const_int_operand’, ‘const_double_operand’, or ‘immediate_operand’. Predicates written with ‘define_special_predicate’ do not get any automatic mode checks, and are treated as having special mode handling by ‘genrecog’. The program ‘genpreds’ is responsible for generating code to test predicates. It also writes a header file containing function declarations for all machine-specific predicates. It is not necessary to declare these predicates in ‘CPU-protos.h’.  File: gccint.info, Node: Constraints, Next: Standard Names, Prev: Predicates, Up: Machine Desc 17.9 Operand Constraints ======================== Each ‘match_operand’ in an instruction pattern can specify constraints for the operands allowed. The constraints allow you to fine-tune matching within the set of operands allowed by the predicate. Constraints can say whether an operand may be in a register, and which kinds of register; whether the operand can be a memory reference, and which kinds of address; whether the operand may be an immediate constant, and which possible values it may have. Constraints can also require two operands to match. Side-effects aren't allowed in operands of inline ‘asm’, unless ‘<’ or ‘>’ constraints are used, because there is no guarantee that the side effects will happen exactly once in an instruction that can update the addressing register. * Menu: * Simple Constraints:: Basic use of constraints. * Multi-Alternative:: When an insn has two alternative constraint-patterns. * Class Preferences:: Constraints guide which hard register to put things in. * Modifiers:: More precise control over effects of constraints. * Machine Constraints:: Existing constraints for some particular machines. * Disable Insn Alternatives:: Disable insn alternatives using attributes. * Define Constraints:: How to define machine-specific constraints. * C Constraint Interface:: How to test constraints from C code.  File: gccint.info, Node: Simple Constraints, Next: Multi-Alternative, Up: Constraints 17.9.1 Simple Constraints ------------------------- The simplest kind of constraint is a string full of letters, each of which describes one kind of operand that is permitted. Here are the letters that are allowed: whitespace Whitespace characters are ignored and can be inserted at any position except the first. This enables each alternative for different operands to be visually aligned in the machine description even if they have different number of constraints and modifiers. ‘m’ A memory operand is allowed, with any kind of address that the machine supports in general. Note that the letter used for the general memory constraint can be re-defined by a back end using the ‘TARGET_MEM_CONSTRAINT’ macro. ‘o’ A memory operand is allowed, but only if the address is “offsettable”. This means that adding a small integer (actually, the width in bytes of the operand, as determined by its machine mode) may be added to the address and the result is also a valid memory address. For example, an address which is constant is offsettable; so is an address that is the sum of a register and a constant (as long as a slightly larger constant is also within the range of address-offsets supported by the machine); but an autoincrement or autodecrement address is not offsettable. More complicated indirect/indexed addresses may or may not be offsettable depending on the other addressing modes that the machine supports. Note that in an output operand which can be matched by another operand, the constraint letter ‘o’ is valid only when accompanied by both ‘<’ (if the target machine has predecrement addressing) and ‘>’ (if the target machine has preincrement addressing). ‘V’ A memory operand that is not offsettable. In other words, anything that would fit the ‘m’ constraint but not the ‘o’ constraint. ‘<’ A memory operand with autodecrement addressing (either predecrement or postdecrement) is allowed. In inline ‘asm’ this constraint is only allowed if the operand is used exactly once in an instruction that can handle the side effects. Not using an operand with ‘<’ in constraint string in the inline ‘asm’ pattern at all or using it in multiple instructions isn't valid, because the side effects wouldn't be performed or would be performed more than once. Furthermore, on some targets the operand with ‘<’ in constraint string must be accompanied by special instruction suffixes like ‘%U0’ instruction suffix on PowerPC or ‘%P0’ on IA-64. ‘>’ A memory operand with autoincrement addressing (either preincrement or postincrement) is allowed. In inline ‘asm’ the same restrictions as for ‘<’ apply. ‘r’ A register operand is allowed provided that it is in a general register. ‘i’ An immediate integer operand (one with constant value) is allowed. This includes symbolic constants whose values will be known only at assembly time or later. ‘n’ An immediate integer operand with a known numeric value is allowed. Many systems cannot support assembly-time constants for operands less than a word wide. Constraints for these operands should use ‘n’ rather than ‘i’. ‘I’, ‘J’, ‘K’, ... ‘P’ Other letters in the range ‘I’ through ‘P’ may be defined in a machine-dependent fashion to permit immediate integer operands with explicit integer values in specified ranges. For example, on the 68000, ‘I’ is defined to stand for the range of values 1 to 8. This is the range permitted as a shift count in the shift instructions. ‘E’ An immediate floating operand (expression code ‘const_double’) is allowed, but only if the target floating point format is the same as that of the host machine (on which the compiler is running). ‘F’ An immediate floating operand (expression code ‘const_double’ or ‘const_vector’) is allowed. ‘G’, ‘H’ ‘G’ and ‘H’ may be defined in a machine-dependent fashion to permit immediate floating operands in particular ranges of values. ‘s’ An immediate integer operand whose value is not an explicit integer is allowed. This might appear strange; if an insn allows a constant operand with a value not known at compile time, it certainly must allow any known value. So why use ‘s’ instead of ‘i’? Sometimes it allows better code to be generated. For example, on the 68000 in a fullword instruction it is possible to use an immediate operand; but if the immediate value is between −128 and 127, better code results from loading the value into a register and using the register. This is because the load into the register can be done with a ‘moveq’ instruction. We arrange for this to happen by defining the letter ‘K’ to mean "any integer outside the range −128 to 127", and then specifying ‘Ks’ in the operand constraints. ‘g’ Any register, memory or immediate integer operand is allowed, except for registers that are not general registers. ‘X’ Any operand whatsoever is allowed, even if it does not satisfy ‘general_operand’. This is normally used in the constraint of a ‘match_scratch’ when certain alternatives will not actually require a scratch register. ‘0’, ‘1’, ‘2’, ... ‘9’ An operand that matches the specified operand number is allowed. If a digit is used together with letters within the same alternative, the digit should come last. This number is allowed to be more than a single digit. If multiple digits are encountered consecutively, they are interpreted as a single decimal integer. There is scant chance for ambiguity, since to-date it has never been desirable that ‘10’ be interpreted as matching either operand 1 _or_ operand 0. Should this be desired, one can use multiple alternatives instead. This is called a “matching constraint” and what it really means is that the assembler has only a single operand that fills two roles considered separate in the RTL insn. For example, an add insn has two input operands and one output operand in the RTL, but on most CISC machines an add instruction really has only two operands, one of them an input-output operand: addl #35,r12 Matching constraints are used in these circumstances. More precisely, the two operands that match must include one input-only operand and one output-only operand. Moreover, the digit must be a smaller number than the number of the operand that uses it in the constraint. For operands to match in a particular case usually means that they are identical-looking RTL expressions. But in a few special cases specific kinds of dissimilarity are allowed. For example, ‘*x’ as an input operand will match ‘*x++’ as an output operand. For proper results in such cases, the output template should always use the output-operand's number when printing the operand. ‘p’ An operand that is a valid memory address is allowed. This is for "load address" and "push address" instructions. ‘p’ in the constraint must be accompanied by ‘address_operand’ as the predicate in the ‘match_operand’. This predicate interprets the mode specified in the ‘match_operand’ as the mode of the memory reference for which the address would be valid. OTHER-LETTERS Other letters can be defined in machine-dependent fashion to stand for particular classes of registers or other arbitrary operand types. ‘d’, ‘a’ and ‘f’ are defined on the 68000/68020 to stand for data, address and floating point registers. In order to have valid assembler code, each operand must satisfy its constraint. But a failure to do so does not prevent the pattern from applying to an insn. Instead, it directs the compiler to modify the code so that the constraint will be satisfied. Usually this is done by copying an operand into a register. Contrast, therefore, the two instruction patterns that follow: (define_insn "" [(set (match_operand:SI 0 "general_operand" "=r") (plus:SI (match_dup 0) (match_operand:SI 1 "general_operand" "r")))] "" "...") which has two operands, one of which must appear in two places, and (define_insn "" [(set (match_operand:SI 0 "general_operand" "=r") (plus:SI (match_operand:SI 1 "general_operand" "0") (match_operand:SI 2 "general_operand" "r")))] "" "...") which has three operands, two of which are required by a constraint to be identical. If we are considering an insn of the form (insn N PREV NEXT (set (reg:SI 3) (plus:SI (reg:SI 6) (reg:SI 109))) ...) the first pattern would not apply at all, because this insn does not contain two identical subexpressions in the right place. The pattern would say, "That does not look like an add instruction; try other patterns". The second pattern would say, "Yes, that's an add instruction, but there is something wrong with it". It would direct the reload pass of the compiler to generate additional insns to make the constraint true. The results might look like this: (insn N2 PREV N (set (reg:SI 3) (reg:SI 6)) ...) (insn N N2 NEXT (set (reg:SI 3) (plus:SI (reg:SI 3) (reg:SI 109))) ...) It is up to you to make sure that each operand, in each pattern, has constraints that can handle any RTL expression that could be present for that operand. (When multiple alternatives are in use, each pattern must, for each possible combination of operand expressions, have at least one alternative which can handle that combination of operands.) The constraints don't need to _allow_ any possible operand--when this is the case, they do not constrain--but they must at least point the way to reloading any possible operand so that it will fit. • If the constraint accepts whatever operands the predicate permits, there is no problem: reloading is never necessary for this operand. For example, an operand whose constraints permit everything except registers is safe provided its predicate rejects registers. An operand whose predicate accepts only constant values is safe provided its constraints include the letter ‘i’. If any possible constant value is accepted, then nothing less than ‘i’ will do; if the predicate is more selective, then the constraints may also be more selective. • Any operand expression can be reloaded by copying it into a register. So if an operand's constraints allow some kind of register, it is certain to be safe. It need not permit all classes of registers; the compiler knows how to copy a register into another register of the proper class in order to make an instruction valid. • A nonoffsettable memory reference can be reloaded by copying the address into a register. So if the constraint uses the letter ‘o’, all memory references are taken care of. • A constant operand can be reloaded by allocating space in memory to hold it as preinitialized data. Then the memory reference can be used in place of the constant. So if the constraint uses the letters ‘o’ or ‘m’, constant operands are not a problem. • If the constraint permits a constant and a pseudo register used in an insn was not allocated to a hard register and is equivalent to a constant, the register will be replaced with the constant. If the predicate does not permit a constant and the insn is re-recognized for some reason, the compiler will crash. Thus the predicate must always recognize any objects allowed by the constraint. If the operand's predicate can recognize registers, but the constraint does not permit them, it can make the compiler crash. When this operand happens to be a register, the reload pass will be stymied, because it does not know how to copy a register temporarily into memory. If the predicate accepts a unary operator, the constraint applies to the operand. For example, the MIPS processor at ISA level 3 supports an instruction which adds two registers in ‘SImode’ to produce a ‘DImode’ result, but only if the registers are correctly sign extended. This predicate for the input operands accepts a ‘sign_extend’ of an ‘SImode’ register. Write the constraint to indicate the type of register that is required for the operand of the ‘sign_extend’.  File: gccint.info, Node: Multi-Alternative, Next: Class Preferences, Prev: Simple Constraints, Up: Constraints 17.9.2 Multiple Alternative Constraints --------------------------------------- Sometimes a single instruction has multiple alternative sets of possible operands. For example, on the 68000, a logical-or instruction can combine register or an immediate value into memory, or it can combine any kind of operand into a register; but it cannot combine one memory location into another. These constraints are represented as multiple alternatives. An alternative can be described by a series of letters for each operand. The overall constraint for an operand is made from the letters for this operand from the first alternative, a comma, the letters for this operand from the second alternative, a comma, and so on until the last alternative. All operands for a single instruction must have the same number of alternatives. Here is how it is done for fullword logical-or on the 68000: (define_insn "iorsi3" [(set (match_operand:SI 0 "general_operand" "=m,d") (ior:SI (match_operand:SI 1 "general_operand" "%0,0") (match_operand:SI 2 "general_operand" "dKs,dmKs")))] ...) The first alternative has ‘m’ (memory) for operand 0, ‘0’ for operand 1 (meaning it must match operand 0), and ‘dKs’ for operand 2. The second alternative has ‘d’ (data register) for operand 0, ‘0’ for operand 1, and ‘dmKs’ for operand 2. The ‘=’ and ‘%’ in the constraints apply to all the alternatives; their meaning is explained in a later section (*note Modifiers::). If all the operands fit any one alternative, the instruction is valid. Otherwise, for each alternative, the compiler counts how many instructions must be added to copy the operands so that that alternative applies. The alternative requiring the least copying is chosen. If two alternatives need the same amount of copying, the one that comes first is chosen. These choices can be altered with the ‘?’ and ‘!’ characters: ‘?’ Disparage slightly the alternative that the ‘?’ appears in, as a choice when no alternative applies exactly. The compiler regards this alternative as one unit more costly for each ‘?’ that appears in it. ‘!’ Disparage severely the alternative that the ‘!’ appears in. This alternative can still be used if it fits without reloading, but if reloading is needed, some other alternative will be used. ‘^’ This constraint is analogous to ‘?’ but it disparages slightly the alternative only if the operand with the ‘^’ needs a reload. ‘$’ This constraint is analogous to ‘!’ but it disparages severely the alternative only if the operand with the ‘$’ needs a reload. When an insn pattern has multiple alternatives in its constraints, often the appearance of the assembler code is determined mostly by which alternative was matched. When this is so, the C code for writing the assembler code can use the variable ‘which_alternative’, which is the ordinal number of the alternative that was actually satisfied (0 for the first, 1 for the second alternative, etc.). *Note Output Statement::.  File: gccint.info, Node: Class Preferences, Next: Modifiers, Prev: Multi-Alternative, Up: Constraints 17.9.3 Register Class Preferences --------------------------------- The operand constraints have another function: they enable the compiler to decide which kind of hardware register a pseudo register is best allocated to. The compiler examines the constraints that apply to the insns that use the pseudo register, looking for the machine-dependent letters such as ‘d’ and ‘a’ that specify classes of registers. The pseudo register is put in whichever class gets the most "votes". The constraint letters ‘g’ and ‘r’ also vote: they vote in favor of a general register. The machine description says which registers are considered general. Of course, on some machines all registers are equivalent, and no register classes are defined. Then none of this complexity is relevant.  File: gccint.info, Node: Modifiers, Next: Machine Constraints, Prev: Class Preferences, Up: Constraints 17.9.4 Constraint Modifier Characters ------------------------------------- Here are constraint modifier characters. ‘=’ Means that this operand is written to by this instruction: the previous value is discarded and replaced by new data. ‘+’ Means that this operand is both read and written by the instruction. When the compiler fixes up the operands to satisfy the constraints, it needs to know which operands are read by the instruction and which are written by it. ‘=’ identifies an operand which is only written; ‘+’ identifies an operand that is both read and written; all other operands are assumed to only be read. If you specify ‘=’ or ‘+’ in a constraint, you put it in the first character of the constraint string. ‘&’ Means (in a particular alternative) that this operand is an “earlyclobber” operand, which is written before the instruction is finished using the input operands. Therefore, this operand may not lie in a register that is read by the instruction or as part of any memory address. ‘&’ applies only to the alternative in which it is written. In constraints with multiple alternatives, sometimes one alternative requires ‘&’ while others do not. See, for example, the ‘movdf’ insn of the 68000. An operand which is read by the instruction can be tied to an earlyclobber operand if its only use as an input occurs before the early result is written. Adding alternatives of this form often allows GCC to produce better code when only some of the read operands can be affected by the earlyclobber. See, for example, the ‘mulsi3’ insn of the ARM. Furthermore, if the “earlyclobber” operand is also a read/write operand, then that operand is written only after it's used. ‘&’ does not obviate the need to write ‘=’ or ‘+’. As “earlyclobber” operands are always written, a read-only “earlyclobber” operand is ill-formed and will be rejected by the compiler. ‘%’ Declares the instruction to be commutative for this operand and the following operand. This means that the compiler may interchange the two operands if that is the cheapest way to make all operands fit the constraints. ‘%’ applies to all alternatives and must appear as the first character in the constraint. Only read-only operands can use ‘%’. This is often used in patterns for addition instructions that really have only two operands: the result must go in one of the arguments. Here for example, is how the 68000 halfword-add instruction is defined: (define_insn "addhi3" [(set (match_operand:HI 0 "general_operand" "=m,r") (plus:HI (match_operand:HI 1 "general_operand" "%0,0") (match_operand:HI 2 "general_operand" "di,g")))] ...) GCC can only handle one commutative pair in an asm; if you use more, the compiler may fail. Note that you need not use the modifier if the two alternatives are strictly identical; this would only waste time in the reload pass. The modifier is not operational after register allocation, so the result of ‘define_peephole2’ and ‘define_split’s performed after reload cannot rely on ‘%’ to make the intended insn match. ‘#’ Says that all following characters, up to the next comma, are to be ignored as a constraint. They are significant only for choosing register preferences. ‘*’ Says that the following character should be ignored when choosing register preferences. ‘*’ has no effect on the meaning of the constraint as a constraint, and no effect on reloading. For LRA ‘*’ additionally disparages slightly the alternative if the following character matches the operand. Here is an example: the 68000 has an instruction to sign-extend a halfword in a data register, and can also sign-extend a value by copying it into an address register. While either kind of register is acceptable, the constraints on an address-register destination are less strict, so it is best if register allocation makes an address register its goal. Therefore, ‘*’ is used so that the ‘d’ constraint letter (for data register) is ignored when computing register preferences. (define_insn "extendhisi2" [(set (match_operand:SI 0 "general_operand" "=*d,a") (sign_extend:SI (match_operand:HI 1 "general_operand" "0,g")))] ...)  File: gccint.info, Node: Machine Constraints, Next: Disable Insn Alternatives, Prev: Modifiers, Up: Constraints 17.9.5 Constraints for Particular Machines ------------------------------------------ Whenever possible, you should use the general-purpose constraint letters in ‘asm’ arguments, since they will convey meaning more readily to people reading your code. Failing that, use the constraint letters that usually have very similar meanings across architectures. The most commonly used constraints are ‘m’ and ‘r’ (for memory and general-purpose registers respectively; *note Simple Constraints::), and ‘I’, usually the letter indicating the most common immediate-constant format. Each architecture defines additional constraints. These constraints are used by the compiler itself for instruction generation, as well as for ‘asm’ statements; therefore, some of the constraints are not particularly useful for ‘asm’. Here is a summary of some of the machine-dependent constraints available on some particular machines; it includes both constraints that are useful for ‘asm’ and constraints that aren't. The compiler source file mentioned in the table heading for each architecture is the definitive reference for the meanings of that architecture's constraints. _AArch64 family--‘config/aarch64/constraints.md’_ ‘k’ The stack pointer register (‘SP’) ‘w’ Floating point register, Advanced SIMD vector register or SVE vector register ‘x’ Like ‘w’, but restricted to registers 0 to 15 inclusive. ‘y’ Like ‘w’, but restricted to registers 0 to 7 inclusive. ‘Upl’ One of the low eight SVE predicate registers (‘P0’ to ‘P7’) ‘Upa’ Any of the SVE predicate registers (‘P0’ to ‘P15’) ‘I’ Integer constant that is valid as an immediate operand in an ‘ADD’ instruction ‘J’ Integer constant that is valid as an immediate operand in a ‘SUB’ instruction (once negated) ‘K’ Integer constant that can be used with a 32-bit logical instruction ‘L’ Integer constant that can be used with a 64-bit logical instruction ‘M’ Integer constant that is valid as an immediate operand in a 32-bit ‘MOV’ pseudo instruction. The ‘MOV’ may be assembled to one of several different machine instructions depending on the value ‘N’ Integer constant that is valid as an immediate operand in a 64-bit ‘MOV’ pseudo instruction ‘S’ An absolute symbolic address or a label reference ‘Y’ Floating point constant zero ‘Z’ Integer constant zero ‘Ush’ The high part (bits 12 and upwards) of the pc-relative address of a symbol within 4GB of the instruction ‘Q’ A memory address which uses a single base register with no offset ‘Ump’ A memory address suitable for a load/store pair instruction in SI, DI, SF and DF modes _AMD GCN --‘config/gcn/constraints.md’_ ‘I’ Immediate integer in the range −16 to 64 ‘J’ Immediate 16-bit signed integer ‘Kf’ Immediate constant −1 ‘L’ Immediate 15-bit unsigned integer ‘A’ Immediate constant that can be inlined in an instruction encoding: integer −16..64, or float 0.0, +/−0.5, +/−1.0, +/−2.0, +/−4.0, 1.0/(2.0*PI) ‘B’ Immediate 32-bit signed integer that can be attached to an instruction encoding ‘C’ Immediate 32-bit integer in range −16..4294967295 (i.e. 32-bit unsigned integer or ‘A’ constraint) ‘DA’ Immediate 64-bit constant that can be split into two ‘A’ constants ‘DB’ Immediate 64-bit constant that can be split into two ‘B’ constants ‘U’ Any ‘unspec’ ‘Y’ Any ‘symbol_ref’ or ‘label_ref’ ‘v’ VGPR register ‘a’ Accelerator VGPR register (CDNA1 onwards) ‘Sg’ SGPR register ‘SD’ SGPR registers valid for instruction destinations, including VCC, M0 and EXEC ‘SS’ SGPR registers valid for instruction sources, including VCC, M0, EXEC and SCC ‘Sm’ SGPR registers valid as a source for scalar memory instructions (excludes M0 and EXEC) ‘Sv’ SGPR registers valid as a source or destination for vector instructions (excludes EXEC) ‘ca’ All condition registers: SCC, VCCZ, EXECZ ‘cs’ Scalar condition register: SCC ‘cV’ Vector condition register: VCC, VCC_LO, VCC_HI ‘e’ EXEC register (EXEC_LO and EXEC_HI) ‘RB’ Memory operand with address space suitable for ‘buffer_*’ instructions ‘RF’ Memory operand with address space suitable for ‘flat_*’ instructions ‘RS’ Memory operand with address space suitable for ‘s_*’ instructions ‘RL’ Memory operand with address space suitable for ‘ds_*’ LDS instructions ‘RG’ Memory operand with address space suitable for ‘ds_*’ GDS instructions ‘RD’ Memory operand with address space suitable for any ‘ds_*’ instructions ‘RM’ Memory operand with address space suitable for ‘global_*’ instructions _ARC --‘config/arc/constraints.md’_ ‘q’ Registers usable in ARCompact 16-bit instructions: ‘r0’-‘r3’, ‘r12’-‘r15’. This constraint can only match when the ‘-mq’ option is in effect. ‘e’ Registers usable as base-regs of memory addresses in ARCompact 16-bit memory instructions: ‘r0’-‘r3’, ‘r12’-‘r15’, ‘sp’. This constraint can only match when the ‘-mq’ option is in effect. ‘D’ ARC FPX (dpfp) 64-bit registers. ‘D0’, ‘D1’. ‘I’ A signed 12-bit integer constant. ‘Cal’ constant for arithmetic/logical operations. This might be any constant that can be put into a long immediate by the assmbler or linker without involving a PIC relocation. ‘K’ A 3-bit unsigned integer constant. ‘L’ A 6-bit unsigned integer constant. ‘CnL’ One's complement of a 6-bit unsigned integer constant. ‘CmL’ Two's complement of a 6-bit unsigned integer constant. ‘M’ A 5-bit unsigned integer constant. ‘O’ A 7-bit unsigned integer constant. ‘P’ A 8-bit unsigned integer constant. ‘H’ Any const_double value. _ARM family--‘config/arm/constraints.md’_ ‘h’ In Thumb state, the core registers ‘r8’-‘r15’. ‘k’ The stack pointer register. ‘l’ In Thumb State the core registers ‘r0’-‘r7’. In ARM state this is an alias for the ‘r’ constraint. ‘t’ VFP floating-point registers ‘s0’-‘s31’. Used for 32 bit values. ‘w’ VFP floating-point registers ‘d0’-‘d31’ and the appropriate subset ‘d0’-‘d15’ based on command line options. Used for 64 bit values only. Not valid for Thumb1. ‘y’ The iWMMX co-processor registers. ‘z’ The iWMMX GR registers. ‘G’ The floating-point constant 0.0 ‘I’ Integer that is valid as an immediate operand in a data processing instruction. That is, an integer in the range 0 to 255 rotated by a multiple of 2 ‘J’ Integer in the range −4095 to 4095 ‘K’ Integer that satisfies constraint ‘I’ when inverted (ones complement) ‘L’ Integer that satisfies constraint ‘I’ when negated (twos complement) ‘M’ Integer in the range 0 to 32 ‘Q’ A memory reference where the exact address is in a single register ('‘m’' is preferable for ‘asm’ statements) ‘R’ An item in the constant pool ‘S’ A symbol in the text segment of the current file ‘Uv’ A memory reference suitable for VFP load/store insns (reg+constant offset) ‘Uy’ A memory reference suitable for iWMMXt load/store instructions. ‘Uq’ A memory reference suitable for the ARMv4 ldrsb instruction. _AVR family--‘config/avr/constraints.md’_ ‘l’ Registers from r0 to r15 ‘a’ Registers from r16 to r23 ‘d’ Registers from r16 to r31 ‘w’ Registers from r24 to r31. These registers can be used in ‘adiw’ command ‘e’ Pointer register (r26-r31) ‘b’ Base pointer register (r28-r31) ‘q’ Stack pointer register (SPH:SPL) ‘t’ Temporary register r0 ‘x’ Register pair X (r27:r26) ‘y’ Register pair Y (r29:r28) ‘z’ Register pair Z (r31:r30) ‘I’ Constant greater than −1, less than 64 ‘J’ Constant greater than −64, less than 1 ‘K’ Constant integer 2 ‘L’ Constant integer 0 ‘M’ Constant that fits in 8 bits ‘N’ Constant integer −1 ‘O’ Constant integer 8, 16, or 24 ‘P’ Constant integer 1 ‘G’ A floating point constant 0.0 ‘Q’ A memory address based on Y or Z pointer with displacement. _Blackfin family--‘config/bfin/constraints.md’_ ‘a’ P register ‘d’ D register ‘z’ A call clobbered P register. ‘qN’ A single register. If N is in the range 0 to 7, the corresponding D register. If it is ‘A’, then the register P0. ‘D’ Even-numbered D register ‘W’ Odd-numbered D register ‘e’ Accumulator register. ‘A’ Even-numbered accumulator register. ‘B’ Odd-numbered accumulator register. ‘b’ I register ‘v’ B register ‘f’ M register ‘c’ Registers used for circular buffering, i.e. I, B, or L registers. ‘C’ The CC register. ‘t’ LT0 or LT1. ‘k’ LC0 or LC1. ‘u’ LB0 or LB1. ‘x’ Any D, P, B, M, I or L register. ‘y’ Additional registers typically used only in prologues and epilogues: RETS, RETN, RETI, RETX, RETE, ASTAT, SEQSTAT and USP. ‘w’ Any register except accumulators or CC. ‘Ksh’ Signed 16 bit integer (in the range −32768 to 32767) ‘Kuh’ Unsigned 16 bit integer (in the range 0 to 65535) ‘Ks7’ Signed 7 bit integer (in the range −64 to 63) ‘Ku7’ Unsigned 7 bit integer (in the range 0 to 127) ‘Ku5’ Unsigned 5 bit integer (in the range 0 to 31) ‘Ks4’ Signed 4 bit integer (in the range −8 to 7) ‘Ks3’ Signed 3 bit integer (in the range −3 to 4) ‘Ku3’ Unsigned 3 bit integer (in the range 0 to 7) ‘PN’ Constant N, where N is a single-digit constant in the range 0 to 4. ‘PA’ An integer equal to one of the MACFLAG_XXX constants that is suitable for use with either accumulator. ‘PB’ An integer equal to one of the MACFLAG_XXX constants that is suitable for use only with accumulator A1. ‘M1’ Constant 255. ‘M2’ Constant 65535. ‘J’ An integer constant with exactly a single bit set. ‘L’ An integer constant with all bits set except exactly one. ‘H’ ‘Q’ Any SYMBOL_REF. _C-SKY--‘config/csky/constraints.md’_ ‘a’ The mini registers r0 - r7. ‘b’ The low registers r0 - r15. ‘c’ C register. ‘y’ HI and LO registers. ‘l’ LO register. ‘h’ HI register. ‘v’ Vector registers. ‘z’ Stack pointer register (SP). ‘Q’ A memory address which uses a base register with a short offset or with a index register with its scale. ‘W’ A memory address which uses a base register with a index register with its scale. The C-SKY back end supports a large set of additional constraints that are only useful for instruction selection or splitting rather than inline asm, such as constraints representing constant integer ranges accepted by particular instruction encodings. Refer to the source code for details. _Epiphany--‘config/epiphany/constraints.md’_ ‘U16’ An unsigned 16-bit constant. ‘K’ An unsigned 5-bit constant. ‘L’ A signed 11-bit constant. ‘Cm1’ A signed 11-bit constant added to −1. Can only match when the ‘-m1reg-REG’ option is active. ‘Cl1’ Left-shift of −1, i.e., a bit mask with a block of leading ones, the rest being a block of trailing zeroes. Can only match when the ‘-m1reg-REG’ option is active. ‘Cr1’ Right-shift of −1, i.e., a bit mask with a trailing block of ones, the rest being zeroes. Or to put it another way, one less than a power of two. Can only match when the ‘-m1reg-REG’ option is active. ‘Cal’ Constant for arithmetic/logical operations. This is like ‘i’, except that for position independent code, no symbols / expressions needing relocations are allowed. ‘Csy’ Symbolic constant for call/jump instruction. ‘Rcs’ The register class usable in short insns. This is a register class constraint, and can thus drive register allocation. This constraint won't match unless ‘-mprefer-short-insn-regs’ is in effect. ‘Rsc’ The register class of registers that can be used to hold a sibcall call address. I.e., a caller-saved register. ‘Rct’ Core control register class. ‘Rgs’ The register group usable in short insns. This constraint does not use a register class, so that it only passively matches suitable registers, and doesn't drive register allocation. ‘Car’ Constant suitable for the addsi3_r pattern. This is a valid offset For byte, halfword, or word addressing. ‘Rra’ Matches the return address if it can be replaced with the link register. ‘Rcc’ Matches the integer condition code register. ‘Sra’ Matches the return address if it is in a stack slot. ‘Cfm’ Matches control register values to switch fp mode, which are encapsulated in ‘UNSPEC_FP_MODE’. _FRV--‘config/frv/frv.h’_ ‘a’ Register in the class ‘ACC_REGS’ (‘acc0’ to ‘acc7’). ‘b’ Register in the class ‘EVEN_ACC_REGS’ (‘acc0’ to ‘acc7’). ‘c’ Register in the class ‘CC_REGS’ (‘fcc0’ to ‘fcc3’ and ‘icc0’ to ‘icc3’). ‘d’ Register in the class ‘GPR_REGS’ (‘gr0’ to ‘gr63’). ‘e’ Register in the class ‘EVEN_REGS’ (‘gr0’ to ‘gr63’). Odd registers are excluded not in the class but through the use of a machine mode larger than 4 bytes. ‘f’ Register in the class ‘FPR_REGS’ (‘fr0’ to ‘fr63’). ‘h’ Register in the class ‘FEVEN_REGS’ (‘fr0’ to ‘fr63’). Odd registers are excluded not in the class but through the use of a machine mode larger than 4 bytes. ‘l’ Register in the class ‘LR_REG’ (the ‘lr’ register). ‘q’ Register in the class ‘QUAD_REGS’ (‘gr2’ to ‘gr63’). Register numbers not divisible by 4 are excluded not in the class but through the use of a machine mode larger than 8 bytes. ‘t’ Register in the class ‘ICC_REGS’ (‘icc0’ to ‘icc3’). ‘u’ Register in the class ‘FCC_REGS’ (‘fcc0’ to ‘fcc3’). ‘v’ Register in the class ‘ICR_REGS’ (‘cc4’ to ‘cc7’). ‘w’ Register in the class ‘FCR_REGS’ (‘cc0’ to ‘cc3’). ‘x’ Register in the class ‘QUAD_FPR_REGS’ (‘fr0’ to ‘fr63’). Register numbers not divisible by 4 are excluded not in the class but through the use of a machine mode larger than 8 bytes. ‘z’ Register in the class ‘SPR_REGS’ (‘lcr’ and ‘lr’). ‘A’ Register in the class ‘QUAD_ACC_REGS’ (‘acc0’ to ‘acc7’). ‘B’ Register in the class ‘ACCG_REGS’ (‘accg0’ to ‘accg7’). ‘C’ Register in the class ‘CR_REGS’ (‘cc0’ to ‘cc7’). ‘G’ Floating point constant zero ‘I’ 6-bit signed integer constant ‘J’ 10-bit signed integer constant ‘L’ 16-bit signed integer constant ‘M’ 16-bit unsigned integer constant ‘N’ 12-bit signed integer constant that is negative--i.e. in the range of −2048 to −1 ‘O’ Constant zero ‘P’ 12-bit signed integer constant that is greater than zero--i.e. in the range of 1 to 2047. _FT32--‘config/ft32/constraints.md’_ ‘A’ An absolute address ‘B’ An offset address ‘W’ A register indirect memory operand ‘e’ An offset address. ‘f’ An offset address. ‘O’ The constant zero or one ‘I’ A 16-bit signed constant (−32768 ... 32767) ‘w’ A bitfield mask suitable for bext or bins ‘x’ An inverted bitfield mask suitable for bext or bins ‘L’ A 16-bit unsigned constant, multiple of 4 (0 ... 65532) ‘S’ A 20-bit signed constant (−524288 ... 524287) ‘b’ A constant for a bitfield width (1 ... 16) ‘KA’ A 10-bit signed constant (−512 ... 511) _Hewlett-Packard PA-RISC--‘config/pa/pa.h’_ ‘a’ General register 1 ‘f’ Floating point register ‘q’ Shift amount register ‘x’ Floating point register (deprecated) ‘y’ Upper floating point register (32-bit), floating point register (64-bit) ‘Z’ Any register ‘I’ Signed 11-bit integer constant ‘J’ Signed 14-bit integer constant ‘K’ Integer constant that can be deposited with a ‘zdepi’ instruction ‘L’ Signed 5-bit integer constant ‘M’ Integer constant 0 ‘N’ Integer constant that can be loaded with a ‘ldil’ instruction ‘O’ Integer constant whose value plus one is a power of 2 ‘P’ Integer constant that can be used for ‘and’ operations in ‘depi’ and ‘extru’ instructions ‘S’ Integer constant 31 ‘U’ Integer constant 63 ‘G’ Floating-point constant 0.0 ‘A’ A ‘lo_sum’ data-linkage-table memory operand ‘Q’ A memory operand that can be used as the destination operand of an integer store instruction ‘R’ A scaled or unscaled indexed memory operand ‘T’ A memory operand for floating-point loads and stores ‘W’ A register indirect memory operand _Intel IA-64--‘config/ia64/ia64.h’_ ‘a’ General register ‘r0’ to ‘r3’ for ‘addl’ instruction ‘b’ Branch register ‘c’ Predicate register (‘c’ as in "conditional") ‘d’ Application register residing in M-unit ‘e’ Application register residing in I-unit ‘f’ Floating-point register ‘m’ Memory operand. If used together with ‘<’ or ‘>’, the operand can have postincrement and postdecrement which require printing with ‘%Pn’ on IA-64. ‘G’ Floating-point constant 0.0 or 1.0 ‘I’ 14-bit signed integer constant ‘J’ 22-bit signed integer constant ‘K’ 8-bit signed integer constant for logical instructions ‘L’ 8-bit adjusted signed integer constant for compare pseudo-ops ‘M’ 6-bit unsigned integer constant for shift counts ‘N’ 9-bit signed integer constant for load and store postincrements ‘O’ The constant zero ‘P’ 0 or −1 for ‘dep’ instruction ‘Q’ Non-volatile memory for floating-point loads and stores ‘R’ Integer constant in the range 1 to 4 for ‘shladd’ instruction ‘S’ Memory operand except postincrement and postdecrement. This is now roughly the same as ‘m’ when not used together with ‘<’ or ‘>’. _M32C--‘config/m32c/m32c.cc’_ ‘Rsp’ ‘Rfb’ ‘Rsb’ ‘$sp’, ‘$fb’, ‘$sb’. ‘Rcr’ Any control register, when they're 16 bits wide (nothing if control registers are 24 bits wide) ‘Rcl’ Any control register, when they're 24 bits wide. ‘R0w’ ‘R1w’ ‘R2w’ ‘R3w’ $r0, $r1, $r2, $r3. ‘R02’ $r0 or $r2, or $r2r0 for 32 bit values. ‘R13’ $r1 or $r3, or $r3r1 for 32 bit values. ‘Rdi’ A register that can hold a 64 bit value. ‘Rhl’ $r0 or $r1 (registers with addressable high/low bytes) ‘R23’ $r2 or $r3 ‘Raa’ Address registers ‘Raw’ Address registers when they're 16 bits wide. ‘Ral’ Address registers when they're 24 bits wide. ‘Rqi’ Registers that can hold QI values. ‘Rad’ Registers that can be used with displacements ($a0, $a1, $sb). ‘Rsi’ Registers that can hold 32 bit values. ‘Rhi’ Registers that can hold 16 bit values. ‘Rhc’ Registers chat can hold 16 bit values, including all control registers. ‘Rra’ $r0 through R1, plus $a0 and $a1. ‘Rfl’ The flags register. ‘Rmm’ The memory-based pseudo-registers $mem0 through $mem15. ‘Rpi’ Registers that can hold pointers (16 bit registers for r8c, m16c; 24 bit registers for m32cm, m32c). ‘Rpa’ Matches multiple registers in a PARALLEL to form a larger register. Used to match function return values. ‘Is3’ −8 ... 7 ‘IS1’ −128 ... 127 ‘IS2’ −32768 ... 32767 ‘IU2’ 0 ... 65535 ‘In4’ −8 ... −1 or 1 ... 8 ‘In5’ −16 ... −1 or 1 ... 16 ‘In6’ −32 ... −1 or 1 ... 32 ‘IM2’ −65536 ... −1 ‘Ilb’ An 8 bit value with exactly one bit set. ‘Ilw’ A 16 bit value with exactly one bit set. ‘Sd’ The common src/dest memory addressing modes. ‘Sa’ Memory addressed using $a0 or $a1. ‘Si’ Memory addressed with immediate addresses. ‘Ss’ Memory addressed using the stack pointer ($sp). ‘Sf’ Memory addressed using the frame base register ($fb). ‘Ss’ Memory addressed using the small base register ($sb). ‘S1’ $r1h _LoongArch--‘config/loongarch/constraints.md’_ ‘f’ A floating-point or vector register (if available). ‘k’ A memory operand whose address is formed by a base register and (optionally scaled) index register. ‘l’ A signed 16-bit constant. ‘m’ A memory operand whose address is formed by a base register and offset that is suitable for use in instructions with the same addressing mode as ‘st.w’ and ‘ld.w’. ‘I’ A signed 12-bit constant (for arithmetic instructions). ‘K’ An unsigned 12-bit constant (for logic instructions). ‘M’ A constant that cannot be loaded using ‘lui’, ‘addiu’ or ‘ori’. ‘N’ A constant in the range -65535 to -1 (inclusive). ‘O’ A signed 15-bit constant. ‘P’ A constant in the range 1 to 65535 (inclusive). ‘R’ An address that can be used in a non-macro load or store. ‘ZB’ An address that is held in a general-purpose register. The offset is zero. ‘ZC’ A memory operand whose address is formed by a base register and offset that is suitable for use in instructions with the same addressing mode as ‘ll.w’ and ‘sc.w’. _MicroBlaze--‘config/microblaze/constraints.md’_ ‘d’ A general register (‘r0’ to ‘r31’). ‘z’ A status register (‘rmsr’, ‘$fcc1’ to ‘$fcc7’). _MIPS--‘config/mips/constraints.md’_ ‘d’ A general-purpose register. This is equivalent to ‘r’ unless generating MIPS16 code, in which case the MIPS16 register set is used. ‘f’ A floating-point register (if available). ‘h’ Formerly the ‘hi’ register. This constraint is no longer supported. ‘l’ The ‘lo’ register. Use this register to store values that are no bigger than a word. ‘x’ The concatenated ‘hi’ and ‘lo’ registers. Use this register to store doubleword values. ‘c’ A register suitable for use in an indirect jump. This will always be ‘$25’ for ‘-mabicalls’. ‘v’ Register ‘$3’. Do not use this constraint in new code; it is retained only for compatibility with glibc. ‘y’ Equivalent to ‘r’; retained for backwards compatibility. ‘z’ A floating-point condition code register. ‘I’ A signed 16-bit constant (for arithmetic instructions). ‘J’ Integer zero. ‘K’ An unsigned 16-bit constant (for logic instructions). ‘L’ A signed 32-bit constant in which the lower 16 bits are zero. Such constants can be loaded using ‘lui’. ‘M’ A constant that cannot be loaded using ‘lui’, ‘addiu’ or ‘ori’. ‘N’ A constant in the range −65535 to −1 (inclusive). ‘O’ A signed 15-bit constant. ‘P’ A constant in the range 1 to 65535 (inclusive). ‘G’ Floating-point zero. ‘R’ An address that can be used in a non-macro load or store. ‘ZC’ A memory operand whose address is formed by a base register and offset that is suitable for use in instructions with the same addressing mode as ‘ll’ and ‘sc’. ‘ZD’ An address suitable for a ‘prefetch’ instruction, or for any other instruction with the same addressing mode as ‘prefetch’. _Motorola 680x0--‘config/m68k/constraints.md’_ ‘a’ Address register ‘d’ Data register ‘f’ 68881 floating-point register, if available ‘I’ Integer in the range 1 to 8 ‘J’ 16-bit signed number ‘K’ Signed number whose magnitude is greater than 0x80 ‘L’ Integer in the range −8 to −1 ‘M’ Signed number whose magnitude is greater than 0x100 ‘N’ Range 24 to 31, rotatert:SI 8 to 1 expressed as rotate ‘O’ 16 (for rotate using swap) ‘P’ Range 8 to 15, rotatert:HI 8 to 1 expressed as rotate ‘R’ Numbers that mov3q can handle ‘G’ Floating point constant that is not a 68881 constant ‘S’ Operands that satisfy 'm' when -mpcrel is in effect ‘T’ Operands that satisfy 's' when -mpcrel is not in effect ‘Q’ Address register indirect addressing mode ‘U’ Register offset addressing ‘W’ const_call_operand ‘Cs’ symbol_ref or const ‘Ci’ const_int ‘C0’ const_int 0 ‘Cj’ Range of signed numbers that don't fit in 16 bits ‘Cmvq’ Integers valid for mvq ‘Capsw’ Integers valid for a moveq followed by a swap ‘Cmvz’ Integers valid for mvz ‘Cmvs’ Integers valid for mvs ‘Ap’ push_operand ‘Ac’ Non-register operands allowed in clr _Moxie--‘config/moxie/constraints.md’_ ‘A’ An absolute address ‘B’ An offset address ‘W’ A register indirect memory operand ‘I’ A constant in the range of 0 to 255. ‘N’ A constant in the range of 0 to −255. _MSP430-‘config/msp430/constraints.md’_ ‘R12’ Register R12. ‘R13’ Register R13. ‘K’ Integer constant 1. ‘L’ Integer constant -1^20..1^19. ‘M’ Integer constant 1-4. ‘Ya’ Memory references which do not require an extended MOVX instruction. ‘Yl’ Memory reference, labels only. ‘Ys’ Memory reference, stack only. _NDS32--‘config/nds32/constraints.md’_ ‘w’ LOW register class $r0 to $r7 constraint for V3/V3M ISA. ‘l’ LOW register class $r0 to $r7. ‘d’ MIDDLE register class $r0 to $r11, $r16 to $r19. ‘h’ HIGH register class $r12 to $r14, $r20 to $r31. ‘t’ Temporary assist register $ta (i.e. $r15). ‘k’ Stack register $sp. ‘Iu03’ Unsigned immediate 3-bit value. ‘In03’ Negative immediate 3-bit value in the range of −7-0. ‘Iu04’ Unsigned immediate 4-bit value. ‘Is05’ Signed immediate 5-bit value. ‘Iu05’ Unsigned immediate 5-bit value. ‘In05’ Negative immediate 5-bit value in the range of −31-0. ‘Ip05’ Unsigned immediate 5-bit value for movpi45 instruction with range 16-47. ‘Iu06’ Unsigned immediate 6-bit value constraint for addri36.sp instruction. ‘Iu08’ Unsigned immediate 8-bit value. ‘Iu09’ Unsigned immediate 9-bit value. ‘Is10’ Signed immediate 10-bit value. ‘Is11’ Signed immediate 11-bit value. ‘Is15’ Signed immediate 15-bit value. ‘Iu15’ Unsigned immediate 15-bit value. ‘Ic15’ A constant which is not in the range of imm15u but ok for bclr instruction. ‘Ie15’ A constant which is not in the range of imm15u but ok for bset instruction. ‘It15’ A constant which is not in the range of imm15u but ok for btgl instruction. ‘Ii15’ A constant whose compliment value is in the range of imm15u and ok for bitci instruction. ‘Is16’ Signed immediate 16-bit value. ‘Is17’ Signed immediate 17-bit value. ‘Is19’ Signed immediate 19-bit value. ‘Is20’ Signed immediate 20-bit value. ‘Ihig’ The immediate value that can be simply set high 20-bit. ‘Izeb’ The immediate value 0xff. ‘Izeh’ The immediate value 0xffff. ‘Ixls’ The immediate value 0x01. ‘Ix11’ The immediate value 0x7ff. ‘Ibms’ The immediate value with power of 2. ‘Ifex’ The immediate value with power of 2 minus 1. ‘U33’ Memory constraint for 333 format. ‘U45’ Memory constraint for 45 format. ‘U37’ Memory constraint for 37 format. _Nios II family--‘config/nios2/constraints.md’_ ‘I’ Integer that is valid as an immediate operand in an instruction taking a signed 16-bit number. Range −32768 to 32767. ‘J’ Integer that is valid as an immediate operand in an instruction taking an unsigned 16-bit number. Range 0 to 65535. ‘K’ Integer that is valid as an immediate operand in an instruction taking only the upper 16-bits of a 32-bit number. Range 32-bit numbers with the lower 16-bits being 0. ‘L’ Integer that is valid as an immediate operand for a shift instruction. Range 0 to 31. ‘M’ Integer that is valid as an immediate operand for only the value 0. Can be used in conjunction with the format modifier ‘z’ to use ‘r0’ instead of ‘0’ in the assembly output. ‘N’ Integer that is valid as an immediate operand for a custom instruction opcode. Range 0 to 255. ‘P’ An immediate operand for R2 andchi/andci instructions. ‘S’ Matches immediates which are addresses in the small data section and therefore can be added to ‘gp’ as a 16-bit immediate to re-create their 32-bit value. ‘U’ Matches constants suitable as an operand for the rdprs and cache instructions. ‘v’ A memory operand suitable for Nios II R2 load/store exclusive instructions. ‘w’ A memory operand suitable for load/store IO and cache instructions. ‘T’ A ‘const’ wrapped ‘UNSPEC’ expression, representing a supported PIC or TLS relocation. _OpenRISC--‘config/or1k/constraints.md’_ ‘I’ Integer that is valid as an immediate operand in an instruction taking a signed 16-bit number. Range −32768 to 32767. ‘K’ Integer that is valid as an immediate operand in an instruction taking an unsigned 16-bit number. Range 0 to 65535. ‘M’ Signed 16-bit constant shifted left 16 bits. (Used with ‘l.movhi’) ‘O’ Zero ‘c’ Register usable for sibcalls. _PDP-11--‘config/pdp11/constraints.md’_ ‘a’ Floating point registers AC0 through AC3. These can be loaded from/to memory with a single instruction. ‘d’ Odd numbered general registers (R1, R3, R5). These are used for 16-bit multiply operations. ‘D’ A memory reference that is encoded within the opcode, but not auto-increment or auto-decrement. ‘f’ Any of the floating point registers (AC0 through AC5). ‘G’ Floating point constant 0. ‘h’ Floating point registers AC4 and AC5. These cannot be loaded from/to memory with a single instruction. ‘I’ An integer constant that fits in 16 bits. ‘J’ An integer constant whose low order 16 bits are zero. ‘K’ An integer constant that does not meet the constraints for codes ‘I’ or ‘J’. ‘L’ The integer constant 1. ‘M’ The integer constant −1. ‘N’ The integer constant 0. ‘O’ Integer constants 0 through 3; shifts by these amounts are handled as multiple single-bit shifts rather than a single variable-length shift. ‘Q’ A memory reference which requires an additional word (address or offset) after the opcode. ‘R’ A memory reference that is encoded within the opcode. _PowerPC and IBM RS6000--‘config/rs6000/constraints.md’_ ‘r’ A general purpose register (GPR), ‘r0’...‘r31’. ‘b’ A base register. Like ‘r’, but ‘r0’ is not allowed, so ‘r1’...‘r31’. ‘f’ A floating point register (FPR), ‘f0’...‘f31’. ‘d’ A floating point register. This is the same as ‘f’ nowadays; historically ‘f’ was for single-precision and ‘d’ was for double-precision floating point. ‘v’ An Altivec vector register (VR), ‘v0’...‘v31’. ‘wa’ A VSX register (VSR), ‘vs0’...‘vs63’. This is either an FPR (‘vs0’...‘vs31’ are ‘f0’...‘f31’) or a VR (‘vs32’...‘vs63’ are ‘v0’...‘v31’). When using ‘wa’, you should use the ‘%x’ output modifier, so that the correct register number is printed. For example: asm ("xvadddp %x0,%x1,%x2" : "=wa" (v1) : "wa" (v2), "wa" (v3)); You should not use ‘%x’ for ‘v’ operands: asm ("xsaddqp %0,%1,%2" : "=v" (v1) : "v" (v2), "v" (v3)); ‘h’ A special register (‘vrsave’, ‘ctr’, or ‘lr’). ‘c’ The count register, ‘ctr’. ‘l’ The link register, ‘lr’. ‘x’ Condition register field 0, ‘cr0’. ‘y’ Any condition register field, ‘cr0’...‘cr7’. ‘z’ The carry bit, ‘XER[CA]’. ‘we’ Like ‘wa’, if ‘-mpower9-vector’ and ‘-m64’ are used; otherwise, ‘NO_REGS’. ‘wn’ No register (‘NO_REGS’). ‘wr’ Like ‘r’, if ‘-mpowerpc64’ is used; otherwise, ‘NO_REGS’. ‘wx’ Like ‘d’, if ‘-mpowerpc-gfxopt’ is used; otherwise, ‘NO_REGS’. ‘wA’ Like ‘b’, if ‘-mpowerpc64’ is used; otherwise, ‘NO_REGS’. ‘wB’ Signed 5-bit constant integer that can be loaded into an Altivec register. ‘wE’ Vector constant that can be loaded with the XXSPLTIB instruction. ‘wF’ Memory operand suitable for power8 GPR load fusion. ‘wL’ Int constant that is the element number mfvsrld accesses in a vector. ‘wM’ Match vector constant with all 1's if the XXLORC instruction is available. ‘wO’ Memory operand suitable for the ISA 3.0 vector d-form instructions. ‘wQ’ Memory operand suitable for the load/store quad instructions. ‘wS’ Vector constant that can be loaded with XXSPLTIB & sign extension. ‘wY’ A memory operand for a DS-form instruction. ‘wZ’ An indexed or indirect memory operand, ignoring the bottom 4 bits. ‘I’ A signed 16-bit constant. ‘J’ An unsigned 16-bit constant shifted left 16 bits (use ‘L’ instead for ‘SImode’ constants). ‘K’ An unsigned 16-bit constant. ‘L’ A signed 16-bit constant shifted left 16 bits. ‘M’ An integer constant greater than 31. ‘N’ An exact power of 2. ‘O’ The integer constant zero. ‘P’ A constant whose negation is a signed 16-bit constant. ‘eI’ A signed 34-bit integer constant if prefixed instructions are supported. ‘eP’ A scalar floating point constant or a vector constant that can be loaded to a VSX register with one prefixed instruction. ‘eQ’ An IEEE 128-bit constant that can be loaded into a VSX register with the ‘lxvkq’ instruction. ‘G’ A floating point constant that can be loaded into a register with one instruction per word. ‘H’ A floating point constant that can be loaded into a register using three instructions. ‘m’ A memory operand. Normally, ‘m’ does not allow addresses that update the base register. If the ‘<’ or ‘>’ constraint is also used, they are allowed and therefore on PowerPC targets in that case it is only safe to use ‘m<>’ in an ‘asm’ statement if that ‘asm’ statement accesses the operand exactly once. The ‘asm’ statement must also use ‘%U’ as a placeholder for the "update" flag in the corresponding load or store instruction. For example: asm ("st%U0 %1,%0" : "=m<>" (mem) : "r" (val)); is correct but: asm ("st %1,%0" : "=m<>" (mem) : "r" (val)); is not. ‘es’ A "stable" memory operand; that is, one which does not include any automodification of the base register. This used to be useful when ‘m’ allowed automodification of the base register, but as those are now only allowed when ‘<’ or ‘>’ is used, ‘es’ is basically the same as ‘m’ without ‘<’ and ‘>’. ‘Q’ A memory operand addressed by just a base register. ‘Y’ A memory operand for a DQ-form instruction. ‘Z’ A memory operand accessed with indexed or indirect addressing. ‘R’ An AIX TOC entry. ‘a’ An indexed or indirect address. ‘U’ A V.4 small data reference. ‘W’ A vector constant that does not require memory. ‘j’ The zero vector constant. _PRU--‘config/pru/constraints.md’_ ‘I’ An unsigned 8-bit integer constant. ‘J’ An unsigned 16-bit integer constant. ‘L’ An unsigned 5-bit integer constant (for shift counts). ‘T’ A text segment (program memory) constant label. ‘Z’ Integer constant zero. _RL78--‘config/rl78/constraints.md’_ ‘Int3’ An integer constant in the range 1 ... 7. ‘Int8’ An integer constant in the range 0 ... 255. ‘J’ An integer constant in the range −255 ... 0 ‘K’ The integer constant 1. ‘L’ The integer constant -1. ‘M’ The integer constant 0. ‘N’ The integer constant 2. ‘O’ The integer constant -2. ‘P’ An integer constant in the range 1 ... 15. ‘Qbi’ The built-in compare types-eq, ne, gtu, ltu, geu, and leu. ‘Qsc’ The synthetic compare types-gt, lt, ge, and le. ‘Wab’ A memory reference with an absolute address. ‘Wbc’ A memory reference using ‘BC’ as a base register, with an optional offset. ‘Wca’ A memory reference using ‘AX’, ‘BC’, ‘DE’, or ‘HL’ for the address, for calls. ‘Wcv’ A memory reference using any 16-bit register pair for the address, for calls. ‘Wd2’ A memory reference using ‘DE’ as a base register, with an optional offset. ‘Wde’ A memory reference using ‘DE’ as a base register, without any offset. ‘Wfr’ Any memory reference to an address in the far address space. ‘Wh1’ A memory reference using ‘HL’ as a base register, with an optional one-byte offset. ‘Whb’ A memory reference using ‘HL’ as a base register, with ‘B’ or ‘C’ as the index register. ‘Whl’ A memory reference using ‘HL’ as a base register, without any offset. ‘Ws1’ A memory reference using ‘SP’ as a base register, with an optional one-byte offset. ‘Y’ Any memory reference to an address in the near address space. ‘A’ The ‘AX’ register. ‘B’ The ‘BC’ register. ‘D’ The ‘DE’ register. ‘R’ ‘A’ through ‘L’ registers. ‘S’ The ‘SP’ register. ‘T’ The ‘HL’ register. ‘Z08W’ The 16-bit ‘R8’ register. ‘Z10W’ The 16-bit ‘R10’ register. ‘Zint’ The registers reserved for interrupts (‘R24’ to ‘R31’). ‘a’ The ‘A’ register. ‘b’ The ‘B’ register. ‘c’ The ‘C’ register. ‘d’ The ‘D’ register. ‘e’ The ‘E’ register. ‘h’ The ‘H’ register. ‘l’ The ‘L’ register. ‘v’ The virtual registers. ‘w’ The ‘PSW’ register. ‘x’ The ‘X’ register. _RISC-V--‘config/riscv/constraints.md’_ ‘f’ A floating-point register (if available). ‘I’ An I-type 12-bit signed immediate. ‘J’ Integer zero. ‘K’ A 5-bit unsigned immediate for CSR access instructions. ‘A’ An address that is held in a general-purpose register. ‘S’ A constraint that matches an absolute symbolic address. ‘vr’ A vector register (if available).. ‘vd’ A vector register, excluding v0 (if available). ‘vm’ A vector register, only v0 (if available). _RX--‘config/rx/constraints.md’_ ‘Q’ An address which does not involve register indirect addressing or pre/post increment/decrement addressing. ‘Symbol’ A symbol reference. ‘Int08’ A constant in the range −256 to 255, inclusive. ‘Sint08’ A constant in the range −128 to 127, inclusive. ‘Sint16’ A constant in the range −32768 to 32767, inclusive. ‘Sint24’ A constant in the range −8388608 to 8388607, inclusive. ‘Uint04’ A constant in the range 0 to 15, inclusive. _S/390 and zSeries--‘config/s390/s390.h’_ ‘a’ Address register (general purpose register except r0) ‘c’ Condition code register ‘d’ Data register (arbitrary general purpose register) ‘f’ Floating-point register ‘I’ Unsigned 8-bit constant (0-255) ‘J’ Unsigned 12-bit constant (0-4095) ‘K’ Signed 16-bit constant (−32768-32767) ‘L’ Value appropriate as displacement. ‘(0..4095)’ for short displacement ‘(−524288..524287)’ for long displacement ‘M’ Constant integer with a value of 0x7fffffff. ‘N’ Multiple letter constraint followed by 4 parameter letters. ‘0..9:’ number of the part counting from most to least significant ‘H,Q:’ mode of the part ‘D,S,H:’ mode of the containing operand ‘0,F:’ value of the other parts (F--all bits set) The constraint matches if the specified part of a constant has a value different from its other parts. ‘Q’ Memory reference without index register and with short displacement. ‘R’ Memory reference with index register and short displacement. ‘S’ Memory reference without index register but with long displacement. ‘T’ Memory reference with index register and long displacement. ‘U’ Pointer with short displacement. ‘W’ Pointer with long displacement. ‘Y’ Shift count operand. _SPARC--‘config/sparc/sparc.h’_ ‘f’ Floating-point register on the SPARC-V8 architecture and lower floating-point register on the SPARC-V9 architecture. ‘e’ Floating-point register. It is equivalent to ‘f’ on the SPARC-V8 architecture and contains both lower and upper floating-point registers on the SPARC-V9 architecture. ‘c’ Floating-point condition code register. ‘d’ Lower floating-point register. It is only valid on the SPARC-V9 architecture when the Visual Instruction Set is available. ‘b’ Floating-point register. It is only valid on the SPARC-V9 architecture when the Visual Instruction Set is available. ‘h’ 64-bit global or out register for the SPARC-V8+ architecture. ‘C’ The constant all-ones, for floating-point. ‘A’ Signed 5-bit constant ‘D’ A vector constant ‘I’ Signed 13-bit constant ‘J’ Zero ‘K’ 32-bit constant with the low 12 bits clear (a constant that can be loaded with the ‘sethi’ instruction) ‘L’ A constant in the range supported by ‘movcc’ instructions (11-bit signed immediate) ‘M’ A constant in the range supported by ‘movrcc’ instructions (10-bit signed immediate) ‘N’ Same as ‘K’, except that it verifies that bits that are not in the lower 32-bit range are all zero. Must be used instead of ‘K’ for modes wider than ‘SImode’ ‘O’ The constant 4096 ‘G’ Floating-point zero ‘H’ Signed 13-bit constant, sign-extended to 32 or 64 bits ‘P’ The constant -1 ‘Q’ Floating-point constant whose integral representation can be moved into an integer register using a single sethi instruction ‘R’ Floating-point constant whose integral representation can be moved into an integer register using a single mov instruction ‘S’ Floating-point constant whose integral representation can be moved into an integer register using a high/lo_sum instruction sequence ‘T’ Memory address aligned to an 8-byte boundary ‘U’ Even register ‘W’ Memory address for ‘e’ constraint registers ‘w’ Memory address with only a base register ‘Y’ Vector zero _TI C6X family--‘config/c6x/constraints.md’_ ‘a’ Register file A (A0-A31). ‘b’ Register file B (B0-B31). ‘A’ Predicate registers in register file A (A0-A2 on C64X and higher, A1 and A2 otherwise). ‘B’ Predicate registers in register file B (B0-B2). ‘C’ A call-used register in register file B (B0-B9, B16-B31). ‘Da’ Register file A, excluding predicate registers (A3-A31, plus A0 if not C64X or higher). ‘Db’ Register file B, excluding predicate registers (B3-B31). ‘Iu4’ Integer constant in the range 0 ... 15. ‘Iu5’ Integer constant in the range 0 ... 31. ‘In5’ Integer constant in the range −31 ... 0. ‘Is5’ Integer constant in the range −16 ... 15. ‘I5x’ Integer constant that can be the operand of an ADDA or a SUBA insn. ‘IuB’ Integer constant in the range 0 ... 65535. ‘IsB’ Integer constant in the range −32768 ... 32767. ‘IsC’ Integer constant in the range -2^{20} ... 2^{20} - 1. ‘Jc’ Integer constant that is a valid mask for the clr instruction. ‘Js’ Integer constant that is a valid mask for the set instruction. ‘Q’ Memory location with A base register. ‘R’ Memory location with B base register. ‘S0’ On C64x+ targets, a GP-relative small data reference. ‘S1’ Any kind of ‘SYMBOL_REF’, for use in a call address. ‘Si’ Any kind of immediate operand, unless it matches the S0 constraint. ‘T’ Memory location with B base register, but not using a long offset. ‘W’ A memory operand with an address that cannot be used in an unaligned access. ‘Z’ Register B14 (aka DP). _Visium--‘config/visium/constraints.md’_ ‘b’ EAM register ‘mdb’ ‘c’ EAM register ‘mdc’ ‘f’ Floating point register ‘k’ Register for sibcall optimization ‘l’ General register, but not ‘r29’, ‘r30’ and ‘r31’ ‘t’ Register ‘r1’ ‘u’ Register ‘r2’ ‘v’ Register ‘r3’ ‘G’ Floating-point constant 0.0 ‘J’ Integer constant in the range 0 .. 65535 (16-bit immediate) ‘K’ Integer constant in the range 1 .. 31 (5-bit immediate) ‘L’ Integer constant in the range −65535 .. −1 (16-bit negative immediate) ‘M’ Integer constant −1 ‘O’ Integer constant 0 ‘P’ Integer constant 32 _x86 family--‘config/i386/constraints.md’_ ‘R’ Legacy register--the eight integer registers available on all i386 processors (‘a’, ‘b’, ‘c’, ‘d’, ‘si’, ‘di’, ‘bp’, ‘sp’). ‘q’ Any register accessible as ‘Rl’. In 32-bit mode, ‘a’, ‘b’, ‘c’, and ‘d’; in 64-bit mode, any integer register. ‘Q’ Any register accessible as ‘Rh’: ‘a’, ‘b’, ‘c’, and ‘d’. ‘l’ Any register that can be used as the index in a base+index memory access: that is, any general register except the stack pointer. ‘a’ The ‘a’ register. ‘b’ The ‘b’ register. ‘c’ The ‘c’ register. ‘d’ The ‘d’ register. ‘S’ The ‘si’ register. ‘D’ The ‘di’ register. ‘A’ The ‘a’ and ‘d’ registers. This class is used for instructions that return double word results in the ‘ax:dx’ register pair. Single word values will be allocated either in ‘ax’ or ‘dx’. For example on i386 the following implements ‘rdtsc’: unsigned long long rdtsc (void) { unsigned long long tick; __asm__ __volatile__("rdtsc":"=A"(tick)); return tick; } This is not correct on x86-64 as it would allocate tick in either ‘ax’ or ‘dx’. You have to use the following variant instead: unsigned long long rdtsc (void) { unsigned int tickl, tickh; __asm__ __volatile__("rdtsc":"=a"(tickl),"=d"(tickh)); return ((unsigned long long)tickh << 32)|tickl; } ‘U’ The call-clobbered integer registers. ‘f’ Any 80387 floating-point (stack) register. ‘t’ Top of 80387 floating-point stack (‘%st(0)’). ‘u’ Second from top of 80387 floating-point stack (‘%st(1)’). ‘Yk’ Any mask register that can be used as a predicate, i.e. ‘k1-k7’. ‘k’ Any mask register. ‘y’ Any MMX register. ‘x’ Any SSE register. ‘v’ Any EVEX encodable SSE register (‘%xmm0-%xmm31’). ‘w’ Any bound register. ‘Yz’ First SSE register (‘%xmm0’). ‘Yi’ Any SSE register, when SSE2 and inter-unit moves are enabled. ‘Yj’ Any SSE register, when SSE2 and inter-unit moves from vector registers are enabled. ‘Ym’ Any MMX register, when inter-unit moves are enabled. ‘Yn’ Any MMX register, when inter-unit moves from vector registers are enabled. ‘Yp’ Any integer register when ‘TARGET_PARTIAL_REG_STALL’ is disabled. ‘Ya’ Any integer register when zero extensions with ‘AND’ are disabled. ‘Yb’ Any register that can be used as the GOT base when calling ‘___tls_get_addr’: that is, any general register except ‘a’ and ‘sp’ registers, for ‘-fno-plt’ if linker supports it. Otherwise, ‘b’ register. ‘Yf’ Any x87 register when 80387 floating-point arithmetic is enabled. ‘Yr’ Lower SSE register when avoiding REX prefix and all SSE registers otherwise. ‘Yv’ For AVX512VL, any EVEX-encodable SSE register (‘%xmm0-%xmm31’), otherwise any SSE register. ‘Yh’ Any EVEX-encodable SSE register, that has number factor of four. ‘Bf’ Flags register operand. ‘Bg’ GOT memory operand. ‘Bm’ Vector memory operand. ‘Bc’ Constant memory operand. ‘Bn’ Memory operand without REX prefix. ‘Bs’ Sibcall memory operand. ‘Bw’ Call memory operand. ‘Bz’ Constant call address operand. ‘BC’ SSE constant -1 operand. ‘I’ Integer constant in the range 0 ... 31, for 32-bit shifts. ‘J’ Integer constant in the range 0 ... 63, for 64-bit shifts. ‘K’ Signed 8-bit integer constant. ‘L’ ‘0xFF’ or ‘0xFFFF’, for andsi as a zero-extending move. ‘M’ 0, 1, 2, or 3 (shifts for the ‘lea’ instruction). ‘N’ Unsigned 8-bit integer constant (for ‘in’ and ‘out’ instructions). ‘O’ Integer constant in the range 0 ... 127, for 128-bit shifts. ‘G’ Standard 80387 floating point constant. ‘C’ SSE constant zero operand. ‘e’ 32-bit signed integer constant, or a symbolic reference known to fit that range (for immediate operands in sign-extending x86-64 instructions). ‘We’ 32-bit signed integer constant, or a symbolic reference known to fit that range (for sign-extending conversion operations that require non-‘VOIDmode’ immediate operands). ‘Wz’ 32-bit unsigned integer constant, or a symbolic reference known to fit that range (for zero-extending conversion operations that require non-‘VOIDmode’ immediate operands). ‘Wd’ 128-bit integer constant where both the high and low 64-bit word satisfy the ‘e’ constraint. ‘Ws’ A symbolic reference or label reference. You can use the ‘%p’ modifier to print the raw symbol. ‘Z’ 32-bit unsigned integer constant, or a symbolic reference known to fit that range (for immediate operands in zero-extending x86-64 instructions). ‘Tv’ VSIB address operand. ‘Ts’ Address operand without segment register. _Xstormy16--‘config/stormy16/stormy16.h’_ ‘a’ Register r0. ‘b’ Register r1. ‘c’ Register r2. ‘d’ Register r8. ‘e’ Registers r0 through r7. ‘t’ Registers r0 and r1. ‘y’ The carry register. ‘z’ Registers r8 and r9. ‘I’ A constant between 0 and 3 inclusive. ‘J’ A constant that has exactly one bit set. ‘K’ A constant that has exactly one bit clear. ‘L’ A constant between 0 and 255 inclusive. ‘M’ A constant between −255 and 0 inclusive. ‘N’ A constant between −3 and 0 inclusive. ‘O’ A constant between 1 and 4 inclusive. ‘P’ A constant between −4 and −1 inclusive. ‘Q’ A memory reference that is a stack push. ‘R’ A memory reference that is a stack pop. ‘S’ A memory reference that refers to a constant address of known value. ‘T’ The register indicated by Rx (not implemented yet). ‘U’ A constant that is not between 2 and 15 inclusive. ‘Z’ The constant 0. _Xtensa--‘config/xtensa/constraints.md’_ ‘a’ General-purpose 32-bit register ‘b’ One-bit boolean register ‘A’ MAC16 40-bit accumulator register ‘I’ Signed 12-bit integer constant, for use in MOVI instructions ‘J’ Signed 8-bit integer constant, for use in ADDI instructions ‘K’ Integer constant valid for BccI instructions ‘L’ Unsigned constant valid for BccUI instructions  File: gccint.info, Node: Disable Insn Alternatives, Next: Define Constraints, Prev: Machine Constraints, Up: Constraints 17.9.6 Disable insn alternatives using the ‘enabled’ attribute -------------------------------------------------------------- There are three insn attributes that may be used to selectively disable instruction alternatives: ‘enabled’ Says whether an alternative is available on the current subtarget. ‘preferred_for_size’ Says whether an enabled alternative should be used in code that is optimized for size. ‘preferred_for_speed’ Says whether an enabled alternative should be used in code that is optimized for speed. All these attributes should use ‘(const_int 1)’ to allow an alternative or ‘(const_int 0)’ to disallow it. The attributes must be a static property of the subtarget; they cannot for example depend on the current operands, on the current optimization level, on the location of the insn within the body of a loop, on whether register allocation has finished, or on the current compiler pass. The ‘enabled’ attribute is a correctness property. It tells GCC to act as though the disabled alternatives were never defined in the first place. This is useful when adding new instructions to an existing pattern in cases where the new instructions are only available for certain cpu architecture levels (typically mapped to the ‘-march=’ command-line option). In contrast, the ‘preferred_for_size’ and ‘preferred_for_speed’ attributes are strong optimization hints rather than correctness properties. ‘preferred_for_size’ tells GCC which alternatives to consider when adding or modifying an instruction that GCC wants to optimize for size. ‘preferred_for_speed’ does the same thing for speed. Note that things like code motion can lead to cases where code optimized for size uses alternatives that are not preferred for size, and similarly for speed. Although ‘define_insn’s can in principle specify the ‘enabled’ attribute directly, it is often clearer to have subsiduary attributes for each architectural feature of interest. The ‘define_insn’s can then use these subsiduary attributes to say which alternatives require which features. The example below does this for ‘cpu_facility’. E.g. the following two patterns could easily be merged using the ‘enabled’ attribute: (define_insn "*movdi_old" [(set (match_operand:DI 0 "register_operand" "=d") (match_operand:DI 1 "register_operand" " d"))] "!TARGET_NEW" "lgr %0,%1") (define_insn "*movdi_new" [(set (match_operand:DI 0 "register_operand" "=d,f,d") (match_operand:DI 1 "register_operand" " d,d,f"))] "TARGET_NEW" "@ lgr %0,%1 ldgr %0,%1 lgdr %0,%1") to: (define_insn "*movdi_combined" [(set (match_operand:DI 0 "register_operand" "=d,f,d") (match_operand:DI 1 "register_operand" " d,d,f"))] "" "@ lgr %0,%1 ldgr %0,%1 lgdr %0,%1" [(set_attr "cpu_facility" "*,new,new")]) with the ‘enabled’ attribute defined like this: (define_attr "cpu_facility" "standard,new" (const_string "standard")) (define_attr "enabled" "" (cond [(eq_attr "cpu_facility" "standard") (const_int 1) (and (eq_attr "cpu_facility" "new") (ne (symbol_ref "TARGET_NEW") (const_int 0))) (const_int 1)] (const_int 0)))  File: gccint.info, Node: Define Constraints, Next: C Constraint Interface, Prev: Disable Insn Alternatives, Up: Constraints 17.9.7 Defining Machine-Specific Constraints -------------------------------------------- Machine-specific constraints fall into two categories: register and non-register constraints. Within the latter category, constraints which allow subsets of all possible memory or address operands should be specially marked, to give ‘reload’ more information. Machine-specific constraints can be given names of arbitrary length, but they must be entirely composed of letters, digits, underscores (‘_’), and angle brackets (‘< >’). Like C identifiers, they must begin with a letter or underscore. In order to avoid ambiguity in operand constraint strings, no constraint can have a name that begins with any other constraint's name. For example, if ‘x’ is defined as a constraint name, ‘xy’ may not be, and vice versa. As a consequence of this rule, no constraint may begin with one of the generic constraint letters: ‘E F V X g i m n o p r s’. Register constraints correspond directly to register classes. *Note Register Classes::. There is thus not much flexibility in their definitions. -- MD Expression: define_register_constraint name regclass docstring [filter] All arguments are string constants. NAME is the name of the constraint, as it will appear in ‘match_operand’ expressions. If NAME is a multi-letter constraint its length shall be the same for all constraints starting with the same letter. REGCLASS can be either the name of the corresponding register class (*note Register Classes::), or a C expression which evaluates to the appropriate register class. If it is an expression, it must have no side effects, and it cannot look at the operand. The usual use of expressions is to map some register constraints to ‘NO_REGS’ when the register class is not available on a given subarchitecture. If an operand occupies multiple hard registers, the constraint requires all of those registers to belong to REGCLASS. For example, if REGCLASS is ‘GENERAL_REGS’ and ‘GENERAL_REGS’ contains registers ‘r0’ to ‘r15’, the constraint does not allow R15 to be used for modes that occupy more than one register. The choice of register is also constrained by ‘TARGET_HARD_REGNO_MODE_OK’. For example, if ‘TARGET_HARD_REGNO_MODE_OK’ disallows ‘(reg:DI r1)’, that requirement applies to all constraints whose classes include ‘r1’. However, it is sometimes useful to impose extra operand-specific requirements on the register number. For example, a target might not want to prevent _all_ odd-even pairs from holding ‘DImode’ values, but it might still need to prevent specific operands from having an odd-numbered register. The optional FILTER argument exists for such cases. When given, FILTER is a C++ expression that evaluates to true if ‘regno’ is a valid register for the operand. If an operand occupies multiple registers, the condition applies only to the first register. For example: (define_register_constraint "e" "GENERAL_REGS" "..." "regno % 2 == 0") defines a constraint that requires an even-numbered general register. Filter conditions that impose an alignment are encouraged to test the alignment of ‘regno’ itself, as in the example, rather than calculate an offset relative to the start of the class. If it is sometimes necessary for a register of class C to be aligned to N, the first register in C should itself by divisible by N. DOCSTRING is a sentence documenting the meaning of the constraint. Docstrings are explained further below. Non-register constraints are more like predicates: the constraint definition gives a boolean expression which indicates whether the constraint matches. -- MD Expression: define_constraint name docstring exp The NAME and DOCSTRING arguments are the same as for ‘define_register_constraint’, but note that the docstring comes immediately after the name for these expressions. EXP is an RTL expression, obeying the same rules as the RTL expressions in predicate definitions. *Note Defining Predicates::, for details. If it evaluates true, the constraint matches; if it evaluates false, it doesn't. Constraint expressions should indicate which RTL codes they might match, just like predicate expressions. ‘match_test’ C expressions have access to the following variables: OP The RTL object defining the operand. MODE The machine mode of OP. IVAL ‘INTVAL (OP)’, if OP is a ‘const_int’. HVAL ‘CONST_DOUBLE_HIGH (OP)’, if OP is an integer ‘const_double’. LVAL ‘CONST_DOUBLE_LOW (OP)’, if OP is an integer ‘const_double’. RVAL ‘CONST_DOUBLE_REAL_VALUE (OP)’, if OP is a floating-point ‘const_double’. The *VAL variables should only be used once another piece of the expression has verified that OP is the appropriate kind of RTL object. Most non-register constraints should be defined with ‘define_constraint’. The remaining two definition expressions are only appropriate for constraints that should be handled specially by ‘reload’ if they fail to match. -- MD Expression: define_memory_constraint name docstring exp Use this expression for constraints that match a subset of all memory operands: that is, ‘reload’ can make them match by converting the operand to the form ‘(mem (reg X))’, where X is a base register (from the register class specified by ‘BASE_REG_CLASS’, *note Register Classes::). For example, on the S/390, some instructions do not accept arbitrary memory references, but only those that do not make use of an index register. The constraint letter ‘Q’ is defined to represent a memory address of this type. If ‘Q’ is defined with ‘define_memory_constraint’, a ‘Q’ constraint can handle any memory operand, because ‘reload’ knows it can simply copy the memory address into a base register if required. This is analogous to the way an ‘o’ constraint can handle any memory operand. The syntax and semantics are otherwise identical to ‘define_constraint’. -- MD Expression: define_special_memory_constraint name docstring exp Use this expression for constraints that match a subset of all memory operands: that is, ‘reload’ cannot make them match by reloading the address as it is described for ‘define_memory_constraint’ or such address reload is undesirable with the performance point of view. For example, ‘define_special_memory_constraint’ can be useful if specifically aligned memory is necessary or desirable for some insn operand. The syntax and semantics are otherwise identical to ‘define_memory_constraint’. -- MD Expression: define_relaxed_memory_constraint name docstring exp The test expression in a ‘define_memory_constraint’ can assume that ‘TARGET_LEGITIMATE_ADDRESS_P’ holds for the address inside a ‘mem’ rtx and so it does not need to test this condition itself. In other words, a ‘define_memory_constraint’ test of the form: (match_test "mem") is enough to test whether an rtx is a ‘mem’ _and_ whether its address satisfies ‘TARGET_MEM_CONSTRAINT’ (which is usually ‘'m'’). Thus the conditions imposed by a ‘define_memory_constraint’ always apply on top of the conditions imposed by ‘TARGET_MEM_CONSTRAINT’. However, it is sometimes useful to define memory constraints that allow addresses beyond those accepted by ‘TARGET_LEGITIMATE_ADDRESS_P’. ‘define_relaxed_memory_constraint’ exists for this case. The test expression in a ‘define_relaxed_memory_constraint’ is applied with no preconditions, so that the expression can determine "from scratch" exactly which addresses are valid and which are not. The syntax and semantics are otherwise identical to ‘define_memory_constraint’. -- MD Expression: define_address_constraint name docstring exp Use this expression for constraints that match a subset of all address operands: that is, ‘reload’ can make the constraint match by converting the operand to the form ‘(reg X)’, again with X a base register. Constraints defined with ‘define_address_constraint’ can only be used with the ‘address_operand’ predicate, or machine-specific predicates that work the same way. They are treated analogously to the generic ‘p’ constraint. The syntax and semantics are otherwise identical to ‘define_constraint’. For historical reasons, names beginning with the letters ‘G H’ are reserved for constraints that match only ‘const_double’s, and names beginning with the letters ‘I J K L M N O P’ are reserved for constraints that match only ‘const_int’s. This may change in the future. For the time being, constraints with these names must be written in a stylized form, so that ‘genpreds’ can tell you did it correctly: (define_constraint "[GHIJKLMNOP]..." "DOC..." (and (match_code "const_int") ; ‘const_double’ for G/H CONDITION...)) ; usually a ‘match_test’ It is fine to use names beginning with other letters for constraints that match ‘const_double’s or ‘const_int’s. Each docstring in a constraint definition should be one or more complete sentences, marked up in Texinfo format. _They are currently unused._ In the future they will be copied into the GCC manual, in *note Machine Constraints::, replacing the hand-maintained tables currently found in that section. Also, in the future the compiler may use this to give more helpful diagnostics when poor choice of ‘asm’ constraints causes a reload failure. If you put the pseudo-Texinfo directive ‘@internal’ at the beginning of a docstring, then (in the future) it will appear only in the internals manual's version of the machine-specific constraint tables. Use this for constraints that should not appear in ‘asm’ statements.  File: gccint.info, Node: C Constraint Interface, Prev: Define Constraints, Up: Constraints 17.9.8 Testing constraints from C --------------------------------- It is occasionally useful to test a constraint from C code rather than implicitly via the constraint string in a ‘match_operand’. The generated file ‘tm_p.h’ declares a few interfaces for working with constraints. At present these are defined for all constraints except ‘g’ (which is equivalent to ‘general_operand’). Some valid constraint names are not valid C identifiers, so there is a mangling scheme for referring to them from C. Constraint names that do not contain angle brackets or underscores are left unchanged. Underscores are doubled, each ‘<’ is replaced with ‘_l’, and each ‘>’ with ‘_g’. Here are some examples: *Original* *Mangled* x x P42x P42x P4_x P4__x P4>x P4_gx P4>> P4_g_g P4_g> P4__g_g Throughout this section, the variable C is either a constraint in the abstract sense, or a constant from ‘enum constraint_num’; the variable M is a mangled constraint name (usually as part of a larger identifier). -- Enum: constraint_num For each constraint except ‘g’, there is a corresponding enumeration constant: ‘CONSTRAINT_’ plus the mangled name of the constraint. Functions that take an ‘enum constraint_num’ as an argument expect one of these constants. -- Function: inline bool satisfies_constraint_M (rtx EXP) For each non-register constraint M except ‘g’, there is one of these functions; it returns ‘true’ if EXP satisfies the constraint. These functions are only visible if ‘rtl.h’ was included before ‘tm_p.h’. -- Function: bool constraint_satisfied_p (rtx EXP, enum constraint_num C) Like the ‘satisfies_constraint_M’ functions, but the constraint to test is given as an argument, C. If C specifies a register constraint, this function will always return ‘false’. -- Function: enum reg_class reg_class_for_constraint (enum constraint_num C) Returns the register class associated with C. If C is not a register constraint, or those registers are not available for the currently selected subtarget, returns ‘NO_REGS’. Here is an example use of ‘satisfies_constraint_M’. In peephole optimizations (*note Peephole Definitions::), operand constraint strings are ignored, so if there are relevant constraints, they must be tested in the C condition. In the example, the optimization is applied if operand 2 does _not_ satisfy the ‘K’ constraint. (This is a simplified version of a peephole definition from the i386 machine description.) (define_peephole2 [(match_scratch:SI 3 "r") (set (match_operand:SI 0 "register_operand" "") (mult:SI (match_operand:SI 1 "memory_operand" "") (match_operand:SI 2 "immediate_operand" "")))] "!satisfies_constraint_K (operands[2])" [(set (match_dup 3) (match_dup 1)) (set (match_dup 0) (mult:SI (match_dup 3) (match_dup 2)))] "")  File: gccint.info, Node: Standard Names, Next: Pattern Ordering, Prev: Constraints, Up: Machine Desc 17.10 Standard Pattern Names For Generation =========================================== Here is a table of the instruction names that are meaningful in the RTL generation pass of the compiler. Giving one of these names to an instruction pattern tells the RTL generation pass that it can use the pattern to accomplish a certain task. ‘movM’ Here M stands for a two-letter machine mode name, in lowercase. This instruction pattern moves data with that machine mode from operand 1 to operand 0. For example, ‘movsi’ moves full-word data. If operand 0 is a ‘subreg’ with mode M of a register whose own mode is wider than M, the effect of this instruction is to store the specified value in the part of the register that corresponds to mode M. Bits outside of M, but which are within the same target word as the ‘subreg’ are undefined. Bits which are outside the target word are left unchanged. This class of patterns is special in several ways. First of all, each of these names up to and including full word size _must_ be defined, because there is no other way to copy a datum from one place to another. If there are patterns accepting operands in larger modes, ‘movM’ must be defined for integer modes of those sizes. Second, these patterns are not used solely in the RTL generation pass. Even the reload pass can generate move insns to copy values from stack slots into temporary registers. When it does so, one of the operands is a hard register and the other is an operand that can need to be reloaded into a register. Therefore, when given such a pair of operands, the pattern must generate RTL which needs no reloading and needs no temporary registers--no registers other than the operands. For example, if you support the pattern with a ‘define_expand’, then in such a case the ‘define_expand’ mustn't call ‘force_reg’ or any other such function which might generate new pseudo registers. This requirement exists even for subword modes on a RISC machine where fetching those modes from memory normally requires several insns and some temporary registers. During reload a memory reference with an invalid address may be passed as an operand. Such an address will be replaced with a valid address later in the reload pass. In this case, nothing may be done with the address except to use it as it stands. If it is copied, it will not be replaced with a valid address. No attempt should be made to make such an address into a valid address and no routine (such as ‘change_address’) that will do so may be called. Note that ‘general_operand’ will fail when applied to such an address. The global variable ‘reload_in_progress’ (which must be explicitly declared if required) can be used to determine whether such special handling is required. The variety of operands that have reloads depends on the rest of the machine description, but typically on a RISC machine these can only be pseudo registers that did not get hard registers, while on other machines explicit memory references will get optional reloads. If a scratch register is required to move an object to or from memory, it can be allocated using ‘gen_reg_rtx’ prior to life analysis. If there are cases which need scratch registers during or after reload, you must provide an appropriate secondary_reload target hook. The macro ‘can_create_pseudo_p’ can be used to determine if it is unsafe to create new pseudo registers. If this variable is nonzero, then it is unsafe to call ‘gen_reg_rtx’ to allocate a new pseudo. The constraints on a ‘movM’ must permit moving any hard register to any other hard register provided that ‘TARGET_HARD_REGNO_MODE_OK’ permits mode M in both registers and ‘TARGET_REGISTER_MOVE_COST’ applied to their classes returns a value of 2. It is obligatory to support floating point ‘movM’ instructions into and out of any registers that can hold fixed point values, because unions and structures (which have modes ‘SImode’ or ‘DImode’) can be in those registers and they may have floating point members. There may also be a need to support fixed point ‘movM’ instructions in and out of floating point registers. Unfortunately, I have forgotten why this was so, and I don't know whether it is still true. If ‘TARGET_HARD_REGNO_MODE_OK’ rejects fixed point values in floating point registers, then the constraints of the fixed point ‘movM’ instructions must be designed to avoid ever trying to reload into a floating point register. ‘reload_inM’ ‘reload_outM’ These named patterns have been obsoleted by the target hook ‘secondary_reload’. Like ‘movM’, but used when a scratch register is required to move between operand 0 and operand 1. Operand 2 describes the scratch register. See the discussion of the ‘SECONDARY_RELOAD_CLASS’ macro in *note Register Classes::. There are special restrictions on the form of the ‘match_operand’s used in these patterns. First, only the predicate for the reload operand is examined, i.e., ‘reload_in’ examines operand 1, but not the predicates for operand 0 or 2. Second, there may be only one alternative in the constraints. Third, only a single register class letter may be used for the constraint; subsequent constraint letters are ignored. As a special exception, an empty constraint string matches the ‘ALL_REGS’ register class. This may relieve ports of the burden of defining an ‘ALL_REGS’ constraint letter just for these patterns. ‘movstrictM’ Like ‘movM’ except that if operand 0 is a ‘subreg’ with mode M of a register whose natural mode is wider, the ‘movstrictM’ instruction is guaranteed not to alter any of the register except the part which belongs to mode M. ‘movmisalignM’ This variant of a move pattern is designed to load or store a value from a memory address that is not naturally aligned for its mode. For a store, the memory will be in operand 0; for a load, the memory will be in operand 1. The other operand is guaranteed not to be a memory, so that it's easy to tell whether this is a load or store. This pattern is used by the autovectorizer, and when expanding a ‘MISALIGNED_INDIRECT_REF’ expression. ‘load_multiple’ Load several consecutive memory locations into consecutive registers. Operand 0 is the first of the consecutive registers, operand 1 is the first memory location, and operand 2 is a constant: the number of consecutive registers. Define this only if the target machine really has such an instruction; do not define this if the most efficient way of loading consecutive registers from memory is to do them one at a time. On some machines, there are restrictions as to which consecutive registers can be stored into memory, such as particular starting or ending register numbers or only a range of valid counts. For those machines, use a ‘define_expand’ (*note Expander Definitions::) and make the pattern fail if the restrictions are not met. Write the generated insn as a ‘parallel’ with elements being a ‘set’ of one register from the appropriate memory location (you may also need ‘use’ or ‘clobber’ elements). Use a ‘match_parallel’ (*note RTL Template::) to recognize the insn. See ‘rs6000.md’ for examples of the use of this insn pattern. ‘store_multiple’ Similar to ‘load_multiple’, but store several consecutive registers into consecutive memory locations. Operand 0 is the first of the consecutive memory locations, operand 1 is the first register, and operand 2 is a constant: the number of consecutive registers. ‘vec_load_lanesMN’ Perform an interleaved load of several vectors from memory operand 1 into register operand 0. Both operands have mode M. The register operand is viewed as holding consecutive vectors of mode N, while the memory operand is a flat array that contains the same number of elements. The operation is equivalent to: int c = GET_MODE_SIZE (M) / GET_MODE_SIZE (N); for (j = 0; j < GET_MODE_NUNITS (N); j++) for (i = 0; i < c; i++) operand0[i][j] = operand1[j * c + i]; For example, ‘vec_load_lanestiv4hi’ loads 8 16-bit values from memory into a register of mode ‘TI’. The register contains two consecutive vectors of mode ‘V4HI’. This pattern can only be used if: TARGET_ARRAY_MODE_SUPPORTED_P (N, C) is true. GCC assumes that, if a target supports this kind of instruction for some mode N, it also supports unaligned loads for vectors of mode N. This pattern is not allowed to ‘FAIL’. ‘vec_mask_load_lanesMN’ Like ‘vec_load_lanesMN’, but takes an additional mask operand (operand 2) that specifies which elements of the destination vectors should be loaded. Other elements of the destination vectors are set to zero. The operation is equivalent to: int c = GET_MODE_SIZE (M) / GET_MODE_SIZE (N); for (j = 0; j < GET_MODE_NUNITS (N); j++) if (operand2[j]) for (i = 0; i < c; i++) operand0[i][j] = operand1[j * c + i]; else for (i = 0; i < c; i++) operand0[i][j] = 0; This pattern is not allowed to ‘FAIL’. ‘vec_mask_len_load_lanesMN’ Like ‘vec_load_lanesMN’, but takes an additional mask operand (operand 2), length operand (operand 3) as well as bias operand (operand 4) that specifies which elements of the destination vectors should be loaded. Other elements of the destination vectors are undefined. The operation is equivalent to: int c = GET_MODE_SIZE (M) / GET_MODE_SIZE (N); for (j = 0; j < operand3 + operand4; j++) if (operand2[j]) for (i = 0; i < c; i++) operand0[i][j] = operand1[j * c + i]; This pattern is not allowed to ‘FAIL’. ‘vec_store_lanesMN’ Equivalent to ‘vec_load_lanesMN’, with the memory and register operands reversed. That is, the instruction is equivalent to: int c = GET_MODE_SIZE (M) / GET_MODE_SIZE (N); for (j = 0; j < GET_MODE_NUNITS (N); j++) for (i = 0; i < c; i++) operand0[j * c + i] = operand1[i][j]; for a memory operand 0 and register operand 1. This pattern is not allowed to ‘FAIL’. ‘vec_mask_store_lanesMN’ Like ‘vec_store_lanesMN’, but takes an additional mask operand (operand 2) that specifies which elements of the source vectors should be stored. The operation is equivalent to: int c = GET_MODE_SIZE (M) / GET_MODE_SIZE (N); for (j = 0; j < GET_MODE_NUNITS (N); j++) if (operand2[j]) for (i = 0; i < c; i++) operand0[j * c + i] = operand1[i][j]; This pattern is not allowed to ‘FAIL’. ‘vec_mask_len_store_lanesMN’ Like ‘vec_store_lanesMN’, but takes an additional mask operand (operand 2), length operand (operand 3) as well as bias operand (operand 4) that specifies which elements of the source vectors should be stored. The operation is equivalent to: int c = GET_MODE_SIZE (M) / GET_MODE_SIZE (N); for (j = 0; j < operand3 + operand4; j++) if (operand2[j]) for (i = 0; i < c; i++) operand0[j * c + i] = operand1[i][j]; This pattern is not allowed to ‘FAIL’. ‘gather_loadMN’ Load several separate memory locations into a vector of mode M. Operand 1 is a scalar base address and operand 2 is a vector of mode N containing offsets from that base. Operand 0 is a destination vector with the same number of elements as N. For each element index I: • extend the offset element I to address width, using zero extension if operand 3 is 1 and sign extension if operand 3 is zero; • multiply the extended offset by operand 4; • add the result to the base; and • load the value at that address into element I of operand 0. The value of operand 3 does not matter if the offsets are already address width. ‘mask_gather_loadMN’ Like ‘gather_loadMN’, but takes an extra mask operand as operand 5. Bit I of the mask is set if element I of the result should be loaded from memory and clear if element I of the result should be set to zero. ‘mask_len_gather_loadMN’ Like ‘gather_loadMN’, but takes an extra mask operand (operand 5), a len operand (operand 6) as well as a bias operand (operand 7). Similar to mask_len_load, the instruction loads at most (operand 6 + operand 7) elements from memory. Bit I of the mask is set if element I of the result should be loaded from memory and clear if element I of the result should be undefined. Mask elements I with I > (operand 6 + operand 7) are ignored. ‘scatter_storeMN’ Store a vector of mode M into several distinct memory locations. Operand 0 is a scalar base address and operand 1 is a vector of mode N containing offsets from that base. Operand 4 is the vector of values that should be stored, which has the same number of elements as N. For each element index I: • extend the offset element I to address width, using zero extension if operand 2 is 1 and sign extension if operand 2 is zero; • multiply the extended offset by operand 3; • add the result to the base; and • store element I of operand 4 to that address. The value of operand 2 does not matter if the offsets are already address width. ‘mask_scatter_storeMN’ Like ‘scatter_storeMN’, but takes an extra mask operand as operand 5. Bit I of the mask is set if element I of the result should be stored to memory. ‘mask_len_scatter_storeMN’ Like ‘scatter_storeMN’, but takes an extra mask operand (operand 5), a len operand (operand 6) as well as a bias operand (operand 7). The instruction stores at most (operand 6 + operand 7) elements of (operand 4) to memory. Bit I of the mask is set if element I of (operand 4) should be stored. Mask elements I with I > (operand 6 + operand 7) are ignored. ‘vec_setM’ Set given field in the vector value. Operand 0 is the vector to modify, operand 1 is new value of field and operand 2 specify the field index. This pattern is not allowed to ‘FAIL’. ‘vec_extractMN’ Extract given field from the vector value. Operand 1 is the vector, operand 2 specify field index and operand 0 place to store value into. The N mode is the mode of the field or vector of fields that should be extracted, should be either element mode of the vector mode M, or a vector mode with the same element mode and smaller number of elements. If N is a vector mode the index is counted in multiples of mode N. This pattern is not allowed to ‘FAIL’. ‘vec_initMN’ Initialize the vector to given values. Operand 0 is the vector to initialize and operand 1 is parallel containing values for individual fields. The N mode is the mode of the elements, should be either element mode of the vector mode M, or a vector mode with the same element mode and smaller number of elements. ‘vec_duplicateM’ Initialize vector output operand 0 so that each element has the value given by scalar input operand 1. The vector has mode M and the scalar has the mode appropriate for one element of M. This pattern only handles duplicates of non-constant inputs. Constant vectors go through the ‘movM’ pattern instead. This pattern is not allowed to ‘FAIL’. ‘vec_seriesM’ Initialize vector output operand 0 so that element I is equal to operand 1 plus I times operand 2. In other words, create a linear series whose base value is operand 1 and whose step is operand 2. The vector output has mode M and the scalar inputs have the mode appropriate for one element of M. This pattern is not used for floating-point vectors, in order to avoid having to specify the rounding behavior for I > 1. This pattern is not allowed to ‘FAIL’. ‘while_ultMN’ Set operand 0 to a mask that is true while incrementing operand 1 gives a value that is less than operand 2, for a vector length up to operand 3. Operand 0 has mode N and operands 1 and 2 are scalar integers of mode M. Operand 3 should be omitted when N is a vector mode, and a ‘CONST_INT’ otherwise. The operation for vector modes is equivalent to: operand0[0] = operand1 < operand2; for (i = 1; i < GET_MODE_NUNITS (N); i++) operand0[i] = operand0[i - 1] && (operand1 + i < operand2); And for non-vector modes the operation is equivalent to: operand0[0] = operand1 < operand2; for (i = 1; i < operand3; i++) operand0[i] = operand0[i - 1] && (operand1 + i < operand2); ‘select_vlM’ Set operand 0 to the number of scalar iterations that should be handled by one iteration of a vector loop. Operand 1 is the total number of scalar iterations that the loop needs to process and operand 2 is a maximum bound on the result (also known as the maximum "vectorization factor"). The maximum value of operand 0 is given by: operand0 = MIN (operand1, operand2) However, targets might choose a lower value than this, based on target-specific criteria. Each iteration of the vector loop might therefore process a different number of scalar iterations, which in turn means that induction variables will have a variable step. Because of this, it is generally not useful to define this instruction if it will always calculate the maximum value. This optab is only useful on targets that implement ‘len_load_M’ and/or ‘len_store_M’. ‘check_raw_ptrsM’ Check whether, given two pointers A and B and a length LEN, a write of LEN bytes at A followed by a read of LEN bytes at B can be split into interleaved byte accesses ‘A[0], B[0], A[1], B[1], ...’ without affecting the dependencies between the bytes. Set operand 0 to true if the split is possible and false otherwise. Operands 1, 2 and 3 provide the values of A, B and LEN respectively. Operand 4 is a constant integer that provides the known common alignment of A and B. All inputs have mode M. This split is possible if: A == B || A + LEN <= B || B + LEN <= A You should only define this pattern if the target has a way of accelerating the test without having to do the individual comparisons. ‘check_war_ptrsM’ Like ‘check_raw_ptrsM’, but with the read and write swapped round. The split is possible in this case if: B <= A || A + LEN <= B ‘vec_cmpMN’ Output a vector comparison. Operand 0 of mode N is the destination for predicate in operand 1 which is a signed vector comparison with operands of mode M in operands 2 and 3. Predicate is computed by element-wise evaluation of the vector comparison with a truth value of all-ones and a false value of all-zeros. ‘vec_cmpuMN’ Similar to ‘vec_cmpMN’ but perform unsigned vector comparison. ‘vec_cmpeqMN’ Similar to ‘vec_cmpMN’ but perform equality or non-equality vector comparison only. If ‘vec_cmpMN’ or ‘vec_cmpuMN’ instruction pattern is supported, it will be preferred over ‘vec_cmpeqMN’, so there is no need to define this instruction pattern if the others are supported. ‘vcondMN’ Output a conditional vector move. Operand 0 is the destination to receive a combination of operand 1 and operand 2, which are of mode M, dependent on the outcome of the predicate in operand 3 which is a signed vector comparison with operands of mode N in operands 4 and 5. The modes M and N should have the same size. Operand 0 will be set to the value OP1 & MSK | OP2 & ~MSK where MSK is computed by element-wise evaluation of the vector comparison with a truth value of all-ones and a false value of all-zeros. ‘vconduMN’ Similar to ‘vcondMN’ but performs unsigned vector comparison. ‘vcondeqMN’ Similar to ‘vcondMN’ but performs equality or non-equality vector comparison only. If ‘vcondMN’ or ‘vconduMN’ instruction pattern is supported, it will be preferred over ‘vcondeqMN’, so there is no need to define this instruction pattern if the others are supported. ‘vcond_mask_MN’ Similar to ‘vcondMN’ but operand 3 holds a pre-computed result of vector comparison. ‘vcond_mask_MN’ Set each element of operand 0 to the corresponding element of operand 2 or operand 3. Choose operand 2 if both the element index is less than operand 4 plus operand 5 and the corresponding element of operand 1 is nonzero: for (i = 0; i < GET_MODE_NUNITS (M); i++) op0[i] = i < op4 + op5 && op1[i] ? op2[i] : op3[i]; Operands 0, 2 and 3 have mode M. Operand 1 has mode N. Operands 4 and 5 have a target-dependent scalar integer mode. ‘maskloadMN’ Perform a masked load of vector from memory operand 1 of mode M into register operand 0. Mask is provided in register operand 2 of mode N. This pattern is not allowed to ‘FAIL’. ‘maskstoreMN’ Perform a masked store of vector from register operand 1 of mode M into memory operand 0. Mask is provided in register operand 2 of mode N. This pattern is not allowed to ‘FAIL’. ‘len_load_M’ Load (operand 2 + operand 3) elements from memory operand 1 into vector register operand 0, setting the other elements of operand 0 to undefined values. Operands 0 and 1 have mode M, which must be a vector mode. Operand 2 has whichever integer mode the target prefers. Operand 3 conceptually has mode ‘QI’. Operand 2 can be a variable or a constant amount. Operand 3 specifies a constant bias: it is either a constant 0 or a constant -1. The predicate on operand 3 must only accept the bias values that the target actually supports. GCC handles a bias of 0 more efficiently than a bias of -1. If (operand 2 + operand 3) exceeds the number of elements in mode M, the behavior is undefined. If the target prefers the length to be measured in bytes rather than elements, it should only implement this pattern for vectors of ‘QI’ elements. This pattern is not allowed to ‘FAIL’. ‘len_store_M’ Store (operand 2 + operand 3) vector elements from vector register operand 1 into memory operand 0, leaving the other elements of operand 0 unchanged. Operands 0 and 1 have mode M, which must be a vector mode. Operand 2 has whichever integer mode the target prefers. Operand 3 conceptually has mode ‘QI’. Operand 2 can be a variable or a constant amount. Operand 3 specifies a constant bias: it is either a constant 0 or a constant -1. The predicate on operand 3 must only accept the bias values that the target actually supports. GCC handles a bias of 0 more efficiently than a bias of -1. If (operand 2 + operand 3) exceeds the number of elements in mode M, the behavior is undefined. If the target prefers the length to be measured in bytes rather than elements, it should only implement this pattern for vectors of ‘QI’ elements. This pattern is not allowed to ‘FAIL’. ‘mask_len_loadMN’ Perform a masked load from the memory location pointed to by operand 1 into register operand 0. (operand 3 + operand 4) elements are loaded from memory and other elements in operand 0 are set to undefined values. This is a combination of len_load and maskload. Operands 0 and 1 have mode M, which must be a vector mode. Operand 3 has whichever integer mode the target prefers. A mask is specified in operand 2 which must be of type N. The mask has lower precedence than the length and is itself subject to length masking, i.e. only mask indices < (operand 3 + operand 4) are used. Operand 4 conceptually has mode ‘QI’. Operand 2 can be a variable or a constant amount. Operand 4 specifies a constant bias: it is either a constant 0 or a constant -1. The predicate on operand 4 must only accept the bias values that the target actually supports. GCC handles a bias of 0 more efficiently than a bias of -1. If (operand 2 + operand 4) exceeds the number of elements in mode M, the behavior is undefined. If the target prefers the length to be measured in bytes rather than elements, it should only implement this pattern for vectors of ‘QI’ elements. This pattern is not allowed to ‘FAIL’. ‘mask_len_storeMN’ Perform a masked store from vector register operand 1 into memory operand 0. (operand 3 + operand 4) elements are stored to memory and leave the other elements of operand 0 unchanged. This is a combination of len_store and maskstore. Operands 0 and 1 have mode M, which must be a vector mode. Operand 3 has whichever integer mode the target prefers. A mask is specified in operand 2 which must be of type N. The mask has lower precedence than the length and is itself subject to length masking, i.e. only mask indices < (operand 3 + operand 4) are used. Operand 4 conceptually has mode ‘QI’. Operand 2 can be a variable or a constant amount. Operand 3 specifies a constant bias: it is either a constant 0 or a constant -1. The predicate on operand 4 must only accept the bias values that the target actually supports. GCC handles a bias of 0 more efficiently than a bias of -1. If (operand 2 + operand 4) exceeds the number of elements in mode M, the behavior is undefined. If the target prefers the length to be measured in bytes rather than elements, it should only implement this pattern for vectors of ‘QI’ elements. This pattern is not allowed to ‘FAIL’. ‘vec_permM’ Output a (variable) vector permutation. Operand 0 is the destination to receive elements from operand 1 and operand 2, which are of mode M. Operand 3 is the “selector”. It is an integral mode vector of the same width and number of elements as mode M. The input elements are numbered from 0 in operand 1 through 2*N-1 in operand 2. The elements of the selector must be computed modulo 2*N. Note that if ‘rtx_equal_p(operand1, operand2)’, this can be implemented with just operand 1 and selector elements modulo N. In order to make things easy for a number of targets, if there is no ‘vec_perm’ pattern for mode M, but there is for mode Q where Q is a vector of ‘QImode’ of the same width as M, the middle-end will lower the mode M ‘VEC_PERM_EXPR’ to mode Q. See also ‘TARGET_VECTORIZER_VEC_PERM_CONST’, which performs the analogous operation for constant selectors. ‘pushM1’ Output a push instruction. Operand 0 is value to push. Used only when ‘PUSH_ROUNDING’ is defined. For historical reason, this pattern may be missing and in such case an ‘mov’ expander is used instead, with a ‘MEM’ expression forming the push operation. The ‘mov’ expander method is deprecated. ‘addM3’ Add operand 2 and operand 1, storing the result in operand 0. All operands must have mode M. This can be used even on two-address machines, by means of constraints requiring operands 1 and 0 to be the same location. ‘ssaddM3’, ‘usaddM3’ ‘subM3’, ‘sssubM3’, ‘ussubM3’ ‘mulM3’, ‘ssmulM3’, ‘usmulM3’ ‘divM3’, ‘ssdivM3’ ‘udivM3’, ‘usdivM3’ ‘modM3’, ‘umodM3’ ‘uminM3’, ‘umaxM3’ ‘andM3’, ‘iorM3’, ‘xorM3’ Similar, for other arithmetic operations. ‘addvM4’ Like ‘addM3’ but takes a ‘code_label’ as operand 3 and emits code to jump to it if signed overflow occurs during the addition. This pattern is used to implement the built-in functions performing signed integer addition with overflow checking. ‘subvM4’, ‘mulvM4’ Similar, for other signed arithmetic operations. ‘uaddvM4’ Like ‘addvM4’ but for unsigned addition. That is to say, the operation is the same as signed addition but the jump is taken only on unsigned overflow. ‘usubvM4’, ‘umulvM4’ Similar, for other unsigned arithmetic operations. ‘uaddcM5’ Adds unsigned operands 2, 3 and 4 (where the last operand is guaranteed to have only values 0 or 1) together, sets operand 0 to the result of the addition of the 3 operands and sets operand 1 to 1 iff there was overflow on the unsigned additions, and to 0 otherwise. So, it is an addition with carry in (operand 4) and carry out (operand 1). All operands have the same mode. ‘usubcM5’ Similarly to ‘uaddcM5’, except subtracts unsigned operands 3 and 4 from operand 2 instead of adding them. So, it is a subtraction with carry/borrow in (operand 4) and carry/borrow out (operand 1). All operands have the same mode. ‘addptrM3’ Like ‘addM3’ but is guaranteed to only be used for address calculations. The expanded code is not allowed to clobber the condition code. It only needs to be defined if ‘addM3’ sets the condition code. If adds used for address calculations and normal adds are not compatible it is required to expand a distinct pattern (e.g. using an unspec). The pattern is used by LRA to emit address calculations. ‘addM3’ is used if ‘addptrM3’ is not defined. ‘fmaM4’ Multiply operand 2 and operand 1, then add operand 3, storing the result in operand 0 without doing an intermediate rounding step. All operands must have mode M. This pattern is used to implement the ‘fma’, ‘fmaf’, and ‘fmal’ builtin functions from the ISO C99 standard. ‘fmsM4’ Like ‘fmaM4’, except operand 3 subtracted from the product instead of added to the product. This is represented in the rtl as (fma:M OP1 OP2 (neg:M OP3)) ‘fnmaM4’ Like ‘fmaM4’ except that the intermediate product is negated before being added to operand 3. This is represented in the rtl as (fma:M (neg:M OP1) OP2 OP3) ‘fnmsM4’ Like ‘fmsM4’ except that the intermediate product is negated before subtracting operand 3. This is represented in the rtl as (fma:M (neg:M OP1) OP2 (neg:M OP3)) ‘sminM3’, ‘smaxM3’ Signed minimum and maximum operations. When used with floating point, if both operands are zeros, or if either operand is ‘NaN’, then it is unspecified which of the two operands is returned as the result. ‘fminM3’, ‘fmaxM3’ IEEE-conformant minimum and maximum operations. If one operand is a quiet ‘NaN’, then the other operand is returned. If both operands are quiet ‘NaN’, then a quiet ‘NaN’ is returned. In the case when gcc supports signaling ‘NaN’ (-fsignaling-nans) an invalid floating point exception is raised and a quiet ‘NaN’ is returned. All operands have mode M, which is a scalar or vector floating-point mode. These patterns are not allowed to ‘FAIL’. ‘reduc_smin_scal_M’, ‘reduc_smax_scal_M’ Find the signed minimum/maximum of the elements of a vector. The vector is operand 1, and operand 0 is the scalar result, with mode equal to the mode of the elements of the input vector. ‘reduc_umin_scal_M’, ‘reduc_umax_scal_M’ Find the unsigned minimum/maximum of the elements of a vector. The vector is operand 1, and operand 0 is the scalar result, with mode equal to the mode of the elements of the input vector. ‘reduc_fmin_scal_M’, ‘reduc_fmax_scal_M’ Find the floating-point minimum/maximum of the elements of a vector, using the same rules as ‘fminM3’ and ‘fmaxM3’. Operand 1 is a vector of mode M and operand 0 is the scalar result, which has mode ‘GET_MODE_INNER (M)’. ‘reduc_plus_scal_M’ Compute the sum of the elements of a vector. The vector is operand 1, and operand 0 is the scalar result, with mode equal to the mode of the elements of the input vector. ‘reduc_and_scal_M’ ‘reduc_ior_scal_M’ ‘reduc_xor_scal_M’ Compute the bitwise ‘AND’/‘IOR’/‘XOR’ reduction of the elements of a vector of mode M. Operand 1 is the vector input and operand 0 is the scalar result. The mode of the scalar result is the same as one element of M. ‘extract_last_M’ Find the last set bit in mask operand 1 and extract the associated element of vector operand 2. Store the result in scalar operand 0. Operand 2 has vector mode M while operand 0 has the mode appropriate for one element of M. Operand 1 has the usual mask mode for vectors of mode M; see ‘TARGET_VECTORIZE_GET_MASK_MODE’. ‘fold_extract_last_M’ If any bits of mask operand 2 are set, find the last set bit, extract the associated element from vector operand 3, and store the result in operand 0. Store operand 1 in operand 0 otherwise. Operand 3 has mode M and operands 0 and 1 have the mode appropriate for one element of M. Operand 2 has the usual mask mode for vectors of mode M; see ‘TARGET_VECTORIZE_GET_MASK_MODE’. ‘len_fold_extract_last_M’ Like ‘fold_extract_last_M’, but takes an extra length operand as operand 4 and an extra bias operand as operand 5. The last associated element is extracted should have the index i < len (operand 4) + bias (operand 5). ‘fold_left_plus_M’ Take scalar operand 1 and successively add each element from vector operand 2. Store the result in scalar operand 0. The vector has mode M and the scalars have the mode appropriate for one element of M. The operation is strictly in-order: there is no reassociation. ‘mask_fold_left_plus_M’ Like ‘fold_left_plus_M’, but takes an additional mask operand (operand 3) that specifies which elements of the source vector should be added. ‘mask_len_fold_left_plus_M’ Like ‘fold_left_plus_M’, but takes an additional mask operand (operand 3), len operand (operand 4) and bias operand (operand 5) that performs following operations strictly in-order (no reassociation): operand0 = operand1; for (i = 0; i < LEN + BIAS; i++) if (operand3[i]) operand0 += operand2[i]; ‘sdot_prodM’ Compute the sum of the products of two signed elements. Operand 1 and operand 2 are of the same mode. Their product, which is of a wider mode, is computed and added to operand 3. Operand 3 is of a mode equal or wider than the mode of the product. The result is placed in operand 0, which is of the same mode as operand 3. M is the mode of operand 1 and operand 2. Semantically the expressions perform the multiplication in the following signs sdot == op0 = sign-ext (op1) * sign-ext (op2) + op3 ... ‘udot_prodM’ Compute the sum of the products of two unsigned elements. Operand 1 and operand 2 are of the same mode. Their product, which is of a wider mode, is computed and added to operand 3. Operand 3 is of a mode equal or wider than the mode of the product. The result is placed in operand 0, which is of the same mode as operand 3. M is the mode of operand 1 and operand 2. Semantically the expressions perform the multiplication in the following signs udot == op0 = zero-ext (op1) * zero-ext (op2) + op3 ... ‘usdot_prodM’ Compute the sum of the products of elements of different signs. Operand 1 must be unsigned and operand 2 signed. Their product, which is of a wider mode, is computed and added to operand 3. Operand 3 is of a mode equal or wider than the mode of the product. The result is placed in operand 0, which is of the same mode as operand 3. M is the mode of operand 1 and operand 2. Semantically the expressions perform the multiplication in the following signs usdot == op0 = ((signed-conv) zero-ext (op1)) * sign-ext (op2) + op3 ... ‘ssadM’ ‘usadM’ Compute the sum of absolute differences of two signed/unsigned elements. Operand 1 and operand 2 are of the same mode. Their absolute difference, which is of a wider mode, is computed and added to operand 3. Operand 3 is of a mode equal or wider than the mode of the absolute difference. The result is placed in operand 0, which is of the same mode as operand 3. M is the mode of operand 1 and operand 2. ‘widen_ssumM3’ ‘widen_usumM3’ Operands 0 and 2 are of the same mode, which is wider than the mode of operand 1. Add operand 1 to operand 2 and place the widened result in operand 0. (This is used express accumulation of elements into an accumulator of a wider mode.) M is the mode of operand 1. ‘smulhsM3’ ‘umulhsM3’ Signed/unsigned multiply high with scale. This is equivalent to the C code: narrow op0, op1, op2; ... op0 = (narrow) (((wide) op1 * (wide) op2) >> (N / 2 - 1)); where the sign of ‘narrow’ determines whether this is a signed or unsigned operation, and N is the size of ‘wide’ in bits. M is the mode for all 3 operands (narrow). The wide mode is not specified and is defined to fit the whole multiply. ‘smulhrsM3’ ‘umulhrsM3’ Signed/unsigned multiply high with round and scale. This is equivalent to the C code: narrow op0, op1, op2; ... op0 = (narrow) (((((wide) op1 * (wide) op2) >> (N / 2 - 2)) + 1) >> 1); where the sign of ‘narrow’ determines whether this is a signed or unsigned operation, and N is the size of ‘wide’ in bits. M is the mode for all 3 operands (narrow). The wide mode is not specified and is defined to fit the whole multiply. ‘sdiv_pow2M3’ ‘sdiv_pow2M3’ Signed division by power-of-2 immediate. Equivalent to: signed op0, op1; ... op0 = op1 / (1 << imm); ‘vec_shl_insert_M’ Shift the elements in vector input operand 1 left one element (i.e. away from element 0) and fill the vacated element 0 with the scalar in operand 2. Store the result in vector output operand 0. Operands 0 and 1 have mode M and operand 2 has the mode appropriate for one element of M. ‘vec_shl_M’ Whole vector left shift in bits, i.e. away from element 0. Operand 1 is a vector to be shifted. Operand 2 is an integer shift amount in bits. Operand 0 is where the resulting shifted vector is stored. The output and input vectors should have the same modes. ‘vec_shr_M’ Whole vector right shift in bits, i.e. towards element 0. Operand 1 is a vector to be shifted. Operand 2 is an integer shift amount in bits. Operand 0 is where the resulting shifted vector is stored. The output and input vectors should have the same modes. ‘vec_pack_trunc_M’ Narrow (demote) and merge the elements of two vectors. Operands 1 and 2 are vectors of the same mode having N integral or floating point elements of size S. Operand 0 is the resulting vector in which 2*N elements of size S/2 are concatenated after narrowing them down using truncation. ‘vec_pack_sbool_trunc_M’ Narrow and merge the elements of two vectors. Operands 1 and 2 are vectors of the same type having N boolean elements. Operand 0 is the resulting vector in which 2*N elements are concatenated. The last operand (operand 3) is the number of elements in the output vector 2*N as a ‘CONST_INT’. This instruction pattern is used when all the vector input and output operands have the same scalar mode M and thus using ‘vec_pack_trunc_M’ would be ambiguous. ‘vec_pack_ssat_M’, ‘vec_pack_usat_M’ Narrow (demote) and merge the elements of two vectors. Operands 1 and 2 are vectors of the same mode having N integral elements of size S. Operand 0 is the resulting vector in which the elements of the two input vectors are concatenated after narrowing them down using signed/unsigned saturating arithmetic. ‘vec_pack_sfix_trunc_M’, ‘vec_pack_ufix_trunc_M’ Narrow, convert to signed/unsigned integral type and merge the elements of two vectors. Operands 1 and 2 are vectors of the same mode having N floating point elements of size S. Operand 0 is the resulting vector in which 2*N elements of size S/2 are concatenated. ‘vec_packs_float_M’, ‘vec_packu_float_M’ Narrow, convert to floating point type and merge the elements of two vectors. Operands 1 and 2 are vectors of the same mode having N signed/unsigned integral elements of size S. Operand 0 is the resulting vector in which 2*N elements of size S/2 are concatenated. ‘vec_unpacks_hi_M’, ‘vec_unpacks_lo_M’ Extract and widen (promote) the high/low part of a vector of signed integral or floating point elements. The input vector (operand 1) has N elements of size S. Widen (promote) the high/low elements of the vector using signed or floating point extension and place the resulting N/2 values of size 2*S in the output vector (operand 0). ‘vec_unpacku_hi_M’, ‘vec_unpacku_lo_M’ Extract and widen (promote) the high/low part of a vector of unsigned integral elements. The input vector (operand 1) has N elements of size S. Widen (promote) the high/low elements of the vector using zero extension and place the resulting N/2 values of size 2*S in the output vector (operand 0). ‘vec_unpacks_sbool_hi_M’, ‘vec_unpacks_sbool_lo_M’ Extract the high/low part of a vector of boolean elements that have scalar mode M. The input vector (operand 1) has N elements, the output vector (operand 0) has N/2 elements. The last operand (operand 2) is the number of elements of the input vector N as a ‘CONST_INT’. These patterns are used if both the input and output vectors have the same scalar mode M and thus using ‘vec_unpacks_hi_M’ or ‘vec_unpacks_lo_M’ would be ambiguous. ‘vec_unpacks_float_hi_M’, ‘vec_unpacks_float_lo_M’ ‘vec_unpacku_float_hi_M’, ‘vec_unpacku_float_lo_M’ Extract, convert to floating point type and widen the high/low part of a vector of signed/unsigned integral elements. The input vector (operand 1) has N elements of size S. Convert the high/low elements of the vector using floating point conversion and place the resulting N/2 values of size 2*S in the output vector (operand 0). ‘vec_unpack_sfix_trunc_hi_M’, ‘vec_unpack_sfix_trunc_lo_M’ ‘vec_unpack_ufix_trunc_hi_M’ ‘vec_unpack_ufix_trunc_lo_M’ Extract, convert to signed/unsigned integer type and widen the high/low part of a vector of floating point elements. The input vector (operand 1) has N elements of size S. Convert the high/low elements of the vector to integers and place the resulting N/2 values of size 2*S in the output vector (operand 0). ‘vec_widen_umult_hi_M’, ‘vec_widen_umult_lo_M’ ‘vec_widen_smult_hi_M’, ‘vec_widen_smult_lo_M’ ‘vec_widen_umult_even_M’, ‘vec_widen_umult_odd_M’ ‘vec_widen_smult_even_M’, ‘vec_widen_smult_odd_M’ Signed/Unsigned widening multiplication. The two inputs (operands 1 and 2) are vectors with N signed/unsigned elements of size S. Multiply the high/low or even/odd elements of the two vectors, and put the N/2 products of size 2*S in the output vector (operand 0). A target shouldn't implement even/odd pattern pair if it is less efficient than lo/hi one. ‘vec_widen_ushiftl_hi_M’, ‘vec_widen_ushiftl_lo_M’ ‘vec_widen_sshiftl_hi_M’, ‘vec_widen_sshiftl_lo_M’ Signed/Unsigned widening shift left. The first input (operand 1) is a vector with N signed/unsigned elements of size S. Operand 2 is a constant. Shift the high/low elements of operand 1, and put the N/2 results of size 2*S in the output vector (operand 0). ‘vec_widen_uaddl_hi_M’, ‘vec_widen_uaddl_lo_M’ ‘vec_widen_saddl_hi_M’, ‘vec_widen_saddl_lo_M’ Signed/Unsigned widening add long. Operands 1 and 2 are vectors with N signed/unsigned elements of size S. Add the high/low elements of 1 and 2 together, widen the resulting elements and put the N/2 results of size 2*S in the output vector (operand 0). ‘vec_widen_usubl_hi_M’, ‘vec_widen_usubl_lo_M’ ‘vec_widen_ssubl_hi_M’, ‘vec_widen_ssubl_lo_M’ Signed/Unsigned widening subtract long. Operands 1 and 2 are vectors with N signed/unsigned elements of size S. Subtract the high/low elements of 2 from 1 and widen the resulting elements. Put the N/2 results of size 2*S in the output vector (operand 0). ‘vec_widen_uabd_hi_M’, ‘vec_widen_uabd_lo_M’ ‘vec_widen_uabd_odd_M’, ‘vec_widen_uabd_even_M’ ‘vec_widen_sabd_hi_M’, ‘vec_widen_sabd_lo_M’ ‘vec_widen_sabd_odd_M’, ‘vec_widen_sabd_even_M’ Signed/Unsigned widening absolute difference. Operands 1 and 2 are vectors with N signed/unsigned elements of size S. Find the absolute difference between operands 1 and 2 and widen the resulting elements. Put the N/2 results of size 2*S in the output vector (operand 0). ‘vec_addsubM3’ Alternating subtract, add with even lanes doing subtract and odd lanes doing addition. Operands 1 and 2 and the outout operand are vectors with mode M. ‘vec_fmaddsubM4’ Alternating multiply subtract, add with even lanes doing subtract and odd lanes doing addition of the third operand to the multiplication result of the first two operands. Operands 1, 2 and 3 and the outout operand are vectors with mode M. ‘vec_fmsubaddM4’ Alternating multiply add, subtract with even lanes doing addition and odd lanes doing subtraction of the third operand to the multiplication result of the first two operands. Operands 1, 2 and 3 and the outout operand are vectors with mode M. These instructions are not allowed to ‘FAIL’. ‘mulhisi3’ Multiply operands 1 and 2, which have mode ‘HImode’, and store a ‘SImode’ product in operand 0. ‘mulqihi3’, ‘mulsidi3’ Similar widening-multiplication instructions of other widths. ‘umulqihi3’, ‘umulhisi3’, ‘umulsidi3’ Similar widening-multiplication instructions that do unsigned multiplication. ‘usmulqihi3’, ‘usmulhisi3’, ‘usmulsidi3’ Similar widening-multiplication instructions that interpret the first operand as unsigned and the second operand as signed, then do a signed multiplication. ‘smulM3_highpart’ Perform a signed multiplication of operands 1 and 2, which have mode M, and store the most significant half of the product in operand 0. The least significant half of the product is discarded. This may be represented in RTL using a ‘smul_highpart’ RTX expression. ‘umulM3_highpart’ Similar, but the multiplication is unsigned. This may be represented in RTL using an ‘umul_highpart’ RTX expression. ‘maddMN4’ Multiply operands 1 and 2, sign-extend them to mode N, add operand 3, and store the result in operand 0. Operands 1 and 2 have mode M and operands 0 and 3 have mode N. Both modes must be integer or fixed-point modes and N must be twice the size of M. In other words, ‘maddMN4’ is like ‘mulMN3’ except that it also adds operand 3. These instructions are not allowed to ‘FAIL’. ‘umaddMN4’ Like ‘maddMN4’, but zero-extend the multiplication operands instead of sign-extending them. ‘ssmaddMN4’ Like ‘maddMN4’, but all involved operations must be signed-saturating. ‘usmaddMN4’ Like ‘umaddMN4’, but all involved operations must be unsigned-saturating. ‘msubMN4’ Multiply operands 1 and 2, sign-extend them to mode N, subtract the result from operand 3, and store the result in operand 0. Operands 1 and 2 have mode M and operands 0 and 3 have mode N. Both modes must be integer or fixed-point modes and N must be twice the size of M. In other words, ‘msubMN4’ is like ‘mulMN3’ except that it also subtracts the result from operand 3. These instructions are not allowed to ‘FAIL’. ‘umsubMN4’ Like ‘msubMN4’, but zero-extend the multiplication operands instead of sign-extending them. ‘ssmsubMN4’ Like ‘msubMN4’, but all involved operations must be signed-saturating. ‘usmsubMN4’ Like ‘umsubMN4’, but all involved operations must be unsigned-saturating. ‘divmodM4’ Signed division that produces both a quotient and a remainder. Operand 1 is divided by operand 2 to produce a quotient stored in operand 0 and a remainder stored in operand 3. For machines with an instruction that produces both a quotient and a remainder, provide a pattern for ‘divmodM4’ but do not provide patterns for ‘divM3’ and ‘modM3’. This allows optimization in the relatively common case when both the quotient and remainder are computed. If an instruction that just produces a quotient or just a remainder exists and is more efficient than the instruction that produces both, write the output routine of ‘divmodM4’ to call ‘find_reg_note’ and look for a ‘REG_UNUSED’ note on the quotient or remainder and generate the appropriate instruction. ‘udivmodM4’ Similar, but does unsigned division. ‘ashlM3’, ‘ssashlM3’, ‘usashlM3’ Arithmetic-shift operand 1 left by a number of bits specified by operand 2, and store the result in operand 0. Here M is the mode of operand 0 and operand 1; operand 2's mode is specified by the instruction pattern, and the compiler will convert the operand to that mode before generating the instruction. The shift or rotate expander or instruction pattern should explicitly specify the mode of the operand 2, it should never be ‘VOIDmode’. The meaning of out-of-range shift counts can optionally be specified by ‘TARGET_SHIFT_TRUNCATION_MASK’. *Note TARGET_SHIFT_TRUNCATION_MASK::. Operand 2 is always a scalar type. ‘ashrM3’, ‘lshrM3’, ‘rotlM3’, ‘rotrM3’ Other shift and rotate instructions, analogous to the ‘ashlM3’ instructions. Operand 2 is always a scalar type. ‘vashlM3’, ‘vashrM3’, ‘vlshrM3’, ‘vrotlM3’, ‘vrotrM3’ Vector shift and rotate instructions that take vectors as operand 2 instead of a scalar type. ‘uabdM’, ‘sabdM’ Signed and unsigned absolute difference instructions. These instructions find the difference between operands 1 and 2 then return the absolute value. A C code equivalent would be: op0 = op1 > op2 ? op1 - op2 : op2 - op1; ‘avgM3_floor’ ‘uavgM3_floor’ Signed and unsigned average instructions. These instructions add operands 1 and 2 without truncation, divide the result by 2, round towards -Inf, and store the result in operand 0. This is equivalent to the C code: narrow op0, op1, op2; ... op0 = (narrow) (((wide) op1 + (wide) op2) >> 1); where the sign of ‘narrow’ determines whether this is a signed or unsigned operation. ‘avgM3_ceil’ ‘uavgM3_ceil’ Like ‘avgM3_floor’ and ‘uavgM3_floor’, but round towards +Inf. This is equivalent to the C code: narrow op0, op1, op2; ... op0 = (narrow) (((wide) op1 + (wide) op2 + 1) >> 1); ‘bswapM2’ Reverse the order of bytes of operand 1 and store the result in operand 0. ‘negM2’, ‘ssnegM2’, ‘usnegM2’ Negate operand 1 and store the result in operand 0. ‘negvM3’ Like ‘negM2’ but takes a ‘code_label’ as operand 2 and emits code to jump to it if signed overflow occurs during the negation. ‘absM2’ Store the absolute value of operand 1 into operand 0. ‘sqrtM2’ Store the square root of operand 1 into operand 0. Both operands have mode M, which is a scalar or vector floating-point mode. This pattern is not allowed to ‘FAIL’. ‘rsqrtM2’ Store the reciprocal of the square root of operand 1 into operand 0. Both operands have mode M, which is a scalar or vector floating-point mode. On most architectures this pattern is only approximate, so either its C condition or the ‘TARGET_OPTAB_SUPPORTED_P’ hook should check for the appropriate math flags. (Using the C condition is more direct, but using ‘TARGET_OPTAB_SUPPORTED_P’ can be useful if a target-specific built-in also uses the ‘rsqrtM2’ pattern.) This pattern is not allowed to ‘FAIL’. ‘fmodM3’ Store the remainder of dividing operand 1 by operand 2 into operand 0, rounded towards zero to an integer. All operands have mode M, which is a scalar or vector floating-point mode. This pattern is not allowed to ‘FAIL’. ‘remainderM3’ Store the remainder of dividing operand 1 by operand 2 into operand 0, rounded to the nearest integer. All operands have mode M, which is a scalar or vector floating-point mode. This pattern is not allowed to ‘FAIL’. ‘scalbM3’ Raise ‘FLT_RADIX’ to the power of operand 2, multiply it by operand 1, and store the result in operand 0. All operands have mode M, which is a scalar or vector floating-point mode. This pattern is not allowed to ‘FAIL’. ‘ldexpM3’ Raise 2 to the power of operand 2, multiply it by operand 1, and store the result in operand 0. Operands 0 and 1 have mode M, which is a scalar or vector floating-point mode. Operand 2's mode has the same number of elements as M and each element is wide enough to store an ‘int’. The integers are signed. This pattern is not allowed to ‘FAIL’. ‘cosM2’ Store the cosine of operand 1 into operand 0. Both operands have mode M, which is a scalar or vector floating-point mode. This pattern is not allowed to ‘FAIL’. ‘sinM2’ Store the sine of operand 1 into operand 0. Both operands have mode M, which is a scalar or vector floating-point mode. This pattern is not allowed to ‘FAIL’. ‘sincosM3’ Store the cosine of operand 2 into operand 0 and the sine of operand 2 into operand 1. All operands have mode M, which is a scalar or vector floating-point mode. Targets that can calculate the sine and cosine simultaneously can implement this pattern as opposed to implementing individual ‘sinM2’ and ‘cosM2’ patterns. The ‘sin’ and ‘cos’ built-in functions will then be expanded to the ‘sincosM3’ pattern, with one of the output values left unused. ‘tanM2’ Store the tangent of operand 1 into operand 0. Both operands have mode M, which is a scalar or vector floating-point mode. This pattern is not allowed to ‘FAIL’. ‘asinM2’ Store the arc sine of operand 1 into operand 0. Both operands have mode M, which is a scalar or vector floating-point mode. This pattern is not allowed to ‘FAIL’. ‘acosM2’ Store the arc cosine of operand 1 into operand 0. Both operands have mode M, which is a scalar or vector floating-point mode. This pattern is not allowed to ‘FAIL’. ‘atanM2’ Store the arc tangent of operand 1 into operand 0. Both operands have mode M, which is a scalar or vector floating-point mode. This pattern is not allowed to ‘FAIL’. ‘fegetroundM’ Store the current machine floating-point rounding mode into operand 0. Operand 0 has mode M, which is scalar. This pattern is used to implement the ‘fegetround’ function from the ISO C99 standard. ‘feclearexceptM’ ‘feraiseexceptM’ Clears or raises the supported machine floating-point exceptions represented by the bits in operand 1. Error status is stored as nonzero value in operand 0. Both operands have mode M, which is a scalar. These patterns are used to implement the ‘feclearexcept’ and ‘feraiseexcept’ functions from the ISO C99 standard. ‘expM2’ Raise e (the base of natural logarithms) to the power of operand 1 and store the result in operand 0. Both operands have mode M, which is a scalar or vector floating-point mode. This pattern is not allowed to ‘FAIL’. ‘expm1M2’ Raise e (the base of natural logarithms) to the power of operand 1, subtract 1, and store the result in operand 0. Both operands have mode M, which is a scalar or vector floating-point mode. For inputs close to zero, the pattern is expected to be more accurate than a separate ‘expM2’ and ‘subM3’ would be. This pattern is not allowed to ‘FAIL’. ‘exp10M2’ Raise 10 to the power of operand 1 and store the result in operand 0. Both operands have mode M, which is a scalar or vector floating-point mode. This pattern is not allowed to ‘FAIL’. ‘exp2M2’ Raise 2 to the power of operand 1 and store the result in operand 0. Both operands have mode M, which is a scalar or vector floating-point mode. This pattern is not allowed to ‘FAIL’. ‘logM2’ Store the natural logarithm of operand 1 into operand 0. Both operands have mode M, which is a scalar or vector floating-point mode. This pattern is not allowed to ‘FAIL’. ‘log1pM2’ Add 1 to operand 1, compute the natural logarithm, and store the result in operand 0. Both operands have mode M, which is a scalar or vector floating-point mode. For inputs close to zero, the pattern is expected to be more accurate than a separate ‘addM3’ and ‘logM2’ would be. This pattern is not allowed to ‘FAIL’. ‘log10M2’ Store the base-10 logarithm of operand 1 into operand 0. Both operands have mode M, which is a scalar or vector floating-point mode. This pattern is not allowed to ‘FAIL’. ‘log2M2’ Store the base-2 logarithm of operand 1 into operand 0. Both operands have mode M, which is a scalar or vector floating-point mode. This pattern is not allowed to ‘FAIL’. ‘logbM2’ Store the base-‘FLT_RADIX’ logarithm of operand 1 into operand 0. Both operands have mode M, which is a scalar or vector floating-point mode. This pattern is not allowed to ‘FAIL’. ‘signbitM2’ Store the sign bit of floating-point operand 1 in operand 0. M is either a scalar or vector mode. When it is a scalar, operand 1 has mode M but operand 0 must have mode ‘SImode’. When M is a vector, operand 1 has the mode M. operand 0's mode should be an vector integer mode which has the same number of elements and the same size as mode M. This pattern is not allowed to ‘FAIL’. ‘significandM2’ Store the significand of floating-point operand 1 in operand 0. Both operands have mode M, which is a scalar or vector floating-point mode. This pattern is not allowed to ‘FAIL’. ‘powM3’ Store the value of operand 1 raised to the exponent operand 2 into operand 0. All operands have mode M, which is a scalar or vector floating-point mode. This pattern is not allowed to ‘FAIL’. ‘atan2M3’ Store the arc tangent (inverse tangent) of operand 1 divided by operand 2 into operand 0, using the signs of both arguments to determine the quadrant of the result. All operands have mode M, which is a scalar or vector floating-point mode. This pattern is not allowed to ‘FAIL’. ‘floorM2’ Store the largest integral value not greater than operand 1 in operand 0. Both operands have mode M, which is a scalar or vector floating-point mode. If ‘-ffp-int-builtin-inexact’ is in effect, the "inexact" exception may be raised for noninteger operands; otherwise, it may not. This pattern is not allowed to ‘FAIL’. ‘btruncM2’ Round operand 1 to an integer, towards zero, and store the result in operand 0. Both operands have mode M, which is a scalar or vector floating-point mode. If ‘-ffp-int-builtin-inexact’ is in effect, the "inexact" exception may be raised for noninteger operands; otherwise, it may not. This pattern is not allowed to ‘FAIL’. ‘roundM2’ Round operand 1 to the nearest integer, rounding away from zero in the event of a tie, and store the result in operand 0. Both operands have mode M, which is a scalar or vector floating-point mode. If ‘-ffp-int-builtin-inexact’ is in effect, the "inexact" exception may be raised for noninteger operands; otherwise, it may not. This pattern is not allowed to ‘FAIL’. ‘ceilM2’ Store the smallest integral value not less than operand 1 in operand 0. Both operands have mode M, which is a scalar or vector floating-point mode. If ‘-ffp-int-builtin-inexact’ is in effect, the "inexact" exception may be raised for noninteger operands; otherwise, it may not. This pattern is not allowed to ‘FAIL’. ‘nearbyintM2’ Round operand 1 to an integer, using the current rounding mode, and store the result in operand 0. Do not raise an inexact condition when the result is different from the argument. Both operands have mode M, which is a scalar or vector floating-point mode. This pattern is not allowed to ‘FAIL’. ‘rintM2’ Round operand 1 to an integer, using the current rounding mode, and store the result in operand 0. Raise an inexact condition when the result is different from the argument. Both operands have mode M, which is a scalar or vector floating-point mode. This pattern is not allowed to ‘FAIL’. ‘lrintMN2’ Convert operand 1 (valid for floating point mode M) to fixed point mode N as a signed number according to the current rounding mode and store in operand 0 (which has mode N). ‘lroundMN2’ Convert operand 1 (valid for floating point mode M) to fixed point mode N as a signed number rounding to nearest and away from zero and store in operand 0 (which has mode N). ‘lfloorMN2’ Convert operand 1 (valid for floating point mode M) to fixed point mode N as a signed number rounding down and store in operand 0 (which has mode N). ‘lceilMN2’ Convert operand 1 (valid for floating point mode M) to fixed point mode N as a signed number rounding up and store in operand 0 (which has mode N). ‘copysignM3’ Store a value with the magnitude of operand 1 and the sign of operand 2 into operand 0. All operands have mode M, which is a scalar or vector floating-point mode. This pattern is not allowed to ‘FAIL’. ‘xorsignM3’ Equivalent to ‘op0 = op1 * copysign (1.0, op2)’: store a value with the magnitude of operand 1 and the sign of operand 2 into operand 0. All operands have mode M, which is a scalar or vector floating-point mode. This pattern is not allowed to ‘FAIL’. ‘issignalingM2’ Set operand 0 to 1 if operand 1 is a signaling NaN and to 0 otherwise. ‘cadd90M3’ Perform vector add and subtract on even/odd number pairs. The operation being matched is semantically described as for (int i = 0; i < N; i += 2) { c[i] = a[i] - b[i+1]; c[i+1] = a[i+1] + b[i]; } This operation is semantically equivalent to performing a vector addition of complex numbers in operand 1 with operand 2 rotated by 90 degrees around the argand plane and storing the result in operand 0. In GCC lane ordering the real part of the number must be in the even lanes with the imaginary part in the odd lanes. The operation is only supported for vector modes M. This pattern is not allowed to ‘FAIL’. ‘cadd270M3’ Perform vector add and subtract on even/odd number pairs. The operation being matched is semantically described as for (int i = 0; i < N; i += 2) { c[i] = a[i] + b[i+1]; c[i+1] = a[i+1] - b[i]; } This operation is semantically equivalent to performing a vector addition of complex numbers in operand 1 with operand 2 rotated by 270 degrees around the argand plane and storing the result in operand 0. In GCC lane ordering the real part of the number must be in the even lanes with the imaginary part in the odd lanes. The operation is only supported for vector modes M. This pattern is not allowed to ‘FAIL’. ‘cmlaM4’ Perform a vector multiply and accumulate that is semantically the same as a multiply and accumulate of complex numbers. complex TYPE op0[N]; complex TYPE op1[N]; complex TYPE op2[N]; complex TYPE op3[N]; for (int i = 0; i < N; i += 1) { op0[i] = op1[i] * op2[i] + op3[i]; } In GCC lane ordering the real part of the number must be in the even lanes with the imaginary part in the odd lanes. The operation is only supported for vector modes M. This pattern is not allowed to ‘FAIL’. ‘cmla_conjM4’ Perform a vector multiply by conjugate and accumulate that is semantically the same as a multiply and accumulate of complex numbers where the second multiply arguments is conjugated. complex TYPE op0[N]; complex TYPE op1[N]; complex TYPE op2[N]; complex TYPE op3[N]; for (int i = 0; i < N; i += 1) { op0[i] = op1[i] * conj (op2[i]) + op3[i]; } In GCC lane ordering the real part of the number must be in the even lanes with the imaginary part in the odd lanes. The operation is only supported for vector modes M. This pattern is not allowed to ‘FAIL’. ‘cmlsM4’ Perform a vector multiply and subtract that is semantically the same as a multiply and subtract of complex numbers. complex TYPE op0[N]; complex TYPE op1[N]; complex TYPE op2[N]; complex TYPE op3[N]; for (int i = 0; i < N; i += 1) { op0[i] = op1[i] * op2[i] - op3[i]; } In GCC lane ordering the real part of the number must be in the even lanes with the imaginary part in the odd lanes. The operation is only supported for vector modes M. This pattern is not allowed to ‘FAIL’. ‘cmls_conjM4’ Perform a vector multiply by conjugate and subtract that is semantically the same as a multiply and subtract of complex numbers where the second multiply arguments is conjugated. complex TYPE op0[N]; complex TYPE op1[N]; complex TYPE op2[N]; complex TYPE op3[N]; for (int i = 0; i < N; i += 1) { op0[i] = op1[i] * conj (op2[i]) - op3[i]; } In GCC lane ordering the real part of the number must be in the even lanes with the imaginary part in the odd lanes. The operation is only supported for vector modes M. This pattern is not allowed to ‘FAIL’. ‘cmulM4’ Perform a vector multiply that is semantically the same as multiply of complex numbers. complex TYPE op0[N]; complex TYPE op1[N]; complex TYPE op2[N]; for (int i = 0; i < N; i += 1) { op0[i] = op1[i] * op2[i]; } In GCC lane ordering the real part of the number must be in the even lanes with the imaginary part in the odd lanes. The operation is only supported for vector modes M. This pattern is not allowed to ‘FAIL’. ‘cmul_conjM4’ Perform a vector multiply by conjugate that is semantically the same as a multiply of complex numbers where the second multiply arguments is conjugated. complex TYPE op0[N]; complex TYPE op1[N]; complex TYPE op2[N]; for (int i = 0; i < N; i += 1) { op0[i] = op1[i] * conj (op2[i]); } In GCC lane ordering the real part of the number must be in the even lanes with the imaginary part in the odd lanes. The operation is only supported for vector modes M. This pattern is not allowed to ‘FAIL’. ‘ffsM2’ Store into operand 0 one plus the index of the least significant 1-bit of operand 1. If operand 1 is zero, store zero. M is either a scalar or vector integer mode. When it is a scalar, operand 1 has mode M but operand 0 can have whatever scalar integer mode is suitable for the target. The compiler will insert conversion instructions as necessary (typically to convert the result to the same width as ‘int’). When M is a vector, both operands must have mode M. This pattern is not allowed to ‘FAIL’. ‘clrsbM2’ Count leading redundant sign bits. Store into operand 0 the number of redundant sign bits in operand 1, starting at the most significant bit position. A redundant sign bit is defined as any sign bit after the first. As such, this count will be one less than the count of leading sign bits. M is either a scalar or vector integer mode. When it is a scalar, operand 1 has mode M but operand 0 can have whatever scalar integer mode is suitable for the target. The compiler will insert conversion instructions as necessary (typically to convert the result to the same width as ‘int’). When M is a vector, both operands must have mode M. This pattern is not allowed to ‘FAIL’. ‘clzM2’ Store into operand 0 the number of leading 0-bits in operand 1, starting at the most significant bit position. If operand 1 is 0, the ‘CLZ_DEFINED_VALUE_AT_ZERO’ (*note Misc::) macro defines if the result is undefined or has a useful value. M is either a scalar or vector integer mode. When it is a scalar, operand 1 has mode M but operand 0 can have whatever scalar integer mode is suitable for the target. The compiler will insert conversion instructions as necessary (typically to convert the result to the same width as ‘int’). When M is a vector, both operands must have mode M. This pattern is not allowed to ‘FAIL’. ‘ctzM2’ Store into operand 0 the number of trailing 0-bits in operand 1, starting at the least significant bit position. If operand 1 is 0, the ‘CTZ_DEFINED_VALUE_AT_ZERO’ (*note Misc::) macro defines if the result is undefined or has a useful value. M is either a scalar or vector integer mode. When it is a scalar, operand 1 has mode M but operand 0 can have whatever scalar integer mode is suitable for the target. The compiler will insert conversion instructions as necessary (typically to convert the result to the same width as ‘int’). When M is a vector, both operands must have mode M. This pattern is not allowed to ‘FAIL’. ‘popcountM2’ Store into operand 0 the number of 1-bits in operand 1. M is either a scalar or vector integer mode. When it is a scalar, operand 1 has mode M but operand 0 can have whatever scalar integer mode is suitable for the target. The compiler will insert conversion instructions as necessary (typically to convert the result to the same width as ‘int’). When M is a vector, both operands must have mode M. This pattern is not allowed to ‘FAIL’. ‘parityM2’ Store into operand 0 the parity of operand 1, i.e. the number of 1-bits in operand 1 modulo 2. M is either a scalar or vector integer mode. When it is a scalar, operand 1 has mode M but operand 0 can have whatever scalar integer mode is suitable for the target. The compiler will insert conversion instructions as necessary (typically to convert the result to the same width as ‘int’). When M is a vector, both operands must have mode M. This pattern is not allowed to ‘FAIL’. ‘one_cmplM2’ Store the bitwise-complement of operand 1 into operand 0. ‘cpymemM’ Block copy instruction. The destination and source blocks of memory are the first two operands, and both are ‘mem:BLK’s with an address in mode ‘Pmode’. The number of bytes to copy is the third operand, in mode M. Usually, you specify ‘Pmode’ for M. However, if you can generate better code knowing the range of valid lengths is smaller than those representable in a full Pmode pointer, you should provide a pattern with a mode corresponding to the range of values you can handle efficiently (e.g., ‘QImode’ for values in the range 0-127; note we avoid numbers that appear negative) and also a pattern with ‘Pmode’. The fourth operand is the known shared alignment of the source and destination, in the form of a ‘const_int’ rtx. Thus, if the compiler knows that both source and destination are word-aligned, it may provide the value 4 for this operand. Optional operands 5 and 6 specify expected alignment and size of block respectively. The expected alignment differs from alignment in operand 4 in a way that the blocks are not required to be aligned according to it in all cases. This expected alignment is also in bytes, just like operand 4. Expected size, when unknown, is set to ‘(const_int -1)’. Descriptions of multiple ‘cpymemM’ patterns can only be beneficial if the patterns for smaller modes have fewer restrictions on their first, second and fourth operands. Note that the mode M in ‘cpymemM’ does not impose any restriction on the mode of individually copied data units in the block. The ‘cpymemM’ patterns need not give special consideration to the possibility that the source and destination strings might overlap. An exception is the case where source and destination are equal, this case needs to be handled correctly. These patterns are used to do inline expansion of ‘__builtin_memcpy’. ‘movmemM’ Block move instruction. The destination and source blocks of memory are the first two operands, and both are ‘mem:BLK’s with an address in mode ‘Pmode’. The number of bytes to copy is the third operand, in mode M. Usually, you specify ‘Pmode’ for M. However, if you can generate better code knowing the range of valid lengths is smaller than those representable in a full Pmode pointer, you should provide a pattern with a mode corresponding to the range of values you can handle efficiently (e.g., ‘QImode’ for values in the range 0-127; note we avoid numbers that appear negative) and also a pattern with ‘Pmode’. The fourth operand is the known shared alignment of the source and destination, in the form of a ‘const_int’ rtx. Thus, if the compiler knows that both source and destination are word-aligned, it may provide the value 4 for this operand. Optional operands 5 and 6 specify expected alignment and size of block respectively. The expected alignment differs from alignment in operand 4 in a way that the blocks are not required to be aligned according to it in all cases. This expected alignment is also in bytes, just like operand 4. Expected size, when unknown, is set to ‘(const_int -1)’. Descriptions of multiple ‘movmemM’ patterns can only be beneficial if the patterns for smaller modes have fewer restrictions on their first, second and fourth operands. Note that the mode M in ‘movmemM’ does not impose any restriction on the mode of individually copied data units in the block. The ‘movmemM’ patterns must correctly handle the case where the source and destination strings overlap. These patterns are used to do inline expansion of ‘__builtin_memmove’. ‘movstr’ String copy instruction, with ‘stpcpy’ semantics. Operand 0 is an output operand in mode ‘Pmode’. The addresses of the destination and source strings are operands 1 and 2, and both are ‘mem:BLK’s with addresses in mode ‘Pmode’. The execution of the expansion of this pattern should store in operand 0 the address in which the ‘NUL’ terminator was stored in the destination string. This pattern has also several optional operands that are same as in ‘setmem’. ‘setmemM’ Block set instruction. The destination string is the first operand, given as a ‘mem:BLK’ whose address is in mode ‘Pmode’. The number of bytes to set is the second operand, in mode M. The value to initialize the memory with is the third operand. Targets that only support the clearing of memory should reject any value that is not the constant 0. See ‘cpymemM’ for a discussion of the choice of mode. The fourth operand is the known alignment of the destination, in the form of a ‘const_int’ rtx. Thus, if the compiler knows that the destination is word-aligned, it may provide the value 4 for this operand. Optional operands 5 and 6 specify expected alignment and size of block respectively. The expected alignment differs from alignment in operand 4 in a way that the blocks are not required to be aligned according to it in all cases. This expected alignment is also in bytes, just like operand 4. Expected size, when unknown, is set to ‘(const_int -1)’. Operand 7 is the minimal size of the block and operand 8 is the maximal size of the block (NULL if it cannot be represented as CONST_INT). Operand 9 is the probable maximal size (i.e. we cannot rely on it for correctness, but it can be used for choosing proper code sequence for a given size). The use for multiple ‘setmemM’ is as for ‘cpymemM’. ‘cmpstrnM’ String compare instruction, with five operands. Operand 0 is the output; it has mode M. The remaining four operands are like the operands of ‘cpymemM’. The two memory blocks specified are compared byte by byte in lexicographic order starting at the beginning of each string. The instruction is not allowed to prefetch more than one byte at a time since either string may end in the first byte and reading past that may access an invalid page or segment and cause a fault. The comparison terminates early if the fetched bytes are different or if they are equal to zero. The effect of the instruction is to store a value in operand 0 whose sign indicates the result of the comparison. ‘cmpstrM’ String compare instruction, without known maximum length. Operand 0 is the output; it has mode M. The second and third operand are the blocks of memory to be compared; both are ‘mem:BLK’ with an address in mode ‘Pmode’. The fourth operand is the known shared alignment of the source and destination, in the form of a ‘const_int’ rtx. Thus, if the compiler knows that both source and destination are word-aligned, it may provide the value 4 for this operand. The two memory blocks specified are compared byte by byte in lexicographic order starting at the beginning of each string. The instruction is not allowed to prefetch more than one byte at a time since either string may end in the first byte and reading past that may access an invalid page or segment and cause a fault. The comparison will terminate when the fetched bytes are different or if they are equal to zero. The effect of the instruction is to store a value in operand 0 whose sign indicates the result of the comparison. ‘cmpmemM’ Block compare instruction, with five operands like the operands of ‘cmpstrM’. The two memory blocks specified are compared byte by byte in lexicographic order starting at the beginning of each block. Unlike ‘cmpstrM’ the instruction can prefetch any bytes in the two memory blocks. Also unlike ‘cmpstrM’ the comparison will not stop if both bytes are zero. The effect of the instruction is to store a value in operand 0 whose sign indicates the result of the comparison. ‘strlenM’ Compute the length of a string, with three operands. Operand 0 is the result (of mode M), operand 1 is a ‘mem’ referring to the first character of the string, operand 2 is the character to search for (normally zero), and operand 3 is a constant describing the known alignment of the beginning of the string. ‘rawmemchrM’ Scan memory referred to by operand 1 for the first occurrence of operand 2. Operand 1 is a ‘mem’ and operand 2 a ‘const_int’ of mode M. Operand 0 is the result, i.e., a pointer to the first occurrence of operand 2 in the memory block given by operand 1. ‘floatMN2’ Convert signed integer operand 1 (valid for fixed point mode M) to floating point mode N and store in operand 0 (which has mode N). ‘floatunsMN2’ Convert unsigned integer operand 1 (valid for fixed point mode M) to floating point mode N and store in operand 0 (which has mode N). ‘fixMN2’ Convert operand 1 (valid for floating point mode M) to fixed point mode N as a signed number and store in operand 0 (which has mode N). This instruction's result is defined only when the value of operand 1 is an integer. If the machine description defines this pattern, it also needs to define the ‘ftrunc’ pattern. ‘fixunsMN2’ Convert operand 1 (valid for floating point mode M) to fixed point mode N as an unsigned number and store in operand 0 (which has mode N). This instruction's result is defined only when the value of operand 1 is an integer. ‘ftruncM2’ Convert operand 1 (valid for floating point mode M) to an integer value, still represented in floating point mode M, and store it in operand 0 (valid for floating point mode M). ‘fix_truncMN2’ Like ‘fixMN2’ but works for any floating point value of mode M by converting the value to an integer. ‘fixuns_truncMN2’ Like ‘fixunsMN2’ but works for any floating point value of mode M by converting the value to an integer. ‘truncMN2’ Truncate operand 1 (valid for mode M) to mode N and store in operand 0 (which has mode N). Both modes must be fixed point or both floating point. ‘extendMN2’ Sign-extend operand 1 (valid for mode M) to mode N and store in operand 0 (which has mode N). Both modes must be fixed point or both floating point. ‘zero_extendMN2’ Zero-extend operand 1 (valid for mode M) to mode N and store in operand 0 (which has mode N). Both modes must be fixed point. ‘fractMN2’ Convert operand 1 of mode M to mode N and store in operand 0 (which has mode N). Mode M and mode N could be fixed-point to fixed-point, signed integer to fixed-point, fixed-point to signed integer, floating-point to fixed-point, or fixed-point to floating-point. When overflows or underflows happen, the results are undefined. ‘satfractMN2’ Convert operand 1 of mode M to mode N and store in operand 0 (which has mode N). Mode M and mode N could be fixed-point to fixed-point, signed integer to fixed-point, or floating-point to fixed-point. When overflows or underflows happen, the instruction saturates the results to the maximum or the minimum. ‘fractunsMN2’ Convert operand 1 of mode M to mode N and store in operand 0 (which has mode N). Mode M and mode N could be unsigned integer to fixed-point, or fixed-point to unsigned integer. When overflows or underflows happen, the results are undefined. ‘satfractunsMN2’ Convert unsigned integer operand 1 of mode M to fixed-point mode N and store in operand 0 (which has mode N). When overflows or underflows happen, the instruction saturates the results to the maximum or the minimum. ‘extvM’ Extract a bit-field from register operand 1, sign-extend it, and store it in operand 0. Operand 2 specifies the width of the field in bits and operand 3 the starting bit, which counts from the most significant bit if ‘BITS_BIG_ENDIAN’ is true and from the least significant bit otherwise. Operands 0 and 1 both have mode M. Operands 2 and 3 have a target-specific mode. ‘extvmisalignM’ Extract a bit-field from memory operand 1, sign extend it, and store it in operand 0. Operand 2 specifies the width in bits and operand 3 the starting bit. The starting bit is always somewhere in the first byte of operand 1; it counts from the most significant bit if ‘BITS_BIG_ENDIAN’ is true and from the least significant bit otherwise. Operand 0 has mode M while operand 1 has ‘BLK’ mode. Operands 2 and 3 have a target-specific mode. The instruction must not read beyond the last byte of the bit-field. ‘extzvM’ Like ‘extvM’ except that the bit-field value is zero-extended. ‘extzvmisalignM’ Like ‘extvmisalignM’ except that the bit-field value is zero-extended. ‘insvM’ Insert operand 3 into a bit-field of register operand 0. Operand 1 specifies the width of the field in bits and operand 2 the starting bit, which counts from the most significant bit if ‘BITS_BIG_ENDIAN’ is true and from the least significant bit otherwise. Operands 0 and 3 both have mode M. Operands 1 and 2 have a target-specific mode. ‘insvmisalignM’ Insert operand 3 into a bit-field of memory operand 0. Operand 1 specifies the width of the field in bits and operand 2 the starting bit. The starting bit is always somewhere in the first byte of operand 0; it counts from the most significant bit if ‘BITS_BIG_ENDIAN’ is true and from the least significant bit otherwise. Operand 3 has mode M while operand 0 has ‘BLK’ mode. Operands 1 and 2 have a target-specific mode. The instruction must not read or write beyond the last byte of the bit-field. ‘extv’ Extract a bit-field from operand 1 (a register or memory operand), where operand 2 specifies the width in bits and operand 3 the starting bit, and store it in operand 0. Operand 0 must have mode ‘word_mode’. Operand 1 may have mode ‘byte_mode’ or ‘word_mode’; often ‘word_mode’ is allowed only for registers. Operands 2 and 3 must be valid for ‘word_mode’. The RTL generation pass generates this instruction only with constants for operands 2 and 3 and the constant is never zero for operand 2. The bit-field value is sign-extended to a full word integer before it is stored in operand 0. This pattern is deprecated; please use ‘extvM’ and ‘extvmisalignM’ instead. ‘extzv’ Like ‘extv’ except that the bit-field value is zero-extended. This pattern is deprecated; please use ‘extzvM’ and ‘extzvmisalignM’ instead. ‘insv’ Store operand 3 (which must be valid for ‘word_mode’) into a bit-field in operand 0, where operand 1 specifies the width in bits and operand 2 the starting bit. Operand 0 may have mode ‘byte_mode’ or ‘word_mode’; often ‘word_mode’ is allowed only for registers. Operands 1 and 2 must be valid for ‘word_mode’. The RTL generation pass generates this instruction only with constants for operands 1 and 2 and the constant is never zero for operand 1. This pattern is deprecated; please use ‘insvM’ and ‘insvmisalignM’ instead. ‘movMODEcc’ Conditionally move operand 2 or operand 3 into operand 0 according to the comparison in operand 1. If the comparison is true, operand 2 is moved into operand 0, otherwise operand 3 is moved. The mode of the operands being compared need not be the same as the operands being moved. Some machines, sparc64 for example, have instructions that conditionally move an integer value based on the floating point condition codes and vice versa. If the machine does not have conditional move instructions, do not define these patterns. ‘addMODEcc’ Similar to ‘movMODEcc’ but for conditional addition. Conditionally move operand 2 or (operands 2 + operand 3) into operand 0 according to the comparison in operand 1. If the comparison is false, operand 2 is moved into operand 0, otherwise (operand 2 + operand 3) is moved. ‘cond_negMODE’ ‘cond_one_cmplMODE’ When operand 1 is true, perform an operation on operands 2 and store the result in operand 0, otherwise store operand 3 in operand 0. The operation works elementwise if the operands are vectors. The scalar case is equivalent to: op0 = op1 ? OP op2 : op3; while the vector case is equivalent to: for (i = 0; i < GET_MODE_NUNITS (M); i++) op0[i] = op1[i] ? OP op2[i] : op3[i]; where, for example, OP is ‘~’ for ‘cond_one_cmplMODE’. When defined for floating-point modes, the contents of ‘op2[i]’ are not interpreted if ‘op1[i]’ is false, just like they would not be in a normal C ‘?:’ condition. Operands 0, 2, and 3 all have mode M. Operand 1 is a scalar integer if M is scalar, otherwise it has the mode returned by ‘TARGET_VECTORIZE_GET_MASK_MODE’. ‘cond_OPMODE’ generally corresponds to a conditional form of ‘OPMODE2’. ‘cond_addMODE’ ‘cond_subMODE’ ‘cond_mulMODE’ ‘cond_divMODE’ ‘cond_udivMODE’ ‘cond_modMODE’ ‘cond_umodMODE’ ‘cond_andMODE’ ‘cond_iorMODE’ ‘cond_xorMODE’ ‘cond_sminMODE’ ‘cond_smaxMODE’ ‘cond_uminMODE’ ‘cond_umaxMODE’ ‘cond_copysignMODE’ ‘cond_fminMODE’ ‘cond_fmaxMODE’ ‘cond_ashlMODE’ ‘cond_ashrMODE’ ‘cond_lshrMODE’ When operand 1 is true, perform an operation on operands 2 and 3 and store the result in operand 0, otherwise store operand 4 in operand 0. The operation works elementwise if the operands are vectors. The scalar case is equivalent to: op0 = op1 ? op2 OP op3 : op4; while the vector case is equivalent to: for (i = 0; i < GET_MODE_NUNITS (M); i++) op0[i] = op1[i] ? op2[i] OP op3[i] : op4[i]; where, for example, OP is ‘+’ for ‘cond_addMODE’. When defined for floating-point modes, the contents of ‘op3[i]’ are not interpreted if ‘op1[i]’ is false, just like they would not be in a normal C ‘?:’ condition. Operands 0, 2, 3 and 4 all have mode M. Operand 1 is a scalar integer if M is scalar, otherwise it has the mode returned by ‘TARGET_VECTORIZE_GET_MASK_MODE’. ‘cond_OPMODE’ generally corresponds to a conditional form of ‘OPMODE3’. As an exception, the vector forms of shifts correspond to patterns like ‘vashlMODE3’ rather than patterns like ‘ashlMODE3’. ‘cond_copysignMODE’ is only defined for floating point modes. ‘cond_fmaMODE’ ‘cond_fmsMODE’ ‘cond_fnmaMODE’ ‘cond_fnmsMODE’ Like ‘cond_addM’, except that the conditional operation takes 3 operands rather than two. For example, the vector form of ‘cond_fmaMODE’ is equivalent to: for (i = 0; i < GET_MODE_NUNITS (M); i++) op0[i] = op1[i] ? fma (op2[i], op3[i], op4[i]) : op5[i]; ‘cond_len_negMODE’ ‘cond_len_one_cmplMODE’ When operand 1 is true and element index < operand 4 + operand 5, perform an operation on operands 1 and store the result in operand 0, otherwise store operand 2 in operand 0. The operation only works for the operands are vectors. for (i = 0; i < GET_MODE_NUNITS (M); i++) op0[i] = (i < ops[4] + ops[5] && op1[i] ? OP op2[i] : op3[i]); where, for example, OP is ‘~’ for ‘cond_len_one_cmplMODE’. When defined for floating-point modes, the contents of ‘op2[i]’ are not interpreted if ‘op1[i]’ is false, just like they would not be in a normal C ‘?:’ condition. Operands 0, 2, and 3 all have mode M. Operand 1 is a scalar integer if M is scalar, otherwise it has the mode returned by ‘TARGET_VECTORIZE_GET_MASK_MODE’. Operand 4 has whichever integer mode the target prefers. ‘cond_len_OPMODE’ generally corresponds to a conditional form of ‘OPMODE2’. ‘cond_len_addMODE’ ‘cond_len_subMODE’ ‘cond_len_mulMODE’ ‘cond_len_divMODE’ ‘cond_len_udivMODE’ ‘cond_len_modMODE’ ‘cond_len_umodMODE’ ‘cond_len_andMODE’ ‘cond_len_iorMODE’ ‘cond_len_xorMODE’ ‘cond_len_sminMODE’ ‘cond_len_smaxMODE’ ‘cond_len_uminMODE’ ‘cond_len_umaxMODE’ ‘cond_len_copysignMODE’ ‘cond_len_fminMODE’ ‘cond_len_fmaxMODE’ ‘cond_len_ashlMODE’ ‘cond_len_ashrMODE’ ‘cond_len_lshrMODE’ When operand 1 is true and element index < operand 5 + operand 6, perform an operation on operands 2 and 3 and store the result in operand 0, otherwise store operand 4 in operand 0. The operation only works for the operands are vectors. for (i = 0; i < GET_MODE_NUNITS (M); i++) op0[i] = (i < ops[5] + ops[6] && op1[i] ? op2[i] OP op3[i] : op4[i]); where, for example, OP is ‘+’ for ‘cond_len_addMODE’. When defined for floating-point modes, the contents of ‘op3[i]’ are not interpreted if ‘op1[i]’ is false, just like they would not be in a normal C ‘?:’ condition. Operands 0, 2, 3 and 4 all have mode M. Operand 1 is a scalar integer if M is scalar, otherwise it has the mode returned by ‘TARGET_VECTORIZE_GET_MASK_MODE’. Operand 5 has whichever integer mode the target prefers. ‘cond_len_OPMODE’ generally corresponds to a conditional form of ‘OPMODE3’. As an exception, the vector forms of shifts correspond to patterns like ‘vashlMODE3’ rather than patterns like ‘ashlMODE3’. ‘cond_len_copysignMODE’ is only defined for floating point modes. ‘cond_len_fmaMODE’ ‘cond_len_fmsMODE’ ‘cond_len_fnmaMODE’ ‘cond_len_fnmsMODE’ Like ‘cond_len_addM’, except that the conditional operation takes 3 operands rather than two. For example, the vector form of ‘cond_len_fmaMODE’ is equivalent to: for (i = 0; i < GET_MODE_NUNITS (M); i++) op0[i] = (i < ops[6] + ops[7] && op1[i] ? fma (op2[i], op3[i], op4[i]) : op5[i]); ‘negMODEcc’ Similar to ‘movMODEcc’ but for conditional negation. Conditionally move the negation of operand 2 or the unchanged operand 3 into operand 0 according to the comparison in operand 1. If the comparison is true, the negation of operand 2 is moved into operand 0, otherwise operand 3 is moved. ‘notMODEcc’ Similar to ‘negMODEcc’ but for conditional complement. Conditionally move the bitwise complement of operand 2 or the unchanged operand 3 into operand 0 according to the comparison in operand 1. If the comparison is true, the complement of operand 2 is moved into operand 0, otherwise operand 3 is moved. ‘cstoreMODE4’ Store zero or nonzero in operand 0 according to whether a comparison is true. Operand 1 is a comparison operator. Operand 2 and operand 3 are the first and second operand of the comparison, respectively. You specify the mode that operand 0 must have when you write the ‘match_operand’ expression. The compiler automatically sees which mode you have used and supplies an operand of that mode. The value stored for a true condition must have 1 as its low bit, or else must be negative. Otherwise the instruction is not suitable and you should omit it from the machine description. You describe to the compiler exactly which value is stored by defining the macro ‘STORE_FLAG_VALUE’ (*note Misc::). If a description cannot be found that can be used for all the possible comparison operators, you should pick one and use a ‘define_expand’ to map all results onto the one you chose. These operations may ‘FAIL’, but should do so only in relatively uncommon cases; if they would ‘FAIL’ for common cases involving integer comparisons, it is best to restrict the predicates to not allow these operands. Likewise if a given comparison operator will always fail, independent of the operands (for floating-point modes, the ‘ordered_comparison_operator’ predicate is often useful in this case). If this pattern is omitted, the compiler will generate a conditional branch--for example, it may copy a constant one to the target and branching around an assignment of zero to the target--or a libcall. If the predicate for operand 1 only rejects some operators, it will also try reordering the operands and/or inverting the result value (e.g. by an exclusive OR). These possibilities could be cheaper or equivalent to the instructions used for the ‘cstoreMODE4’ pattern followed by those required to convert a positive result from ‘STORE_FLAG_VALUE’ to 1; in this case, you can and should make operand 1's predicate reject some operators in the ‘cstoreMODE4’ pattern, or remove the pattern altogether from the machine description. ‘tbranch_OPMODE3’ Conditional branch instruction combined with a bit test-and-compare instruction. Operand 0 is the operand of the comparison. Operand 1 is the bit position of Operand 1 to test. Operand 3 is the ‘code_label’ to jump to. OP is one of EQ or NE. ‘cbranchMODE4’ Conditional branch instruction combined with a compare instruction. Operand 0 is a comparison operator. Operand 1 and operand 2 are the first and second operands of the comparison, respectively. Operand 3 is the ‘code_label’ to jump to. ‘jump’ A jump inside a function; an unconditional branch. Operand 0 is the ‘code_label’ to jump to. This pattern name is mandatory on all machines. ‘call’ Subroutine call instruction returning no value. Operand 0 is the function to call; operand 1 is the number of bytes of arguments pushed as a ‘const_int’. Operand 2 is the result of calling the target hook ‘TARGET_FUNCTION_ARG’ with the second argument ‘arg’ yielding true for ‘arg.end_marker_p ()’, in a call after all parameters have been passed to that hook. By default this is the first register beyond those used for arguments in the call, or ‘NULL’ if all the argument-registers are used in the call. On most machines, operand 2 is not actually stored into the RTL pattern. It is supplied for the sake of some RISC machines which need to put this information into the assembler code; they can put it in the RTL instead of operand 1. Operand 0 should be a ‘mem’ RTX whose address is the address of the function. Note, however, that this address can be a ‘symbol_ref’ expression even if it would not be a legitimate memory address on the target machine. If it is also not a valid argument for a call instruction, the pattern for this operation should be a ‘define_expand’ (*note Expander Definitions::) that places the address into a register and uses that register in the call instruction. ‘call_value’ Subroutine call instruction returning a value. Operand 0 is the hard register in which the value is returned. There are three more operands, the same as the three operands of the ‘call’ instruction (but with numbers increased by one). Subroutines that return ‘BLKmode’ objects use the ‘call’ insn. ‘call_pop’, ‘call_value_pop’ Similar to ‘call’ and ‘call_value’, except used if defined and if ‘RETURN_POPS_ARGS’ is nonzero. They should emit a ‘parallel’ that contains both the function call and a ‘set’ to indicate the adjustment made to the frame pointer. For machines where ‘RETURN_POPS_ARGS’ can be nonzero, the use of these patterns increases the number of functions for which the frame pointer can be eliminated, if desired. ‘untyped_call’ Subroutine call instruction returning a value of any type. Operand 0 is the function to call; operand 1 is a memory location where the result of calling the function is to be stored; operand 2 is a ‘parallel’ expression where each element is a ‘set’ expression that indicates the saving of a function return value into the result block. This instruction pattern should be defined to support ‘__builtin_apply’ on machines where special instructions are needed to call a subroutine with arbitrary arguments or to save the value returned. This instruction pattern is required on machines that have multiple registers that can hold a return value (i.e. ‘FUNCTION_VALUE_REGNO_P’ is true for more than one register). ‘return’ Subroutine return instruction. This instruction pattern name should be defined only if a single instruction can do all the work of returning from a function. Like the ‘movM’ patterns, this pattern is also used after the RTL generation phase. In this case it is to support machines where multiple instructions are usually needed to return from a function, but some class of functions only requires one instruction to implement a return. Normally, the applicable functions are those which do not need to save any registers or allocate stack space. It is valid for this pattern to expand to an instruction using ‘simple_return’ if no epilogue is required. ‘simple_return’ Subroutine return instruction. This instruction pattern name should be defined only if a single instruction can do all the work of returning from a function on a path where no epilogue is required. This pattern is very similar to the ‘return’ instruction pattern, but it is emitted only by the shrink-wrapping optimization on paths where the function prologue has not been executed, and a function return should occur without any of the effects of the epilogue. Additional uses may be introduced on paths where both the prologue and the epilogue have executed. For such machines, the condition specified in this pattern should only be true when ‘reload_completed’ is nonzero and the function's epilogue would only be a single instruction. For machines with register windows, the routine ‘leaf_function_p’ may be used to determine if a register window push is required. Machines that have conditional return instructions should define patterns such as (define_insn "" [(set (pc) (if_then_else (match_operator 0 "comparison_operator" [(reg:CC CC_REG) (const_int 0)]) (return) (pc)))] "CONDITION" "...") where CONDITION would normally be the same condition specified on the named ‘return’ pattern. ‘untyped_return’ Untyped subroutine return instruction. This instruction pattern should be defined to support ‘__builtin_return’ on machines where special instructions are needed to return a value of any type. Operand 0 is a memory location where the result of calling a function with ‘__builtin_apply’ is stored; operand 1 is a ‘parallel’ expression where each element is a ‘set’ expression that indicates the restoring of a function return value from the result block. ‘nop’ No-op instruction. This instruction pattern name should always be defined to output a no-op in assembler code. ‘(const_int 0)’ will do as an RTL pattern. ‘indirect_jump’ An instruction to jump to an address which is operand zero. This pattern name is mandatory on all machines. ‘casesi’ Instruction to jump through a dispatch table, including bounds checking. This instruction takes five operands: 1. The index to dispatch on, which has mode ‘SImode’. 2. The lower bound for indices in the table, an integer constant. 3. The total range of indices in the table--the largest index minus the smallest one (both inclusive). 4. A label that precedes the table itself. 5. A label to jump to if the index has a value outside the bounds. The table is an ‘addr_vec’ or ‘addr_diff_vec’ inside of a ‘jump_table_data’. The number of elements in the table is one plus the difference between the upper bound and the lower bound. ‘tablejump’ Instruction to jump to a variable address. This is a low-level capability which can be used to implement a dispatch table when there is no ‘casesi’ pattern. This pattern requires two operands: the address or offset, and a label which should immediately precede the jump table. If the macro ‘CASE_VECTOR_PC_RELATIVE’ evaluates to a nonzero value then the first operand is an offset which counts from the address of the table; otherwise, it is an absolute address to jump to. In either case, the first operand has mode ‘Pmode’. The ‘tablejump’ insn is always the last insn before the jump table it uses. Its assembler code normally has no need to use the second operand, but you should incorporate it in the RTL pattern so that the jump optimizer will not delete the table as unreachable code. ‘doloop_end’ Conditional branch instruction that decrements a register and jumps if the register is nonzero. Operand 0 is the register to decrement and test; operand 1 is the label to jump to if the register is nonzero. *Note Looping Patterns::. This optional instruction pattern should be defined for machines with low-overhead looping instructions as the loop optimizer will try to modify suitable loops to utilize it. The target hook ‘TARGET_CAN_USE_DOLOOP_P’ controls the conditions under which low-overhead loops can be used. ‘doloop_begin’ Companion instruction to ‘doloop_end’ required for machines that need to perform some initialization, such as loading a special counter register. Operand 1 is the associated ‘doloop_end’ pattern and operand 0 is the register that it decrements. If initialization insns do not always need to be emitted, use a ‘define_expand’ (*note Expander Definitions::) and make it fail. ‘canonicalize_funcptr_for_compare’ Canonicalize the function pointer in operand 1 and store the result into operand 0. Operand 0 is always a ‘reg’ and has mode ‘Pmode’; operand 1 may be a ‘reg’, ‘mem’, ‘symbol_ref’, ‘const_int’, etc and also has mode ‘Pmode’. Canonicalization of a function pointer usually involves computing the address of the function which would be called if the function pointer were used in an indirect call. Only define this pattern if function pointers on the target machine can have different values but still call the same function when used in an indirect call. ‘save_stack_block’ ‘save_stack_function’ ‘save_stack_nonlocal’ ‘restore_stack_block’ ‘restore_stack_function’ ‘restore_stack_nonlocal’ Most machines save and restore the stack pointer by copying it to or from an object of mode ‘Pmode’. Do not define these patterns on such machines. Some machines require special handling for stack pointer saves and restores. On those machines, define the patterns corresponding to the non-standard cases by using a ‘define_expand’ (*note Expander Definitions::) that produces the required insns. The three types of saves and restores are: 1. ‘save_stack_block’ saves the stack pointer at the start of a block that allocates a variable-sized object, and ‘restore_stack_block’ restores the stack pointer when the block is exited. 2. ‘save_stack_function’ and ‘restore_stack_function’ do a similar job for the outermost block of a function and are used when the function allocates variable-sized objects or calls ‘alloca’. Only the epilogue uses the restored stack pointer, allowing a simpler save or restore sequence on some machines. 3. ‘save_stack_nonlocal’ is used in functions that contain labels branched to by nested functions. It saves the stack pointer in such a way that the inner function can use ‘restore_stack_nonlocal’ to restore the stack pointer. The compiler generates code to restore the frame and argument pointer registers, but some machines require saving and restoring additional data such as register window information or stack backchains. Place insns in these patterns to save and restore any such required data. When saving the stack pointer, operand 0 is the save area and operand 1 is the stack pointer. The mode used to allocate the save area defaults to ‘Pmode’ but you can override that choice by defining the ‘STACK_SAVEAREA_MODE’ macro (*note Storage Layout::). You must specify an integral mode, or ‘VOIDmode’ if no save area is needed for a particular type of save (either because no save is needed or because a machine-specific save area can be used). Operand 0 is the stack pointer and operand 1 is the save area for restore operations. If ‘save_stack_block’ is defined, operand 0 must not be ‘VOIDmode’ since these saves can be arbitrarily nested. A save area is a ‘mem’ that is at a constant offset from ‘virtual_stack_vars_rtx’ when the stack pointer is saved for use by nonlocal gotos and a ‘reg’ in the other two cases. ‘allocate_stack’ Subtract (or add if ‘STACK_GROWS_DOWNWARD’ is undefined) operand 1 from the stack pointer to create space for dynamically allocated data. Store the resultant pointer to this space into operand 0. If you are allocating space from the main stack, do this by emitting a move insn to copy ‘virtual_stack_dynamic_rtx’ to operand 0. If you are allocating the space elsewhere, generate code to copy the location of the space to operand 0. In the latter case, you must ensure this space gets freed when the corresponding space on the main stack is free. Do not define this pattern if all that must be done is the subtraction. Some machines require other operations such as stack probes or maintaining the back chain. Define this pattern to emit those operations in addition to updating the stack pointer. ‘check_stack’ If stack checking (*note Stack Checking::) cannot be done on your system by probing the stack, define this pattern to perform the needed check and signal an error if the stack has overflowed. The single operand is the address in the stack farthest from the current stack pointer that you need to validate. Normally, on platforms where this pattern is needed, you would obtain the stack limit from a global or thread-specific variable or register. ‘probe_stack_address’ If stack checking (*note Stack Checking::) can be done on your system by probing the stack but without the need to actually access it, define this pattern and signal an error if the stack has overflowed. The single operand is the memory address in the stack that needs to be probed. ‘probe_stack’ If stack checking (*note Stack Checking::) can be done on your system by probing the stack but doing it with a "store zero" instruction is not valid or optimal, define this pattern to do the probing differently and signal an error if the stack has overflowed. The single operand is the memory reference in the stack that needs to be probed. ‘nonlocal_goto’ Emit code to generate a non-local goto, e.g., a jump from one function to a label in an outer function. This pattern has four arguments, each representing a value to be used in the jump. The first argument is to be loaded into the frame pointer, the second is the address to branch to (code to dispatch to the actual label), the third is the address of a location where the stack is saved, and the last is the address of the label, to be placed in the location for the incoming static chain. On most machines you need not define this pattern, since GCC will already generate the correct code, which is to load the frame pointer and static chain, restore the stack (using the ‘restore_stack_nonlocal’ pattern, if defined), and jump indirectly to the dispatcher. You need only define this pattern if this code will not work on your machine. ‘nonlocal_goto_receiver’ This pattern, if defined, contains code needed at the target of a nonlocal goto after the code already generated by GCC. You will not normally need to define this pattern. A typical reason why you might need this pattern is if some value, such as a pointer to a global table, must be restored when the frame pointer is restored. Note that a nonlocal goto only occurs within a unit-of-translation, so a global table pointer that is shared by all functions of a given module need not be restored. There are no arguments. ‘exception_receiver’ This pattern, if defined, contains code needed at the site of an exception handler that isn't needed at the site of a nonlocal goto. You will not normally need to define this pattern. A typical reason why you might need this pattern is if some value, such as a pointer to a global table, must be restored after control flow is branched to the handler of an exception. There are no arguments. ‘builtin_setjmp_setup’ This pattern, if defined, contains additional code needed to initialize the ‘jmp_buf’. You will not normally need to define this pattern. A typical reason why you might need this pattern is if some value, such as a pointer to a global table, must be restored. Though it is preferred that the pointer value be recalculated if possible (given the address of a label for instance). The single argument is a pointer to the ‘jmp_buf’. Note that the buffer is five words long and that the first three are normally used by the generic mechanism. ‘builtin_setjmp_receiver’ This pattern, if defined, contains code needed at the site of a built-in setjmp that isn't needed at the site of a nonlocal goto. You will not normally need to define this pattern. A typical reason why you might need this pattern is if some value, such as a pointer to a global table, must be restored. It takes one argument, which is the label to which builtin_longjmp transferred control; this pattern may be emitted at a small offset from that label. ‘builtin_longjmp’ This pattern, if defined, performs the entire action of the longjmp. You will not normally need to define this pattern unless you also define ‘builtin_setjmp_setup’. The single argument is a pointer to the ‘jmp_buf’. ‘eh_return’ This pattern, if defined, affects the way ‘__builtin_eh_return’, and thence the call frame exception handling library routines, are built. It is intended to handle non-trivial actions needed along the abnormal return path. The address of the exception handler to which the function should return is passed as operand to this pattern. It will normally need to copied by the pattern to some special register or memory location. If the pattern needs to determine the location of the target call frame in order to do so, it may use ‘EH_RETURN_STACKADJ_RTX’, if defined; it will have already been assigned. If this pattern is not defined, the default action will be to simply copy the return address to ‘EH_RETURN_HANDLER_RTX’. Either that macro or this pattern needs to be defined if call frame exception handling is to be used. ‘prologue’ This pattern, if defined, emits RTL for entry to a function. The function entry is responsible for setting up the stack frame, initializing the frame pointer register, saving callee saved registers, etc. Using a prologue pattern is generally preferred over defining ‘TARGET_ASM_FUNCTION_PROLOGUE’ to emit assembly code for the prologue. The ‘prologue’ pattern is particularly useful for targets which perform instruction scheduling. ‘window_save’ This pattern, if defined, emits RTL for a register window save. It should be defined if the target machine has register windows but the window events are decoupled from calls to subroutines. The canonical example is the SPARC architecture. ‘epilogue’ This pattern emits RTL for exit from a function. The function exit is responsible for deallocating the stack frame, restoring callee saved registers and emitting the return instruction. Using an epilogue pattern is generally preferred over defining ‘TARGET_ASM_FUNCTION_EPILOGUE’ to emit assembly code for the epilogue. The ‘epilogue’ pattern is particularly useful for targets which perform instruction scheduling or which have delay slots for their return instruction. ‘sibcall_epilogue’ This pattern, if defined, emits RTL for exit from a function without the final branch back to the calling function. This pattern will be emitted before any sibling call (aka tail call) sites. The ‘sibcall_epilogue’ pattern must not clobber any arguments used for parameter passing or any stack slots for arguments passed to the current function. ‘trap’ This pattern, if defined, signals an error, typically by causing some kind of signal to be raised. ‘ctrapMM4’ Conditional trap instruction. Operand 0 is a piece of RTL which performs a comparison, and operands 1 and 2 are the arms of the comparison. Operand 3 is the trap code, an integer. A typical ‘ctrap’ pattern looks like (define_insn "ctrapsi4" [(trap_if (match_operator 0 "trap_operator" [(match_operand 1 "register_operand") (match_operand 2 "immediate_operand")]) (match_operand 3 "const_int_operand" "i"))] "" "...") ‘prefetch’ This pattern, if defined, emits code for a non-faulting data prefetch instruction. Operand 0 is the address of the memory to prefetch. Operand 1 is a constant 1 if the prefetch is preparing for a write to the memory address, or a constant 0 otherwise. Operand 2 is the expected degree of temporal locality of the data and is a value between 0 and 3, inclusive; 0 means that the data has no temporal locality, so it need not be left in the cache after the access; 3 means that the data has a high degree of temporal locality and should be left in all levels of cache possible; 1 and 2 mean, respectively, a low or moderate degree of temporal locality. Targets that do not support write prefetches or locality hints can ignore the values of operands 1 and 2. ‘blockage’ This pattern defines a pseudo insn that prevents the instruction scheduler and other passes from moving instructions and using register equivalences across the boundary defined by the blockage insn. This needs to be an UNSPEC_VOLATILE pattern or a volatile ASM. ‘memory_blockage’ This pattern, if defined, represents a compiler memory barrier, and will be placed at points across which RTL passes may not propagate memory accesses. This instruction needs to read and write volatile BLKmode memory. It does not need to generate any machine instruction. If this pattern is not defined, the compiler falls back to emitting an instruction corresponding to ‘asm volatile ("" ::: "memory")’. ‘memory_barrier’ If the target memory model is not fully synchronous, then this pattern should be defined to an instruction that orders both loads and stores before the instruction with respect to loads and stores after the instruction. This pattern has no operands. ‘speculation_barrier’ If the target can support speculative execution, then this pattern should be defined to an instruction that will block subsequent execution until any prior speculation conditions has been resolved. The pattern must also ensure that the compiler cannot move memory operations past the barrier, so it needs to be an UNSPEC_VOLATILE pattern. The pattern has no operands. If this pattern is not defined then the default expansion of ‘__builtin_speculation_safe_value’ will emit a warning. You can suppress this warning by defining this pattern with a final condition of ‘0’ (zero), which tells the compiler that a speculation barrier is not needed for this target. ‘sync_compare_and_swapMODE’ This pattern, if defined, emits code for an atomic compare-and-swap operation. Operand 1 is the memory on which the atomic operation is performed. Operand 2 is the "old" value to be compared against the current contents of the memory location. Operand 3 is the "new" value to store in the memory if the compare succeeds. Operand 0 is the result of the operation; it should contain the contents of the memory before the operation. If the compare succeeds, this should obviously be a copy of operand 2. This pattern must show that both operand 0 and operand 1 are modified. This pattern must issue any memory barrier instructions such that all memory operations before the atomic operation occur before the atomic operation and all memory operations after the atomic operation occur after the atomic operation. For targets where the success or failure of the compare-and-swap operation is available via the status flags, it is possible to avoid a separate compare operation and issue the subsequent branch or store-flag operation immediately after the compare-and-swap. To this end, GCC will look for a ‘MODE_CC’ set in the output of ‘sync_compare_and_swapMODE’; if the machine description includes such a set, the target should also define special ‘cbranchcc4’ and/or ‘cstorecc4’ instructions. GCC will then be able to take the destination of the ‘MODE_CC’ set and pass it to the ‘cbranchcc4’ or ‘cstorecc4’ pattern as the first operand of the comparison (the second will be ‘(const_int 0)’). For targets where the operating system may provide support for this operation via library calls, the ‘sync_compare_and_swap_optab’ may be initialized to a function with the same interface as the ‘__sync_val_compare_and_swap_N’ built-in. If the entire set of __SYNC builtins are supported via library calls, the target can initialize all of the optabs at once with ‘init_sync_libfuncs’. For the purposes of C++11 ‘std::atomic::is_lock_free’, it is assumed that these library calls do _not_ use any kind of interruptable locking. ‘sync_addMODE’, ‘sync_subMODE’ ‘sync_iorMODE’, ‘sync_andMODE’ ‘sync_xorMODE’, ‘sync_nandMODE’ These patterns emit code for an atomic operation on memory. Operand 0 is the memory on which the atomic operation is performed. Operand 1 is the second operand to the binary operator. This pattern must issue any memory barrier instructions such that all memory operations before the atomic operation occur before the atomic operation and all memory operations after the atomic operation occur after the atomic operation. If these patterns are not defined, the operation will be constructed from a compare-and-swap operation, if defined. ‘sync_old_addMODE’, ‘sync_old_subMODE’ ‘sync_old_iorMODE’, ‘sync_old_andMODE’ ‘sync_old_xorMODE’, ‘sync_old_nandMODE’ These patterns emit code for an atomic operation on memory, and return the value that the memory contained before the operation. Operand 0 is the result value, operand 1 is the memory on which the atomic operation is performed, and operand 2 is the second operand to the binary operator. This pattern must issue any memory barrier instructions such that all memory operations before the atomic operation occur before the atomic operation and all memory operations after the atomic operation occur after the atomic operation. If these patterns are not defined, the operation will be constructed from a compare-and-swap operation, if defined. ‘sync_new_addMODE’, ‘sync_new_subMODE’ ‘sync_new_iorMODE’, ‘sync_new_andMODE’ ‘sync_new_xorMODE’, ‘sync_new_nandMODE’ These patterns are like their ‘sync_old_OP’ counterparts, except that they return the value that exists in the memory location after the operation, rather than before the operation. ‘sync_lock_test_and_setMODE’ This pattern takes two forms, based on the capabilities of the target. In either case, operand 0 is the result of the operand, operand 1 is the memory on which the atomic operation is performed, and operand 2 is the value to set in the lock. In the ideal case, this operation is an atomic exchange operation, in which the previous value in memory operand is copied into the result operand, and the value operand is stored in the memory operand. For less capable targets, any value operand that is not the constant 1 should be rejected with ‘FAIL’. In this case the target may use an atomic test-and-set bit operation. The result operand should contain 1 if the bit was previously set and 0 if the bit was previously clear. The true contents of the memory operand are implementation defined. This pattern must issue any memory barrier instructions such that the pattern as a whole acts as an acquire barrier, that is all memory operations after the pattern do not occur until the lock is acquired. If this pattern is not defined, the operation will be constructed from a compare-and-swap operation, if defined. ‘sync_lock_releaseMODE’ This pattern, if defined, releases a lock set by ‘sync_lock_test_and_setMODE’. Operand 0 is the memory that contains the lock; operand 1 is the value to store in the lock. If the target doesn't implement full semantics for ‘sync_lock_test_and_setMODE’, any value operand which is not the constant 0 should be rejected with ‘FAIL’, and the true contents of the memory operand are implementation defined. This pattern must issue any memory barrier instructions such that the pattern as a whole acts as a release barrier, that is the lock is released only after all previous memory operations have completed. If this pattern is not defined, then a ‘memory_barrier’ pattern will be emitted, followed by a store of the value to the memory operand. ‘atomic_compare_and_swapMODE’ This pattern, if defined, emits code for an atomic compare-and-swap operation with memory model semantics. Operand 2 is the memory on which the atomic operation is performed. Operand 0 is an output operand which is set to true or false based on whether the operation succeeded. Operand 1 is an output operand which is set to the contents of the memory before the operation was attempted. Operand 3 is the value that is expected to be in memory. Operand 4 is the value to put in memory if the expected value is found there. Operand 5 is set to 1 if this compare and swap is to be treated as a weak operation. Operand 6 is the memory model to be used if the operation is a success. Operand 7 is the memory model to be used if the operation fails. If memory referred to in operand 2 contains the value in operand 3, then operand 4 is stored in memory pointed to by operand 2 and fencing based on the memory model in operand 6 is issued. If memory referred to in operand 2 does not contain the value in operand 3, then fencing based on the memory model in operand 7 is issued. If a target does not support weak compare-and-swap operations, or the port elects not to implement weak operations, the argument in operand 5 can be ignored. Note a strong implementation must be provided. If this pattern is not provided, the ‘__atomic_compare_exchange’ built-in functions will utilize the legacy ‘sync_compare_and_swap’ pattern with an ‘__ATOMIC_SEQ_CST’ memory model. ‘atomic_loadMODE’ This pattern implements an atomic load operation with memory model semantics. Operand 1 is the memory address being loaded from. Operand 0 is the result of the load. Operand 2 is the memory model to be used for the load operation. If not present, the ‘__atomic_load’ built-in function will either resort to a normal load with memory barriers, or a compare-and-swap operation if a normal load would not be atomic. ‘atomic_storeMODE’ This pattern implements an atomic store operation with memory model semantics. Operand 0 is the memory address being stored to. Operand 1 is the value to be written. Operand 2 is the memory model to be used for the operation. If not present, the ‘__atomic_store’ built-in function will attempt to perform a normal store and surround it with any required memory fences. If the store would not be atomic, then an ‘__atomic_exchange’ is attempted with the result being ignored. ‘atomic_exchangeMODE’ This pattern implements an atomic exchange operation with memory model semantics. Operand 1 is the memory location the operation is performed on. Operand 0 is an output operand which is set to the original value contained in the memory pointed to by operand 1. Operand 2 is the value to be stored. Operand 3 is the memory model to be used. If this pattern is not present, the built-in function ‘__atomic_exchange’ will attempt to preform the operation with a compare and swap loop. ‘atomic_addMODE’, ‘atomic_subMODE’ ‘atomic_orMODE’, ‘atomic_andMODE’ ‘atomic_xorMODE’, ‘atomic_nandMODE’ These patterns emit code for an atomic operation on memory with memory model semantics. Operand 0 is the memory on which the atomic operation is performed. Operand 1 is the second operand to the binary operator. Operand 2 is the memory model to be used by the operation. If these patterns are not defined, attempts will be made to use legacy ‘sync’ patterns, or equivalent patterns which return a result. If none of these are available a compare-and-swap loop will be used. ‘atomic_fetch_addMODE’, ‘atomic_fetch_subMODE’ ‘atomic_fetch_orMODE’, ‘atomic_fetch_andMODE’ ‘atomic_fetch_xorMODE’, ‘atomic_fetch_nandMODE’ These patterns emit code for an atomic operation on memory with memory model semantics, and return the original value. Operand 0 is an output operand which contains the value of the memory location before the operation was performed. Operand 1 is the memory on which the atomic operation is performed. Operand 2 is the second operand to the binary operator. Operand 3 is the memory model to be used by the operation. If these patterns are not defined, attempts will be made to use legacy ‘sync’ patterns. If none of these are available a compare-and-swap loop will be used. ‘atomic_add_fetchMODE’, ‘atomic_sub_fetchMODE’ ‘atomic_or_fetchMODE’, ‘atomic_and_fetchMODE’ ‘atomic_xor_fetchMODE’, ‘atomic_nand_fetchMODE’ These patterns emit code for an atomic operation on memory with memory model semantics and return the result after the operation is performed. Operand 0 is an output operand which contains the value after the operation. Operand 1 is the memory on which the atomic operation is performed. Operand 2 is the second operand to the binary operator. Operand 3 is the memory model to be used by the operation. If these patterns are not defined, attempts will be made to use legacy ‘sync’ patterns, or equivalent patterns which return the result before the operation followed by the arithmetic operation required to produce the result. If none of these are available a compare-and-swap loop will be used. ‘atomic_test_and_set’ This pattern emits code for ‘__builtin_atomic_test_and_set’. Operand 0 is an output operand which is set to true if the previous previous contents of the byte was "set", and false otherwise. Operand 1 is the ‘QImode’ memory to be modified. Operand 2 is the memory model to be used. The specific value that defines "set" is implementation defined, and is normally based on what is performed by the native atomic test and set instruction. ‘atomic_bit_test_and_setMODE’ ‘atomic_bit_test_and_complementMODE’ ‘atomic_bit_test_and_resetMODE’ These patterns emit code for an atomic bitwise operation on memory with memory model semantics, and return the original value of the specified bit. Operand 0 is an output operand which contains the value of the specified bit from the memory location before the operation was performed. Operand 1 is the memory on which the atomic operation is performed. Operand 2 is the bit within the operand, starting with least significant bit. Operand 3 is the memory model to be used by the operation. Operand 4 is a flag - it is ‘const1_rtx’ if operand 0 should contain the original value of the specified bit in the least significant bit of the operand, and ‘const0_rtx’ if the bit should be in its original position in the operand. ‘atomic_bit_test_and_setMODE’ atomically sets the specified bit after remembering its original value, ‘atomic_bit_test_and_complementMODE’ inverts the specified bit and ‘atomic_bit_test_and_resetMODE’ clears the specified bit. If these patterns are not defined, attempts will be made to use ‘atomic_fetch_orMODE’, ‘atomic_fetch_xorMODE’ or ‘atomic_fetch_andMODE’ instruction patterns, or their ‘sync’ counterparts. If none of these are available a compare-and-swap loop will be used. ‘atomic_add_fetch_cmp_0MODE’ ‘atomic_sub_fetch_cmp_0MODE’ ‘atomic_and_fetch_cmp_0MODE’ ‘atomic_or_fetch_cmp_0MODE’ ‘atomic_xor_fetch_cmp_0MODE’ These patterns emit code for an atomic operation on memory with memory model semantics if the fetch result is used only in a comparison against zero. Operand 0 is an output operand which contains a boolean result of comparison of the value after the operation against zero. Operand 1 is the memory on which the atomic operation is performed. Operand 2 is the second operand to the binary operator. Operand 3 is the memory model to be used by the operation. Operand 4 is an integer holding the comparison code, one of ‘EQ’, ‘NE’, ‘LT’, ‘GT’, ‘LE’ or ‘GE’. If these patterns are not defined, attempts will be made to use separate atomic operation and fetch pattern followed by comparison of the result against zero. ‘mem_thread_fence’ This pattern emits code required to implement a thread fence with memory model semantics. Operand 0 is the memory model to be used. For the ‘__ATOMIC_RELAXED’ model no instructions need to be issued and this expansion is not invoked. The compiler always emits a compiler memory barrier regardless of what expanding this pattern produced. If this pattern is not defined, the compiler falls back to expanding the ‘memory_barrier’ pattern, then to emitting ‘__sync_synchronize’ library call, and finally to just placing a compiler memory barrier. ‘get_thread_pointerMODE’ ‘set_thread_pointerMODE’ These patterns emit code that reads/sets the TLS thread pointer. Currently, these are only needed if the target needs to support the ‘__builtin_thread_pointer’ and ‘__builtin_set_thread_pointer’ builtins. The get/set patterns have a single output/input operand respectively, with MODE intended to be ‘Pmode’. ‘stack_protect_combined_set’ This pattern, if defined, moves a ‘ptr_mode’ value from an address whose declaration RTX is given in operand 1 to the memory in operand 0 without leaving the value in a register afterward. If several instructions are needed by the target to perform the operation (eg. to load the address from a GOT entry then load the ‘ptr_mode’ value and finally store it), it is the backend's responsibility to ensure no intermediate result gets spilled. This is to avoid leaking the value some place that an attacker might use to rewrite the stack guard slot after having clobbered it. If this pattern is not defined, then the address declaration is expanded first in the standard way and a ‘stack_protect_set’ pattern is then generated to move the value from that address to the address in operand 0. ‘stack_protect_set’ This pattern, if defined, moves a ‘ptr_mode’ value from the valid memory location in operand 1 to the memory in operand 0 without leaving the value in a register afterward. This is to avoid leaking the value some place that an attacker might use to rewrite the stack guard slot after having clobbered it. Note: on targets where the addressing modes do not allow to load directly from stack guard address, the address is expanded in a standard way first which could cause some spills. If this pattern is not defined, then a plain move pattern is generated. ‘stack_protect_combined_test’ This pattern, if defined, compares a ‘ptr_mode’ value from an address whose declaration RTX is given in operand 1 with the memory in operand 0 without leaving the value in a register afterward and branches to operand 2 if the values were equal. If several instructions are needed by the target to perform the operation (eg. to load the address from a GOT entry then load the ‘ptr_mode’ value and finally store it), it is the backend's responsibility to ensure no intermediate result gets spilled. This is to avoid leaking the value some place that an attacker might use to rewrite the stack guard slot after having clobbered it. If this pattern is not defined, then the address declaration is expanded first in the standard way and a ‘stack_protect_test’ pattern is then generated to compare the value from that address to the value at the memory in operand 0. ‘stack_protect_test’ This pattern, if defined, compares a ‘ptr_mode’ value from the valid memory location in operand 1 with the memory in operand 0 without leaving the value in a register afterward and branches to operand 2 if the values were equal. If this pattern is not defined, then a plain compare pattern and conditional branch pattern is used. ‘clear_cache’ This pattern, if defined, flushes the instruction cache for a region of memory. The region is bounded to by the Pmode pointers in operand 0 inclusive and operand 1 exclusive. If this pattern is not defined, a call to the library function ‘__clear_cache’ is used. ‘spaceshipM3’ Initialize output operand 0 with mode of integer type to -1, 0, 1 or 2 if operand 1 with mode M compares less than operand 2, equal to operand 2, greater than operand 2 or is unordered with operand 2. M should be a scalar floating point mode. This pattern is not allowed to ‘FAIL’.