Abstract:
Fusing immediate value, write-based instructions in instruction processing circuits, and related processor systems, methods, and computer-readable media are disclosed. In one embodiment, a first instruction indicating an operation writing an immediate value to a register is detected by an instruction processing circuit. The circuit also detects at least one subsequent instruction indicating an operation that overwrites at least one first portion of the register while maintaining a value of a second portion of the register. The at least one subsequent instruction is converted (or replaced) with a fused instruction(s), which indicates an operation writing the at least one first portion and the second portion of the register. In this manner, conversion of multiple instructions for generating a constant into the fused instruction(s) removes the potential for a read-after-write hazard and associated consequences caused by dependencies between certain instructions, while reducing a number of clock cycles required to process the instructions.
Abstract:
A method includes selectively coupling a first address line of a plurality of address lines and a second address line of the plurality of address lines to a first element bank of a plurality of element banks of a vector register file according to a selection pattern. The method also includes accessing data stored within the first element bank that is selectively addressed by the first address line via a single read port.
Abstract:
Issuing instructions to execution pipelines based on register-associated preferences and related instruction processing circuits, systems, methods, and computer-readable media are disclosed. In one embodiment, an instruction is detected in an instruction stream. Upon determining that the instruction specifies at least one source register, an execution pipeline preference(s) is determined based on at least one pipeline indicator associated with the at least one source register in a pipeline issuance table, and the instruction is issued to an execution pipeline based on the execution pipeline preference(s). Upon a determination that the instruction specifies at least one target register, at least one pipeline indicator associated with the at least one target register in the pipeline issuance table is updated based on the execution pipeline to which the instruction is issued. In this manner, optimal forwarding of instructions may be facilitated, thus improving processor performance.
Abstract:
A method of determining whether a particular pixel of an image is a feature includes receiving data corresponding to a plurality of pixels (from the image) surrounding the particular pixel. The method further includes determining a set of comparison results, each corresponding to one of the plurality of pixels and indicating a result of comparing an attribute value corresponding to one of the plurality of pixels to a comparison value (based on a particular attribute value of the particular pixel and a threshold value). The method further includes performing a processor-executable instruction that, when executed by a processor, causes the processor to identify a subset of the set of comparison results that indicate the particular pixel is the feature. The identified subset may be a consecutive order of pixels of the plurality of pixels.
Abstract:
Systems and methods for allocation of cache lines in a shared partitioned cache (104) of a multi-threaded processor (102). A memory management unit (110) is configured to determine attributes associated with an address for a cache entry associated with a processing thread (T0) to be allocated in the cache. A configuration register (CP 300_0) is configured to store cache allocation information based on the determined attributes. A partitioning register (DP 310) is configured to store partitioning information for partitioning the cache into two or more portions (Main/Aux in FIG. 3). The cache entry is allocated into one of the portions of the cache based on the configuration register and the partitioning register.
Abstract:
A set of even interpolated sub-pixels is formed based on a pixel window and a tap coefficient register having a tap coefficient set, the pixel window is shifted and, applying the tap coefficient register a set of odd interpolated pixels is formed. The set of even interpolated sub-pixels and the set of odd interpolated sub-pixels are accumulated, repeatedly, until a termination condition is let. In the accumulating, the tap coefficient register is updated with another tap coefficient set, the pixel window is shifted, and the even interpolated pixels are incremented, the pixel window is then shifted again and the odd interpolated pixels are incremented.
Abstract:
Systems and methods for generating pulse clocks with programmable edges and pulse widths configured for varying requirements of different memory access operations. A pulse clock generation circuit (100) includes a selective delay logic (102) to provide a programmable rising edge delay of the pulse clock (114), a selective pulse width widening logic (110) to provide a programmable pulse width of the pulse clock, and a built-in level shifter for shifting a voltage level of the pulse clock. A rising edge delay for a read operation is programmed to correspond to an expected read array access delay, and the pulse width for a write operation is programmed to be wider than the pulse width for a read operation.
Abstract:
Provided are a floating-point adder and methods for implementing a floating-point adder with operand shifting based on a predicted exponent difference when performing an effective subtraction on normal or subnormal numbers. In an aspect, two least significant bits (LSBs) of a first floating-point operand's exponent are compared with two LSBs of a second floating-point operand's exponent to estimate a difference between the two exponents. A first shift of up to one of the first and the second operands is performed, based on the estimated difference. A prospective result is then produced by subtracting the first operand and the second operand. Contemporaneously, one of the first operand's exponent and the second operand's exponent is subtracted from the other of the first operand's exponent and the second operand's exponent to determine if the exponents actually differ by one or less. If the first operand's exponent and the second operand's exponent differ by one or less, the prospective result is provided as the raw difference of the operands.
Abstract:
Systems and methods for prefetching cache lines into a cache coupled to a processor. A hardware prefetcher is configured to recognize a memory access instruction as an autoincrement-address (AIA) memory access instruction, infer a stride value from an increment field of the AIA instruction, and prefetch lines into the cache based on the stride value. Additionally or alternatively, the hardware prefetcher is configured to recognize that prefetched cache lines are part of a hardware loop, determine a maximum loop count of the hardware loop, and a remaining loop count as a difference between the maximum loop count and a number of loop iterations that have been completed, select a number of cache lines to prefetch, and truncate an actual number of cache lines to prefetch to be less than or equal to the remaining loop count, when the remaining loop count is less than the selected number of cache lines.
Abstract:
A processor reset control circuit is configured to automatically capture a prereset value of processor information stored in one or more hardware registers, as part of a reset operation state machine and prior to changing the processor information to its architecturally required post reset value. Such pre-reset processor information includes, for example one or more pre-reset values of the processor program counter (PC) and one or more pre-reset values of an operating-state mode register, both of which may be captured in one or more pre-reset capture storage devices which are then made available for evaluation purposes. Such pre-reset capture storage devices store pre-reset information in response to the reset and maintain the stored pre-reset information until another reset occurs.