摘要:
A method and apparatus for determining a byte select vector for a crossbar shifter include processing that begins by storing data in a first set of byte locations and in a second set of byte locations. Typically, a data operand is written into the first and a shift value is written into the second set of byte locations. The processing continues by obtaining a shift amount value for the data. The processing then continues by determining, for each byte multiplexor of a set of byte multiplexors associated with a corresponding output byte, whether a wrapped condition will occur based on the shift amount for the data. When the wrap condition occurs, a wrap shift amount is determined based on a mode of shifting. The processing then continues by generating a byte select vector for the set of byte multiplexors based on the wrap shift amount and the shift amount. The byte select vector includes a first nibble that is associated with a first one of the byte multiplexors and a second nipple that is associated with a second one of the byte multiplexors.
摘要:
A method of operation within an integrated-circuit processing device having an enhanced combined-arithmetic capability. In response to an instruction indicating a combined arithmetic operation, the processor generates a dot-product of first and second operands, adds the dot-product to an accumulated value, and then outputs the sum of the accumulated value and the dot-product.
摘要:
Method of developing a model of a circuit design including the steps of generating four different path-tracing runs, creating four arcs from the four different path-tracing runs, and combining the four arcs into two separate models. Also, a method of adjusting timing of a clock signal provided to a first block and a second block where data signals travel via a first path from the first block to the second block and data signals travel via a second path from the second block to the first block and the time for the data signals to travel the first path is greater than the time for the data signals to travel the second path. The clock signal provided to the second block relative to the clock signal provided to the first block is delayed by an amount that is a function of the difference between the time for the data signals to travel the first path and the time for the data signals to travel the second path.
摘要:
A method and apparatus for transposing bits include processing that begins by receiving a multiple bit input. The multiple bit input may be received from memory for executing a read operation from a processing device or for a write operation to memory. The processing continues by determining whether a transposed bit function is enabled. When the transposed bit function is enabled, a set of tri-state transposed drivers are enabled to couple out bit lines to the multiple bit input in a transposed fashion. In addition, a set of tri-state non-transposed drivers are disabled such that they are not coupled to the output bit lines. When the transposed bit function is not enabled, the non-transposed drivers are enabled and the tri-state transposed drivers are disabled such that the multiple bit input, when coupled to the output bit lines, is not transposed.
摘要:
Disclosed are methods and systems for dynamically determining data-transfer paths. The data-transfer pats are determined in response to an instruction that facilitates data transfer among execution lanes in an integrated-circuit processing device operable to execute operations in parallel.
摘要:
A method of operation within an integrated-circuit processing device having a plurality of execution lanes. Upon receiving an instruction to exchange data between the execution lanes, respective requests from the execution lanes are examined to determine a set of the execution lanes that may send data to one or more others of the execution lanes during a first interval. Each execution lane within the set of the execution lanes is signaled to indicate that the execution lane may send data to the one or others of the execution lanes.
摘要:
A method and apparatus for arithmetic shifting includes processing that begins by receiving a decoded instruction in a cycle of a pipeline process. Also during this cycle of the pipeline process, shift information and a data operand are determined based on the decoded instruction. In a subsequent cycle of the pipeline process, a data output is generated from the data operand based on the shift information using a crossbar shifting function.