摘要:
Determining an effective address of a memory with a three-operand add operation in single execution cycle of a multithreaded processor that can access both segmented memory and non-segmented memory. During that cycle, the processor determines whether a memory segment base is zero. If the segment base is zero, the processor can access a memory location at the effective address without adding the segment base. If the segment base is not zero, such as when executing legacy code, the processor consumes another cycle to add the segment base to the effective address. Similarly, the processor consumes another cycle if the effective address or the linear address is misaligned. An integer execution unit that performs the three-operand add using a carry-save adder coupled to a carry look-ahead adder. If the segment base is not zero, the effective address is fed back through the integer execution unit to add the segment base.
摘要:
An apparatus and method to support pipelining of variable-latency instructions in a multithreaded processor. In one embodiment, a processor may include instruction fetch logic configured to issue a first and a second instruction from different ones of a plurality of threads during successive cycles. The processor may also include first and second execution units respectively configured to execute shorter-latency and longer-latency instructions and to respectively write shorter-latency or longer-latency instruction results to a result write port during a first or second writeback stage. The first writeback stage may occur a fewer number of cycles after instruction issue than the second writeback stage. The instruction fetch logic may be further configured to guarantee result write port access by the second execution unit during the second writeback stage by preventing the shorter-latency instruction from issuing during a cycle for which the first writeback stage collides with the second writeback stage.
摘要:
A processor including instruction support for implementing hash algorithms may issue, for execution, programmer-selectable hash instructions from a defined instruction set architecture (ISA). The processor may include a cryptographic unit that may receive instructions for execution. The instructions include hash instructions defined within the ISA. In addition, the hash instructions may be executable by the cryptographic unit to implement a hash that is compliant with one or more respective hash algorithm specifications. In response to receiving a particular hash instruction defined within the ISA, the cryptographic unit may retrieve a set of input data blocks from a predetermined set of architectural registers of the processor, and generate a hash value of the set of input data blocks according to a hash algorithm that corresponds to the particular hash instruction.
摘要:
A processor including instruction support for implementing large-operand multiplication may issue, for execution, programmer-selectable instructions from a defined instruction set architecture (ISA). The processor may include an instruction execution unit comprising a hardware multiplier datapath circuit, where the hardware multiplier datapath circuit is configured to multiply operands having a maximum number of bits M. In response to receiving a single instance of a large-operand multiplication instruction defined within the ISA, wherein at least one of the operands of the large-operand multiplication instruction includes more than the maximum number of bits M, the instruction execution unit is configured to multiply operands of the large-operand multiplication instruction within the hardware multiplier datapath circuit to determine a result of the large-operand multiplication instruction without execution of programmer-selected instructions within the ISA other than the large-operand multiplication instruction.
摘要:
A processor including instruction support for implementing hash algorithms may issue, for execution, programmer-selectable hash instructions from a defined instruction set architecture (ISA). The processor may include a cryptographic unit that may receive instructions for execution. The instructions include hash instructions defined within the ISA. In addition, the hash instructions may be executable by the cryptographic unit to implement a hash that is compliant with one or more respective hash algorithm specifications. In response to receiving a particular hash instruction defined within the ISA, the cryptographic unit may retrieve a set of input data blocks from a predetermined set of architectural registers of the processor, and generate a hash value of the set of input data blocks according to a hash algorithm that corresponds to the particular hash instruction.
摘要:
In one embodiment, a processor is configured to execute a window swap instruction. The processor comprises a register file (that comprises a plurality of registers) and first and second execution units coupled to the register file. A first pipeline associated with the first execution unit has a first number of pipeline stages, and a second pipeline associated with the second execution unit has a second number of pipeline stages. The first execution unit is configured to change the current register window from the first register window to the second register window in the register file in response to the instruction. The second execution unit is configured to perform an operation defined by the instruction and write the result to the register file. The second number of pipeline stages exceeds the first number, whereby the second register window is established in the register file prior to writing the result.
摘要:
In one embodiment, a multithreaded processor includes a multithreaded instruction source that may provide a plurality of instructions each corresponding to a respective one of a plurality of threads. The multithreaded processor also includes a pick unit coupled to the multithreaded instruction source. The pick unit may select in a given cycle, a first divide instruction corresponding to one thread of the plurality of threads and a second divide instruction corresponding to another thread of the plurality of threads based upon a thread selection algorithm. Further, the multithreaded processor includes a storage coupled to a functional unit including a divider configured to execute the first divide instruction and the second divide instruction. The storage may store one of the first and the second divide instructions during execution of the other of the first and the second divide instructions.
摘要:
A processor including instruction support for implementing large-operand multiplication may issue, for execution, programmer-selectable instructions from a defined instruction set architecture (ISA). The processor may include an instruction execution unit comprising a hardware multiplier datapath circuit, where the hardware multiplier datapath circuit is configured to multiply operands having a maximum number of bits M. In response to receiving a single instance of a large-operand multiplication instruction defined within the ISA, wherein at least one of the operands of the large-operand multiplication instruction includes more than the maximum number of bits M, the instruction execution unit is configured to multiply operands of the large-operand multiplication instruction within the hardware multiplier datapath circuit to determine a result of the large-operand multiplication instruction without execution of programmer-selected instructions within the ISA other than the large-operand multiplication instruction.
摘要:
A method and apparatus for controlling power consumption in a multi-threaded processor. In one embodiment, the processor includes at least one logic unit for processing instructions. The logic unit includes a plurality of positions, wherein each of the plurality of positions corresponds to at least one instruction thread. Clock signals may be provided to the logic unit via a clock gating unit. The clock gating unit is configured to inhibit a clock signal from being provided to a corresponding one of the thread positions when no instruction thread is active for that position. The inhibiting of the clock signal for an inactive thread position may reduce power consumption by the processor.
摘要:
A processor includes an instruction fetch unit configured to issue instructions for execution, where the instructions are selected from a number of threads, where each given instruction has a corresponding thread identifier, and where at least some of the instructions specify operand(s) via register identifiers. A register file stores operands usable by the instructions, and may include several banks, each corresponding to a register identifiers and including several entries corresponding to the several threads, wherein the entries are configured to store data values. In response to receiving a request to read a particular register identifier for a given thread identifier, the register file may be configured to decode the given thread identifier to retrieve entries from the banks that correspond to the given thread identifier. The register file may further select, from among the retrieved entries, a data value corresponding to the particular register identifier to be output.