Data dependency collapsing hardware apparatus
    1.
    再颁专利
    Data dependency collapsing hardware apparatus 失效
    数据依赖崩溃硬件设备

    公开(公告)号:USRE35311E

    公开(公告)日:1996-08-06

    申请号:US292606

    申请日:1994-08-18

    摘要: A multi-function ALU (arithmetic/logic unit) for use in digital data processing facilitates the execution of instructions in parallel, thereby enhancing processor performance. The proposed apparatus reduces the instruction execution latency that results from data dependency hazards in a pipelined machine. This latency reduction is accomplished by collapsing the interlocks due to these hazards. The proposed apparatus achieves performance improvement while maintaining compatibility with previous implementations designed using an identical architecture.

    摘要翻译: 用于数字数据处理的多功能ALU(算术/逻辑单元)有助于并行执行指令,从而提高处理器的性能。 所提出的装置减少了由流水线机器中的数据依赖性危害引起的指令执行延迟。 这种延迟减少是由于这些危害而使互锁崩溃而实现的。 所提出的装置实现性能改进,同时保持与使用相同架构设计的先前实现的兼容性。

    Status predictor for combined shifter-rotate/merge unit
    4.
    发明授权
    Status predictor for combined shifter-rotate/merge unit 失效
    组合移位器旋转/合并单元的状态预测器

    公开(公告)号:US5590348A

    公开(公告)日:1996-12-31

    申请号:US920962

    申请日:1992-07-28

    摘要: Generation of functional status followed by the use of the status to control the sequencing of microinstructions is a well known critical path in processor designs. The delay associated with the path is exacerbated in superscalar machines by the additional statuses that are produced by multiple functional units from which the appropriate status must be selected for controlling the sequencing of microinstructions. This is especially true in horizontally microcoded machines. The adverse affects on the delay can be reduced by using a staged multiplexor design. For the staged multiplexor to be useful, all functional unit status should be produced as early as possible. In this invention, a status predictor is described that allows the status associated with the shifter to be generated directly from the inputs to the shifter. As a result, the status is available early in the pipeline cycle in which the shift is actually performed and made available to the multiplexor producing the controls for microinstruction sequencing. In addition, the invention allows the early generation of all shifter status used to set condition codes. The predictor has been implemented in an ESA/390 processor implementation where it was instrumental in achieving the desired cycle time.

    摘要翻译: 功能状态的产生,随后使用状态来控制微指令的排序是处理器设计中众所周知的关键路径。 与超标量机相关的延迟通过由多个功能单元产生的附加状态而加剧,由此必须选择适当的状态来控制微指令的排序。 在水平微编机器中尤其如此。 可以通过使用分段多路复用器设计来减少对延迟的不利影响。 为了使分级多路复用器有用,所有功能单元的状态应尽可能早地生成。 在本发明中,描述了状态预测器,其允许直接从移位器的输入产生与移位器相关联的状态。 因此,该状态在流水线周期的早期可用,其中实际执行移位并使其可用于产生用于微指令排序的控制的多路复用器。 此外,本发明允许早期生成用于设置条件代码的所有移位器状态。 预测器已经在ESA / 390处理器实现中实现,它在实现期望的周期时间方面发挥了重要作用。

    Early scalable instruction set machine alu status prediction apparatus
    5.
    发明授权
    Early scalable instruction set machine alu status prediction apparatus 失效
    早期可扩展指令集机ALU状态预测装置

    公开(公告)号:US5359718A

    公开(公告)日:1994-10-25

    申请号:US677692

    申请日:1991-03-29

    IPC分类号: G06F7/575 G06F7/38 G06F7/50

    CPC分类号: G06F7/575 G06F7/4991

    摘要: An apparatus implementing an algorithm for generating carries due to the second instruction of an interlocked instruction pair when executing all combinations of logical as well as arithmetic instruction pairs is developed. The algorithm is then applied to three interlock collapsing ALU means implementations that have been proposed. The critical path for calculating the carries is first presented. Next the expression for generating these carries is used to derive a fast implementation for generating overflow which is implemented in the apparatus. The resulting ALU status determination apparatus includes a three-to-one ALU means for executing plural instructions which can predict the status of three-to-one ALU operations related to the presence/absence of carries incorporated in the three-to-one ALU designed to execute a second instruction of a pair of instructions in parallel and whether or not the second instruction of the pair is independent or dependent on the result of the operation of the first instruction. Additionally, an implementation scheme for predicting result equal to zero is developed for the three-to-one ALU operations.

    摘要翻译: 开发了执行逻辑和算术指令对的所有组合时由于互锁指令对的第二指令而产生运算的算法的装置。 该算法然后被应用于已经提出的三个互锁折叠ALU装置实现。 首先介绍了运算计算的关键路径。 接下来,用于产生这些载体的表达式用于导出在装置中实现的用于产生溢出的快速实现。 所得到的ALU状态确定装置包括用于执行多个指令的三对一ALU装置,其可以预测与设计的三对一ALU中包含的载体的存在/不存在相关的三对一ALU操作的状态 并行执行一对指令的第二指令,以及该对的第二指令是否是独立的或取决于第一指令的操作结果。 另外,为三对一ALU操作开发了一种用于预测结果等于零的实现方案。

    High performance array multiplier using four-to-two composite counters
    6.
    发明授权
    High performance array multiplier using four-to-two composite counters 失效
    使用四对二复合计数器的高性能阵列乘数

    公开(公告)号:US5303176A

    公开(公告)日:1994-04-12

    申请号:US916937

    申请日:1992-07-20

    IPC分类号: G06F7/50 G06F7/52 G06F7/60

    摘要: An apparatus for the reduction of partial products of a multiplier combines attributes of pre-addition and the regularity found in array multipliers by employing improved four-to-two composite counter cells. This composite counter cell, the basic block for reducing the partial products, is itself comprised of two new four-to-two counters. One of the four-to-two counters is used to perform pre-addition of the partial products while the second counter is used to perform addition between the sum produced by the counter performing the pre-addition and the outputs from the second counter of a cell in a previous stage of the addition. The regularity of array multiplication schemes is preserved and interconnections required by the mechanism span no more than two columns of the matrix.

    摘要翻译: 用于减少乘法器的部分乘积的装置通过使用改进的四对二复合计数器单元来组合预加法的属性和在阵列乘法器中发现的规律性。 该复合计数器单元是减少部分产品的基本块,本身由两个新的四对二计数器组成。 四对二计数器中的一个用于执行部分乘积的预加法,而第二计数器用于在执行预加法的计数器和来自第二计数器的输出之和 细胞在前一阶段的添加。 保留阵列乘法方案的规律性,机制所需的互连不超过矩阵的两列。

    High performance interlock collapsing SCISM ALU apparatus
    7.
    发明授权
    High performance interlock collapsing SCISM ALU apparatus 失效
    高性能互锁倒塌SCISM ALU设备

    公开(公告)号:US5299319A

    公开(公告)日:1994-03-29

    申请号:US677079

    申请日:1991-03-29

    摘要: Three high performance implementations for an interlock collapsing ALU are presented as alternative embodiments. The critical path delay of each embodiment provides reduction in delay. For one of the implementations the delay is shown to be an equivalent number of stages as required by a three-to-one adder assuming a commonly available bookset. The delay for the other two implementations is comparable to the three-to-one adder. In addition, trade-offs for the design complexity of implementation alternatives are set out. The embodiments achieve minimum delays without a prohibitive increase in hardware.

    摘要翻译: 作为替代实施例,呈现了用于互锁折叠ALU的三个高性能实现。 每个实施例的关键路径延迟提供了延迟的减少。 对于其中一个实施方案,延迟被显示为假设一个通用可用的书集所要求的三对一加法器的等效数量级。 其他两个实现的延迟与三对一加法器相当。 此外,还规定了替代方案的设计复杂性的权衡。 这些实施例实现最小延迟,而不会在硬件上增加禁止。

    3-1 Arithmetic logic unit for simultaneous execution of an independent
or dependent add/logic instruction pair
    9.
    发明授权
    3-1 Arithmetic logic unit for simultaneous execution of an independent or dependent add/logic instruction pair 失效
    3-1用于同时执行独立或相关加/逻辑指令对的算术逻辑单元

    公开(公告)号:US5426743A

    公开(公告)日:1995-06-20

    申请号:US186224

    申请日:1994-01-24

    摘要: A high speed three-to-one data dependency collapsing ALU can be used to support multiple issue of instructions. The computing apparatus supports multiple issue of instructions it is useful in CISC, superscalar, superscalar RISC, etc. type computer designs. The concept of the ALU is presented along with a detailed description of a design. The apparatus allows the execution of any combination of two independent or dependent arithmetic or logical instructions in a single machine cycle. The 3-1 collapsing ALU structure has a 3-2 carry save adder (CSA); and a 2-1 control arithmetic logic unit (CALU) coupled for an input from the carry save adder; and a first pre-adder logic block coupled with an output to the control arithmentic logic unit; and a control generator; and a second controlled logic block coupled to receive an input from said control generator and having its output coupled to said control arithmetic logic unit. Instructions have an add/logical combinatorial operation which combines all four of the combinations: add-add, add-logical, logical-add, and logical-logical functions; and wherein two or more disassociated ALU operations are specified by a single interlock collapsing ALU which responds to the parallel issuance of a plurality of separate instructions, including RISC type instructions, each of which specifies ALU operations, and the computing apparatus executes the instructions in parallel in a single machine cycle.

    摘要翻译: 可以使用高速三对一数据依赖关系折叠ALU来支持多个指令的发行。 计算设备支持多个指令,它在CISC,超标量,超标量RISC等类型的计算机设计中是有用的。 ALU的概念与设计的详细描述一起呈现。 该装置允许在单个机器周期中执行两个独立或相关算术或逻辑指令的任何组合。 3-1塌缩ALU结构具有3-2进位保存加法器(CSA); 以及耦合到来自进位存储加法器的输入的2-1控制算术逻辑单元(CALU) 以及与输出耦合到控制算术逻辑单元的第一预加法器逻辑块; 和控制发生器; 以及第二受控逻辑块,其被耦合以从所述控制发生器接收输入,并且其输出耦合到所述控制算术逻辑单元。 指令具有添加/逻辑组合操作,其组合了所有四种组合:添加,添加逻辑,逻辑添加和逻辑逻辑功能; 并且其中两个或多个取消关联的ALU操作由单个互锁折叠ALU指定,所述单个互锁折叠ALU响应并行发布多个单独指令,包括RISC类型指令,每个指令指定ALU操作,并且所述计算装置并行地执行指令 在单机循环中。

    Apparatus for predicting overlapped storage operands for move character
    10.
    发明授权
    Apparatus for predicting overlapped storage operands for move character 失效
    用于预测用于移动角色的重叠存储操作数的装置

    公开(公告)号:US5488707A

    公开(公告)日:1996-01-30

    申请号:US920941

    申请日:1992-07-28

    CPC分类号: G06F9/30032

    摘要: An apparatus is presented and proved for detecting storage operand overlap for instructions having identical overlap detection requirements as the move character (MVC) instruction. The apparatus is applicable to all Enterprise Systems Architecture (ESA)/390 addressing modes encompassing access register addressing for either 24 bit or 31 bit addressing. S/370 addressing in 24 bit and 31 bit modes are also supported by the proposed apparatus and treated as special cases of access register addressing. In addition, the apparatus is extended to support other addressing modes with an example provided to include a 64 bit addressing mode. A fast parallel implementation of the apparatus is also presented. The apparatus results in a one cycle savings for all invocations of the MVC instruction which comprises approximately 2% of the dynamic instruction stream of a representative instruction mix. The one cycle savings results in a 21 percent improvement in the performance of the execution of the MVC instruction for the frequent case (84%) when the operand length is less than or equal to eight bytes and a 9 percent improvement in performance for the less frequent case (16%) in which the operand length is greater than eight bytes.

    摘要翻译: 提出并证明了用于检测与移动字符(MVC)指令具有相同重叠检测要求的指令的存储操作数重叠的装置。 该设备适用于包含24位或31位寻址的访问寄存器寻址的所有企业系统架构(ESA)/ 390寻址模式。 所提出的装置也支持24位和31位模式下的S / 370寻址,并被视为访问寄存器寻址的特殊情况。 此外,该设备被扩展以支持其他寻址模式,其中提供了示例以包括64位寻址模式。 还提出了该装置的快速并行实现。 该装置导致MVC指令的所有调用的一个周期节省,其包括代表性指令组合的大约2%的动态指令流。 一个周期的节省导致当操作数长度小于或等于八个字节时,针对频繁情况(84%)执行MVC指令的性能提高了21%,性能提高了9% 频繁的情况(16%),其中操作数长度大于8字节。