Gather cache architecture
    121.
    发明授权
    Gather cache architecture 有权
    收集缓存架构

    公开(公告)号:US08688962B2

    公开(公告)日:2014-04-01

    申请号:US13078380

    申请日:2011-04-01

    IPC分类号: G06F9/30

    CPC分类号: G06F12/0815 G06F12/0804

    摘要: Apparatuses and methods to perform gather instructions are presented. In one embodiment, an apparatus comprises a gather logic module which includes a gather logic unit to identify locality of data elements in response to a gather instruction. The apparatus includes memory comprising a plurality of memory rows including a memory row associated with the gather instruction. The apparatus further includes memory structure to store data element addresses accessed in response to the gather instruction.

    摘要翻译: 提出了执行收集指令的装置和方法。 在一个实施例中,装置包括收集逻辑模块,其包括收集逻辑单元,以响应于收集指令来识别数据元素的位置。 所述装置包括存储器,所述存储器包括多个存储器行,所述存储器行包括与所述收集指令相关联的存储器行。 该装置还包括用于存储响应于收集指令而被访问的数据元素地址的存储器结构。

    MULTI-ELEMENT INSTRUCTION WITH DIFFERENT READ AND WRITE MASKS
    122.
    发明申请
    MULTI-ELEMENT INSTRUCTION WITH DIFFERENT READ AND WRITE MASKS 有权
    具有不同读取和写入掩码的多元素指令

    公开(公告)号:US20130339678A1

    公开(公告)日:2013-12-19

    申请号:US13997998

    申请日:2011-12-23

    IPC分类号: G06F9/30

    摘要: A method is described that includes reading a first read mask from a first register. The method also includes reading a first vector operand from a second register or memory location. The method also includes applying the read mask against the first vector operand to produce a set of elements for operation. The method also includes performing an operation of the set elements. The method also includes creating an output vector by producing multiple instances of the operation's result. The method also includes reading a first write mask from a third register, the first write mask being different than the first read mask. The method also includes applying the write mask against the output vector to create a resultant vector. The method also includes writing the resultant vector to a destination register.

    摘要翻译: 描述了一种包括从第一寄存器读取第一读取掩码的方法。 该方法还包括从第二寄存器或存储器位置读取第一向量操作数。 该方法还包括对第一向量操作数应用读取掩码以产生用于操作的一组元素。 该方法还包括执行设定元件的操作。 该方法还包括通过产生操作结果的多个实例来创建输出向量。 该方法还包括从第三寄存器读取第一写掩码,第一写掩码不同于第一读掩码。 该方法还包括针对输出向量应用写掩码以产生合成矢量。 该方法还包括将结果矢量写入目的地寄存器。

    Trace reuse
    125.
    发明申请
    Trace reuse 审中-公开
    跟踪重用

    公开(公告)号:US20060036834A1

    公开(公告)日:2006-02-16

    申请号:US10917582

    申请日:2004-08-13

    IPC分类号: G06F9/30

    CPC分类号: G06F9/3808 G06F9/325

    摘要: A trace management architecture to enable the reuse of uops within one or more repeated traces. More particularly, embodiments of the invention relate to a technique to prevent multiple accesses to various functional units within a trace management architecture by reusing traces or sequences of traces that are repeated during a period of operation of the microprocessor, avoiding performance gaps due to multiple trace cache accesses and increasing the rate at which uops can be executed within a processor.

    摘要翻译: 一种跟踪管理架构,可以在一个或多个重复轨迹中重新使用uops。 更具体地,本发明的实施例涉及通过重复使用在微处理器的操作期间重复的迹线或迹线序列来防止对跟踪管理架构内的各种功能单元的多次访问的技术,从而避免由于多个跟踪而导致的性能差距 高速缓存访​​问并增加可以在处理器内执行uop的速率。

    Distribution of architectural state information in a processor across multiple pipeline stages
    126.
    发明申请
    Distribution of architectural state information in a processor across multiple pipeline stages 审中-公开
    跨多个流水线阶段在处理器中分布架构状态信息

    公开(公告)号:US20050033942A1

    公开(公告)日:2005-02-10

    申请号:US10637417

    申请日:2003-08-08

    IPC分类号: G06F9/30 G06F9/38

    摘要: Methods and apparatuses for distributing architectural state information in a processor across multiple pipeline stages are described. An architectural value of a register is represented by a historical value added to an update value which is maintained in a non-final pipeline stage. When an instruction requires the architectural value, a calculation is made and that value is inserted into the pipeline for processing. Recovery of both pre- and post-execution architectural state information is made possible by storing both the update value and the operation to take place on that value for each decoded instruction.

    摘要翻译: 描述了用于在多个流水线级处理器中分布架构状态信息的方法和装置。 寄存器的体系结构值由添加到维护在非最终流水线阶段的更新值的历史值表示。 当指令需要架构值时,进行计算,并将该值插入流水线进行处理。 通过将更新值和对每个解码指令的值进行的操作存储起来,恢复执行前和执行后架构状态信息成为可能。

    Method and apparatus for merging binary translated basic blocks of
instructions
    127.
    发明授权
    Method and apparatus for merging binary translated basic blocks of instructions 失效
    用于合并二进制转换的基本指令块的方法和装置

    公开(公告)号:US6105124A

    公开(公告)日:2000-08-15

    申请号:US672100

    申请日:1996-06-27

    摘要: A method for merging binary translated basic blocks of instructions. The method is for use in a computer system having in a memory a first set of instructions including blocks of instructions, and a translator for translating instructions executable on a source instruction set architecture into instructions executable on a target instruction set architecture. The method includes a first step of determining, by the translator, an order of execution from a first block of instructions to a second block of instructions. A second step of the method includes generating, by the translator, a hyperblock of instructions representing the first and second block of instructions translated and placed adjacent in a memory location in the order of execution.

    摘要翻译: 一种用于合并二进制转换的基本指令块的方法。 该方法用于在存储器中具有包括指令块的第一组指令的计算机系统,以及用于将可在源指令集架构上执行的指令转换为可在目标指令集架构上执行的指令的转换器。 该方法包括第一步骤,由翻译器确定从第一指令块到第二指令块的执行顺序。 该方法的第二步包括由翻译器生成代表按照执行次序在存储单元中相邻的第一和第二指令块的指令的超块。

    Performance throttling to reduce IC power consumption
    128.
    发明授权
    Performance throttling to reduce IC power consumption 失效
    性能节流以降低IC功耗

    公开(公告)号:US5719800A

    公开(公告)日:1998-02-17

    申请号:US497853

    申请日:1995-06-30

    IPC分类号: G06F1/32

    摘要: The power consumed within an integrated circuit (IC) is reduced without substantial impact on its performance for typical applications by throttling the performance of particular functional units within the IC. Artificial worst-case power consumption is reduced by throttling down the activity levels of long-duration sequences of high-power operations. The recent utilization levels of particular functional units within an IC are monitored--for example, by computing each functional unit's average duty cycle over its recent operating history. If this activity level is greater than a threshold, then the functional unit is operated in a reduced-power mode. The threshold value is set large enough to allow short bursts of high utilization to occur without impacting performance. The invention allows an integrated circuit to dynamically make the tradeoff between high-speed operation and low-power operation, by throttling back performance of localized functional units when their utilization exceeds a sustainable level. Additionally, this dynamic power/speed tradeoff can be optimized across multiple functional units within an IC or among multiple ICs within a system. Additionally, this dynamic power/speed tradeoff can be altered by providing software control over throttling parameters.

    摘要翻译: 集成电路(IC)中消耗的功率通过节流IC内的特定功能单元的性能而降低,对其典型应用的性能没有显着影响。 通过减少大功率操作的长持续时间序列的活动水平来减少人为的最坏情况下的功耗。 例如,通过计算每个功能单元在其最近运行历史上的平均占空比来监视IC内的特定功能单元的最新利用水平。 如果该活动级别大于阈值,则功能单元以降低功率模式运行。 阈值设置得足够大以允许在不影响性能的情况下发生高利用率的短脉冲串。 本发明允许集成电路在其利用率超过可持续水平时通过限制本地化功能单元的性能来动态地在高速操作和低功率操作之间进行权衡。 另外,这种动态功率/速度的折衷可以在IC内或多个系统内的多个功能单元之间进行优化。 另外,通过提供对节流参数的软件控制,可以改变动态功率/速度的权衡。