Performing predecode-time optimized instructions in conjunction with predecode time optimized instruction sequence caching
    3.
    发明授权
    Performing predecode-time optimized instructions in conjunction with predecode time optimized instruction sequence caching 有权
    结合预解码时间优化指令序列缓存执行预解码时间优化指令

    公开(公告)号:US09354888B2

    公开(公告)日:2016-05-31

    申请号:US13432357

    申请日:2012-03-28

    IPC分类号: G06F9/38 G06F9/30

    摘要: A method for performing predecode-time optimized instructions in conjunction with predecode time optimized instruction sequence caching. The method includes receiving a first instruction of an instruction sequence and a second instruction of the instruction sequence and determining if the first instruction and the second instruction can be optimized. In response to the determining that the first instruction and second instruction can be optimized, the method includes, preforming a pre-decode optimization on the instruction sequence and generating a new second instruction, wherein the new second instruction is not dependent on a target operand of the first instruction and storing a pre-decoded first instruction and a pre-decoded new second instruction in an instruction cache. In response to determining that the first instruction and second instruction can not be optimized, the method includes, storing the pre-decoded first instruction and a pre-decoded second instruction in the instruction cache.

    摘要翻译: 一种执行预解码时间优化指令并结合预解码时间优化指令序列缓存的方法。 该方法包括接收指令序列的第一指令和指令序列的第二指令,并且确定是否可以优化第一指令和第二指令。 响应于确定可以优化第一指令和第二指令,该方法包括:对指令序列执行预解码优化并产生新的第二指令,其中新的第二指令不依赖于目标操作数 所述第一指令并将预解码的第一指令和预解码的新的第二指令存储在指令高速缓存中。 响应于确定第一指令和第二指令不能被优化,该方法包括:将预解码的第一指令和预解码的第二指令存储在指令高速缓存中。

    Using register last use infomation to perform decode-time computer instruction optimization
    4.
    发明授权
    Using register last use infomation to perform decode-time computer instruction optimization 有权
    使用寄存器最后使用信息执行解码时间计算机指令优化

    公开(公告)号:US09286072B2

    公开(公告)日:2016-03-15

    申请号:US13251486

    申请日:2011-10-03

    IPC分类号: G06F9/30 G06F9/38

    摘要: Two computer machine instructions are fetched for execution, but replaced by a single optimized instruction to be executed, wherein a temporary register used by the two instructions is identified as a last-use register, where a last-use register has a value that is not to be accessed by later instructions, whereby the two computer machine instructions are replaced by a single optimized internal instruction for execution, the single optimized instruction not including the last-use register.

    摘要翻译: 两台计算机机器指令被取出执行,而被一个要执行的优化指令所代替,其中由两个指令使用的临时寄存器被识别为最后使用寄存器,其中最后使用寄存器的值不是 由后来的指令访问,由此两个计算机机器指令被用于执行的单个优化的内部指令代替,单个优化的指令不包括最后使用的寄存器。

    Using Register Last Use Infomation to Perform Decode-Time Computer Instruction Optimization
    5.
    发明申请
    Using Register Last Use Infomation to Perform Decode-Time Computer Instruction Optimization 有权
    使用注册最后使用信息执行解码时间计算机指令优化

    公开(公告)号:US20130086368A1

    公开(公告)日:2013-04-04

    申请号:US13251486

    申请日:2011-10-03

    IPC分类号: G06F9/318

    摘要: Two computer machine instructions are fetched for execution, but replaced by a single optimized instruction to be executed, wherein a temporary register used by the two instructions is identified as a last-use register, where a last-use register has a value that is not to be accessed by later instructions, whereby the two computer machine instructions are replaced by a single optimized internal instruction for execution, the single optimized instruction not including the last-use register.

    摘要翻译: 两台计算机机器指令被取出执行,而被一个要执行的优化指令所代替,其中由两个指令使用的临时寄存器被识别为最后使用寄存器,其中最后使用寄存器的值不是 由后来的指令访问,由此两个计算机机器指令被用于执行的单个优化的内部指令代替,单个优化的指令不包括最后使用的寄存器。

    Exploiting an Architected List-Use Operand Indication in a Computer System Operand Resource Pool
    6.
    发明申请
    Exploiting an Architected List-Use Operand Indication in a Computer System Operand Resource Pool 有权
    在计算机系统操作数资源池中利用架构化的列表使用操作数指示

    公开(公告)号:US20130086365A1

    公开(公告)日:2013-04-04

    申请号:US13251519

    申请日:2011-10-03

    IPC分类号: G06F9/30

    摘要: A pool of available physical registers are provided for architected registers, wherein operations are performed that activate and deactivate selected architected registers, such that the deactivated selected architected registers need not retain values, and physical registers can be deallocated to the pool, wherein deallocation of physical registers is performed after a last-use by a designated last-use instruction, wherein the last-use information is provided either by the last-use instruction or a prefix instruction, wherein reads to deallocated architecture registers return an architected default value.

    摘要翻译: 为架构化寄存器提供可用物理寄存器池,其中执行激活和去激活所选择的架构化寄存器的操作,使得停用的所选建筑寄存器不需要保留值,并且物理寄存器可以被释放到池中,其中物理 寄存器在最后使用指定的最后使用指令之后执行,其中最后使用信息由最后使用指令或前缀指令提供,其中对解除分配的体系结构寄存器的读取返回架构的默认值。

    Vector Loads from Scattered Memory Locations
    7.
    发明申请
    Vector Loads from Scattered Memory Locations 审中-公开
    矢量负载从分散的内存位置

    公开(公告)号:US20120060016A1

    公开(公告)日:2012-03-08

    申请号:US12876432

    申请日:2010-09-07

    IPC分类号: G06F15/76 G06F9/02

    摘要: Mechanisms for performing a scattered load operation are provided. With these mechanisms, a gather instruction is receive in a logic unit of a processor, the gather instruction specifying a plurality of addresses in a memory from which data is to be loaded into a target vector register of the processor. A plurality of separate load instructions for loading the data from the plurality of addresses in the memory are automatically generated within the logic unit. The plurality of separate load instructions are sent, from the logic unit, to one or more load/store units of the processor. The data corresponding to the plurality of addresses is gathered in a buffer of the processor. The logic unit then writes data stored in the buffer to the target vector register.

    摘要翻译: 提供了执行分散加载操作的机构。 利用这些机制,在处理器的逻辑单元中接收收集指令,所述收集指令指定要从中将数据加载到处理器的目标向量寄存器的存储器中的多个地址。 在逻辑单元内自动生成用于从存储器中的多个地址加载数据的多个单独的加载指令。 多个单独的加载指令从逻辑单元发送到处理器的一个或多个加载/存储单元。 对应于多个地址的数据被收集在处理器的缓冲器中。 然后,逻辑单元将存储在缓冲器中的数据写入目标向量寄存器。

    METHOD AND APPARATUS FOR EFFICIENT INTER-THREAD SYNCHRONIZATION FOR HELPER THREADS
    8.
    发明申请
    METHOD AND APPARATUS FOR EFFICIENT INTER-THREAD SYNCHRONIZATION FOR HELPER THREADS 有权
    用于帮助螺纹线的有效的线间同步的方法和装置

    公开(公告)号:US20110296421A1

    公开(公告)日:2011-12-01

    申请号:US12787810

    申请日:2010-05-26

    IPC分类号: G06F9/52

    摘要: A monitor bit per hardware thread in a memory location may be allocated, in a multiprocessing computer system having a plurality of hardware threads, the plurality of hardware threads sharing the memory location, and each of the allocated monitor bit corresponding to one of the plurality of hardware threads. A condition bit may be allocated for each of the plurality of hardware threads, the condition bit being allocated in each context of the plurality of hardware threads. In response to detecting the memory location being accessed, it is determined whether a monitor bit corresponding to a hardware thread in the memory location is set. In response to determining that the monitor bit corresponding to a hardware thread is set in the memory location, a condition bit corresponding to a thread accessing the memory location is set in the hardware thread's context.

    摘要翻译: 可以在具有多个硬件线程的多处理计算机系统中分配存储器位置中的每个硬件线程的监视器位,所述多个硬件线程共享存储器位置,并且所分配的监视器位中的每一个对应于多个 硬件线程。 可以为多个硬件线程中的每一个分配条件位,该条件位在多个硬件线程的每个上下文中被分配。 响应于检测到被访问的存储器位置,确定是否设置了与存储器位置中的硬件线程相对应的监视位。 响应于确定对应于硬件线程的监视位设置在存储器位置中,在硬件线程的上下文中设置与访问存储位置的线程相对应的条件位。

    Low complexity speculative multithreading system based on unmodified microprocessor core
    9.
    发明授权
    Low complexity speculative multithreading system based on unmodified microprocessor core 有权
    基于未修改的微处理器核心的低复杂度推测性多线程系统

    公开(公告)号:US07836260B2

    公开(公告)日:2010-11-16

    申请号:US12147914

    申请日:2008-06-27

    IPC分类号: G06F12/00

    摘要: A system, method and computer program product for supporting thread level speculative execution in a computing environment having multiple processing units adapted for concurrent execution of threads in speculative and non-speculative modes. Each processing unit includes a cache memory hierarchy of caches operatively connected therewith. The apparatus includes an additional cache level local to each processing unit for use only in a thread level speculation mode, each additional cache for storing speculative results and status associated with its associated processor when handling speculative threads. The additional local cache level at each processing unit are interconnected so that speculative values and control data may be forwarded between parallel executing threads. A control implementation is provided that enables speculative coherence between speculative threads executing in the computing environment.

    摘要翻译: 一种用于在具有多个处理单元的计算环境中支持线程级推测性执行的系统,方法和计算机程序产品,该处理单元适于以推测和非推测模式并行执行线程。 每个处理单元包括与其可操作地连接的高速缓存的高速缓冲存储器层级。 该装置包括仅在线程级推测模式中使用的每个处理单元本地的附加高速缓存级别,每个附加高速缓存用于存储推测结果以及处理推测性线程时与其相关联的处理器相关联的状态。 在每个处理单元处的附加本地高速缓存级别互连,使得推测值和控制数据可以在并行执行线程之间转发。 提供了一种控制实现,其实现在计算环境中执行的推测线程之间的推测性一致性。

    Method and apparatus for efficient performance monitoring of a large number of simultaneous events
    10.
    发明授权
    Method and apparatus for efficient performance monitoring of a large number of simultaneous events 失效
    用于高效率监测大量同时事件的方法和装置

    公开(公告)号:US07461383B2

    公开(公告)日:2008-12-02

    申请号:US11507307

    申请日:2006-08-21

    CPC分类号: G06F11/348 Y02D10/34

    摘要: A system for monitoring a large number of simultaneous events implements a hybrid counter array device having a first counter portion comprising counter devices, each counter device for receiving signals representing occurrences of events from an event source and providing a first count value corresponding to a lower order bits of the hybrid counter array. A second counter portion comprises a memory array device having addressable memory locations in correspondence with the counter devices, each addressable memory location for storing a second count value representing higher order bits. A control device monitors each of the counter devices and initiates updating a value of a corresponding second count value stored at the corresponding addressable memory location. The system includes interrupt pre-indication for providing fast interrupt trigger to a processor device when a count value related to an event equals a threshold value. A data transfer sub-system additionally enables one or more of: read access or write access to both the count values in the first and second counter portions over a narrow bus, the read/write access for purposes of initializing and determining status of the count values for a monitored event type in response to a processor device request.

    摘要翻译: 一种用于监视大量同时事件的系统实现了具有包括计数器装置的第一计数器部分的混合计数器阵列装置,每个计数器装置用于接收表示从事件源发生的事件的信号,并提供对应于较低次序的第一计数值 混合计数器阵列的位。 第二计数器部分包括具有与计数器装置对应的可寻址存储器位置的存储器阵列器件,每个可寻址存储器位置用于存储表示较高阶位的第二计数值。 控制装置监视每个计数器装置并且启动更新存储在相应的可寻址存储器位置处的对应的第二计数值的值。 当与事件相关的计数值等于阈值时,该系统包括用于向处理器设备提供快速中断触发的中断预指示。 数据传输子系统另外启用以下一个或多个:通过窄总线对第一和第二计数器部分中的计数值进行读访问或写入访问,用于初始化和确定计数状态的读/写访问 响应于处理器设备请求的被监视事件类型的值。