Vector mask driven clock gating for power efficiency of a processor

    公开(公告)号:US10133577B2

    公开(公告)日:2018-11-20

    申请号:US13997791

    申请日:2012-12-19

    Abstract: A processor includes an instruction schedule and dispatch (schedule/dispatch) unit to receive a single instruction multiple data (SIMD) instruction to perform an operation on multiple data elements stored in a storage location indicated by a first source operand. The instruction schedule/dispatch unit is to determine a first of the data elements that will not be operated to generate a result written to a destination operand based on a second source operand. The processor further includes multiple processing elements coupled to the instruction schedule/dispatch unit to process the data elements of the SIMD instruction in a vector manner, and a power management unit coupled to the instruction schedule/dispatch unit to reduce power consumption of a first of the processing elements configured to process the first data element.

    Apparatus and method for a hybrid latency-throughput processor
    12.
    发明授权
    Apparatus and method for a hybrid latency-throughput processor 有权
    用于混合延迟吞吐量处理器的装置和方法

    公开(公告)号:US09417873B2

    公开(公告)日:2016-08-16

    申请号:US13730055

    申请日:2012-12-28

    Abstract: An apparatus and method are described for executing both latency-optimized execution logic and throughput-optimized execution logic on a processing device. For example, a processor according to one embodiment comprises: latency-optimized execution logic to execute a first type of program code; throughput-optimized execution logic to execute a second type of program code, wherein the first type of program code and the second type of program code are designed for the same instruction set architecture; logic to identify the first type of program code and the second type of program code within a process and to distribute the first type of program code for execution on the latency-optimized execution logic and the second type of program code for execution on the throughput-optimized execution logic.

    Abstract translation: 描述了用于在处理设备上执行延迟优化的执行逻辑和吞吐量优化的执行逻辑的装置和方法。 例如,根据一个实施例的处理器包括:执行第一类型的程序代码的等待时间优化的执行逻辑; 吞吐量优化执行逻辑以执行第二类型的程序代码,其中所述第一类型的程序代码和所述第二类型的程序代码被设计用于相同的指令集架构; 识别过程中的第一类型的程序代码和第二类型的程序代码的逻辑,并且将用于执行的第一类型的程序代码分配在延迟优化的执行逻辑和第二类型的程序代码上以便在吞吐量 - 优化的执行逻辑。

    Apparatus and method for a hybrid latency-throughput processor

    公开(公告)号:US10255077B2

    公开(公告)日:2019-04-09

    申请号:US15226875

    申请日:2016-08-02

    Abstract: An apparatus and method are described for executing both latency-optimized execution logic and throughput-optimized execution logic on a processing device. For example, a processor according to one embodiment comprises: latency-optimized execution logic to execute a first type of program code; throughput-optimized execution logic to execute a second type of program code, wherein the first type of program code and the second type of program code are designed for the same instruction set architecture; logic to identify the first type of program code and the second type of program code within a process and to distribute the first type of program code for execution on the latency-optimized execution logic and the second type of program code for execution on the throughput-optimized execution logic.

    Processing core having shared front end unit

    公开(公告)号:US10140129B2

    公开(公告)日:2018-11-27

    申请号:US13730719

    申请日:2012-12-28

    Abstract: A processor having one or more processing cores is described. Each of the one or more processing cores has front end logic circuitry and a plurality of processing units. The front end logic circuitry is to fetch respective instructions of threads and decode the instructions into respective micro-code and input operand and resultant addresses of the instructions. Each of the plurality of processing units is to be assigned at least one of the threads, is coupled to said front end unit, and has a respective buffer to receive and store microcode of its assigned at least one of the threads. Each of the plurality of processing units also comprises: i) at least one set of functional units corresponding to a complete instruction set offered by the processor, the at least one set of functional units to execute its respective processing unit's received microcode; ii) registers coupled to the at least one set of functional units to store operands and resultants of the received microcode; iii) data fetch circuitry to fetch input operands for the at least one functional units' execution of the received microcode.

    Apparatus and method for low-latency invocation of accelerators

    公开(公告)号:US10095521B2

    公开(公告)日:2018-10-09

    申请号:US15145748

    申请日:2016-05-03

    Abstract: An apparatus and method are described for providing low-latency invocation of accelerators. For example, a processor according to one embodiment comprises execution logic to execute a plurality of instructions including an accelerator invocation instruction to invoke one or more accelerator commands. The accelerator invocation instruction stores command data specifying the command within a command register. One or more accelerators read the command data from the command register and responsively attempt to execute the command identified by the command data. Upon a switch from a first context to a second context, an accelerator context save/restore pointer identifies a region within system memory where the accelerator is to save its state and later the accelerator context save/restore pointer aids in restoring its state upon returning to the first context.

    Apparatus and Method for a Hybrid Latency-Throughput Processor
    16.
    发明申请
    Apparatus and Method for a Hybrid Latency-Throughput Processor 审中-公开
    用于混合延迟吞吐量处理器的装置和方法

    公开(公告)号:US20160342419A1

    公开(公告)日:2016-11-24

    申请号:US15226875

    申请日:2016-08-02

    Abstract: An apparatus and method are described for executing both latency-optimized execution logic and throughput-optimized execution logic on a processing device. For example, a processor according to one embodiment comprises: latency-optimized execution logic to execute a first type of program code; throughput-optimized execution logic to execute a second type of program code, wherein the first type of program code and the second type of program code are designed for the same instruction set architecture; logic to identify the first type of program code and the second type of program code within a process and to distribute the first type of program code for execution on the latency-optimized execution logic and the second type of program code for execution on the throughput-optimized execution logic.

    Abstract translation: 描述了用于在处理设备上执行延迟优化的执行逻辑和吞吐量优化的执行逻辑的装置和方法。 例如,根据一个实施例的处理器包括:执行第一类型的程序代码的等待时间优化的执行逻辑; 吞吐量优化执行逻辑以执行第二类型的程序代码,其中所述第一类型的程序代码和所述第二类型的程序代码被设计用于相同的指令集架构; 识别过程中的第一类型的程序代码和第二类型的程序代码的逻辑,并且将用于执行的第一类型的程序代码分配在延迟优化的执行逻辑和第二类型的程序代码上以便在吞吐量 - 优化的执行逻辑。

    Apparatus and method for memory-mapped register caching
    17.
    发明授权
    Apparatus and method for memory-mapped register caching 有权
    用于存储器映射寄存器缓存的装置和方法

    公开(公告)号:US09189398B2

    公开(公告)日:2015-11-17

    申请号:US13730030

    申请日:2012-12-28

    CPC classification number: G06F12/0802 G06F12/0875 G06F12/0897 Y02D10/13

    Abstract: A processor is described comprising: an architectural register file implemented as a combination of a register file cache and an architectural register region within a level 1 (L1) data cache, and a data location table (DLT) to store data indicating a location of each architectural register within the register file cache and/or the architectural register region within the L1 data cache.

    Abstract translation: 描述了一种处理器,包括:实现为级别1(L1)数据高速缓存中的寄存器文件高速缓存和结构寄存器区域的组合的架构寄存器文件,以及数据位置表(DLT),用于存储指示每个 寄存器文件缓存内的架构寄存器和/或L1数据高速缓存内的体系结构寄存器区域。

    VECTOR MASK DRIVEN CLOCK GATING FOR POWER EFFICIENCY OF A PROCESSOR
    18.
    发明申请
    VECTOR MASK DRIVEN CLOCK GATING FOR POWER EFFICIENCY OF A PROCESSOR 审中-公开
    矢量屏幕驱动时钟增益的处理器的功率效率

    公开(公告)号:US20150220345A1

    公开(公告)日:2015-08-06

    申请号:US13997791

    申请日:2012-12-19

    Abstract: A processor includes an instruction schedule and dispatch (schedule/dispatch) unit to receive a single instruction multiple data (SIMD) instruction to perform an operation on multiple data elements stored in a storage location indicated by a first source operand. The instruction schedule/dispatch unit is to determine a first of the data elements that will not be operated to generate a result written to a destination operand based on a second source operand. The processor further includes multiple processing elements coupled to the instruction schedule/dispatch unit to process the data elements of the SIMD instruction in a vector manner, and a power management unit coupled to the instruction schedule/dispatch unit to reduce power consumption of a first of the processing elements configured to process the first data element.

    Abstract translation: 处理器包括指令调度和调度(调度/调度)单元,以接收单个指令多数据(SIMD)指令,以对存储在由第一源操作数指示的存储位置中的多个数据元素执行操作。 指令调度/调度单元是基于第二源操作数来确定将不被操作以生成写入目的地操作数的结果的第一数据元素。 处理器还包括耦合到指令调度/调度单元的多个处理单元,以矢量方式处理SIMD指令的数据单元,以及耦合到指令调度/调度单元的功率管理单元,以减少第一 所述处理元件被配置为处理所述第一数据元素。

Patent Agency Ranking