Systems and methods for reconfiguring cache memory
    1.
    发明授权
    Systems and methods for reconfiguring cache memory 有权
    用于重新配置缓存的系统和方法

    公开(公告)号:US09547593B2

    公开(公告)日:2017-01-17

    申请号:US13036321

    申请日:2011-02-28

    Applicant: Thang M. Tran

    Inventor: Thang M. Tran

    Abstract: A microprocessor system is disclosed that includes a first data cache that is shared by a first group of one or more program threads in a multi-thread mode and used by one program thread in a single-thread mode. A second data cache is shared by a second group of one or more program threads in the multi-thread mode and is used as a victim cache for the first data cache in the single-thread mode.

    Abstract translation: 公开了一种微处理器系统,其包括由多线程模式中的一个或多个程序线程的第一组共享的第一数据高速缓存,并且由单线程模式中的一个程序线程使用。 第二数据高速缓存由多线程模式中的一个或多个程序线程的第二组共享,并且以单线程模式用作第一数据高速缓存的受害缓存。

    Apparatus and method for memory copy at a processor
    2.
    发明授权
    Apparatus and method for memory copy at a processor 有权
    在处理器处进行存储器复制的装置和方法

    公开(公告)号:US09524162B2

    公开(公告)日:2016-12-20

    申请号:US13455800

    申请日:2012-04-25

    Abstract: A processor uses a dedicated buffer to reduce the amount of time needed to execute memory copy operations. For each load instruction associated with the memory copy operation, the processor copies the load data from memory to the dedicated buffer. For each store operation associated with the memory copy operation, the processor retrieves the store data from the dedicated buffer and transfers it to memory. The dedicated buffer is separate from a register file and caches of the processor, so that each load operation associated with a memory copy operation does not have to wait for data to be loaded from memory to the register file. Similarly, each store operation associated with a memory copy operation does not have to wait for data to be transferred from the register file to memory.

    Abstract translation: 处理器使用专用缓冲器来减少执行内存复制操作所需的时间。 对于与存储器复制操作相关联的每个加载指令,处理器将负载数据从存储器复制到专用缓冲区。 对于与存储器复制操作相关联的每个存储操作,处理器从专用缓冲器检索存储数据并将其传送到存储器。 专用缓冲器与寄存器文件和处理器的高速缓存分开,使得与存储器复制操作相关联的每个加载操作不必等待数据从存储器加载到寄存器文件。 类似地,与存储器复制操作相关联的每个存储操作不必等待数据从寄存器文件传送到存储器。

    Data processing system operable in single and multi-thread modes and having multiple caches and method of operation
    3.
    发明授权
    Data processing system operable in single and multi-thread modes and having multiple caches and method of operation 有权
    数据处理系统可在单线程和多线程模式下运行,并具有多个高速缓存和操作方法

    公开(公告)号:US09424190B2

    公开(公告)日:2016-08-23

    申请号:US13213387

    申请日:2011-08-19

    Applicant: Thang M. Tran

    Inventor: Thang M. Tran

    Abstract: Systems and methods are disclosed for a computer system that includes a first load/store execution unit 210a, a first Level 1 L1 data cache unit 216a coupled to the first load/store execution unit, a second load/store execution unit 210b, and a second L1 data cache unit 216b coupled to the second load/store execution unit. Some instructions are directed to the first load/store execution unit and other instructions are directed to the second load/store execution unit when executing a single thread of instructions.

    Abstract translation: 公开了一种用于计算机系统的系统和方法,该计算机系统包括第一加载/存储执行单元210a,耦合到第一加载/存储执行单元的第一级1 L1数据高速缓存单元216a,第二加载/存储执行单元210b和 第二L1数据高速缓存单元216b,耦合到第二加载/存储执行单元。 一些指令被引导到第一加载/存储执行单元,并且当执行单个指令线程时,其他指令被引导到第二加载/存储执行单元。

    TECHNIQUES FOR REDUCING PROCESSOR POWER CONSUMPTION THROUGH DYNAMIC PROCESSOR RESOURCE ALLOCATION
    4.
    发明申请
    TECHNIQUES FOR REDUCING PROCESSOR POWER CONSUMPTION THROUGH DYNAMIC PROCESSOR RESOURCE ALLOCATION 有权
    通过动态处理器资源分配减少处理器功耗的技术

    公开(公告)号:US20140025967A1

    公开(公告)日:2014-01-23

    申请号:US13551220

    申请日:2012-07-17

    Applicant: Thang M. Tran

    Inventor: Thang M. Tran

    CPC classification number: G06F1/3206

    Abstract: A technique for performing power management for configurable processor resources of a processor determining whether to increase, decrease, or maintain resource units for each of the configurable processor resources based on utilization of each of the configurable processor resources. A total weighted power number for the processor is substantially maintained while resource units for each of the configurable processor resources whose utilization is above a first level is increased and resource units for each of the configurable processor resources whose utilization is below a second level is decreased. The total weighted power number corresponds to a sum of weighted power numbers for the configurable processor resources.

    Abstract translation: 一种用于对处理器的可配置处理器资源执行电力管理的技术,其基于每个可配置处理器资源的利用来确定是否增加,减少或维护每个可配置处理器资源的资源单元。 基本上维持处理器的总加权功率数,同时利用率高于第一级的可配置处理器资源的每个可用处理器资源的资源单元被增加,并且其利用率低于第二级的每个可配置处理器资源的资源单元减少。 总加权功率数量对应于可配置处理器资源的加权功率数之和。

    SYSTEMS AND METHODS FOR REDUCING BRANCH MISPREDICTION PENALTY
    5.
    发明申请
    SYSTEMS AND METHODS FOR REDUCING BRANCH MISPREDICTION PENALTY 有权
    用于减少分支机构误报罚款的制度和方法

    公开(公告)号:US20130198490A1

    公开(公告)日:2013-08-01

    申请号:US13362720

    申请日:2012-01-31

    CPC classification number: G06F9/3851 G06F9/3804 G06F9/381

    Abstract: In a processing system capable of single and multi-thread execution, a branch prediciton unit can be configured to detect hard to predict branches and loop instructions. In a dual-threading (simultaneous multi-threading) configuration, one instruction queues (IQ) is used for each thread and instructions are alternately sent from each IQ to decode units. In single thread mode, the second IQ can be used to store the “not predicted path” of the hard-to-predict branch or the “fall-through” path of the loop. On mis-prediction, the mis-prediction penalty is reduced by getting the instructions from IQ instead of instruction cache.

    Abstract translation: 在能够进行单线程和多线程执行的处理系统中,分支预测单元可以被配置为检测难以预测分支和循环指令。 在双线程(同时多线程)配置中,每个线程都使用一个指令队列(IQ),并将指令从每个IQ交替发送到解码单元。 在单线程模式中,第二个IQ可用于存储难以预测的分支的“未预测路径”或循环的“直通”路径。 在误预测中,通过从IQ而不是指令高速缓存获取指令来减少错误预测损失。

    DATA PROCESSING SYSTEM WITH LATENCY TOLERANCE EXECUTION
    6.
    发明申请
    DATA PROCESSING SYSTEM WITH LATENCY TOLERANCE EXECUTION 有权
    具有延期执行的数据处理系统

    公开(公告)号:US20120303936A1

    公开(公告)日:2012-11-29

    申请号:US13419531

    申请日:2012-03-14

    Abstract: In a processor having an instruction unit, a decode/issue unit, and execution queues configured to provide instructions to correspondingly different types execution units, a method comprises maintaining a duplicate free list for the execution queues. The duplicate free list includes a plurality of duplicate dependent instruction indicators that indicate when a duplicate instruction for a dependent instruction is stored in at least one of the execution queues. One of the duplicate dependent instruction indicators is assigned to an execution queue for a dependent instruction. The dependent instruction is executed only when the one of the duplicate dependent instruction indicators is reset.

    Abstract translation: 在具有指令单元,解码/发布单元和执行队列的处理器中,被配置为向对应的不同类型的执行单元提供指令,一种方法包括维护执行队列的重复空闲列表。 重复的空闲列表包括多个重复相关指令指示符,其指示何时将依赖指令的重复指令存储在至少一个执行队列中。 重复的相关指令指示符之一被分配给依赖指令的执行队列。 依赖指令仅在重复相关指示指示符之一被复位时执行。

    Microprocessor with independent SIMD loop buffer
    7.
    发明授权
    Microprocessor with independent SIMD loop buffer 有权
    具有独立SIMD循环缓冲器的微处理器

    公开(公告)号:US07330964B2

    公开(公告)日:2008-02-12

    申请号:US11273493

    申请日:2005-11-14

    Abstract: An apparatus comprising detection logic configured to detect a loop among a set of instructions, the loop comprising one or more instructions of a first type of instruction and a second type of instruction and a co-processor configured to execute the loop detected by the detection logic, the co-processor comprising an instruction queue. The apparatus further comprises fetch logic configured to fetch instructions; decode logic configured to determine instruction type; a processor configured to execute the loop detected by the detection logic, wherein the loop comprises one or more instructions of the first type of instruction, and an execution unit configured to execute the loop detected by the detection logic.

    Abstract translation: 一种装置,包括被配置为检测一组指令中的循环的检测逻辑,该循环包括第一类型的指令和第二类型的指令的一个或多个指令,以及被配置为执行由检测逻辑检测到的循环的协处理器 协处理器包括指令队列。 所述设备还包括被配置为获取指令的提取逻辑; 解码逻辑配置为确定指令类型; 被配置为执行由检测逻辑检测到的循环的处理器,其中所述循环包括所述第一类型的指令的一个或多个指令,以及被配置为执行由所述检测逻辑检测到的所述循环的执行单元。

    Data address prediction structure and a method for operating the same
    8.
    发明授权
    Data address prediction structure and a method for operating the same 失效
    数据地址预测结构及其操作方法

    公开(公告)号:US06604190B1

    公开(公告)日:2003-08-05

    申请号:US08473504

    申请日:1995-06-07

    Applicant: Thang M. Tran

    Inventor: Thang M. Tran

    Abstract: A data address prediction structure for a superscalar microprocessor is provided. The data address prediction structure predicts a data address that a group of instructions is going to access while that group of instructions is being fetched from the instruction cache. The data bytes associated with the predicted address are placed in a relatively small, fast buffer. The decode stages of instruction processing pipelines in the microprocessor access the buffer with addresses generated from the instructions, and if the associated data bytes are found in the buffer they are conveyed to the reservation station associated with the requesting decode stage. Therefore, the implicit memory read associated with an instruction is performed prior to the instruction arriving in a functional unit. The functional unit is occupied by the instruction for a fewer number of clock cycles, since it need not perform the implicit memory operation. Instead, the functional unit performs the explicit operation indicated by the instruction.

    Abstract translation: 提供了一种用于超标量微处理器的数据地址预测结构。 数据地址预测结构预测当从指令高速缓存取出指令组时,指令组将要访问的数据地址。 与预测地址相关联的数据字节被放置在相对较小的快速缓冲器中。 微处理器中的指令处理流水线的解码阶段利用从指令生成的地址来访问缓冲器,并且如果在缓冲器中找到关联的数据字节,则它们被传送到与请求解码级相关联的保留站。 因此,在指令到达功能单元之前执行与指令相关联的隐式存储器读取。 由于不需要执行隐式存储器操作,功能单元被少量时钟周期的指令占用。 相反,功能单元执行指令指示的显式操作。

    Instruction cache configured to provide instructions to a microprocessor
having a clock cycle time less than a cache access time of said
instruction cache
    9.
    发明授权
    Instruction cache configured to provide instructions to a microprocessor having a clock cycle time less than a cache access time of said instruction cache 失效
    指令高速缓存,其被配置为向微处理器提供具有小于所述指令高速缓存的高速缓存访​​问时间的时钟周期时间的指令

    公开(公告)号:US6167510A

    公开(公告)日:2000-12-26

    申请号:US65346

    申请日:1998-04-23

    Applicant: Thang M. Tran

    Inventor: Thang M. Tran

    Abstract: An apparatus including a banked instruction cache and a branch prediction unit is provided. The banked instruction cache allows multiple instruction fetch addresses (comprising consecutive instruction blocks from the predicted instruction stream being executed by the microprocessor) to be fetched concurrently. The instruction cache provides an instruction block corresponding to one of the multiple fetch addresses to the instruction processing pipeline of the microprocessor during each consecutive clock cycle, while additional instruction fetch addresses from the predicted instruction stream are fetched. Preferably, the instruction cache includes at least a number of banks equal to the number of clock cycles consumed by an instruction cache access. In this manner, instructions may be provided during each consecutive clock cycle even though instruction cache access time is greater than the clock cycle time of the microprocessor.

    Abstract translation: 提供一种包括分组指令高速缓存和分支预测单元的装置。 存储的指令高速缓存允许同时取出多​​个指令获取地址(包括来自由微处理器执行的预测指令流的连续指令块)。 指令高速缓冲存储器在每个连续的时钟周期期间向微处理器的指令处理流水线提供与多个提取地址中的一个相对应的指令块,同时提取来自预测指令流的附加指令提取地址。 优选地,指令高速缓存包括等于指令高速缓存访​​问消耗的时钟周期的数量的至少一组存储体。 以这种方式,即使指令高速缓存访​​问时间大于微处理器的时钟周期时间,也可以在每个连续时钟周期期间提供指令。

    Apparatus for generating a valid mask
    10.
    发明授权
    Apparatus for generating a valid mask 失效
    用于产生有效掩模的装置

    公开(公告)号:US6148393A

    公开(公告)日:2000-11-14

    申请号:US041316

    申请日:1998-03-12

    Abstract: A valid mask generator comprising a series of mask generation blocks. Each block generates a predetermined number of valid mask bits given a predetermined number of start pointer bits and end bits, wherein said predetermined number of valid mask bits generated by each block is less than the total number of bits in the valid mask. The series of mask generation blocks may be connected in series, wherein each block outputs a carry-out signal, and wherein each block receives the carry-out signal from the node before it as a carry-in signal. A method for generating a valid mask from a start pointer and a plurality of end bits is also contemplated.

    Abstract translation: 包括一系列掩码生成块的有效掩码生成器。 每个块产生给定预定数量的开始指针位和结束位的预定数量的有效屏蔽位,其中由每个块产生的所述预定数量的有效屏蔽位小于有效掩码中的总位数。 一系列掩模生成块可以串联连接,其中每个块输出进位输出信号,并且其中每个块在作为进位输入信号之前从该节点接收进位输出信号。 也可以考虑从开始指针和多个结束比特生成有效掩码的方法。

Patent Agency Ranking