Data memory unit and method for storing data into a lockable cache in
one clock cycle by previewing the tag array
    51.
    发明授权
    Data memory unit and method for storing data into a lockable cache in one clock cycle by previewing the tag array 失效
    数据存储单元和通过预览标签阵列在一个时钟周期内将数据存储到可锁定缓存中的方法

    公开(公告)号:US5761712A

    公开(公告)日:1998-06-02

    申请号:US850290

    申请日:1997-05-05

    Abstract: A data memory unit having a load/store unit and a data cache is provided which allows store instructions that are part of a load-op-store instruction to be executed with one access to a data cache. The load/store unit is configured with a load/store buffer having a checked bit and a way field for each buffer storage location. For load-op-store instructions, the checked bit associated with the store portion of the of the instruction is set when the load portion of the instruction accesses and hits the data cache. Also, the way field associated with the store portion is set to the way of the data cache in which the load portion hits. The data cache is configured with a locking mechanism for each cache line stored in the data cache. When the load portion of a load-op-store instruction is executed, the associated line is locked such that the line will remain in the data cache until a store instruction executes. In this way, the store portion of the load-op-store instruction is guaranteed to hit the data cache. The store may then store its data into the data cache without first performing a read cycle to determine if the store address hits the data cache.

    Abstract translation: 提供具有加载/存储单元和数据高速缓存的数据存储单元,其允许作为加载操作存储指令的一部分的存储指令通过对数据高速缓存的一次访问来执行。 加载/存储单元配置有具有每个缓冲存储位置的检查位和方式字段的加载/存储缓冲器。 对于加载操作存储指令,当指令的加载部分访问并且命中数据高速缓冲存储器时,设置与指令的存储部分相关联的检查位。 此外,与存储部分相关联的方式字段被设置为加载部分命中的数据高速缓存的方式。 数据高速缓存配置有存储在数据高速缓存中的每个高速缓存行的锁定机制。 当执行加载操作存储指令的加载部分时,相关联的行被锁定,使得行将保留在数据高速缓存中直到存储指令执行。 以这种方式,加载操作存储指令的存储部分被保证命中数据高速缓存。 然后,存储器可以将其数据存储到数据高速缓存中,而不首先执行读取周期以确定存储地址是否触发数据高速缓存。

    Space efficient checkpoint facility and technique for processor with integrally indexed register mapping and free-list arrays

    公开(公告)号:US09672044B2

    公开(公告)日:2017-06-06

    申请号:US13564490

    申请日:2012-08-01

    Applicant: Thang M. Tran

    Inventor: Thang M. Tran

    Abstract: A processor may efficiently implement register renaming and checkpoint repair even in instruction set architectures with large numbers of wide (bit-width) registers by (i) renaming all destination operand register targets, (ii) implementing free list and architectural-to-physical mapping table as a combined array storage with unitary (or common) read, write and checkpoint pointer indexing and (iiii) storing checkpoints as snapshots of the mapping table, rather than of actual register contents. In this way, uniformity (and timing simplicity) of the decode pipeline may be accentuated and architectural-to-physical mappings (or allocable mappings) may be efficiently shuttled between free-list, reorder buffer and mapping table stores in correspondence with instruction dispatch and completion as well as checkpoint creation, retirement and restoration.

    Register renaming scheme with checkpoint repair in a processing device
    53.
    发明授权
    Register renaming scheme with checkpoint repair in a processing device 有权
    在处理设备中使用检查点修复注册重命名方案

    公开(公告)号:US09170818B2

    公开(公告)日:2015-10-27

    申请号:US13094110

    申请日:2011-04-26

    Applicant: Thang M. Tran

    Inventor: Thang M. Tran

    CPC classification number: G06F9/384 G06F9/3838 G06F9/3851 G06F9/3863

    Abstract: A data processing device maintains register map information that maps accesses to architectural registers, as identified by instructions being executed, to physical registers of the data processing device. In response to determining that an instruction, such as a speculatively-executing conditional branch, indicates a checkpoint, the data processing device stores the register map information for subsequent retrieval depending on the resolution of the instruction. In addition, in response to the checkpoint indication the data processing device generates new register map information such that accesses to the architectural registers are mapped to different physical registers. The data processing device maintains a list, referred to as a free register list, of physical registers available to be mapped to an architectural registers.

    Abstract translation: 数据处理设备维护寄存器映射信息,其将对正在执行的指令识别的架构寄存器的访问映射到数据处理设备的物理寄存器。 响应于确定诸如推测执行条件分支的指令指示检查点,数据处理设备根据指令的分辨率存储用于后续检索的寄存器映射信息。 此外,响应于检查点指示,数据处理装置产生新的寄存器映射信息,使得对架构寄存器的访问被映射到不同的物理寄存器。 数据处理设备维护被称为自由寄存器列表的可被映射到架构寄存器的物理寄存器的列表。

    Method and Apparatus for Dynamic Resource Partition in Simultaneous Multi-Thread Microprocessor
    54.
    发明申请
    Method and Apparatus for Dynamic Resource Partition in Simultaneous Multi-Thread Microprocessor 有权
    同时多线程微处理器动态资源分区的方法与装置

    公开(公告)号:US20150100965A1

    公开(公告)日:2015-04-09

    申请号:US14046438

    申请日:2013-10-04

    Applicant: Thang M. Tran

    Inventor: Thang M. Tran

    Abstract: A method includes, in one implementation, receiving a first set of instructions of a first thread, receiving a second set of instructions of a second thread, and allocating queues to the instructions from the first and second sets. During a time when the first and second threads are simultaneously being processed, changeable number of queues to can be allocated to the first thread based on factors such as the first and/or second thread's requirements or priorities, while maintaining a minimum specified number of queues that are allocated to the first and/or second thread. When needed, one thread may be stalled so that at least the minimum number of queues remains reserved for another thread while attempting to satisfy thread-priority requests or queue-requirement requests.

    Abstract translation: 一种方法包括:在一个实现中,接收第一线程的第一组指令,接收第二线程的第二组指令,以及从所述第一和第二组向所述指令分配队列。 在同时处理第一和第二线程的时间期间,可以基于诸如第一线程和/或第二线程的要求或优先级之类的因素分配给第一线程的可更改队列数,同时保持最小指定队列数 被分配给第一和/或第二线程。 当需要时,一个线程可能被停止,使得在尝试满足线程优先级请求或队列请求请求时,至少最小数量的队列保留为另一个线程。

    SPACE EFFICIENT CHECKPOINT FACILITY AND TECHNIQUE FOR PROCESSOR WITH INTEGRALLY INDEXED REGISTER MAPPING AND FREE-LIST ARRAYS
    55.
    发明申请
    SPACE EFFICIENT CHECKPOINT FACILITY AND TECHNIQUE FOR PROCESSOR WITH INTEGRALLY INDEXED REGISTER MAPPING AND FREE-LIST ARRAYS 有权
    具有整体索引寄存器映射和自由列表区域的处理器的空间有效检查点设施和技术

    公开(公告)号:US20140040595A1

    公开(公告)日:2014-02-06

    申请号:US13564490

    申请日:2012-08-01

    Applicant: Thang M. Tran

    Inventor: Thang M. Tran

    Abstract: A processor may efficiently implement register renaming and checkpoint repair even in instruction set architectures with large numbers of wide (bit-width) registers by (i) renaming all destination operand register targets, (ii) implementing free list and architectural-to-physical mapping table as a combined array storage with unitary (or common) read, write and checkpoint pointer indexing and (iiii) storing checkpoints as snapshots of the mapping table, rather than of actual register contents. In this way, uniformity (and timing simplicity) of the decode pipeline may be accentuated and architectural-to-physical mappings (or allocable mappings) may be efficiently shuttled between free-list, reorder buffer and mapping table stores in correspondence with instruction dispatch and completion as well as checkpoint creation, retirement and restoration.

    Abstract translation: 处理器可以通过(i)重命名所有目的地操作数寄存器目标,(ii)实现自由列表和架构到物理映射,即使在具有大量宽(位宽)寄存器的指令集架构中也可以有效地实现寄存器重命名和检查点修复 表作为具有单一(或公共)读取,写入和检查点指针索引的组合阵列存储,以及(iii)将检查点存储为映射表的快照,而不是实际的寄存器内容。 以这种方式,可以加强解码流水线的均匀性(和时序简单性),并且可以在自由列表,重排序缓冲器和映射表存储之间有效地将架构到物理映射(或可分配映射)与指令分派和 完成以及检查点创建,退休和恢复。

    DATA PROCESSING SYSTEM OPERABLE IN SINGLE AND MULTI-THREAD MODES AND HAVING MULTIPLE CACHES AND METHOD OF OPERATION
    56.
    发明申请
    DATA PROCESSING SYSTEM OPERABLE IN SINGLE AND MULTI-THREAD MODES AND HAVING MULTIPLE CACHES AND METHOD OF OPERATION 有权
    数据处理系统在单和多线程模式下运行,并具有多个速度和操作方法

    公开(公告)号:US20130212585A1

    公开(公告)日:2013-08-15

    申请号:US13370420

    申请日:2012-02-10

    Applicant: Thang M. Tran

    Inventor: Thang M. Tran

    Abstract: In some embodiments, a data processing system includes a processing unit, a first load/store unit LSU and a second LSU configured to operate independently of the first LSU in single and multi-thread modes. A first store buffer is coupled to the first and second LSUs, and a second store buffer is coupled to the first and second LSUs. The first store buffer is used to execute a first thread in multi-thread mode. The second store buffer is used to execute a second thread in multi-thread mode. The first and second store buffers are used when executing a single thread in single thread mode.

    Abstract translation: 在一些实施例中,数据处理系统包括处理单元,第一加载/存储单元LSU和被配置为独立于单线程和多线程模式的第一LSU操作的第二LSU。 第一存储缓冲器耦合到第一和第二LSU,并且第二存储缓冲器耦合到第一和第二LSU。 第一个存储缓冲区用于在多线程模式下执行第一个线程。 第二个存储缓冲区用于在多线程模式下执行第二个线程。 当在单线程模式下执行单个线程时,使用第一个和第二个存储缓冲区。

    DATA PROCESSING SYSTEM OPERABLE IN SINGLE AND MULTI-THREAD MODES AND HAVING MULTIPLE CACHES AND METHOD OF OPERATION
    57.
    发明申请
    DATA PROCESSING SYSTEM OPERABLE IN SINGLE AND MULTI-THREAD MODES AND HAVING MULTIPLE CACHES AND METHOD OF OPERATION 有权
    数据处理系统在单和多线程模式下运行,并具有多个速度和操作方法

    公开(公告)号:US20130046936A1

    公开(公告)日:2013-02-21

    申请号:US13213387

    申请日:2011-08-19

    Applicant: Thang M. Tran

    Inventor: Thang M. Tran

    Abstract: Systems and methods are disclosed for a computer system that includes a first load/store execution unit 210a, a first Level 1 L1 data cache unit 216a coupled to the first load/store execution unit, a second load/store execution unit 210b, and a second L1 data cache unit 216b coupled to the second load/store execution unit. Some instructions are directed to the first load/store execution unit and other instructions are directed to the second load/store execution unit when executing a single thread of instructions.

    Abstract translation: 公开了一种用于计算机系统的系统和方法,该计算机系统包括第一加载/存储执行单元210a,耦合到第一加载/存储执行单元的第一级1 L1数据高速缓存单元216a,第二加载/存储执行单元210b和 第二L1数据高速缓存单元216b,耦合到第二加载/存储执行单元。 一些指令被引导到第一加载/存储执行单元,并且当执行单个指令线程时,其他指令被引导到第二加载/存储执行单元。

    Branch selectors associated with byte ranges within an instruction cache for rapidly identifying branch predictions
    58.
    发明授权
    Branch selectors associated with byte ranges within an instruction cache for rapidly identifying branch predictions 有权
    与指令高速缓存中的字节范围相关联的分支选择器,用于快速识别分支预测

    公开(公告)号:US06279107B1

    公开(公告)日:2001-08-21

    申请号:US09654843

    申请日:2000-09-02

    Applicant: Thang M. Tran

    Inventor: Thang M. Tran

    CPC classification number: G06F9/30054 G06F9/3806 G06F9/3844

    Abstract: A branch prediction unit stores a set of branch selectors corresponding to each of a group of contiguous instruction bytes stored in an instruction cache. Each branch selector identifies the branch prediction to be selected if a fetch address corresponding to that branch selector is presented. In order to minimize the number of branch selectors stored for a group of contiguous instruction bytes, the group is divided into multiple byte ranges. The largest byte range may include a number of bytes comprising the shortest branch instruction in the instruction set (exclusive of the return instruction). For example, the shortest branch instruction may be two bytes in one embodiment. Therefore, the largest byte range is two bytes in the example. Since the branch selectors as a group change value (i.e. indicate a different branch instruction) only at the end byte of a predicted-taken branch instruction, fewer branch selectors may be stored than the number of bytes within the group.

    Abstract translation: 分支预测单元存储对应于存储在指令高速缓存中的一组连续指令字节中的每一个分支选择器。 如果呈现与该分支选择器相对应的获取地址,则每个分支选择器识别要选择的分支预测。 为了最小化一组连续指令字节存储的分支选择器的数量,该组被划分为多个字节范围。 最大字节范围可以包括包括指令集中的最短分支指令(不包括返回指令)的字节数。 例如,在一个实施例中,最短分支指令可以是两个字节。 因此,在该示例中,最大字节范围是两个字节。 由于分支选择器作为组改变值(即指示不同的分支指令)仅在预测的分支指令的结束字节处,所以可以存储比组内的字节数少的分支选择器。

    Reorder buffer configured to allocate storage for instruction results corresponding to predefined maximum number of concurrently receivable instructions independent of a number of instructions received
    59.
    发明授权
    Reorder buffer configured to allocate storage for instruction results corresponding to predefined maximum number of concurrently receivable instructions independent of a number of instructions received 有权
    重新排序缓冲器被配置为分配与预定最大数量的并发可接收指令相对应的指令结果的存储,而与所接收的指令数无关

    公开(公告)号:US06237082B1

    公开(公告)日:2001-05-22

    申请号:US09643591

    申请日:2000-08-22

    Abstract: A reorder buffer is configured into multiple lines of storage, wherein a line of storage includes sufficient storage for instruction results regarding a predefined maximum number of concurrently dispatchable instructions. A line of storage is allocated whenever one or more instructions are dispatched. A microprocessor employing the reorder buffer is also configured with fixed, symmetrical issue positions. The symmetrical nature of the issue positions may increase the average number of instructions to be concurrently dispatched and executed by the microprocessor. The average number of unused locations within the line decreases as the average number of concurrently dispatched instructions increases. One particular implementation of the reorder buffer includes a future file. The future file comprises a storage location corresponding to each register within the microprocessor. The reorder buffer tag (or instruction result, if the instruction has executed) of the last instruction in program order to update the register is stored in the future file. The reorder buffer provides the value (either reorder buffer tag or instruction result) stored in the storage location corresponding to a register when the register is used as a source operand for another instruction. Another advantage of the future file for microprocessors which allow access and update to portions of registers is that narrow-to-wide dependencies are resolved upon completion of the instruction which updates the narrower register.

    Abstract translation: 重排序缓冲器被配置成多个存储线,其中存储线包括关于预定的最大数量的可同时分发的指令的指令结果的足够的存储。 只要调度一个或多个指令,就分配一行存储空间。 采用重排序缓冲器的微处理器也配置有固定的对称发布位置。 问题位置的对称性质可能会增加由微处理器同时调度和执行的指令的平均数量。 随着并发调度指令的平均数量的增加,行中未使用位置的平均数量减少。 重排序缓冲器的一个特定实现包括将来的文件。 未来文件包括与微处理器内的每个寄存器对应的存储位置。 程序顺序中的最后一条指令的重新排序缓冲区标签(或指令结果已执行)更新寄存器存储在将来的文件中。 重新排序缓冲器提供当寄存器用作另一个指令的源操作数时,存储在与寄存器相对应的存储位置中的值(重新排序缓冲器标签或指令结果)。 允许访问和更新寄存器部分的微处理器的未来文件的另一个优点是,在更新较窄寄存器的指令完成后,解决了窄到宽的依赖关系。

    Dependency table for reducing dependency checking hardware
    60.
    发明授权
    Dependency table for reducing dependency checking hardware 有权
    用于减少依赖关系检查硬件的依赖关系表

    公开(公告)号:US06209084B1

    公开(公告)日:2001-03-27

    申请号:US09566216

    申请日:2000-05-05

    Abstract: A dependency table stores a reorder buffer tag for each register. When operand fetch is performed for a set of concurrently decoded instructions, dependency checking is performed including checking for dependencies between the set of concurrently decoded instructions as well as accessing the dependency table to select the reorder buffer tag stored therein. Either the reorder buffer tag of one of the concurrently decoded instructions, the reorder buffer tag stored in the dependency table, the instruction result corresponding to the stored reorder buffer tag, or the value from the register itself is forwarded as the source operand for the instruction. The dependency table stores the width of the register being updated. Prior to forwarding the reorder buffer tag stored within the dependency table, the width stored therein is compared to the width of the source operand being requested. If a narrow-to-wide dependency is detected the instruction is stalled until the instruction indicated in the dependency table retires.

    Abstract translation: 依赖关系表存储每个寄存器的重排序缓冲区标签。 当对一组并行解码的指令执行操作数提取时,执行依赖性检查,包括检查所述一组并行解码指令之间的依赖性以及访问依赖关系表以选择其中存储的重排序缓冲器标签。 同时解码的指令之一的重排序缓冲器标签,存储在依赖关系表中的重排序缓冲器标签,与存储的重排序缓冲器标签相对应的指令结果或来自寄存器本身的值被转发作为指令的源操作数 。 依赖表存储要更新的寄存器的宽度。 在转发存储在依赖关系表内的重新排序缓冲器标签之前,将其中存储的宽度与所请求的源操作数的宽度进行比较。 如果检测到窄到宽的依赖关系,则指令停止,直到依赖关系表中指示的指令退出。

Patent Agency Ranking