System and method for instruction memory storage and processing based on backwards branch control information
    1.
    发明授权
    System and method for instruction memory storage and processing based on backwards branch control information 失效
    基于向后分支控制信息的指令存储器和处理系统和方法

    公开(公告)号:US07130963B2

    公开(公告)日:2006-10-31

    申请号:US10620734

    申请日:2003-07-16

    IPC分类号: G06F12/00

    CPC分类号: G06F9/381

    摘要: A system for instruction memory storage and processing in a computing device having a processor, the system is based on backwards branch control information and comprises a dynamic loop buffer (DLB) which is a tagless array of data organized as a direct-mapped structure; a DLB controller having a primary memory unit partitioned into a plurality of banks for controlling the state of the instruction memory system and accepting a program counter address as an input, the DLB controller outputs distinct signals. The system further comprises an address register located in the memory of the computing device, it is a staging register for the program counter address and an instruction fetch process that takes two cycles of the processor clock; and a bank select unit for serving as a program counter address decoder to accept the program counter address and to output a bank enable signal for selecting a bank in a primary memory unit, and a decoded address for access within the selected bank.

    摘要翻译: 一种用于具有处理器的计算设备中的指令存储器存储和处理的系统,所述系统基于向后分支控制信息,并且包括动态循环缓冲器(DLB),其是被组织为直接映射结构的无标记数据阵列; DLB控制器具有划分为多​​个存储体的主存储器单元,用于控制指令存储器系统的状态并接受程序计数器地址作为输入,DLB控制器输出不同的信号。 该系统还包括位于计算设备的存储器中的地址寄存器,它是用于程序计数器地址的分段寄存器和执行处理器时钟的两个周期的指令获取处理; 以及用于作为程序计数器地址解码器接受程序计数器地址并输出用于选择主存储器单元中的存储体的存储体使能信号的存储体选择单元和在所选择的存储体内的存取的解码地址。

    Method and apparatus for history-based movement of shared-data in coherent cache memories of a multiprocessor system using push prefetching
    2.
    发明授权
    Method and apparatus for history-based movement of shared-data in coherent cache memories of a multiprocessor system using push prefetching 有权
    用于使用推取预取的多处理器系统的相干高速缓冲存储器中共享数据的基于历史的移动的方法和装置

    公开(公告)号:US06711651B1

    公开(公告)日:2004-03-23

    申请号:US09655642

    申请日:2000-09-05

    IPC分类号: G06F1300

    摘要: A method and apparatus are provided for moving at least one of instructions and operand data throughout a plurality of caches included in a multiprocessor computer system, wherein each of the plurality of caches is included in one of a plurality of processing nodes of the system so as to provide history-based movement of shared-data in coherent cache memories. A plurality of entries are stored in a consume after produce (CAP) table attached to each of the plurality of caches. Each of the entries is associated with a plurality of storage elements in one of the plurality of caches and includes information of prior usage of the plurality of storage elements by each of the plurality of processing nodes. Upon a miss by a processing node to a cache included therein, any storage elements that caused the miss are transferred to the cache from one of main memory and another cache. An entry is created in the table that is associated with the storage elements that caused the miss. A push prefetching engine may be used to create the entry.

    摘要翻译: 提供了一种方法和装置,用于在包括在多处理器计算机系统中的多个高速缓存中移动指令和操作数数据中的至少一个,其中多个高速缓存中的每一个包括在系统的多个处理节点之一中,以便 以共享缓存存储器中的共享数据提供基于历史的移动。 在附加到多个高速缓存中的每一个的产生(CAP)表之后,将多个条目存储在消费中。 每个条目与多个高速缓存之一中的多个存储元件相关联,并且包括多个处理节点中的每一个的多个存储元件的先前使用的信息。 当处理节点错过其中包含的高速缓存时,导致未命中的任何存储元件从主存储器和另一高速缓存之一传送到高速缓存。 在表中创建一个与导致未命中的存储元素相关联的条目。 推式预取引擎可用于创建条目。

    Method and apparatus for memory prefetching based on intra-page usage history
    3.
    发明授权
    Method and apparatus for memory prefetching based on intra-page usage history 有权
    基于页内使用历史记录预取的方法和装置

    公开(公告)号:US06678795B1

    公开(公告)日:2004-01-13

    申请号:US09639263

    申请日:2000-08-15

    IPC分类号: G06F1208

    CPC分类号: G06F12/0862 G06F2212/6024

    摘要: There is provided a method for fetching at least one of instructions and operand data from a second memory into a first memory of a computer system having at least one processor. The method includes the step of storing a plurality of entries in a table associated with the first memory. Each entry is associated with a memory page that includes a plurality of storage elements in the second memory, and includes information of prior access by the at least one processor to each of the plurality of storage elements. Upon a miss to the first memory from the at least one processor based upon a request, the table is searched for a given entry associated with a given page that includes a target of the request. If the given entry is found, then at least one prefetch request is generated to fetch at least one storage element included in the given page from the second memory to the first memory, based upon given information comprised in the given entry.

    摘要翻译: 提供了一种用于将指令和操作数中的至少一个从第二存储器读取到具有至少一个处理器的计算机系统的第一存储器中的方法。 该方法包括将多个条目存储在与第一存储器相关联的表中的步骤。 每个条目与包括第二存储器中的多个存储元件的存储器页面相关联,并且包括至少一个处理器对多个存储元件中的每一个的先前访问的信息。 在基于请求错过从至少一个处理器到第一存储器时,搜索与包括请求的目标的给定页面相关联的给定条目的表。 如果找到给定条目,则基于给定条目中包含的给定信息,生成至少一个预取请求以从包含在给定页面中的至少一个存储元件从第二存储器提取到第一存储器。

    Method and apparatus for reducing logic activity in a microprocessor using reduced bit width slices that are enabled or disabled depending on operation width
    4.
    发明授权
    Method and apparatus for reducing logic activity in a microprocessor using reduced bit width slices that are enabled or disabled depending on operation width 失效
    使用减少的位宽度切片减少微处理器中的逻辑活动的方法和装置,其根据操作宽度被启用或禁用

    公开(公告)号:US06948051B2

    公开(公告)日:2005-09-20

    申请号:US09855241

    申请日:2001-05-15

    摘要: A method and apparatus for reducing logic activity in a microprocessor which examines every instruction before it is executed and determines in advance the minimum appropriate datapath width (in byte or half-word quantities) necessary to accurately execute the operation. Achieving this requires two major enhancements to a traditional microprocessor pipeline. First, extra logic (potentially an extra pipeline stage for determining an operation's effective bit width—the WD width detection logic) is introduced between the Decode and Execution stages. Second, the traditional Execution stage architecture (including a register file RF and the arithmetic logical unit ALU), instead of being organized as one continuous 32-bit unit, is organized as a collection of multiple slices, where a slice can be of an 8-bit (a byte) or a 16-bit (double byte) granularity. Each slice in this case can operate independently of each other slice, and includes portion of the register file, functional unit and cache memory. Concatenating a multiple number of these slices together creates a required full width processor.

    摘要翻译: 一种用于减少微处理器中的逻辑活动的方法和装置,其在执行之前检查每个指令,并且预先确定准确执行该操作所需的最小适当的数据路径宽度(以字节或半字数量)。 实现这一点需要对传统微处理器管道进行两个主要的改进。 首先,在解码和执行阶段之间引入额外的逻辑(潜在的用于确定操作的有效位宽度的额外流水线级 - WD宽度检测逻辑)。 第二,传统的执行阶段架构(包括寄存器文件RF和算术逻辑单元ALU)而不是组织为一个连续的32位单元,被组织为多个片段的集合,其中片可以是8 位(一个字节)或一个16位(双字节)粒度。 在这种情况下,每个切片可独立于每个切片进行操作,并且包括寄存器文件,功能单元和高速缓冲存储器的部分。 将多个这些切片连接在一起创建所需的全宽处理器。

    PACKED LOAD/STORE WITH GATHER/SCATTER
    5.
    发明申请
    PACKED LOAD/STORE WITH GATHER/SCATTER 审中-公开
    包装加载/存储与GATHER / SCATTER

    公开(公告)号:US20140040599A1

    公开(公告)日:2014-02-06

    申请号:US13569363

    申请日:2012-08-08

    IPC分类号: G06F9/30 G06F9/312

    CPC分类号: G06F9/30043 G06F9/30036

    摘要: Embodiments relate to packed loading and storing of data. An aspect includes a system for packed loading and storing of distributed data. The system includes memory and a processing element configured to communicate with the memory. The processing element is configured to perform a method including fetching and decoding an instruction for execution by the processing element. A plurality of individually addressable data elements is gathered from non-contiguous locations in the memory which are narrower than a nominal width of register file elements in the processing element based on the instruction. The processing element packs and loads the data elements into register file elements of a register file entry based on the instruction, such that at least two of the data elements gathered from the non-contiguous locations in the memory are packed and loaded into a single register file element of the register file entry.

    摘要翻译: 实施例涉及数据的打包加载和存储。 一个方面包括用于打包加载和存储分布式数据的系统。 该系统包括存储器和被配置为与存储器通信的处理元件。 处理元件被配置为执行一种方法,包括对由处理元件执行的指令进行取出和解码。 基于该指令,多个可单独寻址的数据元素从存储器中的不连续位置收集,该位置比处理元件中的寄存器文件元素的标称宽度窄。 处理元件基于指令将数据元素打包并加载到寄存器文件条目的寄存器文件元素中,使得从存储器中的非连续位置收集的至少两个数据元素被打包并加载到单个寄存器 注册文件条目的文件元素。

    PACKED LOAD/STORE WITH GATHER/SCATTER
    6.
    发明申请
    PACKED LOAD/STORE WITH GATHER/SCATTER 有权
    包装加载/存储与GATHER / SCATTER

    公开(公告)号:US20140040596A1

    公开(公告)日:2014-02-06

    申请号:US13566141

    申请日:2012-08-03

    IPC分类号: G06F9/30 G06F9/312

    CPC分类号: G06F9/30043 G06F9/30036

    摘要: Embodiments relate to packed loading and storing of data. An aspect includes a method for packed loading and storing of data distributed in a system that includes memory and a processing element. The method includes fetching and decoding an instruction for execution by the processing element. The processing element gathers a plurality of individually addressable data elements from non-contiguous locations in the memory which are narrower than a nominal width of register file elements in the processing element based on the instruction. The data elements are packed and loaded into register file elements of a register file entry by the processing element based on the instruction, such that at least two of the data elements gathered from the non-contiguous locations in the memory are packed and loaded into a single register file element of the register file entry.

    摘要翻译: 实施例涉及数据的打包加载和存储。 一方面包括一种用于打包加载和存储分布在包括存储器和处理元件的系统中的数据的方法。 该方法包括获取和解码由处理元件执行的指令。 处理元件从存储器中的非连续位置收集多个可单独寻址的数据元素,该数据元素比基于指令的处理元件中的寄存器文件元素的标称宽度更窄。 所述数据元素根据所述指令由所述处理元件打包并加载到寄存器文件条目的寄存器文件元素中,使得从存储器中的非连续位置收集的至少两个数据元素被打包并加载到 注册文件条目的单个注册文件元素。

    Cache line replacement techniques allowing choice of LFU or MFU cache line replacement

    公开(公告)号:US07870341B2

    公开(公告)日:2011-01-11

    申请号:US12130245

    申请日:2008-05-30

    IPC分类号: G06F12/12

    摘要: Methods and apparatus allowing a choice of Least Frequently Used (LFU) or Most Frequently Used (MFU) cache line replacement are disclosed. The methods and apparatus determine new state information for at least two given cache lines of a number of cache lines in a cache, the new state information based at least in part on prior state information for the at least two given cache lines. Additionally, when an access miss occurs in one of the at least two given lines, the methods and apparatus (1) select either LFU or MFU replacement criteria, and (2) replace one of the at least two given cache lines based on the new state information and the selected replacement criteria. Additionally, a cache for replacing MFU cache lines is disclosed. The cache additionally comprises MFU circuitry (1) adapted to produce new state information for the at least two given cache lines in response to an access to one of the at least two given cache lines, and (2) when a cache miss occurs in one of the at least two given cache lines, adapted to determine, based on the new state information, which of the at least two given cache lines is the most frequently used cache line.

    CACHE LINE REPLACEMENT TECHNIQUES ALLOWING CHOICE OF LFU OR MFU CACHE LINE REPLACEMENT

    公开(公告)号:US20090182951A1

    公开(公告)日:2009-07-16

    申请号:US12130245

    申请日:2008-05-30

    IPC分类号: G06F12/08 G06F12/00

    摘要: Methods and apparatus allowing a choice of Least Frequently Used (LFU) or Most Frequently Used (MFU) cache line replacement are disclosed. The methods and apparatus determine new state information for at least two given cache lines of a number of cache lines in a cache, the new state information based at least in part on prior state information for the at least two given cache lines. Additionally, when an access miss occurs in one of the at least two given lines, the methods and apparatus (1) select either LFU or MFU replacement criteria, and (2) replace one of the at least two given cache lines based on the new state information and the selected replacement criteria. Additionally, a cache for replacing MFU cache lines is disclosed. The cache additionally comprises MFU circuitry (1) adapted to produce new state information for the at least two given cache lines in response to an access to one of the at least two given cache lines, and (2) when a cache miss occurs in one of the at least two given cache lines, adapted to determine, based on the new state information, which of the at least two given cache lines is the most frequently used cache line.

    Selective bypassing of a multi-port register file
    10.
    发明授权
    Selective bypassing of a multi-port register file 失效
    选择性绕过多端口寄存器文件

    公开(公告)号:US07051186B2

    公开(公告)日:2006-05-23

    申请号:US10230492

    申请日:2002-08-29

    IPC分类号: G06F15/82 G06F9/305

    CPC分类号: G06F9/3826 G06F9/30109

    摘要: A multi-port register file may be selectively bypassed such that any element in a result vector is bypassed to the same index of an input vector of a succeeding operation when the element is requested in the succeeding operation in the same index as it was generated. Alternatively, the results to be placed in a register file may be bypassed to a succeeding operation when the N elements that dynamically compose a vector are requested as inputs to the next operation exactly in the same order as they were generated. That is, for the purposes of bypassing, the N vector elements are treated as a single entity. Similar rules apply for the write-through path.

    摘要翻译: 可以选择性地旁路多端口寄存器文件,使得当在跟随生成的相同索引中在后续操作中请求元素时,结果向量中的任何元素被绕过到后续操作的输入向量的相同索引。 或者,当动态组成向量的N个要素作为下一个操作的输入被精确地按照它们被生成的相同顺序被请求作为输入时,放置在寄存器文件中的结果可以被绕过到后续的操作。 也就是说,为了绕过,N个向量元素被视为单个实体。 类似的规则适用于直通路径。