Memory accelerator with two instruction set fetch path to prefetch second set while executing first set of number of instructions in access delay to instruction cycle ratio
    1.
    发明授权
    Memory accelerator with two instruction set fetch path to prefetch second set while executing first set of number of instructions in access delay to instruction cycle ratio 有权
    存储器加速器具有两条指令集提取路径,用于在访问延迟到指令周期比率的同时执行第一组指令数时预取第二组

    公开(公告)号:US07290119B2

    公开(公告)日:2007-10-30

    申请号:US10923284

    申请日:2004-08-20

    IPC分类号: G06F9/28

    摘要: A memory accelerator module buffers program instructions and/or data for high speed access using a deterministic access protocol. The program memory is logically partitioned into ‘stripes’, or ‘cyclically sequential’ partitions, and the memory accelerator module includes a latch that is associated with each partition. When a particular partition is accessed, it is loaded into its corresponding latch, and the instructions in the next sequential partition are automatically pre-fetched into their corresponding latch. In this manner, the performance of a sequential-access process will have a known response, because the pre-fetched instructions from the next partition will be in the latch when the program sequences to these instructions. Previously accessed blocks remain in their corresponding latches until the pre-fetch process ‘cycles around’ and overwrites the contents of each sequentially-accessed latch. In this manner, the performance of a loop process, with regard to memory access, will be determined based solely on the size of the loop. If the loop is below a given size, it will be executable without overwriting existing latches, and therefore will not incur memory access delays as it repeatedly executes instructions contained within the latches. If the loop is above a given size, it will overwrite existing latches containing portions of the loop, and therefore require subsequent re-loadings of the latch with each loop. Because the pre-fetch is automatic, and determined solely on the currently accessed instruction, the complexity and overhead associated with this memory acceleration is minimal.

    摘要翻译: 存储器加速器模块使用确定性访问协议来缓冲用于高速访问的程序指令和/或数据。 程序存储器在逻辑上被划分为“条带”或“循环顺序”分区,并且存储器加速器模块包括与每个分区相关联的锁存器。 当访问特定分区时,它被加载到其对应的锁存器中,并且下一个顺序分区中的指令被自动预取到其对应的锁存器中。 以这种方式,顺序访问过程的性能将具有已知的响应,因为当程序对这些指令进行排序时,来自下一分区的预取指令将在锁存器中。 先前访问的块保留在其对应的锁存器中,直到预取处理“周转”并覆盖每个顺序访问的锁存器的内容。 以这种方式,关于存储器访问的循环处理的执行将仅基于循环的大小来确定。 如果循环低于一个给定的大小,它将可执行而不会覆盖现有的锁存器,因此它不会因为重复执行包含在锁存器内的指令而引起存储器访问延迟。 如果循环高于给定尺寸,它将覆盖包含循环部分的现有锁存器,因此需要随后每个循环重新加载锁存器。 因为预取是自动的,并且仅根据当前访问的指令确定,与该存储器加速相关联的复杂性和开销是最小的。

    Memory accelerator for ARM processor pre-fetching multiple instructions from cyclically sequential memory partitions
    2.
    发明授权
    Memory accelerator for ARM processor pre-fetching multiple instructions from cyclically sequential memory partitions 有权
    ARM处理器的内存加速器从循环顺序的内存分区预取多个指令

    公开(公告)号:US06799264B2

    公开(公告)日:2004-09-28

    申请号:US09788691

    申请日:2001-02-20

    IPC分类号: G06F906

    摘要: A memory accelerator module buffers program instructions and/or data for high speed access using a deterministic access protocol. The program memory is logically partitioned into ‘stripes’, or ‘cyclically sequential’ partitions, and the memory accelerator module includes a latch that is associated with each partition. When a particular partition is accessed, it is loaded into its corresponding latch, and the instructions in the next sequential partition are automatically pre-fetched into their corresponding latch. In this manner, the performance of a sequential-access process will have a known response, because the pre-fetched instructions from the next partition will be in the latch when the program sequences to these instructions. Previously accessed blocks remain in their corresponding latches until the pre-fetch process ‘cycles around’ and overwrites the contents of each sequentially-accessed latch. In this manner, the performance of a loop process, with regard to memory access, will be determined based solely on the size of the loop. If the loop is below a given size, it will be executable without overwriting existing latches, and therefore will not incur memory access delays as it repeatedly executes instructions contained within the latches. If the loop is above a given size, it will overwrite existing latches containing portions of the loop, and therefore require subsequent re-loadings of the latch with each loop. Because the pre-fetch is automatic, and determined solely on the currently accessed instruction, the complexity and overhead associated with this memory acceleration is minimal.

    摘要翻译: 存储器加速器模块使用确定性访问协议来缓冲用于高速访问的程序指令和/或数据。 程序存储器在逻辑上被划分为“条带”或“循环顺序”分区,并且存储器加速器模块包括与每个分区相关联的锁存器。 当访问特定分区时,它被加载到其对应的锁存器中,并且下一个顺序分区中的指令被自动预取到其对应的锁存器中。 以这种方式,顺序访问过程的性能将具有已知的响应,因为当程序对这些指令进行排序时,来自下一分区的预取指令将在锁存器中。 先前访问的块保留在其对应的锁存器中,直到预取处理“周转”并覆盖每个顺序访问的锁存器的内容。 以这种方式,关于存储器访问的循环处理的执行将仅基于循环的大小来确定。 如果循环低于一个给定的大小,它将可执行而不会覆盖现有的锁存器,因此它不会因为重复执行包含在锁存器内的指令而引起存储器访问延迟。 如果循环高于给定尺寸,它将覆盖包含循环部分的现有锁存器,因此需要随后每个循环重新加载锁存器。 因为预取是自动的,并且仅根据当前访问的指令确定,与该存储器加速相关联的复杂性和开销是最小的。

    Cyclically sequential memory prefetch
    3.
    发明授权
    Cyclically sequential memory prefetch 有权
    循环顺序存储器预取

    公开(公告)号:US06643755B2

    公开(公告)日:2003-11-04

    申请号:US09788692

    申请日:2001-02-20

    IPC分类号: G06F1200

    摘要: A memory access architecture and technique employs multiple independent buffers that are configured to store items from memory sequentially. The memory is logically partitioned, and each independent buffer is associated with a corresponding memory partition. The partitioning is cyclically sequential, based on the total number of buffers, K, and the size of the buffers, N. The first N memory locations are allocated to the first partition; the next N memory locations to the second partition; and so on until the Kth partition. The next N memory locations, after the Kth partition, are allocated to the first partition; the next N locations are allocated to the second partition; and so on. When an item is accessed from memory, the buffer corresponding to the item's memory location is loaded from memory, and a prefetch of the next sequential partition commences to load the next buffer. During program execution, the ‘steady state’ of the buffer contents corresponds to a buffer containing the current instruction, one or more buffers containing instructions immediately following the current instruction, and one or more buffers containing instructions immediately preceding the current instruction. This steady state condition is particularly well suited for executing program loops, or a continuous sequence of program instructions, and other common program structures. The parameters K and N are selected to accommodate typically sized program loops.

    摘要翻译: 存储器访问架构和技术采用多个独立缓冲器,其被配置为顺序存储来自存储器的项目。 存储器被逻辑地分区,并且每个独立的缓冲器与相应的存储器分区相关联。 基于缓冲区总数K和缓冲区N的大小,分区是循环的顺序。前N个存储器位置被分配给第一分区; 下一个N个存储器位置到第二个分区; 等等,直到第K个分区。 在第K个分区之后的下一个N个存储单元被分配给第一个分区; 接下来的N个位置被分配给第二分区; 等等。 当从存储器访问项目时,与存储器对应的缓冲区从存储器加载,并且下一个顺序分区的预取开始加载下一个缓冲区。 在程序执行期间,缓冲内容的“稳定状态”对应于包含当前指令的缓冲器,一个或多个缓冲区,其中包含紧跟在当前指令之后的指令,以及一个或多个缓冲区,其中包含紧邻当前指令之前的指令。 这种稳态条件特别适用于执行程序循环,或程序指令的连续序列以及其他通用程序结构。 选择参数K和N以适应通常尺寸的程序循环。