摘要:
A memory accelerator module buffers program instructions and/or data for high speed access using a deterministic access protocol. The program memory is logically partitioned into ‘stripes’, or ‘cyclically sequential’ partitions, and the memory accelerator module includes a latch that is associated with each partition. When a particular partition is accessed, it is loaded into its corresponding latch, and the instructions in the next sequential partition are automatically pre-fetched into their corresponding latch. In this manner, the performance of a sequential-access process will have a known response, because the pre-fetched instructions from the next partition will be in the latch when the program sequences to these instructions. Previously accessed blocks remain in their corresponding latches until the pre-fetch process ‘cycles around’ and overwrites the contents of each sequentially-accessed latch. In this manner, the performance of a loop process, with regard to memory access, will be determined based solely on the size of the loop. If the loop is below a given size, it will be executable without overwriting existing latches, and therefore will not incur memory access delays as it repeatedly executes instructions contained within the latches. If the loop is above a given size, it will overwrite existing latches containing portions of the loop, and therefore require subsequent re-loadings of the latch with each loop. Because the pre-fetch is automatic, and determined solely on the currently accessed instruction, the complexity and overhead associated with this memory acceleration is minimal.
摘要:
A memory accelerator module buffers program instructions and/or data for high speed access using a deterministic access protocol. The program memory is logically partitioned into ‘stripes’, or ‘cyclically sequential’ partitions, and the memory accelerator module includes a latch that is associated with each partition. When a particular partition is accessed, it is loaded into its corresponding latch, and the instructions in the next sequential partition are automatically pre-fetched into their corresponding latch. In this manner, the performance of a sequential-access process will have a known response, because the pre-fetched instructions from the next partition will be in the latch when the program sequences to these instructions. Previously accessed blocks remain in their corresponding latches until the pre-fetch process ‘cycles around’ and overwrites the contents of each sequentially-accessed latch. In this manner, the performance of a loop process, with regard to memory access, will be determined based solely on the size of the loop. If the loop is below a given size, it will be executable without overwriting existing latches, and therefore will not incur memory access delays as it repeatedly executes instructions contained within the latches. If the loop is above a given size, it will overwrite existing latches containing portions of the loop, and therefore require subsequent re-loadings of the latch with each loop. Because the pre-fetch is automatic, and determined solely on the currently accessed instruction, the complexity and overhead associated with this memory acceleration is minimal.
摘要:
A memory access architecture and technique employs multiple independent buffers that are configured to store items from memory sequentially. The memory is logically partitioned, and each independent buffer is associated with a corresponding memory partition. The partitioning is cyclically sequential, based on the total number of buffers, K, and the size of the buffers, N. The first N memory locations are allocated to the first partition; the next N memory locations to the second partition; and so on until the Kth partition. The next N memory locations, after the Kth partition, are allocated to the first partition; the next N locations are allocated to the second partition; and so on. When an item is accessed from memory, the buffer corresponding to the item's memory location is loaded from memory, and a prefetch of the next sequential partition commences to load the next buffer. During program execution, the ‘steady state’ of the buffer contents corresponds to a buffer containing the current instruction, one or more buffers containing instructions immediately following the current instruction, and one or more buffers containing instructions immediately preceding the current instruction. This steady state condition is particularly well suited for executing program loops, or a continuous sequence of program instructions, and other common program structures. The parameters K and N are selected to accommodate typically sized program loops.