摘要:
A method and apparatus for enhancing/extending a serial point-to-point interconnect architecture, such as Peripheral Component Interconnect Express (PCIe) is herein described. Temporal and locality caching hints and prefetching hints are provided to improve system wide caching and prefetching. Message codes for atomic operations to arbitrate ownership between system devices/resources are included to allow efficient access/ownership of shared data. Loose transaction ordering provided for while maintaining corresponding transaction priority to memory locations to ensure data integrity and efficient memory access. Active power sub-states and setting thereof is included to allow for more efficient power management. And, caching of device local memory in a host address space, as well as caching of system memory in a device local memory address space is provided for to improve bandwidth and latency for memory accesses.
摘要:
A method and apparatus for enhancing/extending a serial point-to-point interconnect architecture, such as Peripheral Component Interconnect Express (PCIe) is herein described. Temporal and locality caching hints and prefetching hints are provided to improve system wide caching and prefetching. Message codes for atomic operations to arbitrate ownership between system devices/resources are included to allow efficient access/ownership of shared data. Loose transaction ordering provided for while maintaining corresponding transaction priority to memory locations to ensure data integrity and efficient memory access. Active power sub-states and setting thereof is included to allow for more efficient power management. And, caching of device local memory in a host address space, as well as caching of system memory in a device local memory address space is provided for to improve bandwidth and latency for memory accesses.
摘要:
A method of and apparatus for associating units of data with threads of a multi-threaded processor for processing, and enabling each thread to perform processing for at least two of the data units during a thread execution period. The thread execution period is divided among phases, and each of the data units processed by a thread is processed by a different one of the phases.
摘要:
A method and apparatus for enhancing/extending a serial point-to-point interconnect architecture, such as Peripheral Component Interconnect Express (PCIe) is herein described. Temporal and locality caching hints and prefetching hints are provided to improve system wide caching and prefetching. Message codes for atomic operations to arbitrate ownership between system devices/resources are included to allow efficient access/ownership of shared data. Loose transaction ordering provided for while maintaining corresponding transaction priority to memory locations to ensure data integrity and efficient memory access. Active power sub-states and setting thereof is included to allow for more efficient power management. And, caching of device local memory in a host address space, as well as caching of system memory in a device local memory address space is provided for to improve bandwidth and latency for memory accesses.
摘要:
Techniques that may be utilized in a multiprocessor system are described. In one embodiment, one or more signals are generated to indicate that a breakpoint instruction is executed by one of the plurality of processors in the multiprocessor system.
摘要:
A scalable, two-stage rotating priority arbiter with re-circulation and bounded latency for use in multi-threaded, multi-processing devices. An apparatus implementing the two-stage arbiter includes a plurality of masters configured in a plurality of clusters, a plurality of targets, and an chassis interconnect that may be controlled to selectively connects a given master to a given target. The chassis interconnect includes multiple sets of bus lines connected between the plurality of clusters and the plurality of targets forming a cross-bar interconnect, including sets of bus lines corresponding to a command bus. A two-stage arbitration scheme is employed to arbitrate access to the command bus. The first arbitration stage is used to arbitrate between target requests issued by masters in a given cluster. The second arbitration stage is used to arbitrate between winning first-stage target requests. One embodiment of the arbitration scheme employs a rotating priority arbitration scheme at the first stage. Another embodiment employs a complementary rotating priority arbitration scheme at the second stage.
摘要:
Method and apparatus to support expansion of compute engine code space by sharing adjacent control stores using interleaved addressing schemes. Instructions corresponding to an original instruction thread are partitioned into multiple interleaved sequences that are stored in respective control stores. During thread execution, instructions are retrieved from the control stores in a repeated order based on the interleaving scheme. For example, in one embodiment two compute engines share two control stores. Thus, instructions for a given thread are sequentially loaded from the control stores in an alternating manner. In another embodiment, four control stores are shared by four compute engines. In this case, the instructions in a thread are interleave using four stores, and each store is accessed every fourth instruction in the code sequence. Schemes are also provided for handling branching operations to maintain synchronized access to the control stores.
摘要:
A method, apparatus, and system for implementing off-chip cache memory in dual-use static random access memory (SRAM) memory for network processors. An off-chip SRAM memory store is partitioned into a resizable cache region and general-purpose use region (i.e., conventional SRAM use). The cache region is used to store cached data corresponding to portions of data contained in a second off-chip memory store, such as a dynamic RAM (DRAM) memory store or an alternative type of memory store, such as a Rambus DRAM (RDRAM) memory store. An on-chip cache management controller is integrated on the network processor. Various cache management schemes are disclosed, including hardware-based cache tag arrays, memory-based cache tag arrays, content-addressable memory (CAM)-based cache management, and memory address-to-cache line lookup schemes. Under one scheme, multiple network processors are enabled to access shared SRAM and shared DRAM, wherein a portion of the shared SRAM is used as a cache for the shared DRAM.
摘要:
The disclosure includes a description of a content addressable memory (CAM) that includes at least one tag input, at least one output, and at least one random access memory. The CAM includes circuitry to perform multiple read operations of the at least one random access memory with different ones of the read operations specifying an address being based on different subsets of tag bits. Based on the multiple read operations, the CAM generates at least one signal via the at least one output.