摘要:
A context forwarding bus efficiently communicates control and data between processing elements in a processor unit having a plurality of processing elements. Control and data information is transferred over a first bus from processing element to processing element.
摘要:
A method according to one embodiment may include performing one or more fetch operations to retrieve one or more instructions from a program memory; scheduling a write instruction to write data from at least one data register into the program memory; and stealing one or more cycles from one or more of the fetch operations to write the data in the at least one data register into the program memory. Of course, many alternatives, variations, and modifications are possible without departing from this embodiment.
摘要:
Systems and methods are disclosed for aligning data in memory access and other applications. In one embodiment a system is provided that includes a memory unit, a shifter, and control logic operable to route data from the memory unit to the shifter and to send an indication to the shifter of an amount by which the data is to be shifted. In one embodiment, the control logic provides support for speculative execution. The control logic may also permit multiplexing of big endian and little endian data alignment operations, and/or multiplexing of data alignment operations with non-data alignment operations. In one embodiment, the memory unit, shifter, and control logic are integrated within a processing unit, such as a microengine in a network processor.
摘要:
In general, in one aspect, the disclosure describes storing identification of one or more memory buckets associated with different, respective, queued write commands, and, based on the stored identification, determining whether at least one bucket associated with a read command is included in one or more buckets associated with at least one queued write command.
摘要:
Method and apparatus to enable slower memory, such as dynamic random access memory (DRAM)-based memory, to support low-latency access using vertical caching. Related function metadata used for packet-processing functions, including metering and flow statistics, is stored in an external DRAM-based store. In one embodiment, the DRAM comprises double data-rate (DDR) DRAM. A network processor architecture is disclosed including a DDR assist with data cache coupled to a DRAM controller. The architecture further includes multiple compute engines used to execute various packet-processing functions. One such function is a DDR assist function that is used to pre-fetch a set of function metadata for a current packet and store the function metadata in the data cache. Subsequently, one or more packet-processing functions may operate on the function metadata by accessing it from the cache. After the functions are completed, the function metadata are written back to the DRAM-based store. The scheme provides similar performance to SRAM-based schemes, but uses much cheaper DRAM-type memory.
摘要:
A method and apparatus forenhancing/extending a serial point-to-point interconnect architecture, such as Peripheral Component Interconnect Express (PCIe) is herein described. Temporal and locality caching hints and prefetching hints are provided to improve system wide caching and prefetching. Message codes for atomic operations to arbitrate ownership between system devices/resources are included to allow efficient access/ownership of shared data. Loose transaction ordering provided for while maintaining corresponding transaction priority to memory locations to ensure data integrity and efficient memory access. Active power sub-states and setting thereof is included to allow for more efficient power management. And, caching of device local memory in a host address space, as well as caching of system memory in a device local memory address space is provided for to improve bandwidth and latency for memory accesses.
摘要:
A scalable, high-performance interconnect scheme for a multi-threaded, multi-processing system-on-a-chip network processor unit. An apparatus implementing the technique includes a plurality of masters configured in a plurality of clusters, a plurality of targets, and a chassis interconnect that may be controlled to selectively connects a given master to a given target. In one embodiment, the chassis interconnect comprises a plurality of sets of bus lines connected between the plurality of clusters and the plurality of targets forming a cross-bar interconnect, including sets of bus lines corresponding to a command bus, a pull data bus for target writes, and a push data bus for target reads. Multiplexer circuitry for each of the command bus, pull data bus, and push data bus is employed to selectively connect a given cluster to a given target to enable commands and data to be passed between the given cluster and the given target.