Abstract:
A method and apparatus for enhancing/extending a serial point-to-point interconnect architecture, such as Peripheral Component Interconnect Express (PCIe) is herein described. Temporal and locality caching hints and prefetching hints are provided to improve system wide caching and prefetching. Message codes for atomic operations to arbitrate ownership between system devices/resources are included to allow efficient access/ownership of shared data. Loose transaction ordering provided for while maintaining corresponding transaction priority to memory locations to ensure data integrity and efficient memory access. Active power sub-states and setting thereof is included to allow for more efficient power management. And, caching of device local memory in a host address space, as well as caching of system memory in a device local memory address space is provided for to improve bandwidth and latency for memory accesses.
Abstract:
An apparatus and method are described for providing low-latency invocation of accelerators. For example, a processor according to one embodiment comprises: a command register for storing command data identifying a command to be executed; a result register to store a result of the command or data indicating a reason why the command could not be executed; execution logic to execute a plurality of instructions including an accelerator invocation instruction to invoke one or more accelerator commands; and one or more accelerators to read the command data from the command register and responsively attempt to execute the command identified by the command data.
Abstract:
Method, apparatus, and systems employing dictionary-based high-bandwidth lossless compression. A pair of dictionaries having entries that are synchronized and encoded to support compression and decompression operations are implemented via logic at a compressor and decompressor. The compressor/decompressor logic operatives in a cooperative manner, including implementing the same dictionary update schemes, resulting in the data in the respective dictionaries being synchronized. The dictionaries are also configured with replaceable entries, and replacement policies are implemented based on matching bytes of data within sets of data being transferred over the link. Various schemes are disclosed for entry replacement, as well as a delayed dictionary update technique. The techniques support line-speed compression and decompression using parallel operations resulting in substantially no latency overhead.
Abstract:
A method and apparatus for enhancing/extending a serial point-to-point interconnect architecture, such as Peripheral Component Interconnect Express (PCIe) is herein described. Temporal and locality caching hints and prefetching hints are provided to improve system wide caching and prefetching. Message codes for atomic operations to arbitrate ownership between system devices/resources are included to allow efficient access/ownership of shared data. Loose transaction ordering provided for while maintaining corresponding transaction priority to memory locations to ensure data integrity and efficient memory access. Active power sub-states and setting thereof is included to allow for more efficient power management. And, caching of device local memory in a host address space, as well as caching of system memory in a device local memory address space is provided for to improve bandwidth and latency for memory accesses.
Abstract:
A method and apparatus for enhancing/extending a serial point-to-point interconnect architecture, such as Peripheral Component Interconnect Express (PCIe) is herein described. Temporal and locality caching hints and prefetching hints are provided to improve system wide caching and prefetching. Message codes for atomic operations to arbitrate ownership between system devices/resources are included to allow efficient access/ownership of shared data. Loose transaction ordering provided for while maintaining corresponding transaction priority to memory locations to ensure data integrity and efficient memory access. Active power sub-states and setting thereof is included to allow for more efficient power management. And, caching of device local memory in a host address space, as well as caching of system memory in a device local memory address space is provided for to improve bandwidth and latency for memory accesses.
Abstract:
An apparatus and method are described for retrieving elements from a linked structure. For example, one embodiment of an apparatus comprises: a decode unit to decode a first instruction, the first instruction to utilize a current address value, an end address value, and an offset; and an execution unit to execute the first instruction to cause the execution unit to compare the current address value with the end address value, the execution unit to perform no additional operation with respect to the first instruction if the current address value is equal to the end address value; and if the current address value is not equal to the end address value, then the execution unit to add the offset value to the current address value to identify a next address pointer within an element structure, the execution unit to further set the current address value equal to the next address pointer.
Abstract:
An apparatus and method are described for providing low-latency invocation of accelerators. For example, a system according to one embodiment comprises: a processor includes a plurality of simultaneous multithreading (SMT) cores, at least one shared cache circuit to be shared among two or more of the SMT cores; and at least one of the SMT cores including at least one level 2 (L2) cache circuit to store both instructions and data and communicatively coupled to the instruction cache circuit and the data cache circuit, a communication interconnect circuit including a peripheral component interconnect express (PCIe) circuit to communicatively couple one or more of the SMT cores to an accelerator device and a memory access circuit to identify an accelerator context save/restore region in a memory responsive to a context save/restore value, the accelerator context save/restore region to share an accelerator context state.
Abstract:
A system on a chip (SoC) is provided including processing cores and a root complex. The transaction requests are communicated between a root port of the root complex and a device, the root port including electrical idle (EI) exit detect circuitry and a reference clock source. The root port supports a first link state, in which the reference clock source and EI exit detect circuitry of the root port are disabled but a common mode voltage is maintained, and a second link state, in which the reference clock source and EI exit detect circuitry are disabled and the common mode voltage is not maintained. The root port transitions to the first link state based on a service latency requirement of the device being less than a threshold and to the second link state based on the service latency requirement being greater than or equal to the threshold.
Abstract:
A method and apparatus to reduce the idle link power in a platform. In one embodiment of the invention, the host and its coupled endpoint(s) in the platform each has a low power idle link state that allows disabling of the high speed link circuitry in both the host and its coupled endpoint(s). This allows the platform to reduce its idle power as both the host and its coupled endpoint(s) are able to turn off their high speed link circuitry in one embodiment of the invention.
Abstract:
A method and apparatus for enhancing/extending a serial point-to-point interconnect architecture, such as Peripheral Component Interconnect Express (PCIe) is herein described. Temporal and locality caching hints and prefetching hints are provided to improve system wide caching and prefetching. Message codes for atomic operations to arbitrate ownership between system devices/resources are included to allow efficient access/ownership of shared data. Loose transaction ordering provided for while maintaining corresponding transaction priority to memory locations to ensure data integrity and efficient memory access. Active power sub-states and setting thereof is included to allow for more efficient power management. And, caching of device local memory in a host address space, as well as caching of system memory in a device local memory address space is provided for to improve bandwidth and latency for memory accesses.