摘要:
A CPU module includes a host element configured to perform a high-level host-related task, and one or more data-generating processing elements configured to perform a data-generating task associated with the high-level host-related task. Each data-generating processing element includes logic configured to receive input data, and logic configured to process the input data to produce output data. The amount of output data is greater than an amount of input data, and the ratio of the amount of input data to the amount of output data defines a decompression ratio. In one implementation, the high-level host-related task performed by the host element pertains to a high-level graphics processing task, and the data-generating task pertains to the generation of geometry data (such as triangle vertices) for use within the high-level graphics processing task. The CPU module can transfer the output data to a GPU module via at least one locked set of a cache memory. The GPU retrieves the output data from the locked set, and periodically forwards a tail pointer to a cacheable location within the data-generating elements that informs the data-generating elements of its progress in retrieving the output data
摘要:
A CPU module includes a host element configured to perform a high-level host-related task, and one or more data-generating processing elements configured to perform a data-generating task associated with the high-level host-related task. Each data-generating processing element includes logic configured to receive input data, and logic configured to process the input data to produce output data.
摘要:
Methods and apparatus that may be utilized to maintain coherency of data accessed by both a processor and a remote device are provided. Various mechanisms, such as a remote cache directory, castout buffer, and/or outstanding transaction buffer may be utilized by the remote device to track the state of processor cache lines that may hold data targeted by requests initiated by the remote device. Based on the content of these mechanisms, requests targeting data that is not in the processor cache may be routed directly to memory, thus reducing overall latency.
摘要:
Methods and apparatus for reducing the amount of latency involved when accessing, by a remote device, data residing in a cache of a processor are provided. For some embodiments, virtual channels may be utilized to conduct request/response transactions between the remote device and processor that satisfy a set of associated coherency rules.
摘要:
Methods and apparatus that may be utilized to maintain coherency of data accessed by both a processor and a remote device are provided. Various mechanisms, such as a remote cache directory, castout buffer, and/or outstanding transaction buffer may be utilized by the remote device to track the state of processor cache lines that may hold data targeted by requests initiated by the remote device. Based on the content of these mechanisms, requests targeting data that is not in the processor cache may be routed directly to memory, thus reducing overall latency.
摘要:
Methods and apparatus are provided that may be utilized to maintain a copy of a processor cache directory on a remote device that may access data residing in a cache of the processor. Enhanced bus transactions containing cache coherency information used to maintain the remote cache directory may be automatically generated when the processor allocates or de-allocates cache lines. Rather than query the processor cache directory prior to each memory access to determine if the processor cache contains an updated copy of requested data, the remote device may query its remote copy.
摘要:
Methods, apparatus, and systems for quickly accessing data residing in a cache of one processor, by another processor, while avoiding lengthy accesses to main memory are provided. A portion of the cache may be placed in a lock set mode by the processor in which it resides. While in the lock set mode, this portion of the cache may be accessed directly by another processor without lengthy “backing” writes of the accessed data to main memory.
摘要:
Methods and apparatus that may be utilized to maintain coherency of data accessed by both a processor and a remote device are provided. The remote device may include coherency logic, referred to herein as a snoop filter, designed to filter memory access requests that do not require bus commands to be sent to the processor. The snoop filter may filter requests based on a remote cache directory designed to mirror the processor cache directory, such that only those requests that target cache lines indicated to be valid in the processor cache result in snoop commands sent to the processor. Other requests (targeting data that is not cached in the processor) may be routed directly to memory without the latency conventionally associated with snoop requests.
摘要:
A bus bridge between a high speed computer processor bus and a high speed output bus. The preferred embodiment is a bus bridge between a GPUL bus for a GPUL PowerPC microprocessor from International Business Machines Corporation (IBM) and an output high speed interface (MPI). Another preferred embodiment is a bus bridge in a bus transceiver on a multi-chip module.
摘要:
A high speed computer processor system has a high speed interface for a graphics processor. A preferred embodiment combines a PowerPC microprocessor called the Giga-Processor Ultralite (GPUL) 110 from International Business Machines Corporation (IBM) with a high speed interface on a multi-chip module.