摘要:
Methods and apparatus for providing additional storage, in the form of a hardware assisted stack, usable by software running an environment with limited resources. As an example, the hardware assisted stack may provide additional stack space to VBIOS code that is accessible within its limited allocated address space.
摘要:
Deadlocks are avoided by marking read requests issued by a parallel processor to system memory as “special.” Read completions associated with read requests marked as special are routed on virtual channel 1 of the PCIe bus. Data returning on virtual channel 1 cannot become stalled by write requests in virtual channel 0, thus avoiding a potential deadlock.
摘要:
Software can freeze portions of a pipeline operation in a processor by asserting a predetermined freeze register in the processor. The processor halts operations relating to portions of a common pipeline processing in response to an asserted freeze register. Processor resources that operate downstream from the common pipeline continue to process any scheduled instructions. The processor is prevented from initiating any context switching in which a processor resource is allocated to a different channel. The processor stops supplying any additional data to downstream resources and ensures that the interface to downstream resources is clear of previously sent data. The processor prevents state machines from making additional requests. The processor asserts an acknowledgement indication in response to the freeze assertion when the processing has reached a stable state. Software is allowed to manipulate states and registers within the processor. Clearing the freeze register allows processing to resume.
摘要:
Methods, apparatuses, and systems are presented for performing asynchronous communications involving using an asynchronous interface to send signals between a source device and a plurality of client devices, the source device and the plurality of client devices being part of a processing unit capable of performing graphics operations, the source device being coupled to the plurality of client devices using the asynchronous interface, wherein the asynchronous interface includes at least one request signal, at least one address signal, at least one acknowledge signal, and at least one data signal, and wherein the asynchronous interface operates in accordance with at least one programmable timing characteristic associated with the source device.
摘要:
Deadlocks are avoided by marking read requests issued by a parallel processor to system memory as “special.” Read completions associated with read requests marked as special are routed on virtual channel 1 of the PCIe bus. Data returning on virtual channel 1 cannot become stalled by write requests in virtual channel 0, thus avoiding a potential deadlock.
摘要:
A processor having multiple independent engines can concurrently support a number of independent processes or operation contexts. The processor can independently schedule instructions for execution by the engines. The processor can independently switch the operation context that an engine supports. The processor can maintain the integrity of the operations performed and data processed by each engine during a context switch by controlling the manner in which the engine transitions from one operation context to the next. The processor can wait for the engine to complete processing of pipelined instructions of a first context before switching to another context, or the processor can halt the operation of the engine in the midst of one or more instructions to allow the engine to execute instructions corresponding to another context. The processor can affirmatively verify completion of tasks for a specific operation context.
摘要:
Circuits, methods, and apparatus that allow the elimination of a frame buffer connected directly to a graphics processing unit. The graphics processing unit includes an on-chip memory. Following system power-up or reset, the GPU initially renders comparatively low-resolution images to the on-chip memory for display. Afterward, the GPU renders images, which are typically higher resolution, and stores them in a system memory, apart from the graphics processing unit. The on-chip memory, which is no longer needed for image storage, instead stores address information, referred to as page tables, identifying the location of data stored by the GPU in the separate system memory.
摘要:
Circuits, methods, and apparatus that reduce or eliminate system memory accesses to retrieve address translation information. In one example, these accesses are reduced or eliminated by pre-populating a graphics TLB with entries that are used to translate virtual addresses used by a GPU to physical addresses used by a system memory. Translation information is maintained by locking or restricting entries in the graphics TLB that are needed for display access. This may be done by limiting access to certain locations in the graphics TLB, by storing flags or other identifying information in the graphics TLB, or by other appropriate methods. In another example, memory space is allocated by a system BIOS for a GPU, which stores a base address and address range. Virtual addresses in the address range are translated by adding them to the base address.
摘要:
Techniques are disclosed for peer-to-peer data transfers where a source device receives a request to read data words from a target device. The source device creates a first and second read command for reading a first portion and a second portion of a plurality of data words from the target device, respectively. The source device transmits the first read command to the target device, and, before a first read operation associated with the first read command is complete, transmits the second read command to the target device. The first and second portions of the plurality of data words are stored in a first and second portion a buffer memory, respectively. Advantageously, an arbitrary number of multiple read operations may be in progress at a given time without using multiple peer-to-peer memory buffers. Performance for large data block transfers is improved without consuming peer-to-peer memory buffers needed by other peer GPUs.
摘要:
Non-contiguous or tiled payload data are efficiently transferred between peers over a fabric. Specifically, a client transfers a byte enable message to a peer device via a mailbox mechanism, where the byte enable message specifies which bytes of the payload data being transferred via the data packet are to be written to the frame buffer on the peer device and which bytes are not to be written. The client transfers the non-contiguous or tiled payload payload data to the peer device. Upon receiving the payload data, the peer device writes bytes from the payload data into the target frame buffer for only those bytes enabled via the byte enable message. One advantage of the present invention is that non-contiguous or tiled data are transferred over a fabric with improved efficiency.