Memory latency-aware GPU architecture

    公开(公告)号:US12067642B2

    公开(公告)日:2024-08-20

    申请号:US17030024

    申请日:2020-09-23

    Abstract: One or more processing units, such as a graphics processing unit (GPU), execute an application. A resource manager selectively allocates a first memory portion or a second memory portion to the processing units based on memory access characteristics. The first memory portion has a first latency that is lower that a second latency of the second memory portion. In some cases, the memory access characteristics indicate a latency sensitivity. In some cases, hints included in corresponding program code are used to determine the memory access characteristics. The memory access characteristics can also be determined by monitoring memory access requests, measuring a cache miss rate or a row buffer miss rate for the monitored memory access requests, and determining the memory access characteristics based on the cache miss rate or the row buffer miss rate.

    Zero value memory compression
    103.
    发明授权

    公开(公告)号:US12066944B2

    公开(公告)日:2024-08-20

    申请号:US16723780

    申请日:2019-12-20

    Abstract: A coherency management device receives requests to read data from or write data to an address in a main memory. On a write, if the data includes zero data, an entry corresponding to the memory address is created in a cache directory if it does not already exist, is set to an invalid state, and indicates that the data includes zero data. The zero data is not written to main memory or a cache. On a read, the cache directory is checked for an entry corresponding to the memory address. If the entry exists in the cache directory, is invalid, and includes an indication that data corresponding to the memory address includes zero data, the coherency management device returns zero data in response to the request without fetching the data from main memory or a cache.

    Processor with multiple fetch and decode pipelines

    公开(公告)号:US12039337B2

    公开(公告)日:2024-07-16

    申请号:US17032494

    申请日:2020-09-25

    CPC classification number: G06F9/3804 G06F9/30058 G06F9/3822 G06F9/3867

    Abstract: A processor employs a plurality of fetch and decode pipelines by dividing an instruction stream into instruction blocks with identified boundaries. The processor includes a branch predictor that generates branch predictions. Each branch prediction corresponds to a branch instruction and includes a prediction that the corresponding branch is to be taken or not taken. In addition, each branch prediction identifies both an end of the current branch prediction window and the start of another branch prediction window. Using these known boundaries, the processor provides different sequential fetch streams to different ones of the plurality of fetch and decode states, which concurrently process the instructions of the different fetch streams, thereby improving overall instruction throughput at the processor.

    A/D bit storage, processing, and modes

    公开(公告)号:US12038847B2

    公开(公告)日:2024-07-16

    申请号:US17952933

    申请日:2022-09-26

    Inventor: William A. Moyes

    CPC classification number: G06F12/1009 G06F12/0811

    Abstract: A/D bit storage, processing, and mode management techniques through use of a dense A/D bit representation are described. In one example, a memory management unit employs an A/D bit representation generation module to generate the dense A/D bit representation. In an implementation, the A/D bit representation is stored adjacent to existing page table structures of the multilevel page table hierarchy. In another example, memory management unit supports use of modes as part of A/D bit storage.

    Dead surface invalidation
    109.
    发明授权

    公开(公告)号:US12033239B2

    公开(公告)日:2024-07-09

    申请号:US17563950

    申请日:2021-12-28

    CPC classification number: G06T1/60 G06F12/0891 G06T1/20 G06F2212/455

    Abstract: Systems, apparatuses, and methods for performing dead surface invalidation are disclosed. An application sends draw call commands to a graphics processing unit (GPU) via a driver, with the draw call commands rendering to surfaces. After it is determined that a given surface will no longer be accessed by subsequent draw calls, the application sends a surface invalidation command for the given surface to a command processor of the GPU. After the command processor receives the surface invalidation command, the command processor waits for a shader engine to send a draw call completion message for a last draw call to access the given surface. Once the command processor receives the draw call completion message, the command processor sends a surface invalidation command to a cache to invalidate cache lines for the given surface to free up space in the cache for other data.

    Partial sorting for coherency recovery

    公开(公告)号:US12032967B2

    公开(公告)日:2024-07-09

    申请号:US17845938

    申请日:2022-06-21

    CPC classification number: G06F9/3887 G06F9/3012 G06F9/4881 G06F9/5016

    Abstract: Devices and methods for partial sorting for coherence recovery are provided. The partial sorting is efficiently executed by utilizing existing hardware along the memory path (e.g., memory local to the compute unit). The devices include an accelerated processing device which comprises memory and a processor. The processor is, for example, a compute unit of a GPU which comprises a plurality of SIMD units and is configured to determine, for data entries each comprising a plurality of bits, a number of occurrences of different types of the data entries by storing the number of occurrences in one or more portions of the memory local to the processor, sort the data entries based on the determined number of occurrences stored in the one or more portions of the memory local to the processor and execute the sorted data entries.

Patent Agency Ranking