MATRIX OPERATION OPTIMIZATION MECHANISM

    公开(公告)号:US20240427842A1

    公开(公告)日:2024-12-26

    申请号:US18674212

    申请日:2024-05-24

    Abstract: An apparatus to facilitate machine learning matrix processing is disclosed. The apparatus comprises a memory to store matrix data one or more processors to execute an instruction to examine a message descriptor included in the instruction to determine a type of matrix layout manipulation operation that is to be executed, examine a message header included in the instruction having a plurality of parameters that define a two-dimensional (2D) memory surface that is to be retrieved, retrieve one or more blocks of the matrix data from the memory based on the plurality of parameters and a register file including a plurality of registers, wherein the one or more blocks of the matrix data is stored within a first set of the plurality of registers.

    CROSS-THREAD REGISTER SHARING FOR MATRIX MULTIPLICATION COMPUTE

    公开(公告)号:US20240168807A1

    公开(公告)日:2024-05-23

    申请号:US18056949

    申请日:2022-11-18

    CPC classification number: G06F9/5027 G06F9/48 G06F9/522 G06F15/8046

    Abstract: An apparatus to facilitate cross-thread register sharing for matrix multiplication compute is disclosed. The apparatus includes matrix acceleration hardware comprising a plurality of data processing units, wherein the respective plurality of data processing units are to: receive a decoded instruction for a first thread having a first register space, wherein the decoded instruction is for a matrix multiplication operation and comprises an indication to utilize a second register space of a second thread for an operand of the decoded instruction for the first thread; access the second register space of the second thread to obtain data for the operand of the decoded instruction; and perform the matrix multiplication operation for the first thread using the data for the operand from the second register space of the second thread.

    DETERMINISTIC BROADCASTING FROM SHARED MEMORY

    公开(公告)号:US20240111534A1

    公开(公告)日:2024-04-04

    申请号:US17957486

    申请日:2022-09-30

    CPC classification number: G06F9/30047 G06F9/3009 G06F9/542

    Abstract: Embodiments described herein provide a technique enable a broadcast load from an L1 cache or shared local memory to register files associated with hardware threads of a graphics core. One embodiment provides a graphics processor comprising a cache memory and a graphics core coupled with the cache memory. The graphics core includes a plurality of hardware threads and memory access circuitry to facilitate access to memory by the plurality of hardware threads. The graphics core is configurable to process a plurality of load request from the plurality of hardware threads, detect duplicate load requests within the plurality of load requests, perform a single read from the cache memory in response to the duplicate load requests, and transmit data associated with the duplicate load requests to requesting hardware threads.

    TEMPORAL MOTION VECTOR PREDICTION CONTROL IN VIDEO CODING

    公开(公告)号:US20200068216A1

    公开(公告)日:2020-02-27

    申请号:US16666275

    申请日:2019-10-28

    Abstract: Temporal motion vector prediction control is described in video coding. In one example, a method includes receiving a plurality of frames representing encoded video, parsing an uncompressed header for each frame, determining whether a temporal motion vector prediction command is included within the parsed uncompressed header of a first frame, selecting a reference frame from a reference list of frames, retrieving motion vector information from the selected reference frame, performing temporal motion vector prediction on the first frame corresponding to the parsed uncompressed header if a temporal motion vector prediction command is included within the parsed header to form a motion predicted frame, applying a loop filter to the motion predicted frame, and rendering the frame as decoded video.

Patent Agency Ranking