APPROACH FOR ENFORCING ORDERING BETWEEN MEMORY-CENTRIC AND CORE-CENTRIC MEMORY OPERATIONS

    公开(公告)号:US20220317926A1

    公开(公告)日:2022-10-06

    申请号:US17219446

    申请日:2021-03-31

    Abstract: Ordering between memory-centric memory operations, referred to hereinafter as “MC-Mem-Ops,” and core-centric memory operations, referred to hereinafter as “CC-Mem-Ops,” is enforced using inter-centric fences, referred to hereinafter as an “IC-fences.” IC-fences are implemented by an ordering primitive or ordering instruction, that cause a memory controller, a cache controller, etc., to enforce ordering of MC-Mem-Ops and CC-Mem-Ops throughout the memory pipeline and at the memory controller by not reordering MC-Mem-Ops (or sometimes CC-Mem-Ops) that arrive before the IC-fence to after the IC-fence. Processing of an IC-fence also causes the memory controller to issue an ordering acknowledgment to the thread that issued the IC-fence instruction. IC-fences are tracked at the core and designated as complete when the ordering acknowledgment is received. Embodiments include a completion level-specific cache flush operation which, when used with an IC-fence, provides proper ordering between cached CC-Mem-Ops and MC-Mem-ops with reduced data transfer and completion times.

    DATA FABRIC CLOCK SWITCHING
    383.
    发明申请

    公开(公告)号:US20220317755A1

    公开(公告)日:2022-10-06

    申请号:US17219407

    申请日:2021-03-31

    Abstract: A memory controller couples to a data fabric clock domain, and to a physical layer interface circuit PHY clock domain. A first interface circuit adapts transfers between the data fabric clock domain (FCLK) and the memory controllers clock domain, and a second interface circuit couples the memory controller to the PHY clock domain. A power controller responds to a power state change request by sending commands to the second interface circuit to change parameters of a memory system and to update a set of timing parameters of the memory controller according to a selected power state of a plurality of power states. The power controller further responds to a request to synchronize with a new frequency on the FCLK domain by changing a set of timing parameters of the clock interface circuit without changing the set of timing parameters of the memory system or the selected power state.

    Distributed user mode processing
    384.
    发明授权

    公开(公告)号:US11461137B2

    公开(公告)日:2022-10-04

    申请号:US16721456

    申请日:2019-12-19

    Abstract: A first processing unit such as a graphics processing unit (GPU) pipelines that execute commands and a scheduler to schedule one or more first commands for execution by one or more of the pipelines. The one or more first commands are received from a user mode driver in a second processing unit such as a central processing unit (CPU). The scheduler schedules one or more second commands for execution in response to completing execution of the one or more first commands and without notifying the second processing unit. In some cases, the first processing unit includes a direct memory access (DMA) engine that writes blocks of information from the first processing unit to a memory. The one or more second commands program the DMA engine to write a block of information including results generated by executing the one or more first commands.

    SYNCHRONIZATION FREE CROSS PASS BINNING THROUGH SUBPASS INTERLEAVING

    公开(公告)号:US20220309729A1

    公开(公告)日:2022-09-29

    申请号:US17565394

    申请日:2021-12-29

    Abstract: A method of tiled rendering is provided which comprises dividing a frame to be rendered, into a plurality of tiles, receiving commands to execute a plurality of subpasses of the tiles and interleaving execution of same subpasses of multiple tiles of the frame. Interleaving execution of same subpasses of multiple tiles comprises executing a previously ordered first subpass of a second tile between execution of the previously ordered first subpass of a first tile and execution of a subsequently ordered second subpass of the first tile. The interleaving is performed, for example, by executing the plurality of subpasses in an order different from the order in which the commands to execute the plurality of subpasses are stored and issued. Alternatively, interleaving is performed by executing one or more subpasses as skip operations such that the plurality of subpasses are executed in the same order.

    DYNAMICALLY RECONFIGURABLE REGISTER FILE

    公开(公告)号:US20220309606A1

    公开(公告)日:2022-09-29

    申请号:US17214762

    申请日:2021-03-26

    Abstract: Techniques for managing register allocation are provided. The techniques include detecting a first request to allocate first registers for a first wavefront; first determining, based on allocation information, that allocating the first registers to the first wavefront would result in a condition in which a deadlock is possible; in response to the first determining, refraining from allocating the first registers to the first wavefront; detecting a second request to allocate second registers for a second wavefront; second determining, based on the allocation information, that allocating the second registers to the second wavefront would result in a condition in which deadlock is not possible; and in response to the second determining, allocating the second registers to the second wavefront.

    Multi-class multi-label classification using clustered singular decision trees for hardware adaptation

    公开(公告)号:US11455252B2

    公开(公告)日:2022-09-27

    申请号:US16454027

    申请日:2019-06-26

    Abstract: Techniques for generating a model for predicting when different hybrid prefetcher configurations should be used are disclosed. Techniques for using the model to predict when different hybrid prefetcher configurations should be used are also disclosed. The techniques for generating the model include obtaining a set of input data, and generating trees based on the training data. Each tree is associated with a different hybrid prefetcher configuration and the trees output certainty scores for the associated hybrid prefetcher configuration based on hardware feature measurements. To decide on a hybrid prefetcher configuration to use, a prefetcher traverses multiple trees to obtain certainty scores for different hybrid prefetcher configurations and identifies a hybrid prefetcher configuration to used based on a comparison of the certainty scores.

    Dynamic instances semantics
    389.
    发明授权

    公开(公告)号:US11455153B2

    公开(公告)日:2022-09-27

    申请号:US16544796

    申请日:2019-08-19

    Inventor: Nicolai Haehnle

    Abstract: A computing system includes a processor and a memory storing instructions for a compiler that, when executed by the processor, cause the processor to generate a control flow graph of program source code by receiving the program source code in the compiler, in the compiler, generating a structure point representation based on the received program source code by inserting into the program source code a set of structure points including an anchor structure point and a join structure point associated with the anchor structure point, and based on the structure point representation, generating the control flow graph including a plurality of blocks each representing a portion of the program source code. In the control flow graph, a block A between the anchor structure point and the join structure point post-dominates each of the one or more divergent branches between the anchor structure point and the join structure point.

    Secure computer vision processing
    390.
    发明授权

    公开(公告)号:US11443051B2

    公开(公告)日:2022-09-13

    申请号:US16228349

    申请日:2018-12-20

    Abstract: A computer vision processor in an image cluster defines a fenced memory region (FMR) that controls access to image data stored in a first portion of a trusted memory region (TMR). The computer vision processor receives FMR requests from an application implemented in a processing cluster. The FMR requests are to access the image data in the first portion of the TMR. The computer vision processor selectively allows the requesting application to access the image data. In some cases, the computer vision processor acquires the image data and stores the image data in the first portion of the TMR, such as buffers in the TMR. A data fabric selectively permits the image processing application to access the data stored in the TMR based on whether the image cluster has opened or closed the FMR for the portion of the TMR.

Patent Agency Ranking