NETWORK COLLECTIVE OFFLOADING COST MANAGEMENT
    102.
    发明公开

    公开(公告)号:US20240205093A1

    公开(公告)日:2024-06-20

    申请号:US18540783

    申请日:2023-12-14

    Inventor: Josiah I. Clark

    CPC classification number: H04L41/12 H04L41/0823

    Abstract: The disclosed device includes a collective engine that can select a communication cost model from multiple communication cost models for a collective operation and configure a topology of a collective network for performing the collective operation using the selected communication cost model. Various other methods, systems, and computer-readable media are also disclosed.

    NETWORK COLLECTIVE OFFLOADING ROUTING MANAGEMENT

    公开(公告)号:US20240205092A1

    公开(公告)日:2024-06-20

    申请号:US18540779

    申请日:2023-12-14

    Inventor: Josiah I. Clark

    CPC classification number: H04L41/12

    Abstract: The disclosed device includes a collective engine that can receive state information from nodes of a collective network. The collective engine can use the state information to initialize a topology of appropriate data routes between the nodes for the collective operation. Various other methods, systems, and computer-readable media are also disclosed.

    COHERENT BLOCK READ FULFILLMENT
    104.
    发明公开

    公开(公告)号:US20240202144A1

    公开(公告)日:2024-06-20

    申请号:US18410554

    申请日:2024-01-11

    Abstract: A coherent memory fabric includes a plurality of coherent master controllers and a coherent slave controller. The plurality of coherent master controllers each include a response data buffer. The coherent slave controller is coupled to the plurality of coherent master controllers. The coherent slave controller, responsive to determining a selected coherent block read command is guaranteed to have only one data response, sends a target request globally ordered message to the selected coherent master controller and transmits responsive data. The selected coherent master controller, responsive to receiving the target request globally ordered message, blocks any coherent probes to an address associated with the selected coherent block read command until receipt of the responsive data is acknowledged by a requesting client.

    Inclusion of Dedicated Accelerators in Graph Nodes

    公开(公告)号:US20240202003A1

    公开(公告)日:2024-06-20

    申请号:US18066115

    申请日:2022-12-14

    CPC classification number: G06F9/3867 G06F9/4881

    Abstract: Systems, apparatuses, and methods for implementing a hierarchical scheduling in fixed-function graphics pipeline are disclosed. In various implementations, a processor includes a pipeline comprising a plurality of fixed-function units and a scheduler. The scheduler is configured to schedule a first operation for execution by one or more fixed-function units of the pipeline by scheduling the first operation with a first unit of the pipeline, responsive to a first mode of operation and schedule a second operation for execution by a selected fixed-function unit of the pipeline by scheduling the second operation directly to the selected fixed-function unit, independent of a sequential arrangement of the one or more fixed-function units in the pipeline, responsive to a second mode of operation.

    Cross GPU scheduling of dependent processes

    公开(公告)号:US12014442B2

    公开(公告)日:2024-06-18

    申请号:US16721450

    申请日:2019-12-19

    CPC classification number: G06T1/20 G06F9/3838 G06F9/4881

    Abstract: A primary processing unit includes queues configured to store commands prior to execution in corresponding pipelines. The primary processing unit also includes a first table configured to store entries indicating dependencies between commands that are to be executed on different ones of a plurality of processing units that include the primary processing unit and one or more secondary processing units. The primary processing unit also includes a scheduler configured to release commands in response to resolution of the dependencies. In some cases, a first one of the secondary processing units schedules the first command for execution in response to resolution of a dependency on a second command executing in a second one of the secondary processing units. The second one of the secondary processing units notifies the primary processing unit in response to completing execution of the second command.

    EFFICIENT SPATIOTEMPORAL RESAMPLING USING PROBABILITY DENSITY FUNCTION SIMILARITY

    公开(公告)号:US20240193847A1

    公开(公告)日:2024-06-13

    申请号:US18076496

    申请日:2022-12-07

    Inventor: Yusuke Tokuyoshi

    CPC classification number: G06T15/06 G06T3/40

    Abstract: A processor shares path tracing data across sampling locations to amortize computations across space and time. The processor maps a group of sampling locations of a frame that are adjacent to each other to a reservoir. Each reservoir is associated with a ray that intersects subsets of path space such as a pixel. The processor resamples the reservoirs based on a similarity of probability density functions (PDFs) between pixels to select a set of samples mapped to the reservoir. The processor then performs resampling of the selected set of samples to obtain a representative light sample to determine a value for each pixel and renders the frame based on the values of the pixels.

    CONFIGURABLE MULTIPLE-DIE GRAPHICS PROCESSING UNIT

    公开(公告)号:US20240193844A1

    公开(公告)日:2024-06-13

    申请号:US18077424

    申请日:2022-12-08

    CPC classification number: G06T15/005 G06F9/3802

    Abstract: A graphics processing unit (GPU) of a processing system is partitioned into multiple dies (referred to as GPU chiplets) that are configurable to collectively function and interface with an application as a single GPU in a first mode and as multiple GPUs in a second mode. By dividing the GPU into multiple GPU chiplets, the processing system flexibly and cost-effectively configures an amount of active GPU physical resources based on an operating mode. In addition, a configurable number of GPU chiplets are assembled into a single GPU, such that multiple different GPUs having different numbers of GPU chiplets can be assembled using a small number of tape-outs and a multiple-die GPU can be constructed out of GPU chiplets that implement varying generations of technology.

    MULTIPLE PROCESSES SHARING GPU MEMORY OBJECTS
    109.
    发明公开

    公开(公告)号:US20240193016A1

    公开(公告)日:2024-06-13

    申请号:US18064170

    申请日:2022-12-09

    CPC classification number: G06F9/544 G06F12/023

    Abstract: An apparatus and method for efficiently executing multiple processes by reducing an amount of memory usage of the processes. In various implementations, a computing system includes a first processor and a second processor that support parallel data applications stored on a remote server that provides cloud computing services to multiple users. The first processor creates multiple processes, referred to as “instances” in parallel computing platforms, for a particular application as users request to execute the application. When the first processor detects a function call of the application within a particular instance, the first processor searches for shareable data objects to be used by the second processor when executing the first instance of the function call, and frees data storage allocated to data objects that are already shared by one or more instances. Therefore, an amount of memory allocated for the multiple instances of the application is reduced.

    Mechanism for reducing coherence directory controller overhead for near-memory compute elements

    公开(公告)号:US12008378B2

    公开(公告)日:2024-06-11

    申请号:US18132879

    申请日:2023-04-10

    Abstract: A parallel processing (PP) level coherence directory, also referred to as a Processing In-Memory Probe Filter (PimPF), is added to a coherence directory controller. When the coherence directory controller receives a broadcast PIM command from a host, or a PIM command that is directed to multiple memory banks in parallel, the PimPF accelerates processing of the PIM command by maintaining a directory for cache coherence that is separate from existing system level directories in the coherence directory controller. The PimPF maintains a directory according to address signatures that define the memory addresses affected by a broadcast PIM command. Two implementations are described: a lightweight implementation that accelerates PIM loads into registers, and a heavyweight implementation that accelerates both PIM loads into registers and PIM stores into memory.

Patent Agency Ranking