DYNAMIC CONTROL OF WORK SCHEDULING
    51.
    发明公开

    公开(公告)号:US20240220315A1

    公开(公告)日:2024-07-04

    申请号:US18091443

    申请日:2022-12-30

    IPC分类号: G06F9/48 G06F9/52

    CPC分类号: G06F9/4881 G06F9/52

    摘要: A processing system includes a scheduling mechanism for producing data for fine-grained reordering of workgroups of a kernel to produce blocks of data, such as for communication across devices to enable overlapping of a producer computation with an all-reduce communication across the network. This scheduling mechanism enables a first parallel processor to schedule and execute a set of workgroups of a producer operation to generate data for transmission to a second parallel processor in a desired traffic pattern. At the same time, the second parallel processor schedules and executes a different set of workgroups of the producer operation to generate data for transmission in a desired traffic pattern to a third parallel processor or back to the first parallel processor.

    DATA DEPENDENCY-AWARE SCHEDULING
    52.
    发明公开

    公开(公告)号:US20240220314A1

    公开(公告)日:2024-07-04

    申请号:US18091441

    申请日:2022-12-30

    发明人: Harris Gasparakis

    IPC分类号: G06F9/48 G06F9/52

    CPC分类号: G06F9/4881 G06F9/522

    摘要: A processing system flexibly schedules workgroups across kernels based on data dependencies between workgroups to enhance processing efficiency. The workgroups are partitioned into subsets based on the data dependencies and workgroups of a first subset that produces data are scheduled to execute immediately before workgroups of a second subset that consumes the data generated by the first subset. Thus, the processing system does not execute one kernel at a time, but instead schedules workgroups across kernels based on data dependencies across kernels. By limiting the sizes of the subsets to the amount of data that can be stored at local caches, the processing system increases the probability that data to be consumed by workgroups of a subset will be resident in a local cache and will not require a memory access.

    Lookup table optimization for high speed transmit feed-forward equalization link

    公开(公告)号:US12028190B1

    公开(公告)日:2024-07-02

    申请号:US18086960

    申请日:2022-12-22

    IPC分类号: H04L25/03 H04L25/49

    摘要: A driver circuit includes a feed-forward equalization (FFE) circuit. The FFE circuit receives a plurality of pulse-amplitude modulation (PAM) symbol values to be transmitted at one of multiple PAM levels. The FFE circuit includes a first partial lookup table, one or more additional partial lookup tables, and an adder circuit. The first partial lookup table contains partial finite impulse-response (FIR) values and indexed based on a current PAM symbol value, a precursor PAM symbol value, and a postcursor PAM symbol value. The one or more additional partial lookup tables each contain partial FIR values and indexed based on a respective additional one or more of the PAM symbol values. The adder circuit adds results of lookups from the first partial lookup table and the additional partial lookup tables to produce an output value.

    ACCELERATING RELAXED REMOTE ATOMICS ON MULTIPLE WRITER OPERATIONS

    公开(公告)号:US20240211134A1

    公开(公告)日:2024-06-27

    申请号:US18087964

    申请日:2022-12-23

    IPC分类号: G06F3/06

    摘要: A memory controller includes an arbiter, a vector arithmetic logic unit (VALU), a read buffer and a write buffer both coupled to the VALU, and an atomic memory operation scheduler. The VALU performs scattered atomic memory operations on arrays of data elements responsive to selected memory access commands. The atomic memory operation scheduler is for scheduling atomic memory operations at the VALU; identifying a plurality of scattered atomic memory operations with commutative and associative properties, the plurality of scattered atomic memory operations on at least one element of an array of data elements associated with an address; and commanding the VALU to perform the plurality of scattered atomic memory operations.

    BUFFER DISPLAY DATA IN A CHIPLET ARCHITECTURE

    公开(公告)号:US20240211023A1

    公开(公告)日:2024-06-27

    申请号:US18146811

    申请日:2022-12-27

    摘要: An apparatus and method for efficiently managing power consumption among multiple, replicated functional blocks of an integrated circuit. An integrated circuit includes multiple, replicated functional blocks that use separate power domains. Data of a given type is stored in an interleaved manner among at least two of the multiple functional blocks. In one implementation, a prior static allocation determines that only a subset of the functional blocks store the data of the given type. In another implementation, each of the functional blocks stores the data of the given type, and when an idle state has occurred, data of the given type is moved between the multiple functional blocks until one or more functional blocks no longer store data of the given type. When a transition to the idle state has occurred, the functional blocks that do not store the data of the given type are transitioned to a sleep state.

    TECHNIQUE FOR GENERATING A BOUNDING VOLUME HIERARCHY

    公开(公告)号:US20240203036A1

    公开(公告)日:2024-06-20

    申请号:US18083298

    申请日:2022-12-16

    IPC分类号: G06T15/08 G06T15/10

    摘要: A technique for building a bounding volume hierarchy is disclosed. The technique subdividing a candidate box node based on a resolution to generate a plurality of cells of the candidate box node; identifying a plurality of nodes of a triangle set collection that fit within the cells; generating a plurality of candidate splits based on the plurality of nodes; selecting a candidate split based on a selection criterion to obtain a selected candidate split; and generating child box nodes for a box node of a bounding volume hierarchy under construction, based on the selected candidate split.