SMART THREADING IN MATRIX MULTIPLICATION
    82.
    发明公开

    公开(公告)号:US20240320293A1

    公开(公告)日:2024-09-26

    申请号:US18125454

    申请日:2023-03-23

    CPC classification number: G06F17/16 G06F9/4881

    Abstract: Techniques are described in which an estimated optimal thread quantity for matrix multiplication is determined and implemented based on dimensions of the input matrices being multiplied and one or more kernel parameters that vary based on processor architecture. An efficient factorization of the estimated optimal thread quantity is based on a number of blocks along a first dimension of a first input matrix, and a number of blocks along a dimension n of a second input matrix B, with both numbers being based on the kernel parameters. In certain embodiments, a command processor of a parallel processor determines an estimated optimal thread quantity for performing a matrix multiplication command responsive to receiving the matrix multiplication command, and then schedules that estimated optimal thread quantity of kernel threads to execute the matrix multiplication command in parallel.

    Noise mitigation in single-ended links

    公开(公告)号:US12101135B2

    公开(公告)日:2024-09-24

    申请号:US18243243

    申请日:2023-09-07

    CPC classification number: H04B3/56 G05F1/46 H04B3/06

    Abstract: An integrated circuit includes a first terminal for receiving a data signal, a second terminal for receiving an external reference voltage, a receiver, and a reference voltage generation circuit. The receiver is powered by a power supply voltage with respect to ground and has a first input coupled to the first terminal, a second input for receiving a shared reference voltage, and an output for providing a data input signal. The reference voltage generation circuit is coupled to the second terminal and receives the power supply voltage. The reference voltage generation circuit is operable to form the shared reference voltage by mixing noise from the power supply voltage and noise from the second terminal.

    Address mapping-aware tasking mechanism

    公开(公告)号:US12099866B2

    公开(公告)日:2024-09-24

    申请号:US17135381

    申请日:2020-12-28

    Abstract: An Address Mapping-Aware Tasking (AMAT) mechanism manages compute task data and issues compute tasks on behalf of threads that created the compute task data. The AMAT mechanism stores compute task data generated by host threads in a set of partitions, where each partition is designated for a particular memory module. The AMAT mechanism maintains address mapping data that maps address information to partitions. Threads push compute task data to the AMAT mechanism instead of generating and issuing their own compute tasks. The AMAT mechanism uses address information included in the compute task data and the address mapping data to determine partitions in which to store the compute task data. The AMAT mechanism then issues compute tasks to be executed near the corresponding memory modules (i.e., in PIM execution units or NUMA compute nodes) based upon the compute task data stored in the partitions.

    FPGA-based programmable data analysis and compression front end for GPU

    公开(公告)号:US12099789B2

    公开(公告)日:2024-09-24

    申请号:US17118442

    申请日:2020-12-10

    CPC classification number: G06F30/331 G06F9/3877 G06F30/34

    Abstract: Methods, devices, and systems for information communication. Information transmitted from a host to a graphics processing unit (GPU) is received by information analysis circuitry of a field-programmable gate array (FPGA). A pattern in the information is determined by the information analysis circuitry. A predicted information pattern is determined, by the information analysis circuitry, based on the information. An indication of the predicted information pattern is transmitted to the host. Responsive to a signal from the host based on the predicted information pattern, the FPGA is reprogrammed to implement decompression circuitry based on the predicted information pattern. In some implementations, the information includes a plurality of packets. In some implementations, the predicted information pattern includes a pattern in a plurality of packets. In some implementations, the predicted information pattern includes a zero data pattern.

    Tag and data configuration for fine-grained cache memory

    公开(公告)号:US12099723B2

    公开(公告)日:2024-09-24

    申请号:US17956614

    申请日:2022-09-29

    CPC classification number: G06F3/0613 G06F3/0659 G06F3/0679

    Abstract: A method for operating a memory having a plurality of banks accessible in parallel, each bank including a plurality of grains accessible in parallel is provided. The method includes: based on a memory access request that specifies a memory address, identifying a set that stores data for the memory access request, wherein the set is spread across multiple grains of the plurality of grains; and performing operations to satisfy the memory access request, using entries of the set stored across the multiple grains of the plurality of grains.

Patent Agency Ranking