FPGA-BASED PROGRAMMABLE DATA ANALYSIS AND COMPRESSION FRONT END FOR GPU

    公开(公告)号:US20220188493A1

    公开(公告)日:2022-06-16

    申请号:US17118442

    申请日:2020-12-10

    Abstract: Methods, devices, and systems for information communication. Information transmitted from a host to a graphics processing unit (GPU) is received by information analysis circuitry of a field-programmable gate array (FPGA). A pattern in the information is determined by the information analysis circuitry. A predicted information pattern is determined, by the information analysis circuitry, based on the information. An indication of the predicted information pattern is transmitted to the host. Responsive to a signal from the host based on the predicted information pattern, the FPGA is reprogrammed to implement decompression circuitry based on the predicted information pattern. In some implementations, the information includes a plurality of packets. In some implementations, the predicted information pattern includes a pattern in a plurality of packets. In some implementations, the predicted information pattern includes a zero data pattern.

    MEMORY REQUEST PRIORITY ASSIGNMENT TECHNIQUES FOR PARALLEL PROCESSORS

    公开(公告)号:US20210173796A1

    公开(公告)日:2021-06-10

    申请号:US16706421

    申请日:2019-12-06

    Abstract: Systems, apparatuses, and methods for implementing memory request priority assignment techniques for parallel processors are disclosed. A system includes at least a parallel processor coupled to a memory subsystem, where the parallel processor includes at least a plurality of compute units for executing wavefronts in lock-step. The parallel processor assigns priorities to memory requests of wavefronts on a per-work-item basis by indexing into a first priority vector, with the index generated based on lane-specific information. If a given event is detected, a second priority vector is generated by applying a given priority promotion vector to the first priority vector. Then, for subsequent wavefronts, memory requests are assigned priorities by indexing into the second priority vector with lane-specific information. The use of priority vectors to assign priorities to memory requests helps to reduce the memory divergence problem experienced by different work-items of a wavefront.

    Systems and methods for programmed branch predictors

    公开(公告)号:US12217059B1

    公开(公告)日:2025-02-04

    申请号:US18193177

    申请日:2023-03-30

    Abstract: The disclosed device a controller that sets an iteration counter for a loop based on an iteration value read from a loop iteration instruction for the loop. The controller also updates the iteration counter based on a number of times a loop heading instruction for the loop is decoded. When the iteration counter reaches an end value, the controller selects a not taken identifier for the loop to be fetched, to avoid a branch misprediction. Various other methods, systems, and computer-readable media are also disclosed.

    Look-ahead teleportation for reliable computation in multi-SIMD quantum processor

    公开(公告)号:US12079634B2

    公开(公告)日:2024-09-03

    申请号:US16794124

    申请日:2020-02-18

    CPC classification number: G06F9/3887 G06F8/41 G06N10/00

    Abstract: A technique for processing qubits in a quantum computing device is provided. The technique includes determining that, in a first cycle, a first quantum processing region is to perform a first quantum operation that does not use a qubit that is stored in the first quantum processing region, identifying a second quantum processing region that is to perform a second quantum operation at a second cycle that is later than the first cycle, wherein the second quantum operation uses the qubit, determining that between the first cycle and the second cycle, no quantum operations are performed in the second quantum processing region, and moving the qubit from the first quantum processing region to the second quantum processing region.

    LOOK-AHEAD TELEPORTATION FOR RELIABLE COMPUTATION IN MULTI-SIMD QUANTUM PROCESSOR

    公开(公告)号:US20210255871A1

    公开(公告)日:2021-08-19

    申请号:US16794124

    申请日:2020-02-18

    Abstract: A technique for processing qubits in a quantum computing device is provided. The technique includes determining that, in a first cycle, a first quantum processing region is to perform a first quantum operation that does not use a qubit that is stored in the first quantum processing region, identifying a second quantum processing region that is to perform a second quantum operation at a second cycle that is later than the first cycle, wherein the second quantum operation uses the qubit, determining that between the first cycle and the second cycle, no quantum operations are performed in the second quantum processing region, and moving the qubit from the first quantum processing region to the second quantum processing region.

    MECHANISMS TO IMPROVE DATA LOCALITY FOR DISTRIBUTED GPUS

    公开(公告)号:US20180115496A1

    公开(公告)日:2018-04-26

    申请号:US15331002

    申请日:2016-10-21

    Abstract: Systems, apparatuses, and methods for implementing mechanisms to improve data locality for distributed processing units are disclosed. A system includes a plurality of distributed processing units (e.g., GPUs) and memory devices. Each processing unit is coupled to one or more local memory devices. The system determines how to partition a workload into a plurality of workgroups based on maximizing data locality and data sharing. The system determines which subset of the plurality of workgroups to dispatch to each processing unit of the plurality of processing units based on maximizing local memory accesses and minimizing remote memory accesses. The system also determines how to partition data buffer(s) based on data sharing patterns of the workgroups. The system maps to each processing unit a separate portion of the data buffer(s) so as to maximize local memory accesses and minimize remote memory accesses.

    LARGE NUMBER INTEGER ADDITION USING VECTOR ACCUMULATION

    公开(公告)号:US20240319964A1

    公开(公告)日:2024-09-26

    申请号:US18126107

    申请日:2023-03-24

    CPC classification number: G06F7/503

    Abstract: A processor includes one or more processor cores configured to perform accumulate top (ACCT) and accumulate bottom (ACCB) instructions. To perform such instructions, at least one processor core of the processor includes an ACCT data path that adds a first portion of a block of data to a first lane of a set of lanes of a top accumulator and adds a carry-out bit to a second lane of the set of lanes of the top accumulator. Further, the at least one processor core includes an ACCB data path that adds a second portion of the block of data to a first lane of a set of lanes of a bottom accumulator and adds a carry-out bit to a second lane of the set of lanes of the bottom accumulator.

    Near-memory determination of registers

    公开(公告)号:US11966328B2

    公开(公告)日:2024-04-23

    申请号:US17126977

    申请日:2020-12-18

    CPC classification number: G06F12/06 G06F2212/1041

    Abstract: A memory module includes register selection logic to select alternate local source and/or destination registers to process PIM commands. The register selection logic uses an address-based register selection approach to select an alternate local source and/or destination register based upon address data specified by a PIM command and a split address maintained by a memory module. The register selection logic may alternatively use a register data-based approach to select an alternate local source and/or destination register based upon data stored in one or more local registers. A PIM-enabled memory module configured with the register selection logic described herein is capable of selecting an alternate local source and/or destination register to process PIM commands at or near the PIM execution unit where the PIM commands are executed.

    BIGNUM ADDITION AND/OR SUBTRACTION WITH CARRY PROPAGATION

    公开(公告)号:US20240111489A1

    公开(公告)日:2024-04-04

    申请号:US17955634

    申请日:2022-09-29

    CPC classification number: G06F7/4981 G06F7/506

    Abstract: A processing unit includes a plurality of adders and a plurality of carry bit generation circuits. The plurality of adders add first and second X bit binary portion values of a first Y bit binary value and a second Y bit binary value. Y is a multiple of X. The plurality of adders further generate first carry bits. The plurality of carry bit generation circuits is coupled to the plurality of adders, respectively, and receive the first carry bits. The plurality of carry bit generation circuits generate second carry bits based on the first carry bits. The plurality of adders use the second carry bits to add the first and second X bit binary portions of the first and second Y bit binary values, respectively.

Patent Agency Ranking