Systems and methods for programmed branch predictors

    公开(公告)号:US12217059B1

    公开(公告)日:2025-02-04

    申请号:US18193177

    申请日:2023-03-30

    Abstract: The disclosed device a controller that sets an iteration counter for a loop based on an iteration value read from a loop iteration instruction for the loop. The controller also updates the iteration counter based on a number of times a loop heading instruction for the loop is decoded. When the iteration counter reaches an end value, the controller selects a not taken identifier for the loop to be fetched, to avoid a branch misprediction. Various other methods, systems, and computer-readable media are also disclosed.

    COMBINED SPARSE AND BLOCK FLOATING ARITHMETIC

    公开(公告)号:US20240201948A1

    公开(公告)日:2024-06-20

    申请号:US18083273

    申请日:2022-12-16

    Inventor: Gabriel H. Loh

    CPC classification number: G06F7/483

    Abstract: A processing device for encoding floating point numbers comprising memory configured to store data comprising the floating point numbers and circuitry. The circuitry is configured to, for a set of the floating point numbers, identify which of the floating point numbers represent a zero value and which of the floating point numbers represent a non-zero value, convert the floating point numbers which represent a non-zero value into a block floating point format value and generate an encoded sparse block floating point format value. The circuitry is also configured to, decode floating point numbers. For an encoded block floating point format value, the circuitry converts the encoded block floating point format value to a set of non-zero floating point numbers based on a sparsity mask previously generated to encode the encoded block floating point format value and generates a non-sparse set of floating point values.

    CACHE FOR STORING REGIONS OF DATA
    8.
    发明申请

    公开(公告)号:US20200183848A1

    公开(公告)日:2020-06-11

    申请号:US16214363

    申请日:2018-12-10

    Inventor: Gabriel H. Loh

    Abstract: Systems, apparatuses, and methods for efficiently performing memory accesses in a computing system are disclosed. A computing system includes one or more clients, a communication fabric and a last-level cache implemented with low latency, high bandwidth memory. The cache controller for the last-level cache determines a range of addresses corresponding to a first region of system memory with a copy of data stored in a second region of the last-level cache. The cache controller sends a selected memory access request to system memory when the cache controller determines a request address of the memory access request is not within the range of addresses. The cache controller services the selected memory request by accessing data from the last-level cache when the cache controller determines the request address is within the range of addresses.

    MECHANISMS TO IMPROVE DATA LOCALITY FOR DISTRIBUTED GPUS

    公开(公告)号:US20180115496A1

    公开(公告)日:2018-04-26

    申请号:US15331002

    申请日:2016-10-21

    Abstract: Systems, apparatuses, and methods for implementing mechanisms to improve data locality for distributed processing units are disclosed. A system includes a plurality of distributed processing units (e.g., GPUs) and memory devices. Each processing unit is coupled to one or more local memory devices. The system determines how to partition a workload into a plurality of workgroups based on maximizing data locality and data sharing. The system determines which subset of the plurality of workgroups to dispatch to each processing unit of the plurality of processing units based on maximizing local memory accesses and minimizing remote memory accesses. The system also determines how to partition data buffer(s) based on data sharing patterns of the workgroups. The system maps to each processing unit a separate portion of the data buffer(s) so as to maximize local memory accesses and minimize remote memory accesses.

Patent Agency Ranking