DYNAMIC SETUP AND HOLD TIMES ADJUSTMENT FOR MEMORIES

    公开(公告)号:US20230112432A1

    公开(公告)日:2023-04-13

    申请号:US17564747

    申请日:2021-12-29

    Abstract: A system and method for efficiently capturing data by sequential circuits across multiple operating conditions are described. In various implementations, an integrated circuit includes multiple signal arrival adjusters both at its I/O boundaries and across its die. The signal arrival adjuster includes two internal timing paths, each with a respective latency. The signal arrival adjuster receives an input signal, and generates an output signal from the a selected one of the first timing path and the second timing path. The signal arrival adjuster sends the output signal to a sequential circuit. The sequential circuit uses the output signal as one of an input data signal and an input clock signal. The selection between the two timing paths within the signal arrival adjuster aids satisfying the setup and hold time requirements of the sequential circuit.

    Low power and low latency GPU coprocessor for persistent computing

    公开(公告)号:US11625807B2

    公开(公告)日:2023-04-11

    申请号:US17181300

    申请日:2021-02-22

    Abstract: Systems, apparatuses, and methods for implementing a graphics processing unit (GPU) coprocessor are disclosed. The GPU coprocessor includes a SIMD unit with the ability to self-schedule sub-wave procedures based on input data flow events. A host processor sends messages targeting the GPU coprocessor to a queue. In response to detecting a first message in the queue, the GPU coprocessor schedules a first sub-task for execution. The GPU coprocessor includes an inter-lane crossbar and intra-lane biased indexing mechanism for a vector general purpose register (VGPR) file. The VGPR file is split into two files. The first VGPR file is a larger register file with one read port and one write port. The second VGPR file is a smaller register file with multiple read ports and one write port. The second VGPR introduces the ability to co-issue more than one instruction per clock cycle.

    Preserving memory ordering between offloaded instructions and non-offloaded instructions

    公开(公告)号:US11625249B2

    公开(公告)日:2023-04-11

    申请号:US17137140

    申请日:2020-12-29

    Abstract: Preserving memory ordering between offloaded instructions and non-offloaded instructions is disclosed. An offload instruction for an operation to be offloaded is processed and a lock is placed on a memory address associated with the offload instruction. In response to completing a cache operation targeting the memory address, the lock on the memory address is removed. For multithreaded applications, upon determining that a plurality of processor cores have each begun executing a sequence of offload instructions, the execution of non-offload instructions that are younger than any of the offload instructions is restricted. In response to determining that each processor core has completed executing its sequence of offload instructions, the restriction is removed. The remote device may be, for example, a processing-in-memory device or an accelerator coupled to a memory.

    Dropout for accelerated deep learning in heterogeneous architectures

    公开(公告)号:US11620525B2

    公开(公告)日:2023-04-04

    申请号:US16141648

    申请日:2018-09-25

    Inventor: Abhinav Vishnu

    Abstract: A heterogeneous processing system includes at least one central processing unit (CPU) core and at least one graphics processing unit (GPU) core. The heterogeneous processing system is configured to compute an activation for each one of a plurality of neurons for a first network layer of a neural network. The heterogeneous processing system randomly drops a first subset of the plurality of neurons for the first network layer and keeps a second subset of the plurality of neurons for the first network layer. Activation for each one of the second subset of the plurality of neurons is forwarded to the CPU core and coalesced to generate a set of coalesced activation sub-matrices.

    CROSS FIELD EFFECT TRANSISTOR (XFET) ARCHITECTURE PROCESS

    公开(公告)号:US20230102901A1

    公开(公告)日:2023-03-30

    申请号:US17489221

    申请日:2021-09-29

    Abstract: A system and method for creating layout for standard cells are described. In various implementations, a standard cell uses Cross field effect transistors (FETs) that include vertically stacked gate all around (GAA) transistors with conducting channels oriented in an orthogonal direction between them. The direction of current flow of the top GAA transistor is orthogonal to the direction of current flow of the bottom GAA transistor. The channels of the vertically stacked transistors use opposite doping polarities. The orthogonal orientation allows both the top and bottom GAA transistors to have the maximum mobility for their respective carriers based on their orientation. The Cross FETs utilize a single metal layer and a single via layer for connections between the top and bottom GAA transistors.

    STACKED COMMAND QUEUE
    317.
    发明申请

    公开(公告)号:US20230102680A1

    公开(公告)日:2023-03-30

    申请号:US17491058

    申请日:2021-09-30

    Abstract: A memory controller includes a command queue with multiple entry stacks, each with a plurality of entries holding memory access commands, one or more parameter indicators each holding a respective characteristic common to the plurality of entries, and a head indicator designating a current entry for arbitration. An arbiter has a single command input for each entry stack. A command queue loader circuit receives incoming memory access commands and loads entries of respective entry stacks with memory access commands having the respective characteristic of each of the one or more parameter indicators in common.

    DECOMPOSING MATRICES FOR PROCESSING AT A PROCESSOR-IN-MEMORY

    公开(公告)号:US20230102296A1

    公开(公告)日:2023-03-30

    申请号:US17490037

    申请日:2021-09-30

    Abstract: A processing unit decomposes a matrix for partial processing at a processor-in-memory (PIM) device. The processing unit receives a matrix to be used as an operand in an arithmetic operation (e.g., a matrix multiplication operation). In response, the processing unit decomposes the matrix into two component matrices: a sparse component matrix and a dense component matrix. The processing unit itself performs the arithmetic operation with the dense component matrix, but sends the sparse component matrix to the PIM device for execution of the arithmetic operation. The processing unit thereby offloads at least some of the processing overhead to the PIM device, improving overall efficiency of the processing system.

    GLASS CORE PACKAGE SUBSTRATES
    319.
    发明申请

    公开(公告)号:US20230102183A1

    公开(公告)日:2023-03-30

    申请号:US17489182

    申请日:2021-09-29

    Abstract: Apparatuses, systems and methods for efficiently generating a package substrate. A semiconductor fabrication process (or process) fabricates each of a first glass package substrate and a second glass package substrate with a redistribution layer on a single side of a respective glass wafer. The process flips the second glass package substrate upside down and connects the glass wafers of the first and second glass package substrates together using a wafer bonding technique. In some implementations, the process uses copper-based wafer bonding. The resulting bonding between the two glass wafers contains no air gap, no underfill, and no solder bumps. Afterward, the side of the first glass package substrate opposite the glass wafer is connected to at least one integrated circuit. Additionally, the side of the second glass package substrate opposite the glass wafer is connected to a component on the motherboard through pads on the motherboard.

Patent Agency Ranking