Preserving memory ordering between offloaded instructions and non-offloaded instructions

    公开(公告)号:US11625249B2

    公开(公告)日:2023-04-11

    申请号:US17137140

    申请日:2020-12-29

    Abstract: Preserving memory ordering between offloaded instructions and non-offloaded instructions is disclosed. An offload instruction for an operation to be offloaded is processed and a lock is placed on a memory address associated with the offload instruction. In response to completing a cache operation targeting the memory address, the lock on the memory address is removed. For multithreaded applications, upon determining that a plurality of processor cores have each begun executing a sequence of offload instructions, the execution of non-offload instructions that are younger than any of the offload instructions is restricted. In response to determining that each processor core has completed executing its sequence of offload instructions, the restriction is removed. The remote device may be, for example, a processing-in-memory device or an accelerator coupled to a memory.

    Dropout for accelerated deep learning in heterogeneous architectures

    公开(公告)号:US11620525B2

    公开(公告)日:2023-04-04

    申请号:US16141648

    申请日:2018-09-25

    Inventor: Abhinav Vishnu

    Abstract: A heterogeneous processing system includes at least one central processing unit (CPU) core and at least one graphics processing unit (GPU) core. The heterogeneous processing system is configured to compute an activation for each one of a plurality of neurons for a first network layer of a neural network. The heterogeneous processing system randomly drops a first subset of the plurality of neurons for the first network layer and keeps a second subset of the plurality of neurons for the first network layer. Activation for each one of the second subset of the plurality of neurons is forwarded to the CPU core and coalesced to generate a set of coalesced activation sub-matrices.

    CROSS FIELD EFFECT TRANSISTOR (XFET) ARCHITECTURE PROCESS

    公开(公告)号:US20230102901A1

    公开(公告)日:2023-03-30

    申请号:US17489221

    申请日:2021-09-29

    Abstract: A system and method for creating layout for standard cells are described. In various implementations, a standard cell uses Cross field effect transistors (FETs) that include vertically stacked gate all around (GAA) transistors with conducting channels oriented in an orthogonal direction between them. The direction of current flow of the top GAA transistor is orthogonal to the direction of current flow of the bottom GAA transistor. The channels of the vertically stacked transistors use opposite doping polarities. The orthogonal orientation allows both the top and bottom GAA transistors to have the maximum mobility for their respective carriers based on their orientation. The Cross FETs utilize a single metal layer and a single via layer for connections between the top and bottom GAA transistors.

    STACKED COMMAND QUEUE
    325.
    发明申请

    公开(公告)号:US20230102680A1

    公开(公告)日:2023-03-30

    申请号:US17491058

    申请日:2021-09-30

    Abstract: A memory controller includes a command queue with multiple entry stacks, each with a plurality of entries holding memory access commands, one or more parameter indicators each holding a respective characteristic common to the plurality of entries, and a head indicator designating a current entry for arbitration. An arbiter has a single command input for each entry stack. A command queue loader circuit receives incoming memory access commands and loads entries of respective entry stacks with memory access commands having the respective characteristic of each of the one or more parameter indicators in common.

    DECOMPOSING MATRICES FOR PROCESSING AT A PROCESSOR-IN-MEMORY

    公开(公告)号:US20230102296A1

    公开(公告)日:2023-03-30

    申请号:US17490037

    申请日:2021-09-30

    Abstract: A processing unit decomposes a matrix for partial processing at a processor-in-memory (PIM) device. The processing unit receives a matrix to be used as an operand in an arithmetic operation (e.g., a matrix multiplication operation). In response, the processing unit decomposes the matrix into two component matrices: a sparse component matrix and a dense component matrix. The processing unit itself performs the arithmetic operation with the dense component matrix, but sends the sparse component matrix to the PIM device for execution of the arithmetic operation. The processing unit thereby offloads at least some of the processing overhead to the PIM device, improving overall efficiency of the processing system.

    GLASS CORE PACKAGE SUBSTRATES
    327.
    发明申请

    公开(公告)号:US20230102183A1

    公开(公告)日:2023-03-30

    申请号:US17489182

    申请日:2021-09-29

    Abstract: Apparatuses, systems and methods for efficiently generating a package substrate. A semiconductor fabrication process (or process) fabricates each of a first glass package substrate and a second glass package substrate with a redistribution layer on a single side of a respective glass wafer. The process flips the second glass package substrate upside down and connects the glass wafers of the first and second glass package substrates together using a wafer bonding technique. In some implementations, the process uses copper-based wafer bonding. The resulting bonding between the two glass wafers contains no air gap, no underfill, and no solder bumps. Afterward, the side of the first glass package substrate opposite the glass wafer is connected to at least one integrated circuit. Additionally, the side of the second glass package substrate opposite the glass wafer is connected to a component on the motherboard through pads on the motherboard.

    DETERMINISTIC MIXED LATENCY CACHE
    329.
    发明申请

    公开(公告)号:US20230101038A1

    公开(公告)日:2023-03-30

    申请号:US17489741

    申请日:2021-09-29

    Abstract: A method and processing device for accessing data is provided. The processing device comprises a cache and a processor. The cache comprises a first data section having a first cache hit latency and a second data section having a second cache hit latency that is different from the first cache hit latency of the first data section. The processor is configured to request access to data in memory, the data corresponding to a memory address which includes an identifier that identifies the first data section of the cache. The processor is also configured to load the requested data, determined to be located in the first data section of the cache, according to the first cache hit latency of the first data section of the cache.

    SOCKET ACTUATION MECHANISM FOR PACKAGE INSERTION AND PACKAGE-SOCKET ALIGNMENT

    公开(公告)号:US20230100491A1

    公开(公告)日:2023-03-30

    申请号:US17487929

    申请日:2021-09-28

    Abstract: A socket actuation mechanism for package insertion and package-socket alignment, including: a socket frame comprising a plurality of first hinge portions; a carrier frame comprising: a center portion comprising one or more package interlocks; and a tab extending from a first end of the carrier frame, the tab comprising a second hinge portion couplable with the plurality of first hinge portions to form a hinge coupling the carrier frame to the socket frame.

Patent Agency Ranking