Method And Apparatus For Quantization And Dequantization Of Neural Network Input And Output Data Using Processing-In-Memory

    公开(公告)号:US20250006232A1

    公开(公告)日:2025-01-02

    申请号:US18346110

    申请日:2023-06-30

    Abstract: An apparatus and method for creating less computationally intensive nodes for a neural network. An integrated circuit includes a host processor and multiple memory channels, each with multiple memory array banks. Each of the memory array banks includes components of a processing-in-memory (PIM) accelerator and a scatter and gather circuit used to dynamically perform quantization operations and dequantization operations that offload these operations from the host processor. The host processor executes a data model that represents a neural network. The memory array banks store a single copy of a particular data value in a single precision. Therefore, the memory array banks avoid storing replications of the same data value with different precisions to be used by a neural network node. The memory array banks dynamically perform quantization operations and dequantization operations on one or more of the weight values, input data values, and activation output values of the neural network.

    Accessing a Cache Based on an Address Translation Buffer Result

    公开(公告)号:US20240193097A1

    公开(公告)日:2024-06-13

    申请号:US18064155

    申请日:2022-12-09

    CPC classification number: G06F12/1045 G06F12/0897

    Abstract: Address translation is performed to translate a virtual address targeted by a memory request (e.g., a load or memory request for data or an instruction) to a physical address. This translation is performed using an address translation buffer, e.g., a translation lookaside buffer (TLB). One or more actions are taken to reduce data access latencies for memory requests in the event of a TLB miss where the virtual address to physical address translation is not in the TLB. Examples of actions that are performed in various implementations in response to a TLB miss include bypassing level 1 (L1) and level 2 (L2) caches in the memory system, and speculatively sending the memory request to the L2 cache while checking whether the memory request is satisfied by the L1 cache.

    Method and apparatus for virtualizing the micro-op cache

    公开(公告)号:US11586441B2

    公开(公告)日:2023-02-21

    申请号:US17125730

    申请日:2020-12-17

    Abstract: Systems, apparatuses, and methods for virtualizing a micro-operation cache are disclosed. A processor includes at least a micro-operation cache, a conventional cache subsystem, a decode unit, and control logic. The decode unit decodes instructions into micro-operations which are then stored in the micro-operation cache. The micro-operation cache has limited capacity for storing micro-operations. When new micro-operations are decoded from pending instructions, existing micro-operations are evicted from the micro-operation cache to make room for the new micro-operations. Rather than being discarded, micro-operations evicted from the micro-operation cache are stored in the conventional cache subsystem. This prevents the original instruction from having to be decoded again on subsequent executions. When the control logic determines that micro-operations for one or more fetched instructions are stored in either the micro-operation cache or the conventional cache subsystem, the control logic causes the decode unit to transition to a reduced-power state.

    Speculative dram request enabling and disabling

    公开(公告)号:US12189953B2

    公开(公告)日:2025-01-07

    申请号:US17956417

    申请日:2022-09-29

    Abstract: Methods, devices, and systems for retrieving information based on cache miss prediction. It is predicted, based on a history of cache misses at a private cache, that a cache lookup for the information will miss a shared victim cache. A speculative memory request is enabled based on the prediction that the cache lookup for the information will miss the shared victim cache. The information is fetched based on the enabled speculative memory request.

    Tag and data configuration for fine-grained cache memory

    公开(公告)号:US12099723B2

    公开(公告)日:2024-09-24

    申请号:US17956614

    申请日:2022-09-29

    CPC classification number: G06F3/0613 G06F3/0659 G06F3/0679

    Abstract: A method for operating a memory having a plurality of banks accessible in parallel, each bank including a plurality of grains accessible in parallel is provided. The method includes: based on a memory access request that specifies a memory address, identifying a set that stores data for the memory access request, wherein the set is spread across multiple grains of the plurality of grains; and performing operations to satisfy the memory access request, using entries of the set stored across the multiple grains of the plurality of grains.

    SPECULATIVE DRAM REQUEST ENABLING AND DISABLING

    公开(公告)号:US20240111420A1

    公开(公告)日:2024-04-04

    申请号:US17956417

    申请日:2022-09-29

    CPC classification number: G06F3/0611 G06F3/0653 G06F3/0673

    Abstract: Methods, devices, and systems for retrieving information based on cache miss prediction. It is predicted, based on a history of cache misses at a private cache, that a cache lookup for the information will miss a shared victim cache. A speculative memory request is enabled based on the prediction that the cache lookup for the information will miss the shared victim cache. The information is fetched based on the enabled speculative memory request.

Patent Agency Ranking