Processor for performing dynamic programming according to an instruction, and a method for configuring a processor for dynamic programming via an instruction

    公开(公告)号:US11726757B2

    公开(公告)日:2023-08-15

    申请号:US16811068

    申请日:2020-03-06

    CPC classification number: G06F8/451 G06F9/5066 G06T1/20

    Abstract: The disclosure provides processors that are configured to perform dynamic programming according to an instruction, a method for configuring a processor for dynamic programming according to an instruction and a method of computing a modified Smith Waterman algorithm employing an instruction for configuring a parallel processing unit. In one example, the method for configuring includes: (1) receiving, by execution cores of the processor, an instruction that directs the execution cores to compute a set of recurrence equations employing a matrix, (2) configuring the execution cores, according to the set of recurrence equations, to compute states for elements of the matrix, and (3) storing the computed states for current elements of the matrix in registers of the execution cores, wherein the computed states are determined based on the set of recurrence equations and input data.

    OPTIMALLY CLIPPED TENSORS AND VECTORS
    22.
    发明公开

    公开(公告)号:US20230237308A1

    公开(公告)日:2023-07-27

    申请号:US17814957

    申请日:2022-07-26

    CPC classification number: G06N3/04 G06N3/08

    Abstract: Quantizing tensors and vectors processed within a neural network reduces power consumption and may accelerate processing. Quantization reduces the number of bits used to represent a value, where decreasing the number of bits used can decrease the accuracy of computations that use the value. Ideally, quantization is performed without reducing accuracy. Quantization-aware training (QAT) is performed by dynamically quantizing tensors (weights and activations) using optimal clipping scalars. “Optimal” in that the mean squared error (MSE) of the quantized operation is minimized and the clipping scalars define the degree or amount of quantization for various tensors of the operation. Conventional techniques that quantize tensors during training suffer from high amounts of noise (error). Other techniques compute the clipping scalars offline through a brute force search to provide high accuracy. In contrast, the optimal clipping scalars can be computed online and provide the same accuracy as the clipping scalars computed offline.

    MAPPING LOGICAL AND PHYSICAL PROCESSORS AND LOGICAL AND PHYSICAL MEMORY

    公开(公告)号:US20230237011A1

    公开(公告)日:2023-07-27

    申请号:US17581734

    申请日:2022-01-21

    CPC classification number: G06F15/80 G06F12/0646 G06F2212/7201

    Abstract: A mapping may be made between an array of physical processors and an array of functional logical processors. Also, a mapping may be made between logical memory channels (associated with the logical processors) and functional physical memory channels (associated with the physical processors). These mappings may be stored within one or more tables, which may then be used to bypass faulty processors and memory channels when implementing memory accesses, while optimizing locality (e.g., by minimizing the proximity of memory channels to processors).

    DRAM WITH SEGMENTED PAGE CONFIGURATION

    公开(公告)号:US20170154667A1

    公开(公告)日:2017-06-01

    申请号:US15430393

    申请日:2017-02-10

    Abstract: This description is directed to a dynamic random access memory (DRAM) array having a plurality of rows and a plurality of columns. The array further includes a plurality of cells, each of which are associated with one of the columns and one of the rows. Each cell includes a capacitor that is selectively coupled to a bit line of its associate column so as to share charge with the bit line when the cell is selected. There is a segmented word line circuit for each row, which is controllable to cause selection of only a portion of the cells in the row.

    SRAM voltage assist
    25.
    发明授权
    SRAM voltage assist 有权
    SRAM电压辅助

    公开(公告)号:US09460776B2

    公开(公告)日:2016-10-04

    申请号:US13748499

    申请日:2013-01-23

    CPC classification number: G11C11/4125 G11C11/404 G11C11/419

    Abstract: The disclosure provides for an SRAM array having a plurality of wordlines and a plurality of bitlines, referred to generally as SRAM lines. The array has a plurality of cells, each cell being defined by an intersection between one of the wordlines and one of the bitlines. The SRAM array further includes voltage boost circuitry operatively coupled with the cells, the voltage boost circuitry being configured to provide an amount of voltage boost that is based on an address of a cell to be accessed and/or to provide this voltage boost on an SRAM line via capacitive charge coupling.

    Abstract translation: 本公开提供了具有多个字线和多个位线的SRAM阵列,通常称为SRAM线。 该阵列具有多个单元,每个单元由字线之一和位线之一的交点定义。 所述SRAM阵列还包括与所述单元操作地耦合的升压电路,所述升压电路被配置为提供基于待访问的单元的地址和/或在SRAM上提供该电压升压的一定量的升压电压 线通过电容电荷耦合。

    CURRENT PARKING RESPONSE TO TRANSIENT LOAD DEMANDS
    26.
    发明申请
    CURRENT PARKING RESPONSE TO TRANSIENT LOAD DEMANDS 有权
    当前停车对瞬态负载的影响

    公开(公告)号:US20140097813A1

    公开(公告)日:2014-04-10

    申请号:US13647202

    申请日:2012-10-08

    Abstract: Embodiments are disclosed relating to an electric power conversion device and methods for controlling the operation thereof. One disclosed embodiment provides an electric power conversion device comprising a first current control mechanism coupled to an electric power source and an upstream end of an inductor, where the first current control mechanism is operable to control inductor current. The electric power conversion device further comprises a second current control mechanism coupled between the downstream end of the inductor and a load, where the second current control mechanism is operable to control how much of the inductor current is delivered to the load.

    Abstract translation: 公开了关于电力转换装置的实施例以及用于控制其操作的方法。 一个公开的实施例提供一种电力转换装置,其包括耦合到电源和电感器的上游端的第一电流控制机构,其中第一电流控制机构可操作以控制电感器电流。 电力转换装置还包括耦合在电感器的下游端和负载之间的第二电流控制机构,其中第二电流控制机构可操作以控制电感器电流传送到负载的电流。

    Inference accelerator using logarithmic-based arithmetic

    公开(公告)号:US12141225B2

    公开(公告)日:2024-11-12

    申请号:US16750823

    申请日:2020-01-23

    Abstract: Neural networks, in many cases, include convolution layers that are configured to perform many convolution operations that require multiplication and addition operations. Compared with performing multiplication on integer, fixed-point, or floating-point format values, performing multiplication on logarithmic format values is straightforward and energy efficient as the exponents are simply added. However, performing addition on logarithmic format values is more complex. Conventionally, addition is performed by converting the logarithmic format values to integers, computing the sum, and then converting the sum back into the logarithmic format. Instead, logarithmic format values may be added by decomposing the exponents into separate quotient and remainder components, sorting the quotient components based on the remainder components, summing the sorted quotient components using an asynchronous accumulator to produce partial sums, and multiplying the partial sums by the remainder components to produce a sum. The sum may then be converted back into the logarithmic format.

    Application partitioning for locality in a stacked memory system

    公开(公告)号:US12099453B2

    公开(公告)日:2024-09-24

    申请号:US17709031

    申请日:2022-03-30

    Abstract: Embodiments of the present disclosure relate to application partitioning for locality in a stacked memory system. In an embodiment, one or more memory dies are stacked on the processor die. The processor die includes multiple processing tiles and each memory die includes multiple memory tiles. Vertically aligned memory tiles are directly coupled to and comprise the local memory block for a corresponding processing tile. An application program that operates on dense multi-dimensional arrays (matrices) may partition the dense arrays into sub-arrays associated with program tiles. Each program tile is executed by a processing tile using the processing tile's local memory block to process the associated sub-array. Data associated with each sub-array is stored in a local memory block and the processing tile corresponding to the local memory block executes the program tile to process the sub-array data.

    Asynchronous accumulator using logarithmic-based arithmetic

    公开(公告)号:US12033060B2

    公开(公告)日:2024-07-09

    申请号:US16750917

    申请日:2020-01-23

    Abstract: Neural networks, in many cases, include convolution layers that are configured to perform many convolution operations that require multiplication and addition operations. Compared with performing multiplication on integer, fixed-point, or floating-point format values, performing multiplication on logarithmic format values is straightforward and energy efficient as the exponents are simply added. However, performing addition on logarithmic format values is more complex. Conventionally, addition is performed by converting the logarithmic format values to integers, computing the sum, and then converting the sum back into the logarithmic format. Instead, logarithmic format values may be added by decomposing the exponents into separate quotient and remainder components, sorting the quotient components based on the remainder components, summing the sorted quotient components using an asynchronous accumulator to produce partial sums, and multiplying the partial sums by the remainder components to produce a sum. The sum may then be converted back into the logarithmic format.

    NEURAL NETWORK ACCELERATOR USING LOGARITHMIC-BASED ARITHMETIC

    公开(公告)号:US20240112007A1

    公开(公告)日:2024-04-04

    申请号:US18537570

    申请日:2023-12-12

    CPC classification number: G06N3/063 G06F7/4833 G06F17/16

    Abstract: Neural networks, in many cases, include convolution layers that are configured to perform many convolution operations that require multiplication and addition operations. Compared with performing multiplication on integer, fixed-point, or floating-point format values, performing multiplication on logarithmic format values is straightforward and energy efficient as the exponents are simply added. However, performing addition on logarithmic format values is more complex. Conventionally, addition is performed by converting the logarithmic format values to integers, computing the sum, and then converting the sum back into the logarithmic format. Instead, logarithmic format values may be added by decomposing the exponents into separate quotient and remainder components, sorting the quotient components based on the remainder components, summing the sorted quotient components to produce partial sums, and multiplying the partial sums by the remainder components to produce a sum. The sum may then be converted back into the logarithmic format.

Patent Agency Ranking