Memory access for multiple circuit components

    公开(公告)号:US10768856B1

    公开(公告)日:2020-09-08

    申请号:US15919167

    申请日:2018-03-12

    Abstract: Disclosed herein are techniques for performing memory access. In one embodiment, an integrated circuit may include a memory device, a first port to receive first data elements from a memory access circuit within a first time period, and a second port to transmit second data elements to the memory access circuit within a second time period. The memory access circuit may receive the first data elements from the memory device within a third time period shorter than the first time period and transmit, via the first port, the received first data elements to a first processing circuit sequentially within the first time period. The memory access circuit may receive, via the second port, the second data elements from a second processing circuit sequentially within the second time period, and store the received second data elements in the memory device within a fourth time period shorter than the second time period.

    Filtering control plane decision requests for forwarding network packets

    公开(公告)号:US10587514B1

    公开(公告)日:2020-03-10

    申请号:US14977468

    申请日:2015-12-21

    Abstract: Packet processing pipelines may implement filtering of control plane decisions. When network packets are received various types of decision-making and processing is performed. In order to complete processing for the network packet, some decisions may need to be determined by a control plane for the packet processing pipeline, such as a general processor. Requests for control plane decisions for received network packets may be filtered prior to sending the requests to the control plane based on whether the same control plane decisions have been requested for previously received network packets. For control plane decisions with outstanding control plane decision requests, an additional control plane decision request for the network packet may be blocked, whereas control plane decisions with no outstanding control plane decision requests may be allowed.

    Configurable vector compute engine
    13.
    发明授权

    公开(公告)号:US12242853B1

    公开(公告)日:2025-03-04

    申请号:US17937335

    申请日:2022-09-30

    Abstract: A compute channel having a compute pipeline of compute stages can be configured using a configuration pipeline with a control table and a datapath table. The control table stores control entries corresponding to respective microoperations, and each control entry includes control information for the compute channel. A datapath table stores datapath configuration entries corresponding to respective microoperations, and each datapath configuration entry has a datapath configuration that includes computational circuit block configurations to configure respective computational circuit blocks in the compute pipeline of the compute channel. Control logic can issue a microoperation to the compute channel by configuring the compute channel according to the control information of the microoperation obtained from the control table, and by inputting the datapath configuration of the microoperation obtained from the datapath table into the configuration pipeline of the compute channel.

    MULTIPLE ACCUMULATE BUSSES IN A SYSTOLIC ARRAY

    公开(公告)号:US20220350775A1

    公开(公告)日:2022-11-03

    申请号:US17659642

    申请日:2022-04-18

    Abstract: Systems and methods are provided to enable parallelized multiply-accumulate operations in a systolic array. Each column of the systolic array can include multiple busses enabling independent transmission of input partial sums along the respective bus. Each processing element of a given columnar bus can receive an input partial sum from a prior element of the given columnar bus, and perform arithmetic operations on the input partial sum. Each processing element can generate an output partial sum based on the arithmetic operations, provide the output partial sum to a next processing element of the given columnar bus, without the output partial sum being processed by a processing element of the column located between the two processing elements that uses a different columnar bus. Use of columnar busses can enable parallelization to increase speed or enable increased latency at individual processing elements.

    Multiple accumulate busses in a systolic array

    公开(公告)号:US11308027B1

    公开(公告)日:2022-04-19

    申请号:US16915795

    申请日:2020-06-29

    Abstract: Systems and methods are provided to enable parallelized multiply-accumulate operations in a systolic array. Each column of the systolic array can include multiple busses enabling independent transmission of input partial sums along the respective bus. Each processing element of a given columnar bus can receive an input partial sum from a prior element of the given columnar bus, and perform arithmetic operations on the input partial sum. Each processing element can generate an output partial sum based on the arithmetic operations, provide the output partial sum to a next processing element of the given columnar bus, without the output partial sum being processed by a processing element of the column located between the two processing elements that uses a different columnar bus. Use of columnar busses can enable parallelization to increase speed or enable increased latency at individual processing elements.

    Registers for restricted memory
    17.
    发明授权

    公开(公告)号:US11294599B1

    公开(公告)日:2022-04-05

    申请号:US16891438

    申请日:2020-06-03

    Abstract: Provided are integrated circuits and methods for operating integrated circuits. An integrated circuit can include a plurality of memory banks and an execution engine including a set of execution components. Each execution component can be associated with a respective memory bank and can read from and write to the respective memory bank. The integrated circuit can further include a set of registers each associated with a respective memory bank from the plurality of memory banks. The integrated circuit can further be operable to load to or store from the set of registers in parallel, and load to or store from the set of registers serially. A parallel operation followed by a serial operation enables data to be moved from many memory banks into one memory bank. A serial operation followed by a parallel operation enables data to be moved from one memory bank into many memory banks.

    Place and route aware data pipelining

    公开(公告)号:US10990408B1

    公开(公告)日:2021-04-27

    申请号:US16582573

    申请日:2019-09-25

    Abstract: Methods for place-and-route aware data pipelining for an integrated circuit device are provided. In large integrated circuits, the physical distance a data signal must travel between a signal source in a master circuit block partition and a signal destination in a servant circuit block partition can exceed the distance the signal can travel in a single clock cycle. To maintain timing requirements of the integrated circuit, a longest physical distance and signal delay for a datapath between master and servant circuit block partitions can be determined and pipelining registers added. Datapaths of master circuit block partitions further away from the servant circuit block can have more pipelining registers added within the master circuit block than datapaths of master circuit block partitions that are closer to the servant circuit block.

    Accelerated quantized multiply-and-add operations

    公开(公告)号:US10983754B2

    公开(公告)日:2021-04-20

    申请号:US16891010

    申请日:2020-06-02

    Abstract: Disclosed herein are techniques for accelerating convolution operations or other matrix multiplications in applications such as neural network. In one example, an apparatus comprises a first circuit, a second circuit, and a third circuit. The first circuit is configured to: receive first values in a first format, the first values being generated from one or more asymmetric quantization operations of second values in a second format, and generate difference values based on subtracting a third value from each of the first values, the third value representing a zero value in the first format. The second circuit is configured to generate a sum of products in the first format using the difference values. The third circuit is configured to convert the sum of products from the first format to the second format based on scaling the sum of products with a scaling factor.

    Parametric mathematical function approximation in integrated circuits

    公开(公告)号:US10733498B1

    公开(公告)日:2020-08-04

    申请号:US16215405

    申请日:2018-12-10

    Abstract: Methods and systems for supporting parametric function computations in hardware circuits are proposed. In one example, a system comprises a hardware mapping table, a control circuit, and arithmetic circuits. The control circuit is configured to: in a first mode of operation, forward a set of parameters of a non-parametric function associated with an input value from the hardware mapping table to the arithmetic circuits to compute a first approximation of the non-parametric function at the input value; and in a second mode of operation, based on information indicating whether the input value is in a first input range or in a second input range from the hardware mapping table, forward a first parameter or a second parameter of a parametric function to the arithmetic circuits to compute, respectively, a second approximation or a third approximation of the parametric function at the input value.

Patent Agency Ranking