Configurable function approximation based on switching mapping table content

    公开(公告)号:US11423313B1

    公开(公告)日:2022-08-23

    申请号:US16218082

    申请日:2018-12-12

    Abstract: Methods and systems for performing hardware approximation of function are provided. In one example, a system comprises a controller, configurable arithmetic circuits, and a mapping table. The mapping table stores a first set of function parameters in a first mode of operation and stores a second set of function parameters in a second mode of operation. Depending on the mode of operation, the controller may configure the arithmetic circuits to compute a first approximation result of a function at an input value based on the first set of function parameters, or to compute a second approximation result of the function at the input value based on the second set of function parameters and to perform post-processing, such as quantization, of the second approximation result.

    Data synchronization operation at distributed computing system

    公开(公告)号:US11409685B1

    公开(公告)日:2022-08-09

    申请号:US17031653

    申请日:2020-09-24

    Abstract: In one example, a method comprises: receiving, by a hardware data processor and from a network adapter, a transfer complete message indicating that the network adapter has initiated a transfer of data received from a network to the hardware data processor, the transfer being performed over an interconnect coupled between the hardware data processor and the network adapter; based on receiving the transfer complete message, performing, by the hardware data processor, a flush operation to fetch any remaining portion of the data buffered in the interconnect to a local memory of the hardware data processor; based on determining that flush operation is complete, storing, by the data hardware processor, the transfer complete message at the local memory; and based on determining that the transfer complete message is stored at the local memory, starting the computation operation of the data at the hardware data processor or preforming an error handling operation.

    Dilated convolution using systolic array

    公开(公告)号:US11379555B2

    公开(公告)日:2022-07-05

    申请号:US16457503

    申请日:2019-06-28

    Abstract: In one example, a non-transitory computer readable medium stores instructions that, when executed by one or more hardware processors, cause the one or more hardware processors to: load a first weight data element of an array of weight data elements from a memory into a systolic array; select a subset of input data elements from the memory into the systolic array to perform first computations of a dilated convolution operation, the subset being selected based on a rate of the dilated convolution operation and coordinates of the weight data element within the array of weight data elements; and control the systolic array to perform the first computations based on the first weight data element and the subset to generate first output data elements of an output data array. An example of a compiler that generates the instructions is also provided.

    DATA-TYPE-AWARE CLOCK-GATING
    194.
    发明申请

    公开(公告)号:US20220188073A1

    公开(公告)日:2022-06-16

    申请号:US17247475

    申请日:2020-12-11

    Abstract: To reduce power consumption, data bits or a portion of a data register that is not expected to toggle frequently can be grouped together, and be clock-gated independently from the rest of the data register. The grouping of the data bits can be determined based on the data types of the workload being operated on. For a data register configured to store a numeric value that supports multiple data types, the portion of the data register being clock-gated may store a group of data bits that are unused for one or more data types of the multiple data types supported by the data register. The portion of the data register being clock-gated can also be a group of data bits that remain unchanged or have a constant value for numeric values within a certain numeric range that is frequently operated on.

    Control plane operation at distributed computing system

    公开(公告)号:US11354258B1

    公开(公告)日:2022-06-07

    申请号:US17038623

    申请日:2020-09-30

    Abstract: In one example, an apparatus comprises: a first local memory, a computation engine configured to generate local data and to store the local data at the first local memory, and a controller. The apparatus is coupled with a host processor and a second device via an interconnect, the second device comprising a second local memory, the host processor hosting an application. The controller is configured to: receive, from the second device, a first message indicating that first data is stored in the second local memory; based on the first message: fetch the first data from the second local memory via the interconnect; control the computation engine to perform a computation operation on the first data to generate second data to support the application hosted by the host processor; and transmit, to the second device, a second message indicating that the second data is stored in the first local memory.

    Powering-down or rebooting a device in a system fabric

    公开(公告)号:US11321179B1

    公开(公告)日:2022-05-03

    申请号:US17001145

    申请日:2020-08-24

    Abstract: A circuit at an interface between a device and an interconnect fabric is configured to track outstanding transactions associated with the device and ensure the completion of the outstanding transactions before rebooting or powering down the device. In some embodiments, the circuit is also configurable to provide appropriate responses when the device is powered down or is being rebooted such that other devices in the system can still operate even without knowing that the device is inactive and would not hang because no response is received from the device.

    In-band de-duplication
    197.
    发明授权

    公开(公告)号:US11157452B2

    公开(公告)日:2021-10-26

    申请号:US15590898

    申请日:2017-05-09

    Abstract: A method for in-band de-duplication, the method may include receiving by a hardware accelerator, a received packet of a first sequence of packets that conveys a first data chunk; applying a data chunk hash calculation process on the received packet while taking into account a hash calculation result obtained when applying the data chunk hash calculation process on a last packet of the first sequence that preceded the received packet; wherein the calculating of the first data chunk hash value is initiated before a completion of a reception of the entire first data chunk by the hardware accelerator.

    EFFICIENT UTILIZATION OF PROCESSING ELEMENT ARRAY

    公开(公告)号:US20210158132A1

    公开(公告)日:2021-05-27

    申请号:US16698461

    申请日:2019-11-27

    Abstract: A computer-implemented method includes receiving a neural network model for implementation using a processing element array, where the neural network model includes a convolution operation on a set of input feature maps and a set of filters. The method also includes determining, based on the neural network model, that the convolution operation utilizes less than a threshold number of rows in the processing element array for applying a set of filter elements to the set of input feature maps, where the set of filter elements includes one filter element in each filter of the set of filters. The method further includes generating, for the convolution operation and based on the neural network model, a first instruction and a second instruction for execution by respective rows in the processing element array, where the first instruction and the second instruction use different filter elements of a filter in the set of filters.

    Reducing computations for data including padding

    公开(公告)号:US10990650B1

    公开(公告)日:2021-04-27

    申请号:US15933339

    申请日:2018-03-22

    Abstract: Systems and methods are provided to eliminate multiplication operations with zero padding data for convolution computations. A multiplication matrix is generated from an input feature map matrix with padding by adjusting coordinates and dimensions of the input feature map matrix to exclude padding data. The multiplication matrix is used to perform matrix multiplications with respective weight values which results in fewer computations as compared to matrix multiplications which include the zero padding data.

Patent Agency Ranking