PROCESSING FOR MULTIPLE INPUT DATA SETS
    162.
    发明申请

    公开(公告)号:US20190294968A1

    公开(公告)日:2019-09-26

    申请号:US15933201

    申请日:2018-03-22

    Abstract: Disclosed herein are techniques for performing multi-layer neural network processing for multiple contexts. In one embodiment, a computing engine is set in a first configuration to implement a second layer of a neural network and to process first data related to a first context to generate first context second layer output. The computing engine can be switched from the first configuration to a second configuration to implement a first layer of the neural network. The computing engine can be used to process second data related to a second context to generate second context first layer output. The computing engine can be set to a third configuration to implement a third layer of the neural network to process the first context second layer output and the second context first layer output to generate a first processing result of the first context and a second processing result of the second context.

    Using encryption within a computing system

    公开(公告)号:US10423541B1

    公开(公告)日:2019-09-24

    申请号:US15388472

    申请日:2016-12-22

    Abstract: The following description is directed to the use of encryption by a computing system. In one example, a method can include determining whether information associated with a logical address is stored unencrypted within an on-chip memory of an integrated circuit or whether the information associated with the logical address is stored encrypted within an off-chip memory external to the integrated circuit. When the information is not stored unencrypted within the on-chip memory and is stored encrypted within the off-chip memory: a page associated with the logical address can be retrieved from the off-chip memory containing the encrypted information; the retrieved page can be decrypted to generate unencrypted information; and the unencrypted information can be stored in a frame of the on-chip memory.

    Error generation using a computer add-in card

    公开(公告)号:US10261880B1

    公开(公告)日:2019-04-16

    申请号:US15384026

    申请日:2016-12-19

    Abstract: A smart add-in card can be leveraged to perform testing on a host server computer. The add-in card can include an embedded processor and memory. Tests can be downloaded to the add-in card to test a communication bus between the host server computer (motherboard) and the add-in card. In a particular example, a PCIe communication bus couples the motherboard to the add-in card and the tests can inject errors on the PCIe communication bus. The tests can be developed to test errors that are typically difficult to test without the use of special hardware. However, the smart add-in card can be a simple Network Interface Card (NIC) that resides on the host server computer during normal operation and is used for communication other than error testing. By using the NIC as a testing device, repeatable and reliable testing can be obtained.

    Controlled bandwidth expansion in compressed disaggregated storage systems

    公开(公告)号:US10218576B1

    公开(公告)日:2019-02-26

    申请号:US16110270

    申请日:2018-08-23

    Abstract: Technologies for performing controlled bandwidth expansion are described. For example, a storage server can receive a request from a client to read compressed data. The storage server can obtain individual storage units of the compressed data. The storage server can also obtain a compressed size and an uncompressed size for each of the storage units. The storage server can generate network packet content comprising the storage units and associated padding such that the size of the padding for a given storage is based on the uncompressed and compressed sizes of the given storage unit. The storage server can send the network packet content to the client in one or more network packets. The client can receive the network packets, discard the padding, and decompress the compressed data from the storage units.

    Checksum circuit
    167.
    发明授权

    公开(公告)号:US09983851B1

    公开(公告)日:2018-05-29

    申请号:US15274844

    申请日:2016-09-23

    CPC classification number: G06F7/727

    Abstract: A hardware circuit computes a checksum using a technique such as the Adler-32 checksum algorithm. The hardware circuit may include one or more serially-connected chains of adders followed by a modulus circuit. The modulus circuit produces a modulus value in N, where N is not an integer power of 2. In some examples, N is 65,521. In some examples, the modulus circuit may produce a modulus value modulo 216 and then correct that value to modulo N. In other examples, the modulus circuit may include selection logic that selects an appropriate integer multiple of 65,521 to determine the modulo 65,521 result directly.

    Configurable sponge function engine
    168.
    发明授权

    公开(公告)号:US09880960B1

    公开(公告)日:2018-01-30

    申请号:US14869775

    申请日:2015-09-29

    CPC classification number: G06F13/4027 G06F13/4221

    Abstract: A configurable sponge function engine. The configurable engine includes a state register having bitrate and capacity sections, each having a variable size, where a sum of the bitrate and capacity sizes is fixed. A controller generates a bitrate size indication. A configurable message processor receives an input message from an input bus, receives the size indication, fragments the input message into fragmented blocks of a size specified by the size indication, and converts the blocks to a bus width of the bitrate and capacity sizes. An iterative calculator receives the blocks, performs iterative processing operations on the blocks, and stores a result of each operation in the state register overwriting a previous register value. An output adaptor receives a value stored in the state register after the block corresponding to the end of the input message is processed and outputs the register value converted to have an output bus width.

    Throughput increase for compute engine

    公开(公告)号:US12260214B1

    公开(公告)日:2025-03-25

    申请号:US17937332

    申请日:2022-09-30

    Abstract: A compute channel can have multiple computational circuit blocks coupled in series to form a pipeline. The compute channel can perform a computation on an input tensor to generate an output tensor based on an instruction. When the computational does not require all of the computational circuit blocks, the throughput of the compute channel can be increased by splitting the data elements of the input tensor into multiple input data streams. The multiple input data streams are provided to respective subsets of one or more computational circuit blocks in the pipeline using bypass circuitry of the computational circuit blocks, and the computation can be performed on multiple input data streams in the respective subsets of one or more computational circuit blocks to generate multiple output data streams corresponding to the output tensor.

    Sparse machine learning acceleration

    公开(公告)号:US12254398B2

    公开(公告)日:2025-03-18

    申请号:US17301271

    申请日:2021-03-30

    Abstract: To reduce the storage size of weight tensors and speed up loading of weight tensors from system memory, a compression technique can be employed to remove zero values from a weight tensor before storing the weight tensor in system memory. A sparsity threshold can be enforced to achieve a compression ratio target by forcing small weight values to zero during training. When the weight tensor is loaded from system memory, a direct memory access (DMA) engine with an in-line decompression unit can decompress the weight tensor on-the-fly. By performing the decompression in the DMA engine, expansion of the weight values back to the original weight tensor size can be carried out in parallel while other neural network computations are being performed by the processing unit.

Patent Agency Ranking