Bistable-element for random number generation

    公开(公告)号:US10187044B1

    公开(公告)日:2019-01-22

    申请号:US15694165

    申请日:2017-09-01

    Abstract: A bistable cell includes a pair of inverters and multiple pairs of cross-coupled tristate buffers. Each pair of tristate buffers can be individually selected to implement an entropy harvesting state for the bistable cell. Each of the tristate buffers generally has lower strength than the inverters but the inverter-to-buffer strength ratio can be configured through selective use of one or more of the tristate buffer pairs. The resulting entropy harvesting state behavior can be varied based on the inverter-to-buffer strength ratio in terms of greater randomness of the output bits or decreased power consumption.

    Reduced-latency packet ciphering
    122.
    发明授权

    公开(公告)号:US09960908B1

    公开(公告)日:2018-05-01

    申请号:US14869673

    申请日:2015-09-29

    CPC classification number: H04L9/0637 H04L9/065 H04L9/0838

    Abstract: A hardware cipher module to cipher a packet. The cipher module includes a key scheduling engine and a ciphering engine. The key scheduling engine is configured to receive a compact key and iteratively generate a set of round keys, including a first round key, based on the compact key and determine, based upon a cipher mode indication and a type of ciphering whether to generate a key-scheduling-done indication after the first round key is generated and before all of the set of round keys are generated or to generate the key-scheduling-done indication after all of the set of round keys is generated. The ciphering engine is configured to begin to cipher the packet with one of the set of round keys as a result of receiving the key schedule done indication.

    Distributive training with multicast

    公开(公告)号:US12189569B1

    公开(公告)日:2025-01-07

    申请号:US17449300

    申请日:2021-09-29

    Inventor: Kun Xu Ron Diamant

    Abstract: Techniques for distributing data associated with the weight values of a neural network model are described. The techniques can include performing computations associated with the neural network model in a neural network accelerator to generate data associated with weights of the neural network model. A multicast request packet is then generated to distribute the data. The multicast request packet may contain the data associated with the weights, and an address in a multicast address range of a peripheral bus multicast switch. The multicast request packet is sent to a port of the peripheral bus multicast switch, and in response, the peripheral bus multicast switch generates multiple packets containing the data from the multicast request packet and forwards them to multiple peripheral bus ports corresponding to other processing nodes of the system.

    Color selection schemes for storage allocation

    公开(公告)号:US12182549B1

    公开(公告)日:2024-12-31

    申请号:US18230988

    申请日:2023-08-07

    Abstract: A compiler-implemented technique for performing a storage allocation is described. Computer code to be converted into machine instructions for execution on an integrated circuit device is received. The integrated circuit device includes a memory having a set of memory locations. Based on the computer code, a set of values that are to be stored on the integrated circuit device are determined. An interference graph that includes the set of values and a set of interferences is constructed. While traversing the interference graph, a set of memory location assignments are generated by assigning the set of values to the set of memory locations in accordance with one or more color selection schemes.

    Scheduling for locality of reference to memory

    公开(公告)号:US12131188B1

    公开(公告)日:2024-10-29

    申请号:US18192081

    申请日:2023-03-29

    CPC classification number: G06F9/4881 G06F8/43 G06F8/433 G06N3/063

    Abstract: A technique for scheduling instructions includes obtaining a set of instructions that operate on memory objects, and determining the dependencies of the memory objects. The memory objects are then sorted into a sequence of memory objects based on the dependencies of the memory objects, and the set of instructions are scheduled into a sequence of instructions according to the sequence of memory objects. Sorting memory objects allows instructions that operate on the same memory object to be kept together. This helps minimize spilling conditions because intervening instructions that do not operate on the same memory object can be avoided.

    Matrix transpose hardware acceleration

    公开(公告)号:US12125124B1

    公开(公告)日:2024-10-22

    申请号:US18118251

    申请日:2023-03-07

    Inventor: Kun Xu Ron Diamant

    Abstract: In one example, an apparatus comprises: a buffer memory; and a memory access circuit configured to: fetch, from a first memory, a set of first groups of data elements of a first matrix, each first group of data elements being stored at consecutive memory addresses at the first memory; based on a first configuration, store the set of first groups of data elements at consecutive memory addresses or at non-consecutive memory addresses at the buffer memory; based on a second configuration that defines a memory address offset, fetch a set of second groups of the data elements from the buffer memory, each second group of the data elements being stored at consecutive memory addresses of the buffer memory, each second group being separated by the memory address offset in the buffer memory; and store each fetched second group at consecutive addresses of a destination memory to form a second matrix.

    Neural network training under memory restraint

    公开(公告)号:US12106222B2

    公开(公告)日:2024-10-01

    申请号:US18112036

    申请日:2023-02-21

    CPC classification number: G06N3/084 G06N3/04

    Abstract: Methods and systems for training a neural network are provided. In one example, an apparatus comprises a memory that stores instructions; and a hardware processor configured to execute the instructions to: control a neural network processor to perform a loss gradient operation to generate data gradients; after the loss gradient operation completes, control the neural network processor to perform a forward propagation operation to generate intermediate outputs; control the neural network processor to perform a backward propagation operation based on the data gradients and the intermediate outputs to generate weight gradients; receive the weight gradients from the neural network processor; and update weights of a neural network based on the weight gradients.

    Reconfigurable neural network processing based on subgraph recognition

    公开(公告)号:US12045611B1

    公开(公告)日:2024-07-23

    申请号:US18231024

    申请日:2023-08-07

    Abstract: In one example, a method comprises: receiving input codes, wherein the input codes represent a computational dataflow graph; traversing the computational dataflow graph to identify single-entry-single-exit (SESE) subgraphs of the computational dataflow graph, wherein each SESE subgraph has a sequence of nodes comprising a root node and a child node and representing a sequence of element-wise operators, wherein the root node receives a single input tensor, and wherein the child node outputs a single output tensor; determining a merged operator for each SESE subgraph; and generating executable instructions for the computational dataflow graph to be executed by a hardware accelerator having a first execution unit and a second execution unit, wherein the executable instructions comprise first executable instructions for the merged operators targeted at the first execution unit, and second executable instructions for other operators of the computational dataflow graph targeted at the second execution unit.

Patent Agency Ranking