Scalable Switch Capacitor Computation Cores for Accurate and Efficient Deep Learning Inference

    公开(公告)号:US20240176584A1

    公开(公告)日:2024-05-30

    申请号:US18071230

    申请日:2022-11-29

    CPC classification number: G06F7/523

    Abstract: An apparatus comprising: a first plurality of inputs representing an activation input vector; a second plurality of inputs representing a weight input vector; an analog multiplier-and-accumulator to generate a first analog voltage representing a first multiply-and-accumulate result for the said first inputs and the second inputs; a voltage multiplier that takes the said first analog voltage and produces a second analog voltage representing, a second multiply-and-accumulate result by multiplying at least one scaling factor to the first analog voltage; an analog to digital converter configured to convert the said second analog voltage multiply-and-accumulate result into a digital signal using a limited-precision operation during a neural network inference operation; and a hardware controller configured to determine the at least one scaling factor based on the first multiply-and-accumulate result, or a software controller configured to determine the at least one scaling factor based on the first multiply-and-accumulate result.

    Computational efficiency in symbolic sequence analytics using random sequence embeddings

    公开(公告)号:US11227231B2

    公开(公告)日:2022-01-18

    申请号:US15972108

    申请日:2018-05-04

    Abstract: A method and system of analyzing a symbolic sequence is provided. Metadata of a symbolic sequence is received from a computing device of an owner. A set of R random sequences are generated based on the received metadata and sent to the computing device of the owner of the symbolic sequence for computation of a feature matrix based on the set of R random sequences and the symbolic sequence. The feature matrix is received from the computing device of the owner. Upon determining that an inner product of the feature matrix is below a threshold accuracy, the iterative process returns to generating R random sequences. Upon determining that the inner product of the feature matrix is at or above the threshold accuracy, the feature matrix is categorized based on machine learning. The categorized global feature matrix is sent to be displayed on a user interface of the computing device of the owner.

    MIXED PRECISION CAPABLE HARDWARE FOR TUNING A MACHINE LEARNING MODEL

    公开(公告)号:US20210064372A1

    公开(公告)日:2021-03-04

    申请号:US16558536

    申请日:2019-09-03

    Abstract: An apparatus includes a memory and a processor coupled to the memory. The processor includes first and second sets of arithmetic units having first and second precision for floating-point computations, the second precision being lower than the first precision. The processor is configured to obtain a machine learning model trained in the first precision, to utilize the second set of arithmetic units to perform inference on input data, to utilize the first set of arithmetic units to generate feedback for updating parameters of the second set of arithmetic units based on the inference performed on the input data by the second set of arithmetic units, to tune parameters of the second set of arithmetic units based at least in part on the feedback generated by the first set of arithmetic units, and to utilize the second set of arithmetic units with the tuned parameters to generate inference results.

    ASYNCHRONOUS GRADIENT WEIGHT COMPRESSION
    25.
    发明申请

    公开(公告)号:US20200175422A1

    公开(公告)日:2020-06-04

    申请号:US16204770

    申请日:2018-11-29

    Abstract: Systems, computer-implemented methods, and computer program products to facilitate gradient weight compression are provided. According to an embodiment, a system can comprise a memory that stores computer executable components and a processor that executes the computer executable components stored in the memory. The computer executable components can comprise a pointer component that can identify one or more compressed gradient weights not present in a first concatenated compressed gradient weight. The computer executable components can further comprise a compression component that can compute a second concatenated compressed gradient weight based on the one or more compressed gradient weights to update a weight of a learning entity of a machine learning system.

    True random generator (TRNG) in ML accelerators for NN dropout and initialization

    公开(公告)号:US10579583B2

    公开(公告)日:2020-03-03

    申请号:US15232177

    申请日:2016-08-09

    Abstract: A random number signal generator used for performing dropout or weight initialization for a node in a neural network. The random number signal generator includes a transistor which generates a random noise signal. The transistor includes a substrate, source and drain regions formed in the substrate, a first insulating layer formed over a channel of the transistor, a first trapping layer formed over the first insulating layer, a second insulating layer formed over the first trapping layer, and a second trapping layer formed over the second insulating layer. One or more traps in the first and second trapping layers are configured to capture or release one or more carriers flowing through the channel region. The random noise signal is generated as a function of one or more carrier being captured or released by the one or more traps.

    HETEROGENEOUS RUNAHEAD CORE FOR DATA ANALYTICS

    公开(公告)号:US20170344485A1

    公开(公告)日:2017-11-30

    申请号:US15164551

    申请日:2016-05-25

    Abstract: Techniques that facilitate heterogeneous runahead processing for a processor core are provided. In one example, a first core performs a first execution of a first sequence of instructions, where the first core is communicatively coupled to a first cache memory. A second core performs a second execution of at least a portion of the first sequence of instructions and a first determination that data associated with the first sequence of instructions fails to be stored in the first cache memory, where the first determination is performed concurrent with the first execution, and the first core executes a second sequence of instructions based on a second determination that the second core is performing the second execution of at least a portion of the first sequence of instructions.

Patent Agency Ranking