MICRO- SERVICE FRAMEWORK DERIVED FROM THIRD-PARTY APPS

    公开(公告)号:US20180321996A1

    公开(公告)日:2018-11-08

    申请号:US15587311

    申请日:2017-05-04

    IPC分类号: G06F9/54 H04L29/08 G06F9/44

    摘要: Computer systems and methods for generating and interacting with a micro-service framework are provided. A micro-service corresponds to one or more deep link/API calls that carry out some particular function. A static analysis of an app is conducted, from one or more starting sources of the app to identify one or more valid and feasible execution paths, as well as corresponding input parameters within the app. Each valid execution path with corresponding input parameters represent a “deep link” or “API” for that app. The information regarding the deep link is collected and stored as a micro-service in a micro-service catalog. A micro-service framework is implemented that receives a micro-service request (i.e., a request that the micro-service be carried out on behalf of a computer user) from a UX client and executes that micro-service request via execution of the deep link.

    ANALOG MAC AWARE DNN IMPROVEMENT
    2.
    发明公开

    公开(公告)号:US20230185352A1

    公开(公告)日:2023-06-15

    申请号:US17551875

    申请日:2021-12-15

    摘要: Methods, systems and computer program products are provided for improving performance (e.g., reducing power consumption) of a hardware accelerator (e.g., neural processor) comprising hybrid or analog multiply and accumulate (MAC) processing elements (PEs). Selective variation of the precision of an array of MAC PEs may reduce power consumption of a neural processor. Power may be conserved by dynamically controlling the precision of analog to digital (ADC) output bits for one or more MAC PEs. Dynamic control of ADC output bit precision may be based on precision information determined during training and/or post-training (e.g., quantization) of an artificial intelligence (AI) neural network (NN) model implemented by the neural processor. Precision information may include a range of dynamic precision for each of a plurality of nodes of a computation graph for the AI NN model.

    SUBSAMPLING TRAINING DATA DURING ARTIFICIAL NEURAL NETWORK TRAINING

    公开(公告)号:US20200302273A1

    公开(公告)日:2020-09-24

    申请号:US16359663

    申请日:2019-03-20

    IPC分类号: G06N3/04 G06F17/16 G06K9/62

    摘要: Perplexity scores are computed for training data samples during ANN training. Perplexity scores can be computed as a divergence between data defining a class associated with a current training data sample and a probability vector generated by the ANN model. Perplexity scores can alternately be computed by learning a probability density function (“PDF”) fitting activation maps generated by an ANN model during training. A perplexity score can then be computed for a current training data sample by computing a probability for the current training data sample based on the PDF. If the perplexity score for a training data sample is lower than a threshold, the training data sample is removed from the training data set so that it will not be utilized for training during subsequent epochs. Training of the ANN model continues following the removal of training data samples from the training data set.

    DATA-AWARE MODEL PRUNING FOR NEURAL NETWORKS

    公开(公告)号:US20220383123A1

    公开(公告)日:2022-12-01

    申请号:US17334613

    申请日:2021-05-28

    IPC分类号: G06N3/08 G06N3/04

    摘要: Embodiments of the present disclosure include systems and methods for performing data-aware model pruning for neural networks. During a training phase, a neural network is trained with a first set of data. During a validation phase, inference with the neural network is performed using a second set of data that causes the neural network to generate a first set of outputs at a layer in the neural network. During the validation phase, a plurality of mean values and a plurality of variance values are calculated based on the first set of outputs. A plurality of entropy values are calculated based on the plurality of mean values and the plurality of variance values. A second set of outputs are pruned based on the plurality of entropy values. The second set of outputs are generated by the layer of the neural network using a third set of data.

    HIERARCHICAL AND SHARED EXPONENT FLOATING POINT DATA TYPES

    公开(公告)号:US20220253281A1

    公开(公告)日:2022-08-11

    申请号:US17361263

    申请日:2021-06-28

    IPC分类号: G06F7/499 G06F7/48 G06F7/44

    摘要: Embodiments of the present disclosure include systems and methods for providing hierarchical and shared exponent floating point data types. First and second shared exponent values are determined based on exponent values of a plurality of floating point values. A third shared exponent value is determined based the first shared exponent value and the second shared exponent value. First and second difference values are determined based on the first shared exponent value, the second shared exponent value, and the third shared exponent value. Sign values and mantissa values are determined for the plurality of floating point values. The sign value and the mantissa value for each floating point value in the plurality of floating point values, the third shared exponent value, the first difference value, and the second difference value are stored in a data structure for a shared exponent floating point data type.

    BLOCK FLOATING POINT COMPUTATIONS USING REDUCED BIT-WIDTH VECTORS

    公开(公告)号:US20190339937A1

    公开(公告)日:2019-11-07

    申请号:US15971904

    申请日:2018-05-04

    IPC分类号: G06F7/483 G06F17/16

    摘要: A system for block floating point computation in a neural network receives a block floating point number comprising a mantissa portion. A bit-width of the block floating point number is reduced by decomposing the block floating point number into a plurality of numbers each having a mantissa portion with a bit-width that is smaller than a bit-width of the mantissa portion of the block floating point number. One or more dot product operations are performed separately on each of the plurality of numbers to obtain individual results, which are summed to generate a final dot product value. The final dot product value is used to implement the neural network. The reduced bit width computations allow higher precision mathematical operations to be performed on lower-precision processors with improved accuracy.

    ANALOG MAC AWARE DNN IMPROVEMENT
    7.
    发明公开

    公开(公告)号:US20240134439A1

    公开(公告)日:2024-04-25

    申请号:US18400749

    申请日:2023-12-29

    摘要: Methods, systems and computer program products are provided for improving performance (e.g., reducing power consumption) of a hardware accelerator (e.g., neural processor) comprising hybrid or analog multiply and accumulate (MAC) processing elements (PEs). Selective variation of the precision of an array of MAC PEs may reduce power consumption of a neural processor. Power may be conserved by dynamically controlling the precision of analog to digital (ADC) output bits for one or more MAC PEs. Dynamic control of ADC output bit precision may be based on precision information determined during training and/or post-training (e.g., quantization) of an artificial intelligence (AI) neural network (NN) model implemented by the neural processor. Precision information may include a range of dynamic precision for each of a plurality of nodes of a computation graph for the AI NN model.

    NEURAL NETWORK PROCESSING WITH CHAINED INSTRUCTIONS

    公开(公告)号:US20220391209A1

    公开(公告)日:2022-12-08

    申请号:US17883283

    申请日:2022-08-08

    IPC分类号: G06F9/30 G06F9/38

    摘要: Hardware and methods for neural network processing are provided. A method in a hardware node including a pipeline having a matrix vector unit (MVU), a first multifunction unit connected to receive an input from the matrix vector unit, a second multifunction unit connected to receive an output from the first multifunction unit, and a third multifunction unit connected to receive an output from the second multifunction unit is provided. The method includes performing using the MVU a first type of instruction that can only be performed by the MVU to generate a first result. The method further includes performing a second type of instruction that can only be performed by one of the multifunction units and generating a second result and without storing the any of the two results in a global register, passing the second result to the second multifunction and the third multifunction unit.

    TURBO TRAINING FOR DEEP NEURAL NETWORKS

    公开(公告)号:US20220383092A1

    公开(公告)日:2022-12-01

    申请号:US17330395

    申请日:2021-05-25

    IPC分类号: G06N3/08 G06N3/063

    摘要: Embodiments of the present disclosure includes systems and methods for reducing computational cost associated with training a neural network model. A neural network model is received and a neural network training process is executed in which the neural network model is trained according to a first fidelity during a first training phase. As a result of a determination that training of the neural network model during the first training phase satisfies one or more criteria, the neural network model is trained at a second fidelity during a second training phase, the second fidelity being a higher fidelity than the first fidelity.

    REDUCING OPERATIONS FOR TRAINING NEURAL NETWORKS

    公开(公告)号:US20220366236A1

    公开(公告)日:2022-11-17

    申请号:US17322194

    申请日:2021-05-17

    IPC分类号: G06N3/08 G06N3/04

    摘要: Embodiments of the present disclosure include systems and methods for reducing operations for training neural networks. A plurality of training data selected from a training data set is used as a plurality of inputs for training a neural network. The neural network includes a plurality of weights. A plurality of loss values are determined based on outputs generated by the neural network and expected output data of the plurality of training data. A subset of the plurality of loss values are determined. An average loss value is determined based on the subset of the plurality of loss values. A set of gradients is calculated based on the average loss value and the plurality of weights in the neural network. The plurality of weights in the neural network are adjusted based on the set of gradients.