OPTIMIZING LOW PRECISION AND SPARSITY INFERENCE WITHOUT RETRAINING

    公开(公告)号:US20240211762A1

    公开(公告)日:2024-06-27

    申请号:US18146828

    申请日:2022-12-27

    CPC classification number: G06N3/082

    Abstract: An apparatus and method for efficiently creating less computationally intensive nodes for a neural network. In various implementations, a computing system includes a processor and a memory with circuitry that stores multiple input data values to process during inference of a trained neural network. The processor determines, during inference, which node input values, node intermediate values, and node output values of the trained neural network to represent in a respective one of multiple available floating-point formats with less precision. No retraining is performed, but rather, the updates to the representations occur during inference. The processor uses selection criteria to reduce the amount of computation involved for updating the representations during inference while maintaining accuracy above an accuracy threshold. To do so, the processor uses the selection criteria to reduce the number of layers, the number of nodes within a layer, and the number of weight values per node to inspect.

Patent Agency Ranking