Concurrent optimization of machine learning model performance

    公开(公告)号:US12182676B2

    公开(公告)日:2024-12-31

    申请号:US18539022

    申请日:2023-12-13

    Abstract: Certain aspects of the present disclosure provide techniques for concurrently performing inferences using a machine learning model and optimizing parameters used in executing the machine learning model. An example method generally includes receiving a request to perform inferences on a data set using the machine learning model and performance metric targets for performance of the inferences. At least a first inference is performed on the data set using the machine learning model to meet a latency specified for generation of the first inference from receipt of the request. While performing the at least the first inference, operational parameters resulting in inference performance approaching the performance metric targets are identified based on the machine learning model and operational properties of the computing device. The identified operational parameters are applied to performance of subsequent inferences using the machine learning model.

    Optimizing machine learning model performance

    公开(公告)号:US11556798B2

    公开(公告)日:2023-01-17

    申请号:US16905541

    申请日:2020-06-18

    Inventor: Meghal Varia

    Abstract: Certain aspects of the present disclosure provide techniques for receiving data defining a neural network; analyzing the data to determine a depth-first cut point for a depth-first traversal portion of an overall network traversal; performing depth-first traversal for the depth-first portion of the overall network traversal; and performing layer-based traversal for a layer-based portion of the overall network traversal.

Patent Agency Ranking