SYSTEM AND METHOD FOR MANAGING PERFORMANCE OF A COMPUTING DEVICE HAVING DISSIMILAR MEMORY TYPES
    4.
    发明申请
    SYSTEM AND METHOD FOR MANAGING PERFORMANCE OF A COMPUTING DEVICE HAVING DISSIMILAR MEMORY TYPES 有权
    用于管理具有DISSIMILAR存储器类型的计算设备的性能的系统和方法

    公开(公告)号:US20140164689A1

    公开(公告)日:2014-06-12

    申请号:US13726537

    申请日:2012-12-24

    CPC classification number: G06F12/0607 G06F13/1647 G06F13/1694

    Abstract: Systems and methods are provided for managing performance of a computing device having dissimilar memory types. An exemplary embodiment comprises a method for interleaving dissimilar memory devices. The method involves determining an interleave bandwidth ratio comprising a ratio of bandwidths for two or more dissimilar memory devices. The dissimilar memory devices are interleaved according to the interleave bandwidth ratio. Memory address requests are distributed from one or more processing units to the dissimilar memory devices according to the interleave bandwidth ratio.

    Abstract translation: 提供了用于管理具有不同存储器类型的计算设备的性能的系统和方法。 示例性实施例包括用于交错不同存储器件的方法。 该方法涉及确定包括两个或多个不同存储器件的带宽比的交织带宽比。 不同的存储器件根据交织带宽比进行交织。 存储器地址请求根据交织带宽比从一个或多个处理单元分配到不同的存储器设备。

    Concurrent optimization of machine learning model performance

    公开(公告)号:US12182676B2

    公开(公告)日:2024-12-31

    申请号:US18539022

    申请日:2023-12-13

    Abstract: Certain aspects of the present disclosure provide techniques for concurrently performing inferences using a machine learning model and optimizing parameters used in executing the machine learning model. An example method generally includes receiving a request to perform inferences on a data set using the machine learning model and performance metric targets for performance of the inferences. At least a first inference is performed on the data set using the machine learning model to meet a latency specified for generation of the first inference from receipt of the request. While performing the at least the first inference, operational parameters resulting in inference performance approaching the performance metric targets are identified based on the machine learning model and operational properties of the computing device. The identified operational parameters are applied to performance of subsequent inferences using the machine learning model.

    Adaptive quantization for execution of machine learning models

    公开(公告)号:US11861467B2

    公开(公告)日:2024-01-02

    申请号:US16810123

    申请日:2020-03-05

    CPC classification number: G06N20/00 G06F11/3466 G06N5/04

    Abstract: Certain aspects of the present disclosure provide techniques for adaptively executing machine learning models on a computing device. An example method generally includes receiving weight information for a machine learning model to be executed on a computing device. The received weight information is reduced into quantized weight information having a reduced bit size relative to the received weight information. First inferences using the machine learning model and the received weight information, and second inferences are performed using the machine learning model and the quantized weight information. Results of the first and second inferences are compared, it is determined that results of the second inferences are within a threshold performance level of results of the first inferences, and based on the determination, one or more subsequent inferences are performed using the machine learning model and the quantized weight information.

    Concurrent optimization of machine learning model performance

    公开(公告)号:US11907810B2

    公开(公告)日:2024-02-20

    申请号:US16515711

    申请日:2019-07-18

    CPC classification number: G06N20/00 G06F11/3466 G06N5/04

    Abstract: Certain aspects of the present disclosure provide techniques for concurrently performing inferences using a machine learning model and optimizing parameters used in executing the machine learning model. An example method generally includes receiving a request to perform inferences on a data set using the machine learning model and performance metric targets for performance of the inferences. At least a first inference is performed on the data set using the machine learning model to meet a latency specified for generation of the first inference from receipt of the request. While performing the at least the first inference, operational parameters resulting in inference performance approaching the performance metric targets are identified based on the machine learning model and operational properties of the computing device. The identified operational parameters are applied to performance of subsequent inferences using the machine learning model.

Patent Agency Ranking