METHOD AND APPARATUS FOR ACCELERATING DEEP LEANING INFERENCE BASED ON HW-AWARE SPARSITY PATTERN

    公开(公告)号:US20250045586A1

    公开(公告)日:2025-02-06

    申请号:US18717275

    申请日:2022-03-04

    Abstract: The application provides a method and apparatus for accelerating deep learning inference based on a HW-aware sparsity pattern. The method may include determining a hardware-aware sparsity pattern based on a register width specified by an ISA of a hardware unit for implementing the DNN for deep learning inference, the sparsity pattern specifying a block size and a sparsity ratio for block-wise sparsification of a weight matrix of an operator in the DNN; performing the block-wise sparsification for the weight matrix based on the sparsity pattern to obtain a sparse weight matrix, during a training process of the DNN; compressing the sparse weight matrix into a concentrated weight matrix by removing all-zero blocks from the sparse weight matrix; and generating a mask to indicate an index of each row of non-zero blocks in the sparse weight matrix to enable extraction of corresponding elements from an activation matrix of the operator.

    APPARATUS AND METHOD FOR BATCH REBALANCE IN DISTRIBUTED DATA PARALLEL DNN TRAINING

    公开(公告)号:US20240281667A1

    公开(公告)日:2024-08-22

    申请号:US18571151

    申请日:2021-10-18

    CPC classification number: G06N3/098

    Abstract: Provided herein are apparatus and methods for batch rebalance in distributed data parallel DNN training. An apparatus includes interface circuitry; and processor circuitry coupled with the interface circuitry, wherein the processor circuitry is to: obtain sorted samples of a mini batch via the interface circuitry, wherein the sorted samples are in an ascend or descend order based on a volume of each of the samples; and assign the sorted samples to each of a plurality of local batches one by one in an order from a first local batch to a last local batch of the plurality of local batches and then from the last local batch to the first local batch until all of the sorted samples are assigned. Other embodiments may also be disclosed and claimed.

    TECHNOLOGIES FOR DEVICE INDEPENDENT AUTOMATED APPLICATION TESTING

    公开(公告)号:US20180173614A1

    公开(公告)日:2018-06-21

    申请号:US15576491

    申请日:2015-06-26

    Abstract: Technologies for device-independent application testing include a host computing device and one or more test computing devices. The host computer device records user interface events generated by an application of the test computing device and video data indicative of the display interface of the application. The host computing device detects user interface objects in the video data that correspond to user interface events using a computer vision algorithm, which may include image feature detection or optical character recognition. The host computing device generates an object-based test script that identifies the user interface object and a user interaction. The host computing device may identify the user interface object in the display interface of an application executed by a different test computing device using the computer vision algorithm. The host computing device performs the specified user interaction on the detected user interface object. Other embodiments are described and claimed.

Patent Agency Ranking