-
1.
公开(公告)号:US20210350210A1
公开(公告)日:2021-11-11
申请号:US17041336
申请日:2018-07-30
Applicant: INTEL CORPORATION
Inventor: Jiong GONG , Haihao SHEN , Xiao Dong LIN , Xiaoli LIU
Abstract: A method and apparatus for keeping statistical inference accuracy with 8-bit winograd convolution. A calibration dataset and a pretrained CNN comprising 32-bit floating point weight values may be sampled to generate an input activation tensor and a weight tensor. A transformed input activation tensor may be generated by multiplying the input activation tensor and an input matrix to generate a transformed input activation tensor. A transformed weight tensor may be generated by multiplying the weight tensor and a weight matrix. A scale factor may be computed for each transformed tensor. An 8-bit CNN model including the scale factors may be generated.
-
2.
公开(公告)号:US20250045586A1
公开(公告)日:2025-02-06
申请号:US18717275
申请日:2022-03-04
Applicant: Intel Corporation
Inventor: Hengyu MENG , Jiong GONG , Xudong LIU , Haihao SHEN
IPC: G06N3/082
Abstract: The application provides a method and apparatus for accelerating deep learning inference based on a HW-aware sparsity pattern. The method may include determining a hardware-aware sparsity pattern based on a register width specified by an ISA of a hardware unit for implementing the DNN for deep learning inference, the sparsity pattern specifying a block size and a sparsity ratio for block-wise sparsification of a weight matrix of an operator in the DNN; performing the block-wise sparsification for the weight matrix based on the sparsity pattern to obtain a sparse weight matrix, during a training process of the DNN; compressing the sparse weight matrix into a concentrated weight matrix by removing all-zero blocks from the sparse weight matrix; and generating a mask to indicate an index of each row of non-zero blocks in the sparse weight matrix to enable extraction of corresponding elements from an activation matrix of the operator.
-
公开(公告)号:US20240281667A1
公开(公告)日:2024-08-22
申请号:US18571151
申请日:2021-10-18
Applicant: Intel Corporation
Inventor: Guokai MA , Jiong GONG , Hongzhen LIU
IPC: G06N3/098
CPC classification number: G06N3/098
Abstract: Provided herein are apparatus and methods for batch rebalance in distributed data parallel DNN training. An apparatus includes interface circuitry; and processor circuitry coupled with the interface circuitry, wherein the processor circuitry is to: obtain sorted samples of a mini batch via the interface circuitry, wherein the sorted samples are in an ascend or descend order based on a volume of each of the samples; and assign the sorted samples to each of a plurality of local batches one by one in an order from a first local batch to a last local batch of the plurality of local batches and then from the last local batch to the first local batch until all of the sorted samples are assigned. Other embodiments may also be disclosed and claimed.
-
公开(公告)号:US20180173614A1
公开(公告)日:2018-06-21
申请号:US15576491
申请日:2015-06-26
Applicant: INTEL CORPORATION
Inventor: Jiong GONG , Yun WANG , Haihao SHEN
IPC: G06F11/36
CPC classification number: G06F11/3664 , G06F3/0304 , G06F3/0481 , G06F3/04817 , G06F11/3688
Abstract: Technologies for device-independent application testing include a host computing device and one or more test computing devices. The host computer device records user interface events generated by an application of the test computing device and video data indicative of the display interface of the application. The host computing device detects user interface objects in the video data that correspond to user interface events using a computer vision algorithm, which may include image feature detection or optical character recognition. The host computing device generates an object-based test script that identifies the user interface object and a user interaction. The host computing device may identify the user interface object in the display interface of an application executed by a different test computing device using the computer vision algorithm. The host computing device performs the specified user interaction on the detected user interface object. Other embodiments are described and claimed.
-
-
-