-
公开(公告)号:US11029745B2
公开(公告)日:2021-06-08
申请号:US16184934
申请日:2018-11-08
Applicant: QUALCOMM INCORPORATED
Inventor: Kyle Ernewein , Jason Edward Podaima , Francisco Perez , John Daniels , Alex Miler , Jeffrey Gemar , Rexford Alan Hill , Haoping Xu
IPC: G06F1/32 , G06F1/324 , G06F1/3228
Abstract: Systems and methods are disclosed method for controlling instantaneous current changes in parallel processors with arrays of parallel computing elements, such as neural processors. An exemplary method comprises monitoring the array of computing elements and determining a transition from a first activity level of the array to a second activity level of the array, such as an idle-to-active or active-to-idle transition. Once a transition is determined, the array is selectively controlled to minimize the instantaneous current change from the transition from the first activity level to the second activity level.
-
公开(公告)号:US20170083997A1
公开(公告)日:2017-03-23
申请号:US14857303
申请日:2015-09-17
Applicant: QUALCOMM Incorporated
Inventor: Andrew Evan Gruber , Rexford Alan Hill , Shambhoo Khandelwal
IPC: G06T1/60
CPC classification number: G06T1/60 , G06F12/0207 , G06F2212/401 , G06T1/20 , G06T11/40 , G06T2210/08
Abstract: A computing device may allocate a plurality of blocks in the memory, wherein each of the plurality of blocks is of a uniform fixed size in the memory. The computing device may further store a plurality of bandwidth-compressed graphics data into the respective plurality of blocks in the memory, wherein one or more of the plurality of bandwidth-compressed graphics data each has a size that is smaller than the fixed size. The computing device may further store data associated with the plurality of bandwidth-compressed graphics data into unused space of one or more of the plurality of blocks that contains the respective one or more of the plurality of bandwidth-compressed graphics data.
-
公开(公告)号:US12131130B2
公开(公告)日:2024-10-29
申请号:US18105159
申请日:2023-02-02
Applicant: QUALCOMM Incorporated
Inventor: Rexford Alan Hill , Aaron Douglass Lamb , Michael Goldfarb , Amin Ansari , Christopher Lott
CPC classification number: G06F7/5443 , G06F5/06 , G06N3/063
Abstract: A method of exploiting activation sparsity in deep neural networks is described. The method includes retrieving an activation tensor and a weight tensor where the activation tensor is a sparse activation tensor. The method also includes generating a compressed activation tensor comprising non-zero activations of the activation tensor, where the compressed activation tensor has fewer columns than the activation tensor. The method further includes processing the compressed activation tensor and the weight tensor to generate an output tensor.
-
公开(公告)号:US11010313B2
公开(公告)日:2021-05-18
申请号:US16556094
申请日:2019-08-29
Applicant: QUALCOMM Incorporated
Inventor: Colin Beaton Verrilli , Natarajan Vaidhyanathan , Rexford Alan Hill
Abstract: A method, apparatus, and system for an architecture for machine learning acceleration is presented. An apparatus includes a plurality of processing elements, each including a tightly-coupled memory, and a memory system coupled to the processing elements. A global synchronization manager is coupled to the plurality of the processing elements and to the memory system. The processing elements do not implement a coherency protocol with respect to the memory system. The processing elements implement direct memory access with respect to the memory system, and the global synchronization manager is configured to synchronize operations of the plurality of processing elements through the TCMs.
-
公开(公告)号:US11669747B2
公开(公告)日:2023-06-06
申请号:US16667821
申请日:2019-10-29
Applicant: QUALCOMM Incorporated
Inventor: Rexford Alan Hill , Eric Wayne Mahurin , Aaron Douglass Lamb , Albert Danysh , Erich Plondke , David Hoyle
CPC classification number: G06N5/01 , G06F17/17 , G06N3/048 , G06N3/08 , H03M7/24 , H03K19/20 , H03K19/21
Abstract: A method of constraining data represented in a deep neural network is described. The method includes determining an initial shifting specified to convert a fixed-point input value to a floating-point output value. The method also includes determining an additional shifting specified to constrain a dynamic range during converting of the fixed-point input value to the floating-point output value. The method further includes performing both the initial shifting and the additional shifting together to form a dynamic, range constrained, normalized floating-point output value.
-
公开(公告)号:US11487998B2
公开(公告)日:2022-11-01
申请号:US16443695
申请日:2019-06-17
Applicant: QUALCOMM Incorporated
Inventor: Rexford Alan Hill , Sruthikesh Surineni , Adrienne Milner , Vito Bica
Abstract: In one embodiment, a depth-first deep convolutional network (DCN) having a first convolutional layer having a first first-layer kernel and adapted to convolve a first input and a second convolutional layer having a first second-layer kernel and adapted to convolve a second-layer input. A method for the DCN includes initiating convolution in the first convolution layer of the first input tensor with the first first-layer kernel to generate a value strip for the second input tensor and, prior to completion of the convolution in the first convolution layer, initiating convolution in the second convolution layer of the second input with the first second-layer kernel to generate a value strip for a third layer.
-
-
-
-
-