-
公开(公告)号:US20190303156A1
公开(公告)日:2019-10-03
申请号:US15942344
申请日:2018-03-30
Applicant: QUALCOMM Incorporated
Inventor: Amrit PANDA , Francisco PEREZ , Karamvir CHATHA
Abstract: An apparatus for hardware acceleration for use in operating a computational network is configured for determining that a loop structure including one or more loops is to be executed by a first processor. Each of the one or more loops includes a set of operations. The loop structure may be configured as a nested loop, a cascaded or a combination of the two. A second processor may be configured to decouple overhead operations of the loop structure from compute operations of the loop structure. The apparatus accelerates processing of the loop structure by simultaneously processing the overhead operations using the second processor separately from processing the compute operations based on the configuration to operate the computational network.
-
公开(公告)号:US20180164866A1
公开(公告)日:2018-06-14
申请号:US15377858
申请日:2016-12-13
Applicant: QUALCOMM Incorporated
Inventor: Yatish Girish TURAKHIA , Javid JAFFARI , Amrit PANDA , Karamvir CHATHA
CPC classification number: G06F1/3206 , G06N3/02 , G06N3/0454 , G06N3/063
Abstract: A method, a computer-readable medium, and an apparatus for reducing power consumption of a neural network are provided. The apparatus may retrieve, from a tag storage, at least one tag value of a first tag value for a weight in the neural network or a second tag value for an activation in the neural network. The first tag value may indicate whether the weight is zero and the second tag value may indicate whether the activation is zero. The weight and the activation are to be loaded to a multiplier of a multiplier-accumulator unit as a pair of operands. The apparatus may determine whether the at least one tag value indicates a zero value. The apparatus may disable loading the weight and the activation to the multiplier when the at least one tag value indicates a zero value. The apparatus may disable updating of zero-value activations.
-
公开(公告)号:US20210279635A1
公开(公告)日:2021-09-09
申请号:US16810123
申请日:2020-03-05
Applicant: QUALCOMM Incorporated
Inventor: Serag GADELRAB , Karamvir CHATHA , Ofer ROSENBERG
Abstract: Certain aspects of the present disclosure provide techniques for adaptively executing machine learning models on a computing device. An example method generally includes receiving weight information for a machine learning model to be executed on a computing device. The received weight information is reduced into quantized weight information having a reduced bit size relative to the received weight information. First inferences using the machine learning model and the received weight information, and second inferences are performed using the machine learning model and the quantized weight information. Results of the first and second inferences are compared, it is determined that results of the second inferences are within a threshold performance level of results of the first inferences, and based on the determination, one or more subsequent inferences are performed using the machine learning model and the quantized weight information.
-
公开(公告)号:US20200089497A1
公开(公告)日:2020-03-19
申请号:US16134945
申请日:2018-09-18
Applicant: QUALCOMM Incorporated
Inventor: Rakesh KOMURAVELLI , Amin ANSARI , Ramesh Chandra CHAUHAN , Karamvir CHATHA
Abstract: Systems and methods for of minimizing control variance overhead in a dataflow processor include receiving a generating instruction specifying at least an acknowledge predicate based on a first number, a second number, and a first value, wherein a true branch comprises the first number of consumer instructions of the generating instruction based on the first value, used as a first predicate, being true; and a false branch comprises a second number of consumer instructions of the generating instruction based on the first value, used as the first predicate, being false. The acknowledge predicate is evaluated to be a selected number, which is the first number if the first value is true, or the second number if the first value is false. The generating instruction is fired upon the selected number of acknowledge arcs being received from the true branch or the false branch.
-
公开(公告)号:US20180189056A1
公开(公告)日:2018-07-05
申请号:US15393670
申请日:2016-12-29
Applicant: QUALCOMM Incorporated
Inventor: Yatish Girish TURAKHIA , Javid JAFFARI , Amrit PANDA , Karamvir CHATHA
IPC: G06F9/30
CPC classification number: G06F9/3001 , G06N3/0454 , G06N3/063
Abstract: A method, a computer-readable medium, and an apparatus for a sparse neural network are provided. The apparatus may include a hardware accelerator. The apparatus may determine, for each pair of operands to be processed by a MAR unit, whether both operands of the pair are non-zero. The apparatus may prevent a pair of operands to be processed by the MAR unit from being loaded to a multiplier of the MAR unit when an operand of the pair of operands is zero. The apparatus may place the pair of operands into one of a plurality of queues when both operands of the pair of operands are non-zero.
-
公开(公告)号:US20180060278A1
公开(公告)日:2018-03-01
申请号:US15255015
申请日:2016-09-01
Applicant: QUALCOMM Incorporated
Inventor: Dexu LIN , Edward LIAO , Somdeb MAJUMDAR , Aaron LAMB , Karamvir CHATHA
IPC: G06F17/17
CPC classification number: G06F17/17 , G06F7/544 , G06F2207/5354
Abstract: Computing a non-linear function ƒ(x) in hardware or embedded systems can be complex and resource intensive. In one or more aspects of the disclosure, a method, a computer-readable medium, and an apparatus are provided for computing a non-linear function ƒ(x) accurately and efficiently in hardware using look-up tables (LUTs) and interpolation or extrapolation. The apparatus may be a processor. The processor computes a non-linear function ƒ(x) for an input variable x, where ƒ(x)=g(y(x),z(x)). The processor determines an integer n by determining a position of a most significant bit (MSB) of an input variable x. In addition, the processor determines a value for y(x) based on a first look-up table and the determined integer n. Also, the processor determines a value for z(x) based on n and the input variable x, and based on a second look-up table. Further, the processor computes ƒ(x) based on the determined values for y(x) and z(x).
-
-
-
-
-