-
公开(公告)号:US20180300617A1
公开(公告)日:2018-10-18
申请号:US15953388
申请日:2018-04-13
发明人: Chad Balling McBRIDE , Amol Ashok AMBARDEKAR , Kent D. CEDOLA , Boris BOBROV , George PETRE , Larry Marvin WALL
摘要: An exemplary artificial intelligence/machine learning hardware computing environment having an exemplary DNN module cooperating with one or more memory components can perform data sharing and distribution as well reuse of a buffer data to reduce the number of memory component read/writes thereby enhancing overall hardware performance and reducing power consumption. Illustratively, data from a cooperating memory component is read according to a selected operation of the exemplary hardware and written to corresponding other memory component for use by one or more processing elements (e.g., neurons). The data is read in such a manner to optimize the engagement of the one or more processing elements for each processing cycle as well as to reuse data previously stored in the one or more cooperating memory components. Operatively, the written data is copied to a shadow memory buffer prior to being consumed by the processing elements.
-
2.
公开(公告)号:US20180300616A1
公开(公告)日:2018-10-18
申请号:US15953330
申请日:2018-04-13
发明人: Amol Ashok AMBARDEKAR , Boris BOBROV , Chad Balling McBRIDE , George PETRE , Kent D. CEDOLA , Larry Marvin WALL
摘要: A deep neural network (DNN) module is disclosed that can dynamically partition neuron workload to reduce power consumption. The DNN module includes neurons and a group partitioner and scheduler unit. The group partitioner and scheduler unit divides a workload for the neurons into partitions in order to maximize the number of neurons that can simultaneously process the workload. The group partitioner and scheduler unit then assigns a group of neurons to each of the partitions. The groups of neurons in the DNN module process the workload in their assigned partition to generate a partial output value. The neurons in each group can then sum their partial output values to generate a final output value for the workload. The neurons can be powered down once the groups of neurons have completed processing their assigned workload to reduce power consumption.
-
公开(公告)号:US20210232904A1
公开(公告)日:2021-07-29
申请号:US17232074
申请日:2021-04-15
发明人: Amol Ashok AMBARDEKAR , Aleksandar TOMIC , Chad Balling McBRIDE , George PETRE , Kent D. CEDOLA , Larry Marvin Wall , Boris BOBROV
IPC分类号: G06N3/063 , G06N3/04 , G06F12/0862 , G06F9/46 , G06F1/324 , G06F3/06 , G06F9/38 , G06F12/08 , G06F12/10 , G06F15/80 , G06F17/15 , G06N3/06 , G06N3/08 , G06N3/10 , H03M7/30 , H04L12/715 , H04L29/08 , G06F9/30 , G06F13/16 , G06F1/3234 , G06F12/02 , G06F13/28
摘要: The performance of a neural network (NN) and/or deep neural network (DNN) can limited by the number of operations being performed as well as memory data management of a NN/DNN. Using vector quantization of neuron weight values, the processing of data by neurons can be optimize the number of operations as well as memory utilization to enhance the overall performance of a NN/DNN. Operatively, one or more contiguous segments of weight values can be converted into one or more vectors of arbitrary length and each of the one or more vectors can be assigned an index. The generated indexes can be stored in an exemplary vector quantization lookup table and retrieved by exemplary fast weight lookup hardware at run time on the fly as part of an exemplary data processing function of the NN as part of an inline de-quantization operation to obtain needed one or more neuron weight values.
-
公开(公告)号:US20190187771A1
公开(公告)日:2019-06-20
申请号:US15847785
申请日:2017-12-19
发明人: Amol Ashok AMBARDEKAR , Chad Balling MCBRIDE , George PETRE , Kent D. CEDOLA , Larry Marvin WALL
CPC分类号: G06F1/3243 , G06F1/3275 , G06N3/04 , G06N3/0454 , G06N3/063
摘要: Techniques to provide for improved (i.e., reduced) power consumption in an exemplary neural network (NN) and/or Deep Neural Network (DNN) environment using data management. Improved power consumption in the NN/DNN may be achieved by reducing a number of bit flips needed to process operands associated with one or more storages. Reducing the number bit flips associated with the NN/DNN may be achieved by multiplying an operand associated with a first storage with a plurality of individual operands associated with a plurality of kernels of the NN/DNN. The operand associated with the first storage may be neuron input data and the plurality of individual operands associated with the second storage may be weight values for multiplication with the neuron input data. The plurality of kernels may be arranged or sorted and subsequently processed in a manner that improves power consumption in the NN/DNN.
-
5.
公开(公告)号:US20180300614A1
公开(公告)日:2018-10-18
申请号:US15951106
申请日:2018-04-11
发明人: Amol Ashok AMBARDEKAR , Kent D. CEDOLA , Larry Marvin WALL , Boris BOBROV , George PETRE , Chad Balling McBRIDE
摘要: A deep neural network (DNN) processor is configured to execute descriptors in layer descriptor lists. The descriptors define instructions for performing a pass of a DNN by the DNN processor. Several types of descriptors can be utilized: memory-to-memory move (M2M) descriptors; operation descriptors; host communication descriptors; configuration descriptors; branch descriptors; and synchronization descriptors. A DMA engine uses M2M descriptors to perform multi-dimensional strided DMA operations. Operation descriptors define the type of operation to be performed by neurons in the DNN processor and the activation function to be used by the neurons. M2M descriptors are buffered separately from operation descriptors and can be executed at soon as possible, subject to explicitly set dependencies. As a result, latency can be reduced and, consequently, the neurons can complete their processing faster. The DNN module can then be powered down earlier than it otherwise would have, thereby saving power.
-
6.
公开(公告)号:US20200233820A1
公开(公告)日:2020-07-23
申请号:US16843800
申请日:2020-04-08
发明人: Chad Balling McBRIDE , Timothy Hume HEIL , Amol Ashok AMBARDEKAR , George PETRE , Kent D. CEDOLA , Larry Marvin WALL , Boris BOBROV
IPC分类号: G06F13/16 , G06N3/04 , G06N3/063 , G06F12/0862 , G06F9/46 , G06F1/324 , G06F3/06 , G06F9/38 , G06F12/08 , G06F12/10 , G06F15/80 , G06F17/15 , G06N3/06 , G06N3/08 , G06N3/10 , H03M7/30 , H04L12/715 , H04L29/08 , G06F1/3234 , G06F12/02 , G06F13/28
摘要: An exemplary computing environment having a DNN module can maintain one or more bandwidth throttling mechanisms. Illustratively, a first throttling mechanism can specify the number of cycles to wait between transactions on a cooperating fabric component (e.g., data bus). Illustratively, a second throttling mechanism can be a transaction count limiter that operatively sets a threshold of a number of transactions to be processed during a given transaction sequence and limits the number of transactions such as multiple transactions in flight to not exceed the set threshold. In an illustrative operation, in executing these two exemplary calculated throttling parameters, the average bandwidth usage and the peak bandwidth usage can be limited. Operatively, with this fabric bandwidth control, the processing units of the DNN are optimized to process data across each transaction cycle resulting in enhanced processing and lower power consumption.
-
公开(公告)号:US20180300607A1
公开(公告)日:2018-10-18
申请号:US15813952
申请日:2017-11-15
发明人: George PETRE , Chad Balling McBRIDE , Amol Ashok AMBARDEKAR , Kent D. CEDOLA , Larry Marvin WALL , Boris BOBROV
摘要: The performance of a neural network (NN) and/or deep neural network (DNN) can be limited by the number of operations being performed as well as management of data among the various memory components of the NN/DNN. By inserting a selected padding in the input data to align the input data in memory, data read/writes can be optimized for processing by the NN/DNN thereby enhancing the overall performance of a NN/DNN. Operatively, an operations controller/iterator can generate one or more instructions that inserts the selected padding into the data. The data padding can be calculated using various characteristics of the input data as well as the NN/DNN as well as characteristics of the cooperating memory components. Padding on the output data can be utilized to support the data alignment at the memory components and the cooperating processing units of the NN/DNN.
-
8.
公开(公告)号:US20180300605A1
公开(公告)日:2018-10-18
申请号:US15953195
申请日:2018-04-13
发明人: Amol Ashok AMBARDEKAR , Chad Balling McBRIDE , George PETRE , Larry Marvin WALL , Kent D. CEDOLA , Boris BOBROV
摘要: A deep neural network (“DNN”) module can determine whether processing of certain values in an input buffer or a weight buffer by neurons can be skipped. For example, the DNN module might determine whether neurons can skip the processing of values in entire columns of a neuron buffer. Processing of these values might be skipped if an entire column of an input buffer or a weight buffer are zeros, for example. The DNN module can also determine whether processing of single values in rows of the input buffer or the weight buffer can be skipped (e.g. if the values are zero). Neurons that complete their processing early as a result of skipping operations can assist other neurons with their processing. A combination operation can be performed following the completion of processing that transfers the results of the processing operations performed by a neuron to their correct owner.
-
9.
公开(公告)号:US20180300603A1
公开(公告)日:2018-10-18
申请号:US15881519
申请日:2018-01-26
发明人: Amol Ashok AMBARDEKAR , Aleksandar TOMIC , Chad Balling McBRIDE , George PETRE , Kent D. CEDOLA , Larry Marvin Wall , Boris BOBROV
IPC分类号: G06N3/04
摘要: The performance of a neural network (NN) and/or deep neural network (DNN) can limited by the number of operations being performed as well as memory data management of a NN/DNN. Using vector quantization of neuron weight values, the processing of data by neurons can be optimize the number of operations as well as memory utilization to enhance the overall performance of a NN/DNN. Operatively, one or more contiguous segments of weight values can be converted into one or more vectors of arbitrary length and each of the one or more vectors can be assigned an index. The generated indexes can be stored in an exemplary vector quantization lookup table and retrieved by exemplary fast weight lookup hardware at run time on the flyas part of an exemplary data processing function of the NN as part of an inline de-quantization operation to obtain needed one or more neuron weight values.
-
10.
公开(公告)号:US20180300602A1
公开(公告)日:2018-10-18
申请号:US15786514
申请日:2017-10-17
发明人: George PETRE , Chad Balling McBRIDE , Amol Ashok AMBARDEKAR , Kent D. CEDOLA , Larry Marvin WALL , Boris BOBROV
摘要: The performance of a neural network (NN) and/or deep neural network (DNN) can limited by the number of operations being performed as well as management of data among the various memory components of the NN/DNN. Using a directed line buffer that operatively inserts one or more shifting bits in data blocks to be processed, data read/writes to the line buffer can be optimized for processing by the NN/DNN thereby enhancing the overall performance of a NN/DNN. Operatively, an operations controller and/or iterator can generate one or more instructions having a calculated shifting bit(s) for communication to the line buffer. Illustratively, the shifting bit(s) can be calculated using various characteristics of the input data as well as the NN/DNN inclusive of the data dimensions. The line buffer can read data for processing, insert the shifting bits and write the data in the line buffer for subsequent processing by cooperating processing unit(s).
-
-
-
-
-
-
-
-
-