-
公开(公告)号:US11256976B2
公开(公告)日:2022-02-22
申请号:US15719351
申请日:2017-09-28
发明人: Kent D. Cedola , Larry Marvin Wall , Boris Bobrov , George Petre , Chad Balling McBride , Amol Ashok Ambardekar
IPC分类号: G06N3/063 , G06N3/04 , G06F12/0862 , G06F9/46 , G06F1/324 , G06F3/06 , G06F9/38 , G06F12/08 , G06F12/10 , G06F15/80 , G06F17/15 , G06N3/06 , G06N3/08 , G06N3/10 , H03M7/30 , H04L45/00 , H04L67/02 , H04L67/1001 , G06F9/30 , G06F13/16 , G06F1/3234 , G06F12/02 , G06F13/28 , H03M7/46 , H04L45/50
摘要: Optimized memory usage and management is crucial to the overall performance of a neural network (NN) or deep neural network (DNN) computing environment. Using various characteristics of the input data dimension, an apportionment sequence is calculated for the input data to be processed by the NN or DNN that optimizes the efficient use of the local and external memory components. The apportionment sequence can describe how to parcel the input data (and its associated processing parameters—e.g., processing weights) into one or more portions as well as how such portions of input data (and its associated processing parameters) are passed between the local memory, external memory, and processing unit components of the NN or DNN. Additionally, the apportionment sequence can include instructions to store generated output data in the local and/or external memory components so as to optimize the efficient use of the local and/or external memory components.
-
2.
公开(公告)号:US11176448B2
公开(公告)日:2021-11-16
申请号:US16843800
申请日:2020-04-08
发明人: Chad Balling McBride , Timothy Hume Heil , Amol Ashok Ambardekar , George Petre , Kent D. Cedola , Larry Marvin Wall , Boris Bobrov
IPC分类号: G06F1/32 , G06F9/46 , G06F17/15 , G06N3/04 , G06N3/06 , G06N3/08 , G06N3/063 , G06F12/0862 , G06F1/324 , G06F3/06 , G06F9/38 , G06F12/08 , G06F12/10 , G06F15/80 , G06N3/10 , H03M7/30 , H04L12/715 , H04L29/08 , G06F9/30 , G06F13/16 , G06F1/3234 , G06F12/02 , G06F13/28 , H03M7/46 , H04L12/723
摘要: An exemplary computing environment having a DNN module can maintain one or more bandwidth throttling mechanisms. Illustratively, a first throttling mechanism can specify the number of cycles to wait between transactions on a cooperating fabric component (e.g., data bus). Illustratively, a second throttling mechanism can be a transaction count limiter that operatively sets a threshold of a number of transactions to be processed during a given transaction sequence and limits the number of transactions such as multiple transactions in flight to not exceed the set threshold. In an illustrative operation, in executing these two exemplary calculated throttling parameters, the average bandwidth usage and the peak bandwidth usage can be limited. Operatively, with this fabric bandwidth control, the processing units of the DNN are optimized to process data across each transaction cycle resulting in enhanced processing and lower power consumption.
-
3.
公开(公告)号:US20180300603A1
公开(公告)日:2018-10-18
申请号:US15881519
申请日:2018-01-26
发明人: Amol Ashok AMBARDEKAR , Aleksandar TOMIC , Chad Balling McBRIDE , George PETRE , Kent D. CEDOLA , Larry Marvin Wall , Boris BOBROV
IPC分类号: G06N3/04
摘要: The performance of a neural network (NN) and/or deep neural network (DNN) can limited by the number of operations being performed as well as memory data management of a NN/DNN. Using vector quantization of neuron weight values, the processing of data by neurons can be optimize the number of operations as well as memory utilization to enhance the overall performance of a NN/DNN. Operatively, one or more contiguous segments of weight values can be converted into one or more vectors of arbitrary length and each of the one or more vectors can be assigned an index. The generated indexes can be stored in an exemplary vector quantization lookup table and retrieved by exemplary fast weight lookup hardware at run time on the flyas part of an exemplary data processing function of the NN as part of an inline de-quantization operation to obtain needed one or more neuron weight values.
-
公开(公告)号:US11604972B2
公开(公告)日:2023-03-14
申请号:US16457828
申请日:2019-06-28
发明人: Amol A Ambardekar , Boris Bobrov , Kent D. Cedola , Chad Balling McBride , George Petre , Larry Marvin Wall
摘要: Neural processing elements are configured with a hardware AND gate configured to perform a logical AND operation between a sign extend signal and a most significant bit (“MSB”) of an operand. The state of the sign extend signal can be based upon a type of a layer of a deep neural network (“DNN”) that generate the operand. If the sign extend signal is logical FALSE, no sign extension is performed. If the sign extend signal is logical TRUE, a concatenator concatenates the output of the hardware AND gate and the operand, thereby extending the operand from an N-bit unsigned binary value to an N+1 bit signed binary value. The neural processing element can also include another hardware AND gate and another concatenator for processing another operand similarly. The outputs of the concatenators for both operands are provided to a hardware binary multiplier.
-
公开(公告)号:US11494237B2
公开(公告)日:2022-11-08
申请号:US16454026
申请日:2019-06-26
发明人: Chad Balling McBride , Amol A. Ambardekar , Boris Bobrov , Kent D. Cedola , George Petre , Larry Marvin Wall
摘要: A computing system includes processor cores for executing applications that utilize functionality provided by a deep neural network (“DNN”) processor. One of the cores operates as a resource and power management (“RPM”) processor core. When the RPM processor receives a request to execute a DNN workload, it divides the DNN workload into workload fragments. The RPM processor then determines whether a workload fragment is to be statically allocated or dynamically allocated to a DNN processor. Once the RPM processor has selected a DNN processor, the RPM enqueues the workload fragment on a queue maintained by the selected DNN processor. The DNN processor dequeues workload fragments from its queue for execution. Once execution of a workload fragment has completed, the DNN processor generates an interrupt indicating that execution of the workload fragment has completed. The RPM processor can then notify the processor core that originally requested execution of the workload fragment.
-
公开(公告)号:US11100390B2
公开(公告)日:2021-08-24
申请号:US15950550
申请日:2018-04-11
发明人: Chad Balling McBride , Amol Ashok Ambardekar , Kent D. Cedola , George Petre , Larry Marvin Wall , Boris Bobrov
IPC分类号: G06N3/063 , G06N3/04 , G06N3/06 , G06F9/30 , G06F9/38 , G06F12/0862 , G06F9/46 , G06F1/324 , G06F3/06 , G06F12/08 , G06F12/10 , G06F15/80 , G06F17/15 , G06N3/08 , G06N3/10 , H03M7/30 , H04L12/715 , H04L29/08 , G06F13/16 , G06F1/3234 , G06F12/02 , G06F13/28 , H03M7/46 , H04L12/723
摘要: A deep neural network (DNN) processor is configured to execute layer descriptors in layer descriptor lists. The descriptors define instructions for performing a forward pass of a DNN by the DNN processor. The layer descriptors can also be utilized to manage the flow of descriptors through the DNN module. For example, layer descriptors can define dependencies upon other descriptors. Descriptors defining a dependency will not execute until the descriptors upon which they are dependent have completed. Layer descriptors can also define a “fence,” or barrier, function that can be used to prevent the processing of upstream layer descriptors until the processing of all downstream layer descriptors is complete. The fence bit guarantees that there are no other layer descriptors in the DNN processing pipeline before the layer descriptor that has the fence to be asserted is processed.
-
公开(公告)号:US10748346B2
公开(公告)日:2020-08-18
申请号:US16003985
申请日:2018-06-08
IPC分类号: G06F17/27 , G06T19/20 , G06F16/583 , G06F40/205 , G06T19/00
摘要: Systems and methods are disclosed for permitting the use of a natural language expression to specify object (or asset) locations in a virtual three-dimensional (3D) environment. By rapidly identifying and solving constraints for 3D object placement and orientation, consumers of synthetics services may more efficiently generate experiments for use in development of artificial intelligence (AI) algorithms and sensor platforms. Parsing descriptive location specifications, sampling the volumetric space, and solving pose constraints for location and orientation, can produce large numbers of designated coordinates for object locations in virtual environments with reduced demands on user involvement. Converting from location designations that are natural to humans, such as “standing on the floor one meter from a wall, facing the center of the room” to a six-dimensional (6D) pose specification (including 3-D location and orientation) can alleviate the need for a manual drag/drop/reorient procedure for placement of objects in a synthetic environment.
-
公开(公告)号:US11182667B2
公开(公告)日:2021-11-23
申请号:US15813952
申请日:2017-11-15
发明人: George Petre , Chad Balling McBride , Amol Ashok Ambardekar , Kent D. Cedola , Larry Marvin Wall , Boris Bobrov
IPC分类号: G06N3/06 , G06N3/10 , G06N3/04 , G06F9/38 , G06N3/063 , G06F12/0862 , G06F9/46 , G06F1/324 , G06F3/06 , G06F12/08 , G06F12/10 , G06F15/80 , G06F17/15 , G06N3/08 , H03M7/30 , H04L12/715 , H04L29/08 , G06F9/30 , G06F13/16 , G06F1/3234 , G06F12/02 , G06F13/28 , H03M7/46 , H04L12/723
摘要: The performance of a neural network (NN) and/or deep neural network (DNN) can be limited by the number of operations being performed as well as management of data among the various memory components of the NN/DNN. By inserting a selected padding in the input data to align the input data in memory, data read/writes can be optimized for processing by the NN/DNN thereby enhancing the overall performance of a NN/DNN. Operatively, an operations controller/iterator can generate one or more instructions that inserts the selected padding into the data. The data padding can be calculated using various characteristics of the input data as well as the NN/DNN as well as characteristics of the cooperating memory components. Padding on the output data can be utilized to support the data alignment at the memory components and the cooperating processing units of the NN/DNN.
-
公开(公告)号:US20210232904A1
公开(公告)日:2021-07-29
申请号:US17232074
申请日:2021-04-15
发明人: Amol Ashok AMBARDEKAR , Aleksandar TOMIC , Chad Balling McBRIDE , George PETRE , Kent D. CEDOLA , Larry Marvin Wall , Boris BOBROV
IPC分类号: G06N3/063 , G06N3/04 , G06F12/0862 , G06F9/46 , G06F1/324 , G06F3/06 , G06F9/38 , G06F12/08 , G06F12/10 , G06F15/80 , G06F17/15 , G06N3/06 , G06N3/08 , G06N3/10 , H03M7/30 , H04L12/715 , H04L29/08 , G06F9/30 , G06F13/16 , G06F1/3234 , G06F12/02 , G06F13/28
摘要: The performance of a neural network (NN) and/or deep neural network (DNN) can limited by the number of operations being performed as well as memory data management of a NN/DNN. Using vector quantization of neuron weight values, the processing of data by neurons can be optimize the number of operations as well as memory utilization to enhance the overall performance of a NN/DNN. Operatively, one or more contiguous segments of weight values can be converted into one or more vectors of arbitrary length and each of the one or more vectors can be assigned an index. The generated indexes can be stored in an exemplary vector quantization lookup table and retrieved by exemplary fast weight lookup hardware at run time on the fly as part of an exemplary data processing function of the NN as part of an inline de-quantization operation to obtain needed one or more neuron weight values.
-
公开(公告)号:US10996739B2
公开(公告)日:2021-05-04
申请号:US15847785
申请日:2017-12-19
发明人: Amol Ashok Ambardekar , Chad Balling McBride , George Petre , Kent D. Cedola , Larry Marvin Wall
摘要: Techniques to provide for improved (i.e., reduced) power consumption in an exemplary neural network (NN) and/or Deep Neural Network (DNN) environment using data management. Improved power consumption in the NN/DNN may be achieved by reducing a number of bit flips needed to process operands associated with one or more storages. Reducing the number bit flips associated with the NN/DNN may be achieved by multiplying an operand associated with a first storage with a plurality of individual operands associated with a plurality of kernels of the NN/DNN. The operand associated with the first storage may be neuron input data and the plurality of individual operands associated with the second storage may be weight values for multiplication with the neuron input data. The plurality of kernels may be arranged or sorted and subsequently processed in a manner that improves power consumption in the NN/DNN.
-
-
-
-
-
-
-
-
-