-
公开(公告)号:US11983566B2
公开(公告)日:2024-05-14
申请号:US17374361
申请日:2021-07-13
Applicant: NVIDIA Corporation
Inventor: Yilin Zhang , Geng Chen , Yan Zhou , Qifei Fan , Prashant Gaikwad
CPC classification number: G06F9/5027 , G06F9/4881 , G06N3/063
Abstract: Apparatuses, systems, and techniques for scheduling deep learning tasks in hardware are described. One accelerator circuit includes multiple fixed-function circuits that each processes a different layer type of a neural network. A scheduler circuit receives state information associated with a respective layer being processed by a respective fixed-function circuit and dependency information that indicates a layer dependency condition for the respective layer. The scheduler circuit determines that the layer dependency condition is satisfied using the state information and the dependency information and enables the fixed-function circuit to process the current layer at the respective fixed-function circuit.
-
公开(公告)号:US10997492B2
公开(公告)日:2021-05-04
申请号:US15838273
申请日:2017-12-11
Applicant: NVIDIA Corporation
Inventor: Szymon Migacz , Hao Wu , Dilip Sequeira , Ujval Kapasi , Maxim Milakov , Slawomir Kierat , Zacky Zhou , Yilin Zhang , Alex Fit-Florea
Abstract: Aspects of the present invention are directed to computer-implemented techniques for performing data compression and conversion between data formats of varying degrees of precision, and more particularly for improving the inferencing (application) of artificial neural networks using a reduced precision (e.g., INT8) data format. Embodiments of the present invention generate candidate conversions of data output, then employ a relative measure of quality to identify the candidate conversion with the greatest accuracy (i.e., least divergence from the original higher precision values). The representation can be then be used during inference to perform computations on the resulting output data.
-
公开(公告)号:US11726857B2
公开(公告)日:2023-08-15
申请号:US17374592
申请日:2021-07-13
Applicant: NVIDIA Corporation
Inventor: Yilin Zhang , Shangang Zhang , Yan Zhou , Qifei Fan
CPC classification number: G06F11/079 , G06F7/5443 , G06F11/0724 , G06F11/0751 , G06N3/065
Abstract: Apparatuses, systems, and techniques to detect faults in processing pipelines are described. One accelerator circuit includes a fixed-function circuit that performs an operation corresponding to a layer of a neural network. The fixed-function circuit includes a set of homogeneous processing units and a fault scanner circuit. The fault scanner circuit includes an additional homogeneous processing unit to scan each processing unit of the set for functional faults in a sequence.
-
公开(公告)号:US20220374298A1
公开(公告)日:2022-11-24
申请号:US17374592
申请日:2021-07-13
Applicant: NVIDIA Corporation
Inventor: Yilin Zhang , Shangang Zhang , Yan Zhou , Qifei Fan
Abstract: Apparatuses, systems, and techniques to detect faults in processing pipelines are described. One accelerator circuit includes a fixed-function circuit that performs an operation corresponding to a layer of a neural network. The fixed-function circuit includes a set of homogeneous processing units and a fault scanner circuit. The fault scanner circuit includes an additional homogeneous processing unit to scan each processing unit of the set for functional faults in a sequence.
-
公开(公告)号:US20180211152A1
公开(公告)日:2018-07-26
申请号:US15838273
申请日:2017-12-11
Applicant: NVIDIA Corporation
Inventor: Szymon Migacz , Hao Wu , Dilip Sequeira , Ujval Kapasi , Maxim Milakov , Slawomir Kierat , Zacky Zhou , Yilin Zhang , Alex Fit-Florea
CPC classification number: G06N3/04 , G06N3/0454 , G06N3/08 , G06N7/00
Abstract: Aspects of the present invention are directed to computer-implemented techniques for performing data compression and conversion between data formats of varying degrees of precision, and more particularly for improving the inferencing (application) of artificial neural networks using a reduced precision (e.g., INT8) data format. Embodiments of the present invention generate candidate conversions of data output, then employ a relative measure of quality to identify the candidate conversion with the greatest accuracy (i.e., least divergence from the original higher precision values). The representation can be then be used during inference to perform computations on the resulting output data.
-
公开(公告)号:US20220382592A1
公开(公告)日:2022-12-01
申请号:US17374361
申请日:2021-07-13
Applicant: NVIDIA Corporation
Inventor: Yilin Zhang , Geng Chen , Yan Zhou , Qifei Fan , Prashant Gaikwad
Abstract: Apparatuses, systems, and techniques for scheduling deep learning tasks in hardware are described. One accelerator circuit includes multiple fixed-function circuits that each processes a different layer type of a neural network. A scheduler circuit receives state information associated with a respective layer being processed by a respective fixed-function circuit and dependency information that indicates a layer dependency condition for the respective layer. The scheduler circuit determines that the layer dependency condition is satisfied using the state information and the dependency information and enables the fixed-function circuit to process the current layer at the respective fixed-function circuit.
-
公开(公告)号:US20210256348A1
公开(公告)日:2021-08-19
申请号:US17306171
申请日:2021-05-03
Applicant: NVIDIA Corporation
Inventor: Szymon Migacz , Hao Wu , Dilip Sequeira , Ujval Kapasi , Maxim Milakov , Slawomir Kierat , Zacky Zhou , Yilin Zhang , Alex Fit-Florea
Abstract: Aspects of the present invention are directed to computer-implemented techniques for performing data compression and conversion between data formats of varying degrees of precision, and more particularly for improving the inferencing (application) of artificial neural networks using a reduced precision (e.g., INT8) data format. Embodiments of the present invention generate candidate conversions of data output, then employ a relative measure of quality to identify the candidate conversion with the greatest accuracy (i.e., least divergence from the original higher precision values). The representation can be then be used during inference to perform computations on the resulting output data.
-
公开(公告)号:US20220413752A1
公开(公告)日:2022-12-29
申请号:US17446257
申请日:2021-08-27
Applicant: NVIDIA Corporation
Inventor: Yilin Zhang , Yan Zhou , Qifei Fan
Abstract: Techniques for providing an overlap data buffer to store portions of tiles between passes of chained layers of a neural network are described. One accelerator circuit includes one or more processing units to execute instructions corresponding to the chained layers in multiple passes. In a first pass, the processing unit(s) receives a first input tile of an input feature map from a primary buffer and performs a first operation on the first input tile to obtain a first output tile. The processing unit stores the first output tile in the primary buffer and identifies a portion of the first output tile as corresponding to overlap data between tiles of the input feature map. The processing unit stores the portion in a secondary buffer. In a second pass, the processing unit retrieves the portion to avoid fetching the portion that overlaps and computing the overlap data again.
-
-
-
-
-
-
-