-
公开(公告)号:US20240152756A1
公开(公告)日:2024-05-09
申请号:US18548805
申请日:2022-03-25
Applicant: Intel Corporation
Inventor: Barath Lakshmanan , Ashish B. Datta , Craig D. Sperry , David J. Austin , Caleb Mark McMillan , Neha Purushothaman , Rita H. Wouhaybi
IPC: G06N3/08
CPC classification number: G06N3/08
Abstract: In one embodiment, a method of training an autoencoder neural network includes determining autoencoder design parameters for the autoencoder neural network, including an input image size for an input image, a compression ratio for compression of the input image into a latent vector, and a latent vector size for the latent vector. The input image size is determined based on a resolution of training images and a size of target features to be detected. The compression ratio is determined based on entropy of the training images. The latent vector size is determined based on the compression ratio. The method further includes training the autoencoder neural network based on the autoencoder design parameters and the training dataset, and then saving the trained autoencoder neural network on a storage device.
-
公开(公告)号:US20210279571A1
公开(公告)日:2021-09-09
申请号:US17177632
申请日:2021-02-17
Applicant: Intel Corporation
Inventor: Narayan Srinivasa , Joydeep Ray , Nicolas C. Galoppo Von Borries , Ben J. Ashbaugh , Prasoonkumar Surti , Feng Chen , Barath Lakshmanan , Elmoustapha Ould-Ahmed-Vall , Liwei Ma , Linda L. Hurd , Abhishek R. Appu , John C. Weast , Sara S. Baghsorkhi , Justin E. Gottschlich , Chandrasekaran Sakthivel , Farshad Akhbari , Dukhwan Kim , Altug Koker , Nadathur Rajagopalan Satish
Abstract: An apparatus to facilitate optimization of a neural network (NN) is disclosed. The apparatus includes optimization logic to define a NN topology having one or more macro layers, adjust the one or more macro layers to adapt to input and output components of the NN and train the NN based on the one or more macro layers.
-
3.
公开(公告)号:US10409614B2
公开(公告)日:2019-09-10
申请号:US15494773
申请日:2017-04-24
Applicant: Intel Corporation
Inventor: Elmoustapha Ould-Ahmed-Vall , Barath Lakshmanan , Tatiana Shpeisman , Joydeep Ray , Ping T. Tang , Michael Strickland , Xiaoming Chen , Anbang Yao , Ben J. Ashbaugh , Linda L. Hurd , Liwei Ma
IPC: G06F9/38 , G06N3/00 , G06F15/80 , G06F9/50 , G06N3/08 , G06N3/063 , G06N3/04 , G06N20/00 , G06F9/30 , G06T1/20 , G06F13/42 , G06F13/40
Abstract: One embodiment provides for a compute apparatus to perform machine learning operations, the compute apparatus comprising instruction decode logic to decode a single instruction including multiple operands into a single decoded instruction, the multiple operands having differing precisions and a general-purpose graphics compute unit including a first logic unit and a second logic unit, the general-purpose graphics compute unit to execute the single decoded instruction, wherein to execute the single decoded instruction includes to perform a first instruction operation on a first set of operands of the multiple operands at a first precision and a simultaneously perform second instruction operation on a second set of operands of the multiple operands at a second precision.
-
公开(公告)号:US10242423B2
公开(公告)日:2019-03-26
申请号:US15789565
申请日:2017-10-20
Applicant: Intel Corporation
Inventor: Elmoustapha Ould-Ahmed-Vall , Sara S. Baghsorkhi , Anbang Yao , Kevin Nealis , Xiaoming Chen , Altug Koker , Abhishek R. Appu , John C. Weast , Mike B. Macpherson , Dukhwan Kim , Linda L. Hurd , Ben J. Ashbaugh , Barath Lakshmanan , Liwei Ma , Joydeep Ray , Ping T. Tang , Michael S. Strickland
Abstract: One embodiment provides an accelerator module comprising a memory stack including multiple memory dies; a graphics processing unit (GPU) coupled with the memory stack via one or more memory controllers, the GPU including a plurality of multiprocessors having a single instruction, multiple thread (SIMT) architecture, the multiprocessors to execute at least one single instruction; the at least one single instruction to cause at least a portion of the GPU to perform a floating-point operation on input having differing precisions; and the floating-point operation is a two-dimensional matrix multiply and accumulate operation.
-
公开(公告)号:US20180299841A1
公开(公告)日:2018-10-18
申请号:US15489142
申请日:2017-04-17
Applicant: Intel Corporation
Inventor: Abhishek R. Appu , Altug Koker , Linda L. Hurd , Dukhwan Kim , Mike B. MacPherson , John C. Weast , Justin E. Gottschlich , Jingyi Jin , Barath Lakshmanan , Chandrasekaran Sakthivel , Michael S. Strickland , Joydeep Ray , Kamal Sinha , Prasoonkumar Surti , Balaji Vembu , Ping T. Tang , Anbang Yao , Tatiana Shpeisman , Xiaoming Chen , Vasanth Ranganathan , Sanjeev S. Jahagirdar
Abstract: Methods and apparatus relating to autonomous vehicle neural network optimization techniques are described. In an embodiment, the difference between a first training dataset to be used for a neural network and a second training dataset to be used for the neural network is detected. The second training dataset is authenticated in response to the detection of the difference. The neural network is used to assist in an autonomous vehicle/driving. Other embodiments are also disclosed and claimed.
-
公开(公告)号:US20180293490A1
公开(公告)日:2018-10-11
申请号:US15482793
申请日:2017-04-09
Applicant: Intel Corporation
Inventor: Liwei Ma , Nadathur Rajagopalan Satish , Jeremy Bottleson , Farshad Akhbari , Eriko Nurvitadhi , Chandrasekaran Sakthivel , Barath Lakshmanan , Jingyi Jin , Justin E. Gottschlich , Michael Strickland
CPC classification number: G06N3/0445 , G06F9/5038 , G06F2209/5021 , G06N3/0454 , G06N3/063 , G06N3/084
Abstract: An apparatus to facilitate workload scheduling is disclosed. The apparatus includes one or more clients, one or more processing units to processes workloads received from the one or more clients, including hardware resources and scheduling logic to schedule direct access of the hardware resources to the one or more clients to process the workloads.
-
公开(公告)号:US12148063B2
公开(公告)日:2024-11-19
申请号:US17960611
申请日:2022-10-05
Applicant: Intel Corporation
Inventor: Elmoustapha Ould-Ahmed-Vall , Sara S. Baghsorkhi , Anbang Yao , Kevin Nealis , Xiaoming Chen , Altug Koker , Abhishek R. Appu , John C. Weast , Mike B. Macpherson , Dukhwan Kim , Linda L. Hurd , Ben J. Ashbaugh , Barath Lakshmanan , Liwei Ma , Joydeep Ray , Ping T. Tang , Michael S. Strickland
IPC: G06T1/20 , G06F7/483 , G06F9/30 , G06F9/38 , G06F9/50 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/084 , G06N20/00 , G06T1/60 , G06F3/14 , G06T15/00
Abstract: One embodiment provides a multi-chip module accelerator usable to execute tensor data processing operations a multi-chip module. The multi-chip module may include a memory stack including multiple memory dies and parallel processor circuitry communicatively coupled to the memory stack. The parallel processor circuitry may include multiprocessor cores to execute matrix multiplication and accumulate operations. The matrix multiplication and accumulate operations may include floating-point operations that are configurable to include two-dimensional matrix multiply and accumulate operations involving inputs that have differing floating-point precisions. The floating-point operations may include a first operation at a first precision and a second operation at a second precision. The first operation may include a multiply having at least one 16-bit floating-point input and the second operation may include an accumulate having a 32-bit floating-point input.
-
公开(公告)号:US20240256825A1
公开(公告)日:2024-08-01
申请号:US18435528
申请日:2024-02-07
Applicant: Intel Corporation
Inventor: Liwei Ma , Elmoustapha Ould-Ahmed-Vall , Barath Lakshmanan , Ben J. Ashbaugh , Jingyi Jin , Jeremy Bottleson , Mike B. Macpherson , Kevin Nealis , Dhawal Srivastava , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Anbang Yao , Tatiana Shpeisman , Altug Koker , Abhishek R. Appu
Abstract: A library of machine learning primitives is provided to optimize a machine learning model to improve the efficiency of inference operations. In one embodiment a trained convolutional neural network (CNN) model is processed into a trained CNN model via pruning, convolution window optimization, and quantization.
-
公开(公告)号:US20220245753A1
公开(公告)日:2022-08-04
申请号:US17720804
申请日:2022-04-14
Applicant: Intel Corporation
Inventor: Elmoustapha Ould-Ahmed-Vall , Sara S. Baghsorkhi , Anbang Yao , Kevin Nealis , Xiaoming Chen , Altug Koker , Abhishek R. Appu , John C. Weast , Mike B. Macpherson , Dukhwan Kim , Linda L. Hurd , Ben J. Ashbaugh , Barath Lakshmanan , Liwei Ma , Joydeep Ray , Ping T. Tang , Michael S. Strickland
IPC: G06T1/20 , G06F7/483 , G06N3/08 , G06F9/30 , G06N3/04 , G06N3/063 , G06F9/50 , G06F9/38 , G06N20/00
Abstract: Embodiments described herein provide a graphics processor that can perform a variety of mixed and multiple precision instructions and operations. One embodiment provides a streaming multiprocessor that can concurrently execute multiple thread groups, wherein the streaming multiprocessor includes a single instruction, multiple thread (SIMT) architecture and the streaming multiprocessor is to execute multiple threads for each of multiple instructions. The streaming multiprocessor can perform concurrent integer and floating-point operations and includes a mixed precision core to perform operations at multiple or mixed precisions and dynamic ranges.
-
公开(公告)号:US11315007B2
公开(公告)日:2022-04-26
申请号:US16918220
申请日:2020-07-01
Applicant: Intel Corporation
Inventor: Liwei Ma , Nadathur Rajagopalan Satish , Jeremy Bottleson , Farshad Akhbari , Eriko Nurvitadhi , Chandrasekaran Sakthivel , Barath Lakshmanan , Jingyi Jin , Justin E. Gottschlich , Michael Strikland
Abstract: An apparatus to facilitate workload scheduling is disclosed. The apparatus includes one or more clients, one or more processing units to processes workloads received from the one or more clients, including hardware resources and scheduling logic to schedule direct access of the hardware resources to the one or more clients to process the workloads.
-
-
-
-
-
-
-
-
-