-
公开(公告)号:US11557064B2
公开(公告)日:2023-01-17
申请号:US16750819
申请日:2020-01-23
Applicant: Intel Corporation
Inventor: Joydeep Ray , Ben Ashbaugh , Prasoonkumar Surti , Pradeep Ramani , Rama Harihara , Jerin C. Justin , Jing Huang , Xiaoming Cui , Timothy B. Costa , Ting Gong , Elmoustapha Ould-ahmed-vall , Kumar Balasubramanian , Anil Thomas , Oguz H. Elibol , Jayaram Bobba , Guozhong Zhuang , Bhavani Subramanian , Gokce Keskin , Chandrasekaran Sakthivel , Rajesh Poornachandran
Abstract: Embodiments are generally directed to compression in machine learning and deep learning processing. An embodiment of an apparatus for compression of untyped data includes a graphical processing unit (GPU) including a data compression pipeline, the data compression pipeline including a data port coupled with one or more shader cores, wherein the data port is to allow transfer of untyped data without format conversion, and a 3D compression/decompression unit to provide for compression of untyped data to be stored to a memory subsystem and decompression of untyped data from the memory subsystem.
-
公开(公告)号:US11003619B2
公开(公告)日:2021-05-11
申请号:US16283795
申请日:2019-02-24
Applicant: INTEL CORPORATION
Inventor: Srinivasan Narayanamoorthy , Jayaram Bobba , Ankit More
Abstract: The present disclosure is directed to systems and methods for decomposing systolic array circuitry to provide a plurality of N×N systolic sub-array circuits, apportioning a first tensor or array into a plurality of N×M first input arrays, and apportioning a second tensor or array into a plurality of M×N second input arrays. Systolic array control circuitry transfers corresponding ones of the first input arrays and second input arrays to a respective one of the plurality of N×N systolic sub-array circuits. As the elements included in the first input array and the elements included in the second input array are transferred to the systolic sub-array, the systolic sub-array performs one or more mathematical operations using the first and the second input arrays. The systems and methods beneficially improve the usage of the systolic array circuitry thereby advantageously reducing the number of clock cycles needed to perform a given number of calculations.
-
公开(公告)号:US20200258263A1
公开(公告)日:2020-08-13
申请号:US16750819
申请日:2020-01-23
Applicant: Intel Corporation
Inventor: Joydeep Ray , Ben Ashbaugh , Prasoonkumar Surti , Pradeep Ramani , Rama Harihara , Jerin C. Justin , Jing Huang , Xiaoming Cui , Timothy B. Costa , Ting Gong , Elmoustapha Ould-ahmed-vall , Kumar Balasubramanian , Anil Thomas , Oguz H. Elibol , Jayaram Bobba , Guozhong Zhuang , Bhavani Subramanian , Gokce Keskin , Chandrasekaran Sakthivel , Rajesh Poornachandran
Abstract: Embodiments are generally directed to compression in machine learning and deep learning processing. An embodiment of an apparatus for compression of untyped data includes a graphical processing unit (GPU) including a data compression pipeline, the data compression pipeline including a data port coupled with one or more shader cores, wherein the data port is to allow transfer of untyped data without format conversion, and a 3D compression/decompression unit to provide for compression of untyped data to be stored to a memory subsystem and decompression of untyped data from the memory subsystem.
-
公开(公告)号:US10740152B2
公开(公告)日:2020-08-11
申请号:US15370634
申请日:2016-12-06
Applicant: Intel Corporation
Inventor: Jayaram Bobba , Niranjan K. Soundararajan
IPC: G06F9/50
Abstract: Technologies for dynamic acceleration of general-purpose code include a computing device having a general-purpose processor core and one or more hardware accelerators. The computing device identifies an acceleration candidate in an application that is targeted to the processor core. The acceleration candidate may be a long-running computation of the application. The computing device translates the acceleration candidate into a translated executable targeted to the hardware accelerator. The computing device determines whether to offload execution of the acceleration candidate and, if so, executes the translated executable with the hardware accelerator. The computing device may translate the acceleration candidate into multiple translated executables, each targeted to a different hardware accelerator. The computing device may select among the translated executables in response to determining to offload execution. The hardware accelerators may include, for example, a processor graphics, an image signal processor, or a field-programmable gate array. Other embodiments are described and claimed.
-
公开(公告)号:US20190206090A1
公开(公告)日:2019-07-04
申请号:US15859408
申请日:2017-12-30
Applicant: Intel Corporation
Inventor: Joydeep Ray , Ben Ashbaugh , Prasoonkumar Surti , Pradeep Ramani , Rama Harihara , Jerin C. Justin , Jing Huang , Xiaoming Cui , Timothy B. Costa , Ting Gong , Elmoustapha Ould-Ahmed-Vall , Kumar Balasubramanian , Anil Thomas , Oguz H. Elibol , Jayaram Bobba , Guozhong Zhuang , Bhavani Subramanian , Gokce Keskin , Chandrasekaran Sakthivel , Rajesh Poornachandran
CPC classification number: G06T9/002 , G06F12/023 , G06F2212/302 , G06F2212/401 , G06T15/005
Abstract: Embodiments are generally directed to compression in machine learning and deep learning processing. An embodiment of an apparatus for compression of untyped data includes a graphical processing unit (GPU) including a data compression pipeline, the data compression pipeline including a data port coupled with one or more shader cores, wherein the data port is to allow transfer of untyped data without format conversion, and a 3D compression/decompression unit to provide for compression of untyped data to be stored to a memory subsystem and decompression of untyped data from the memory subsystem.
-
公开(公告)号:US09880842B2
公开(公告)日:2018-01-30
申请号:US13834049
申请日:2013-03-15
Applicant: Intel Corporation
Inventor: Jayaram Bobba , Ruchira Sasanka , Jeffrey J. Cook , Abhinav Das , Arvind Krishnaswamy , David J. Sager , Jason M. Agron
CPC classification number: G06F9/3005 , G06F8/433 , G06F11/0715 , G06F11/0721 , G06F11/076 , G06F11/3466
Abstract: A mechanism for tracking the control flow of instructions in an application and performing one or more optimizations of a processing device, based on the control flow of the instructions in the application, is disclosed. Control flow data is generated to indicate the control flow of blocks of instructions in the application. The control flow data may include annotations that indicate whether optimizations may be performed for different blocks of instructions. The control flow data may also be used to track the execution of the instructions to determine whether an instruction in a block of instructions is assigned to a thread, a process, and/or an execution core of a processor, and to determine whether errors have occurred during the execution of the instructions.
-
公开(公告)号:US09170789B2
公开(公告)日:2015-10-27
申请号:US13997140
申请日:2013-03-05
Applicant: INTEL CORPORATION
Inventor: Ruchira Sasanka , Jeffrey J. Cook , Abhinav Das , Jayaram Bobba , Michael R. Greenfield , Suresh Srinivas
Abstract: Embodiments of computer-implemented methods, systems, computing devices, and computer-readable media (transitory and non-transitory) are described herein for analyzing execution of a plurality of executable instructions and, based on the analysis, providing an indication of a benefit to be obtained by vectorization of at least a subset of the plurality of executable instructions. In various embodiments, the analysis may include identification of the subset of the plurality of executable instructions suitable for conversion to one or more single-instruction multiple-data (“SIMD”) instructions.
Abstract translation: 本文描述了计算机实现的方法,系统,计算设备和计算机可读介质(暂时性和非暂时性)的实施例,用于分析多个可执行指令的执行,并且基于该分析,提供对 可以通过对多个可执行指令的至少一个子集进行向量化来获得。 在各种实施例中,分析可以包括识别适合于转换成一个或多个单指令多数据(“SIMD”)指令的多个可执行指令的子集。
-
-
-
-
-
-