-
公开(公告)号:US10761757B2
公开(公告)日:2020-09-01
申请号:US16024812
申请日:2018-06-30
Applicant: INTEL CORPORATION
Inventor: Krishnakumar Nair , Andrew Yang , Michael Rotzin , Nitin Garegrat , Tom Schebye , Tony Werner
Abstract: An apparatus and method for a converting tensor data. For example, one embodiment of a method comprises: fetching source tensor blocks of a source tensor data structure, each source tensor block comprising a plurality of source tensor data elements having a first numeric representation, wherein the source tensor data structure comprises a predefined structural arrangement of source tensor blocks; converting the one or more source tensor blocks into one or more destination tensor blocks comprising a plurality of destination tensor data elements having a second numeric representation different from the first numeric representation, wherein the sets of one or more source tensor blocks are converted to one or more corresponding destination tensor blocks in a specified order based on the first and second numeric representations; and storing each individual destination tensor block in a designated memory region to maintain coherency with the predefined structural arrangement of the source tensor blocks.
-
公开(公告)号:US11567555B2
公开(公告)日:2023-01-31
申请号:US16557657
申请日:2019-08-30
Applicant: Intel Corporation
Inventor: Jason Seung-Min Kim , Sundar Ramani , Yogesh Bansal , Nitin N. Garegrat , Olivia K. Wu , Mayank Kaushik , Mrinal Iyer , Tom Schebye , Andrew Yang
Abstract: Embodiments include an apparatus comprising an execution unit coupled to a memory, a microcode controller, and a hardware controller. The microcode controller is to identify a global power and performance hint in an instruction stream that includes first and second instruction phases to be executed in parallel, identify a local hint based on synchronization dependence in the first instruction phase, and use the first local hint to balance power consumption between the execution unit and the memory during parallel executions of the first and second instruction phases. The hardware controller is to use the global hint to determine an appropriate voltage level of a compute voltage and a frequency of a compute clock signal for the execution unit during the parallel executions of the first and second instruction phases. The first local hint includes a processing rate for the first instruction phase or an indication of the processing rate.
-
3.
公开(公告)号:US20190042094A1
公开(公告)日:2019-02-07
申请号:US16024812
申请日:2018-06-30
Applicant: INTEL CORPORATION
Inventor: Krishnakumar Nair , Andrew Yang , Michael Rotzn , Nitin Garegrat , Tom Schebye , Tony Werner
Abstract: An apparatus and method for a converting tensor data. For example, one embodiment of a method comprises: fetching source tensor blocks of a source tensor data structure, each source tensor block comprising a plurality of source tensor data elements having a first numeric representation, wherein the source tensor data structure comprises a predefined structural arrangement of source tensor blocks; converting the one or more source tensor blocks into one or more destination tensor blocks comprising a plurality of destination tensor data elements having a second numeric representation different from the first numeric representation, wherein the sets of one or more source tensor blocks are converted to one or more corresponding destination tensor blocks in a specified order based on the first and second numeric representations; and storing each individual destination tensor block in a designated memory region to maintain coherency with the predefined structural arrangement of the source tensor blocks.
-
-