-
61.
公开(公告)号:US20220215117A1
公开(公告)日:2022-07-07
申请号:US17677958
申请日:2022-02-22
Applicant: Intel Corporation
Inventor: ELMOUSTAPHA OULD-AHMED-VALL , ROBERT VALENTINE , JESUS CORBAL , BRET L. TOLL , MARK J. CHARNEY
Abstract: An apparatus is described that includes an execution unit to execute a first instruction and a second instruction. The execution unit includes input register space to store a first data structure to be replicated when executing the first instruction and to store a second data structure to be replicated when executing the second instruction. The first and second data structures are both packed data structures. Data values of the first packed data structure are twice as large as data values of the second packed data structure. The execution unit also includes replication logic circuitry to replicate the first data structure when executing the first instruction to create a first replication data structure, and, to replicate the second data structure when executing the second data instruction to create a second replication data structure. The execution unit also includes masking logic circuitry to mask the first replication data structure at a first granularity and mask the second replication data structure at a second granularity. The second granularity is twice as fine as the first granularity.
-
公开(公告)号:US20220171624A1
公开(公告)日:2022-06-02
申请号:US17672504
申请日:2022-02-15
Applicant: Intel Corporation
Inventor: VENKATESWARA MADDURI , ELMOUSTAPHA OULD-AHMED-VALL , JESUS CORBAL , MARK CHARNEY , ROBERT VALENTINE , BINWEI YANG
IPC: G06F9/30
Abstract: An apparatus and method for multiplying packed real and imaginary components of complex numbers are described. A processor embodiment includes: a decoder to decode a first instruction to generate a decoded instruction; a first source register to store a first plurality of packed real and imaginary data elements; a second source register to store a second plurality of packed real and imaginary data elements; and execution circuitry to execute the decoded instruction. The execution circuitry includes: multiplier circuitry to select real and imaginary data elements in the first source register and second source, multiply each selected imaginary data element in the first source register with a selected real data element in the second source register, and multiply each selected real data element in the first source register with a selected imaginary data element in the second source register to generate a plurality of imaginary products; adder circuitry to add a first subset of the plurality of imaginary products and subtract a second subset of the plurality of imaginary products to generate a first temporary result, and to add a third subset of the plurality of imaginary products and subtract a fourth subset of the plurality of imaginary products to generate a second temporary result; and accumulation circuitry to combine the first temporary result with first data from a destination register to generate a first final result, combine the second temporary result with second data from the destination register to generate a second final result, and store the first final result and second final result back in the destination register.
-
公开(公告)号:US20220122215A1
公开(公告)日:2022-04-21
申请号:US17428216
申请日:2020-03-14
Applicant: Intel Corporation
Inventor: JOYDEEP RAY , SELVAKUMAR PANNEER , SAURABH TANGRI , BEN ASHBAUGH , SCOTT JANUS , ABHISHEK APPU , VARGHESE GEORGE , RAVISHANKAR IYER , NILESH JAIN , PATTABHIRAMAN K , ALTUG KOKER , MIKE MACPHERSON , JOSH MASTRONARDE , ELMOUSTAPHA OULD-AHMED-VALL , JAYAKRISHNA P. S , ERIC SAMSON
IPC: G06T1/60 , G06F12/06 , G06F12/1009 , G06T1/20 , G06F12/0875 , G06F9/38
Abstract: Embodiments described herein include software, firmware, and hardware that provides techniques to enable deterministic scheduling across multiple general-purpose graphics processing units. One embodiment provides a multi-GPU architecture with uniform latency. One embodiment provides techniques to distribute memory output based on memory chip thermals. One embodiment provides techniques to enable thermally aware workload scheduling. One embodiment provides techniques to enable end to end contracts for workload scheduling on multiple GPUs.
-
公开(公告)号:US20220066931A1
公开(公告)日:2022-03-03
申请号:US17310540
申请日:2020-03-14
Applicant: INTEL CORPORATION
Inventor: JOYDEEP RAY , NIRANJAN COORAY , SUBRAMANIAM MAIYURAN , ALTUG KOKER , PRASOONKUMAR SURTI , VARGHESE GEORGE , VALENTIN ANDREI , ABHISHEK APPU , GUADALUPE GARCIA , PATTABHIRAMAN K , SUNGYE KIM , SANJAY KUMAR , PRATIK MAROLIA , ELMOUSTAPHA OULD-AHMED-VALL , VASANTH RANGANATHAN , WILLIAM SADLER , LAKSHMINARAYANAN STRIRAMASSARMA
IPC: G06F12/0802
Abstract: Embodiments described herein provide techniques to enable the dynamic reconfiguration of memory on a general-purpose graphics processing unit. One embodiment described herein enables dynamic reconfiguration of cache memory bank assignments based on hardware statistics. One embodiment enables for virtual memory address translation using mixed four kilobyte and sixty-four kilobyte pages within the same page table hierarchy and under the same page directory. One embodiment provides for a graphics processor and associated heterogenous processing system having near and far regions of the same level of a cache hierarchy.
-
65.
公开(公告)号:US20210182067A1
公开(公告)日:2021-06-17
申请号:US16714656
申请日:2019-12-13
Applicant: Intel Corporation
Inventor: MOHAMED ELMALAKI , ELMOUSTAPHA OULD-AHMED-VALL
IPC: G06F9/30
Abstract: Systems, methods, and apparatuses relating to instructions to multiply floating-point values of about one are described. In one embodiment, a hardware processor includes a decoder to decode a single instruction into a decoded single instruction, the single instruction having a first field that identifies a first floating-point number, a second field that identifies a second floating-point number, and a third field that indicates an about one threshold; and an execution circuit to execute the decoded single instruction to: cause a first comparison of an exponent of the first floating-point number to the about one threshold, cause a second comparison of an exponent of the second floating-point number to the about one threshold, provide as a resultant of the single instruction a value of the first floating-point number one when both the first comparison indicates the exponent of the first floating-point number does not exceed the about one threshold and the second comparison indicates the exponent of the second floating-point number does not exceed the about one threshold, provide as the resultant of the single instruction the second floating-point number when the first comparison indicates the exponent of the first floating-point number does not exceed the about one threshold, and provide as the resultant of the single instruction a product of a multiplication of the first floating-point number and the second floating-point number when the first comparison indicates the exponent of the first floating-point number exceeds the about one threshold or and the second comparison indicates the exponent of the second floating-point number exceeds the about one threshold.
-
66.
公开(公告)号:US20210004227A1
公开(公告)日:2021-01-07
申请号:US17027230
申请日:2020-09-21
Applicant: Intel Corporation
Inventor: VENKATESWARA MADDURI , ELMOUSTAPHA OULD-AHMED-VALL , JESUS CORBAL , MARK CHARNEY , ROBERT VALENTINE , BINWEI YANG
Abstract: An apparatus and method for performing dual concurrent multiplications of packed data elements. For example one embodiment of a processor comprises: a decoder to decode a first instruction to generate a decoded instruction; a first source register to store a first plurality of packed byte data elements; a second source register to store a second plurality of packed byte data elements; execution circuitry to execute the decoded instruction, the execution circuitry comprising: multiplier circuitry to concurrently multiply each of the packed byte data elements of the first plurality with a corresponding packed byte data element of the second plurality to generate a plurality of products; adder circuitry to add specified sets of the products to generate temporary results for each set of products; zero-extension or sign-extension circuitry to zero-extend or sign-extend the temporary result for each set to generate an extended temporary result for each set; accumulation circuitry to combine each of the extended temporary results with a selected packed data value stored in a third source register to generate a plurality of final results; and a destination register to store the plurality of final results as a plurality of packed data elements in specified data element positions.
-
公开(公告)号:US20200293488A1
公开(公告)日:2020-09-17
申请号:US16354782
申请日:2019-03-15
Applicant: Intel Corporation
Inventor: JOYDEEP RAY , ARAVINDH ANANTARAMAN , ABHISHEK R. APPU , ALTUG KOKER , ELMOUSTAPHA OULD-AHMED-VALL , VALENTIN ANDREI , SUBRAMANIAM MAIYURAN , NICOLAS GALAPPO VON BORRIES , VARGHESE GEORGE , MIKE MACPHERSON , BEN ASHBAUGH , MURALI RAMADOSS , VIKRANTH VEMULAPALLI , WILLIAM SADLER , JONATHAN PEARCE , SUNGYE KIM
Abstract: Methods and apparatus relating to scalar core integration in a graphics processor. In an example, an apparatus comprises a processor to receive a set of workload instructions for a graphics workload from a host complex, determine a first subset of operations in the set of operations that is suitable for execution by a scalar processor complex of the graphics processing device and a second subset of operations in the set of operations that is suitable for execution by a vector processor complex of the graphics processing device, assign the first subset of operations to the scalar processor complex for execution to generate a first set of outputs, assign the second subset of operations to the vector processor complex for execution to generate a second set of outputs. Other embodiments are also disclosed and claimed.
-
公开(公告)号:US20190196822A1
公开(公告)日:2019-06-27
申请号:US15851145
申请日:2017-12-21
Applicant: Intel Corporation
Inventor: ELMOUSTAPHA OULD-AHMED-VALL , ROBERT VALENTINE , MARK CHARNEY , VENKATESWARA MADDURI
IPC: G06F9/30
CPC classification number: G06F9/30032 , G06F9/3001 , G06F9/30036 , G06F9/30098 , G06F9/30145
Abstract: An apparatus and method for performing left-shifting operations on packed quadword data. For example, one embodiment of a processor comprises: a decoder to decode a left-shift instruction to generate a decoded left-shift instruction; a first source register to store a plurality of packed quadword data elements, each of the packed quadword data elements including a sign bit; execution circuitry to execute the decoded left-shift instruction, the execution circuitry comprising shift circuitry with sign preservation logic to left-shift first and second packed quadword data elements from first and second packed quadword data element locations, respectively, in the first source register by an amount specified in an immediate value or in a control value in a second source register, the left-shifting to generate first and second left-shifted quadwords, the shift circuitry to write zeroes into bit positions exposed by the left-shifting of the packed quadword data elements; the sign preservation logic to maintain a copy of the sign bit while the shift circuitry performs the left-shift operations; the execution circuitry to cause selection of 16 most significant bits of the first and second left-shifted quadwords, including the sign bit, to be written to 16 least significant bit regions of first and second quadword data element locations, respectively, of a destination register, writing the sign bit to the most significant bit position of each 16 least significant bit region.
-
公开(公告)号:US20190196812A1
公开(公告)日:2019-06-27
申请号:US15850412
申请日:2017-12-21
Applicant: Intel Corporation
Inventor: ELMOUSTAPHA OULD-AHMED-VALL , ROBERT VALENTINE , MARK CHARNEY , JESUS CORBAL , VENKATESWARA MADDURI
IPC: G06F9/30
Abstract: An apparatus and method for performing signed multiplication of packed signed/unsigned doublewords and accumulation with a quadword. For example, one embodiment of a processor comprises: a first source register to store a first plurality of packed doubleword data elements; a second source register to store a second plurality of packed doubleword data elements; a third source register to store a plurality of packed quadword data elements; execution circuitry to execute the decoded instruction, the execution circuitry comprising: multiplier circuitry to multiply first and second packed doubleword data elements from the first source register with third and fourth packed doubleword data elements from the second source register, respectively, to generate first and second temporary quadword products, the multiplier circuitry to select the first, second, third, and fourth doubleword data elements based on the opcode of the instruction; accumulation circuitry to combine the first temporary quadword product with a first packed quadword value read from the third source register to generate a first accumulated quadword result and to combine the second temporary quadword product with a second packed quadword value read from the third source register to generate a second accumulated quadword result; a destination register or the third source register to store the first accumulated quadword result in a first quadword data element position and to store the second accumulated quadword result in a second quadword data element position.
-
公开(公告)号:US20190129721A1
公开(公告)日:2019-05-02
申请号:US16233955
申请日:2018-12-27
Applicant: Intel Corporation
Inventor: MIKHAIL PLOTNIKOV , ANDREY NARAIKIN , ELMOUSTAPHA OULD-AHMED-VALL
Abstract: In an embodiment, the present invention is directed to a processor including a decode logic to receive a multi-dimensional loop counter update instruction and to decode the multi-dimensional loop counter update instruction into at least one decoded instruction, and an execution logic to execute the at least one decoded instruction to update at least one loop counter value of a first operand associated with the multi-dimensional loop counter update instruction by a first amount. Methods to collapse loops using such instructions are also disclosed. Other embodiments are described and claimed.
-
-
-
-
-
-
-
-
-