-
公开(公告)号:US20200293456A1
公开(公告)日:2020-09-17
申请号:US16354859
申请日:2019-03-15
申请人: Intel Corporation
发明人: MURALI RAMADOSS , VIKRANTH VEMULAPALLI , NIRAN COORAY , WILLIAM B. SADLER , JONATHAN D. PEARCE , MARIAN ALIN PETRE , BEN ASHBAUGH , ELMOUSTAPHA OULD-AHMED-VALL , NICOLAS GALOPPO VON BORRIES , ALTUG KOKER , ARAVINDH ANANTARAMAN , SUBRAMANIAM MAIYURAN , VARGHESE GEORGE , SUNGYE KIM , ANDREI VALENTIN
IPC分类号: G06F12/1009 , G06N20/00
摘要: Methods and apparatus relating to predictive page fault handling. In an example, an apparatus comprises a processor to receive a virtual address that triggered a page fault for a compute process, check a virtual memory space for a virtual memory allocation for the compute process that triggered the page fault and manage the page fault according to one of a first protocol in response to a determination that the virtual address that triggered the page fault is a last page in the virtual memory allocation for the compute process, or a second protocol in response to a determination that the virtual address that triggered the page fault is not a last page in the virtual memory allocation for the compute process. Other embodiments are also disclosed and claimed.
-
公开(公告)号:US20200097291A1
公开(公告)日:2020-03-26
申请号:US16140196
申请日:2018-09-24
申请人: Intel Corporation
发明人: CHRISTOPHER J. HUGHES , BRET TOLL , ALEXANDER HEINECKE , DAN BAUM , ELMOUSTAPHA OULD-AHMED-VALL , RAANAN SADE , ROBERT VALENTINE , MARK CHARNEY
摘要: An apparatus and method for tile-based gather and scatter operations. For example, one embodiment of a processor comprises: a destination tile register to store a 2-D arrangement of data elements; a first source tile register to store indices associated with the data elements; instruction fetch circuitry to fetch a tile gather instruction comprising operands identifying the first source tile register and the destination tile register; a decoder to decode the tile gather instruction; and execution circuitry to determine a plurality of system memory addresses based on the indices from the first source tile register and to load the data elements from the system memory addresses to the destination tile register.
-
公开(公告)号:US20200097290A1
公开(公告)日:2020-03-26
申请号:US16560223
申请日:2019-09-04
申请人: Intel Corporation
发明人: JESUS CORBAL SAN ADRIAN , ELMOUSTAPHA OULD-AHMED-VALL , ROBERT VALENTINE , MARK J. CHARNEY , MILIND B. GIRKAR , BRET L. TOLL , ROGER ESPASA , GUILLEM SOLE , JAIRO BALART , BRIAN HICKMANN
IPC分类号: G06F9/30 , G06F15/80 , G06F16/901
摘要: An apparatus and method for performing a vector permute. For example, one embodiment of a processor comprises: a source vector register to store a plurality of source data elements; a destination vector register to store a plurality of destination data elements; a control vector register to store a plurality of control data elements, each control data element corresponding to one of the destination data elements and including an N bit value indicating whether a source data element is to be copied to the corresponding destination data element; vector permute logic to compare the N bit value of each control data element to an N bit portion of an immediate to determine whether to copy a source data element to the corresponding destination data element, wherein if the N bit values match, then the vector permute logic is to identify a source data element using an index value included in the control data element and to responsively copy the source data element to the corresponding destination data element in the destination vector register.
-
公开(公告)号:US20190318550A1
公开(公告)日:2019-10-17
申请号:US16383849
申请日:2019-04-15
申请人: Intel Corporation
发明人: BARATH LAKSHAMANAN , LINDA l. HURD , BEN J. ASHBAUGH , ELMOUSTAPHA OULD-AHMED-VALL , LIWEI MA , JINGYI JIN , JUSTIN E. GOTTSCHLICH , CHANDRASEKARAN SAKTHIVEL , MICHAEL S. STRICKLAND , BRIAN T. LEWIS , LINDSEY KUPER , ALTUG KOKER , ABHISHEK R. APPU , PRASOONKUMAR SURTI , JOYDEEP RAY , BALAJI VEMBU , JAVIER S. TUREK , NAILA FAROOQUI
IPC分类号: G07C5/00 , G06F9/50 , H04W28/08 , B60W30/00 , H04L29/08 , G01C21/34 , G05D1/00 , G08G1/01 , G06N20/00
摘要: One embodiment provides for a computing device within an autonomous vehicle, the compute device comprising a wireless network device to enable a wireless data connection with an autonomous vehicle network, a set of multiple processors including a general-purpose processor and a general-purpose graphics processor, the set of multiple processors to execute a compute manager to manage execution of compute workloads associated with the autonomous vehicle, the compute workload associated with autonomous operations of the autonomous vehicle, and offload logic configured to execute on the set of multiple processors, the offload logic to determine to offload one or more of the compute workloads to one or more autonomous vehicles within range of the wireless network device.
-
公开(公告)号:US20190196828A1
公开(公告)日:2019-06-27
申请号:US15850248
申请日:2017-12-21
申请人: Intel Corporation
发明人: VENKATESWARA MADDURI , CARL MURRAY , ELMOUSTAPHA OULD-AHMED-VALL , MARK CHARNEY , ROBERT VALENTINE , JESUS CORBAL , MILIND GIRKAR , BRET TOLL
CPC分类号: G06F9/30145 , G06F9/30101 , G06F17/16
摘要: An apparatus and method for performing signed fractional multiplication of packed data elements. For example one embodiment of a processor comprises: a decoder to decode an instruction; a first source register to store a first plurality of packed signed word data elements; a second source register to store a second plurality of packed signed word data elements; a control register to store a rounding control value to indicate a rounding mode; execution circuitry to execute the decoded instruction, the execution circuitry comprising: multiplier circuitry to concurrently multiply each of the packed signed word data elements of the first plurality with a corresponding packed signed word data element of the second plurality to generate a plurality of signed doubleword products; conversion circuitry to convert the plurality of signed doubleword products to a plurality of fractional signed words, the conversion circuitry including rounding circuitry to round the signed doubleword products in accordance with the rounding mode indicated by the rounding control value to generate the plurality of fractional signed words; and a destination register to store the plurality of fractional signed words as packed signed word fractional data elements in specified data element positions within the destination register.
-
46.
公开(公告)号:US20190196819A1
公开(公告)日:2019-06-27
申请号:US15850716
申请日:2017-12-21
申请人: Intel Corporation
IPC分类号: G06F9/30
CPC分类号: G06F9/30032 , G06F9/3001 , G06F9/30036 , G06F9/30098 , G06F9/30145
摘要: An apparatus and method for performing right-shifting operations on packed quadword data. For example, one embodiment of a processor comprises: a decoder to decode a left-shift instruction to generate a decoded left-shift instruction; a first source register to store a plurality of packed quadword data elements, each of the packed quadword data elements including a sign bit; execution circuitry to execute the decoded left-shift instruction, the execution circuitry comprising shift circuitry with sign preservation logic to left-shift first and second packed quadword data elements from first and second packed quadword data element locations, respectively, in the first source register by an amount specified in an immediate value or in a control value in a second source register, the left-shifting to generate first and second left-shifted quadwords, the shift circuitry to write zeroes into bit positions exposed by the left-shifting of the packed quadword data elements; the sign preservation logic to maintain a copy of the sign bit while the shift circuitry performs the left-shift operations; the execution circuitry to cause selection of 32 most significant bits of the first and second left-shifted quadwords, including the sign bit, to be written to 32 least significant bit regions of first and second quadword data element locations, respectively, of a destination register, writing the sign bit to the most significant bit position of each 32 least significant bit region.
-
公开(公告)号:US20190196813A1
公开(公告)日:2019-06-27
申请号:US15850499
申请日:2017-12-21
申请人: Intel Corporation
发明人: VENKATESWARA MADDURI , ELMOUSTAPHA OULD-AHMED-VALL , ROBERT VALENTINE , MARK CHARNEY , JESUS CORBAL
IPC分类号: G06F9/30
摘要: An apparatus and method for performing multiplication, summation, negation, sign extension, and accumulation with packed bytes. For example, one embodiment of a processor comprises: a decoder to decode an instruction to generate a decoded instruction, the instruction including an opcode, and a plurality of operands identifying a plurality of packed data source registers and a packed data destination register; a first source register to store a first plurality of packed signed bytes; a second source register to store a second plurality of packed signed bytes; execution circuitry to execute the decoded instruction, the execution circuitry comprising: multiplier circuitry to multiply each packed signed byte from the first source register with a corresponding packed signed byte from the second source register to generate a plurality of temporary products, adder circuitry to add a plurality of sets of the temporary products to generate a plurality of temporary sums; negation and extension circuitry to negate and extend each of the temporary sums to doublewords sums; and accumulation circuitry to add each of the doublewords sums to a doubleword from a third source register to general final doubleword results; and a packed data destination register to store the final doubleword results in specified data element locations.
-
48.
公开(公告)号:US20190196787A1
公开(公告)日:2019-06-27
申请号:US15850682
申请日:2017-12-21
申请人: Intel Corporation
发明人: VENKATESWARA MADDURI , ELMOUSTAPHA OULD-AHMED-VALL , MARK CHARNEY , ROBERT VALENTINE , JESUS CORBAL
CPC分类号: G06F7/5095 , G06F9/30101 , G06F9/30145
摘要: An apparatus and method for performing sum of absolute differences with accumulation. For example, one embodiment of a processor comprises: a decoder to decode an instruction to generate a decoded instruction; a first source register to store a first plurality of packed bytes; a second source register to store a second plurality of packed bytes; execution circuitry to execute the decoded instruction, the execution circuitry comprising: adder circuitry to determine a difference between each byte in the first source register and a corresponding byte in the second source register, absolute value circuitry to determine an absolute value of each difference, the adder circuitry to add pairs of the absolute values to generate a plurality of temporary results, and extension circuitry to extend the temporary results to temporary words; and accumulator circuitry to add each temporary word to a word from a third source register to generate a plurality of accumulated words; and a destination register to store the accumulated words as packed words.
-
公开(公告)号:US20190146800A1
公开(公告)日:2019-05-16
申请号:US16227645
申请日:2018-12-20
申请人: Intel Corporation
发明人: ELMOUSTAPHA OULD-AHMED-VALL , BARATH LAKSHMANAN , TATIANA SHPEISMAN , Joydeep Ray , Ping T. Tang , Michael Strickland , Xiaoming Chen , Anbang Yao , Ben J. Ashbaugh , Linda L. Hurd , Liwei Ma
IPC分类号: G06F9/38 , G06N20/00 , G06F15/80 , G06F13/42 , G06F9/30 , G06F13/40 , G06T1/20 , G06N3/00 , G06F9/50
摘要: One embodiment provides for a general-purpose graphics processing unit comprising a streaming multiprocessor having a single instruction, multiple thread (SIMT) architecture including hardware multithreading. The streaming multiprocessor comprises multiple processing blocks including multiple processing cores. The processing cores include independent integer and floating-point data paths that are configurable to concurrently execute multiple independent instructions. A memory is coupled with the multiple processing blocks.
-
50.
公开(公告)号:US20190102190A1
公开(公告)日:2019-04-04
申请号:US15721145
申请日:2017-09-29
申请人: Intel Corporation
发明人: VENKATESWARA MADDURI , ELMOUSTAPHA OULD-AHMED-VALL , MARK CHARNEY , ROBERT VALENTINE , JESUS CORBAL , BINWEI YANG
IPC分类号: G06F9/30
摘要: An apparatus and method for performing a transform on complex data. For example, one embodiment of a processor comprises: multiplier circuitry to multiply packed real N-bit data elements in the first source register with packed real M-bit data elements in the second source register and to multiply packed imaginary N-bit data elements in the first source register with packed imaginary M-bit data elements in the second source register to generate at least four real products, adder circuitry to subtract a first selected real product from a second selected real product to generate a first temporary result and to subtract a third selected real product from a fourth selected real product to generate a second temporary result, the adder circuitry to add the first temporary result to a first packed N-bit data element from the third source register to generate a first pre-scaled result, to subtract the first temporary result from the first packed N-bit data element to generate a second pre-scaled result, to add the second temporary result to a second packed N-bit data element from the third source register to generate a third pre-scaled result, and to subtract the second temporary result from the second packed N-bit data element to generate a fourth pre-scaled result; scaling circuitry to scale the first, second, third and fourth pre-scaled results to a specified bit width to generate first, second, third, and fourth final results; and a destination register to store the first, second, third, and fourth final results in specified data element positions.
-
-
-
-
-
-
-
-
-