Patent search ap:("Intel Corporation") AND inv:"MARK CHARNEY" Page 2

11.

发明申请
APPARATUS AND METHOD FOR VECTOR MULTIPLY OF SIGNED WORDS, ROUNDING, AND SATURATION 审中-公开

公开(公告)号：US20190196828A1

公开(公告)日：2019-06-27

申请号：US15850248

申请日：2017-12-21

Applicant: Intel Corporation

Inventor： VENKATESWARA MADDURI , CARL MURRAY , ELMOUSTAPHA OULD-AHMED-VALL , MARK CHARNEY , ROBERT VALENTINE , JESUS CORBAL , MILIND GIRKAR , BRET TOLL

IPC: G06F9/30 , G06F17/16

CPC classification number: G06F9/30145 , G06F9/30101 , G06F17/16

Abstract: An apparatus and method for performing signed fractional multiplication of packed data elements. For example one embodiment of a processor comprises: a decoder to decode an instruction; a first source register to store a first plurality of packed signed word data elements; a second source register to store a second plurality of packed signed word data elements; a control register to store a rounding control value to indicate a rounding mode; execution circuitry to execute the decoded instruction, the execution circuitry comprising: multiplier circuitry to concurrently multiply each of the packed signed word data elements of the first plurality with a corresponding packed signed word data element of the second plurality to generate a plurality of signed doubleword products; conversion circuitry to convert the plurality of signed doubleword products to a plurality of fractional signed words, the conversion circuitry including rounding circuitry to round the signed doubleword products in accordance with the rounding mode indicated by the rounding control value to generate the plurality of fractional signed words; and a destination register to store the plurality of fractional signed words as packed signed word fractional data elements in specified data element positions within the destination register.

12.

发明申请
APPARATUS AND METHOD FOR LEFT-SHIFTING PACKED QUADWORDS AND EXTRACTING PACKED DOUBLEWORDS 审中-公开

公开(公告)号：US20190196819A1

公开(公告)日：2019-06-27

申请号：US15850716

申请日：2017-12-21

Applicant: Intel Corporation

Inventor： VENKATESWARA MADDURI , ELMOUSTAPHA OULD-AHMED-VALL , ROBERT VALENTINE , MARK CHARNEY

IPC: G06F9/30

CPC classification number: G06F9/30032 , G06F9/3001 , G06F9/30036 , G06F9/30098 , G06F9/30145

Abstract: An apparatus and method for performing right-shifting operations on packed quadword data. For example, one embodiment of a processor comprises: a decoder to decode a left-shift instruction to generate a decoded left-shift instruction; a first source register to store a plurality of packed quadword data elements, each of the packed quadword data elements including a sign bit; execution circuitry to execute the decoded left-shift instruction, the execution circuitry comprising shift circuitry with sign preservation logic to left-shift first and second packed quadword data elements from first and second packed quadword data element locations, respectively, in the first source register by an amount specified in an immediate value or in a control value in a second source register, the left-shifting to generate first and second left-shifted quadwords, the shift circuitry to write zeroes into bit positions exposed by the left-shifting of the packed quadword data elements; the sign preservation logic to maintain a copy of the sign bit while the shift circuitry performs the left-shift operations; the execution circuitry to cause selection of 32 most significant bits of the first and second left-shifted quadwords, including the sign bit, to be written to 32 least significant bit regions of first and second quadword data element locations, respectively, of a destination register, writing the sign bit to the most significant bit position of each 32 least significant bit region.

13.

发明申请
APPARATUS AND METHOD FOR MULTIPLYING, SUMMING, AND ACCUMULATING SETS OF PACKED BYTES 审中-公开

公开(公告)号：US20190196813A1

公开(公告)日：2019-06-27

申请号：US15850499

申请日：2017-12-21

Applicant: Intel Corporation

Inventor： VENKATESWARA MADDURI , ELMOUSTAPHA OULD-AHMED-VALL , ROBERT VALENTINE , MARK CHARNEY , JESUS CORBAL

IPC: G06F9/30

Abstract: An apparatus and method for performing multiplication, summation, negation, sign extension, and accumulation with packed bytes. For example, one embodiment of a processor comprises: a decoder to decode an instruction to generate a decoded instruction, the instruction including an opcode, and a plurality of operands identifying a plurality of packed data source registers and a packed data destination register; a first source register to store a first plurality of packed signed bytes; a second source register to store a second plurality of packed signed bytes; execution circuitry to execute the decoded instruction, the execution circuitry comprising: multiplier circuitry to multiply each packed signed byte from the first source register with a corresponding packed signed byte from the second source register to generate a plurality of temporary products, adder circuitry to add a plurality of sets of the temporary products to generate a plurality of temporary sums; negation and extension circuitry to negate and extend each of the temporary sums to doublewords sums; and accumulation circuitry to add each of the doublewords sums to a doubleword from a third source register to general final doubleword results; and a packed data destination register to store the final doubleword results in specified data element locations.

14.

发明申请
APPARATUS AND METHOD FOR RIGHT SHIFTING PACKED QUADWORDS AND EXTRACTING PACKED DOUBLEWORDS 审中-公开

公开(公告)号：US20190196787A1

公开(公告)日：2019-06-27

申请号：US15850682

申请日：2017-12-21

Applicant: Intel Corporation

Inventor： VENKATESWARA MADDURI , ELMOUSTAPHA OULD-AHMED-VALL , MARK CHARNEY , ROBERT VALENTINE , JESUS CORBAL

IPC: G06F7/509 , G06F9/30

CPC classification number: G06F7/5095 , G06F9/30101 , G06F9/30145

Abstract: An apparatus and method for performing sum of absolute differences with accumulation. For example, one embodiment of a processor comprises: a decoder to decode an instruction to generate a decoded instruction; a first source register to store a first plurality of packed bytes; a second source register to store a second plurality of packed bytes; execution circuitry to execute the decoded instruction, the execution circuitry comprising: adder circuitry to determine a difference between each byte in the first source register and a corresponding byte in the second source register, absolute value circuitry to determine an absolute value of each difference, the adder circuitry to add pairs of the absolute values to generate a plurality of temporary results, and extension circuitry to extend the temporary results to temporary words; and accumulator circuitry to add each temporary word to a word from a third source register to generate a plurality of accumulated words; and a destination register to store the accumulated words as packed words.

15.

发明申请
APPARATUS AND METHOD FOR PERFORMING MULTIPLICATION WITH ADDITION-SUBTRACTION OF REAL COMPONENT 审中-公开

公开(公告)号：US20190102190A1

公开(公告)日：2019-04-04

申请号：US15721145

申请日：2017-09-29

Applicant: Intel Corporation

Inventor： VENKATESWARA MADDURI , ELMOUSTAPHA OULD-AHMED-VALL , MARK CHARNEY , ROBERT VALENTINE , JESUS CORBAL , BINWEI YANG

IPC: G06F9/30

Abstract: An apparatus and method for performing a transform on complex data. For example, one embodiment of a processor comprises: multiplier circuitry to multiply packed real N-bit data elements in the first source register with packed real M-bit data elements in the second source register and to multiply packed imaginary N-bit data elements in the first source register with packed imaginary M-bit data elements in the second source register to generate at least four real products, adder circuitry to subtract a first selected real product from a second selected real product to generate a first temporary result and to subtract a third selected real product from a fourth selected real product to generate a second temporary result, the adder circuitry to add the first temporary result to a first packed N-bit data element from the third source register to generate a first pre-scaled result, to subtract the first temporary result from the first packed N-bit data element to generate a second pre-scaled result, to add the second temporary result to a second packed N-bit data element from the third source register to generate a third pre-scaled result, and to subtract the second temporary result from the second packed N-bit data element to generate a fourth pre-scaled result; scaling circuitry to scale the first, second, third and fourth pre-scaled results to a specified bit width to generate first, second, third, and fourth final results; and a destination register to store the first, second, third, and fourth final results in specified data element positions.

16.

发明申请
APPARATUS AND METHOD FOR SCALING PRE-SCALED RESULTS OF COMPLEX MUTIPLY-ACCUMULATE OPERATIONS ON PACKED REAL AND IMAGINARY DATA ELEMENTS 有权

公开(公告)号：US20220326946A1

公开(公告)日：2022-10-13

申请号：US17589428

申请日：2022-01-31

Applicant: Intel Corporation

Inventor： VENKATESWARA MADDURI , ELMOUSTAPHA OULD-AHMED-VALL , MARK CHARNEY , ROBERT VALENTINE , JESUS CORBAL , BINWEI YANG

IPC: G06F9/30 , G06F7/544 , G06F17/14 , G06F7/48

Abstract: An apparatus and method for performing a transform on complex data. For example, one embodiment of a processor comprises: multiplier circuitry to multiply packed real N-bit data elements in the first source register with packed real M-bit data elements in the second source register and to multiply packed imaginary N-bit data elements in the first source register with packed imaginary M-bit data elements in the second source register to generate at least four real products, adder circuitry to subtract a first selected real product from a second selected real product to generate a first temporary result and to subtract a third selected real product from a fourth selected real product to generate a second temporary result, the adder circuitry to add the first temporary result to a first packed N-bit data element from the third source register to generate a first pre-scaled result, to subtract the first temporary result from the first packed N-bit data element to generate a second pre-scaled result, to add the second temporary result to a second packed N-bit data element from the third source register to generate a third pre-scaled result, and to subtract the second temporary result from the second packed N-bit data element to generate a fourth pre-scaled result; scaling circuitry to scale the first, second, third and fourth pre-scaled results to a specified bit width to generate first, second, third, and fourth final results; and a destination register to store the first, second, third, and fourth final results in specified data element positions.

17.

发明申请
APPARATUS AND METHOD FOR COMPLEX BY COMPLEX CONJUGATE MULTIPLICATION 有权

公开(公告)号：US20220171624A1

公开(公告)日：2022-06-02

申请号：US17672504

申请日：2022-02-15

Applicant: Intel Corporation

Inventor： VENKATESWARA MADDURI , ELMOUSTAPHA OULD-AHMED-VALL , JESUS CORBAL , MARK CHARNEY , ROBERT VALENTINE , BINWEI YANG

IPC: G06F9/30

Abstract: An apparatus and method for multiplying packed real and imaginary components of complex numbers are described. A processor embodiment includes: a decoder to decode a first instruction to generate a decoded instruction; a first source register to store a first plurality of packed real and imaginary data elements; a second source register to store a second plurality of packed real and imaginary data elements; and execution circuitry to execute the decoded instruction. The execution circuitry includes: multiplier circuitry to select real and imaginary data elements in the first source register and second source, multiply each selected imaginary data element in the first source register with a selected real data element in the second source register, and multiply each selected real data element in the first source register with a selected imaginary data element in the second source register to generate a plurality of imaginary products; adder circuitry to add a first subset of the plurality of imaginary products and subtract a second subset of the plurality of imaginary products to generate a first temporary result, and to add a third subset of the plurality of imaginary products and subtract a fourth subset of the plurality of imaginary products to generate a second temporary result; and accumulation circuitry to combine the first temporary result with first data from a destination register to generate a first final result, combine the second temporary result with second data from the destination register to generate a second final result, and store the first final result and second final result back in the destination register.

18.

发明申请
APPARATUS AND METHOD FOR PERFORMING DUAL SIGNED AND UNSIGNED MULTIPLICATION OF PACKED DATA ELEMENTS 有权

公开(公告)号：US20210004227A1

公开(公告)日：2021-01-07

申请号：US17027230

申请日：2020-09-21

Applicant: Intel Corporation

Inventor： VENKATESWARA MADDURI , ELMOUSTAPHA OULD-AHMED-VALL , JESUS CORBAL , MARK CHARNEY , ROBERT VALENTINE , BINWEI YANG

IPC: G06F9/30 , G06F7/00

Abstract: An apparatus and method for performing dual concurrent multiplications of packed data elements. For example one embodiment of a processor comprises: a decoder to decode a first instruction to generate a decoded instruction; a first source register to store a first plurality of packed byte data elements; a second source register to store a second plurality of packed byte data elements; execution circuitry to execute the decoded instruction, the execution circuitry comprising: multiplier circuitry to concurrently multiply each of the packed byte data elements of the first plurality with a corresponding packed byte data element of the second plurality to generate a plurality of products; adder circuitry to add specified sets of the products to generate temporary results for each set of products; zero-extension or sign-extension circuitry to zero-extend or sign-extend the temporary result for each set to generate an extended temporary result for each set; accumulation circuitry to combine each of the extended temporary results with a selected packed data value stored in a third source register to generate a plurality of final results; and a destination register to store the plurality of final results as a plurality of packed data elements in specified data element positions.

19.

发明申请
APPARATUS AND METHOD FOR VECTOR MULTIPLY AND ACCUMULATE OF PACKED WORDS 审中-公开

公开(公告)号：US20190227797A1

公开(公告)日：2019-07-25

申请号：US15879420

申请日：2018-01-24

Applicant: Intel Corporation

Inventor： ALEXANDER HEINECKE , DIPANKAR DAS , ROBERT VALENTINE , MARK CHARNEY

IPC: G06F9/30 , G06F9/38

Abstract: An apparatus and method for performing multiply-accumulate operations. For example, one embodiment of a processor comprises: a decoder to decode instructions; a first source register to store a first plurality of packed words; a second source register to store a second plurality of packed words; a third source register to store a plurality of packed quadwords; execution circuitry to execute a first instruction, the execution circuitry comprising: extension circuitry to sign-extend or zero-extend the first and second plurality of packed words to generate a first and second plurality of doublewords corresponding to the first and second plurality of packed words; multiplier circuitry to multiply each of the first plurality of doublewords with a corresponding one of the second plurality of doublewords to generate a plurality of temporary products; adder circuitry to add at least a first set of the temporary products to generate a first temporary sum; accumulation circuitry to combine the first temporary sum with a first packed quadword value from a first quadword location in the third source register to generate a first accumulated quadword result; a destination register to store the first accumulated quadword result in the first quadword location.

20.

发明申请
APPARATUS AND METHOD FOR SHIFTING PACKED QUADWORDS AND EXTRACTING PACKED WORDS 审中-公开

公开(公告)号：US20190196822A1

公开(公告)日：2019-06-27

申请号：US15851145

申请日：2017-12-21

Applicant: Intel Corporation

Inventor： ELMOUSTAPHA OULD-AHMED-VALL , ROBERT VALENTINE , MARK CHARNEY , VENKATESWARA MADDURI

IPC: G06F9/30

CPC classification number: G06F9/30032 , G06F9/3001 , G06F9/30036 , G06F9/30098 , G06F9/30145

Abstract: An apparatus and method for performing left-shifting operations on packed quadword data. For example, one embodiment of a processor comprises: a decoder to decode a left-shift instruction to generate a decoded left-shift instruction; a first source register to store a plurality of packed quadword data elements, each of the packed quadword data elements including a sign bit; execution circuitry to execute the decoded left-shift instruction, the execution circuitry comprising shift circuitry with sign preservation logic to left-shift first and second packed quadword data elements from first and second packed quadword data element locations, respectively, in the first source register by an amount specified in an immediate value or in a control value in a second source register, the left-shifting to generate first and second left-shifted quadwords, the shift circuitry to write zeroes into bit positions exposed by the left-shifting of the packed quadword data elements; the sign preservation logic to maintain a copy of the sign bit while the shift circuitry performs the left-shift operations; the execution circuitry to cause selection of 16 most significant bits of the first and second left-shifted quadwords, including the sign bit, to be written to 16 least significant bit regions of first and second quadword data element locations, respectively, of a destination register, writing the sign bit to the most significant bit position of each 16 least significant bit region.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification