-
1.
公开(公告)号:US20190102191A1
公开(公告)日:2019-04-04
申请号:US15721313
申请日:2017-09-29
Applicant: Intel Corporation
Inventor: Venkateswara MADDURI , Elmoustapha OULD-AHMED-VALL , Robert VALENTINE , Jesus CORBAL , Mark CHARNEY
IPC: G06F9/30
Abstract: Embodiments of systems, apparatuses, and methods for dual complex number by complex conjugate multiplication in a processor are described. For example, execution circuitry executes a decoded instruction to multiplex data values from a plurality of packed data element positions in the first and second packed data source operands to at least one multiplier circuit, the first and second packed data source operands including a plurality of pairs complex numbers, each pair of complex numbers including data values at shared packed data element positions in the first and second packed data source operands; calculate a real part and an imaginary part of a product of a first complex number and a complex conjugate of a second complex number; and store the real result to a first packed data element position in the destination operand and store the imaginary result to a second packed data element position in the destination operand.
-
公开(公告)号:US20190102181A1
公开(公告)日:2019-04-04
申请号:US15721361
申请日:2017-09-29
Applicant: Intel Corporation
Inventor: Venkateswara MADDURI , Elmoustapha OULD-AHMED-VALL , Robert VALENTINE , Mark CHARNEY
IPC: G06F9/30
Abstract: An apparatus and method for performing left-shifting operations on packed quadword data. For example, one embodiment of a processor comprises: a decoder to decode a left-shift instruction to generate a decoded left-shift instruction; a first source register to store a plurality of packed quadwords data elements; execution circuitry to execute the decoded left-shift instruction, the execution circuitry comprising shift circuitry to left-shift at least first and second packed quadword data elements from first and second packed quadword data element locations, respectively, in the first source register by an amount specified in an immediate value or in a control value in a second source register, to generate first and second left-shifted quadwords; the execution circuitry to cause selection of a specified set of most significant bits of the first and second left-shifted quadwords to be written to least significant bit regions of first and second quadword data element locations, respectively, of a destination register; and the destination register to store the specified set of the most significant bits of the first and second left-shifted quadwords.
-
公开(公告)号:US20190102177A1
公开(公告)日:2019-04-04
申请号:US15721382
申请日:2017-09-29
Applicant: Intel Corporation
Inventor: Venkateswara MADDURI , Elmoustapha OULD-AHMED-VALL , Robert VALENTINE , Mark CHARNEY
IPC: G06F9/30
Abstract: An apparatus and method for performing left-shifting operations on packed quadword data. For example, one embodiment of a processor comprises: a decoder to decode a left-shift instruction to generate a decoded left-shift instruction; a first source register to store a plurality of packed quadwords data elements; execution circuitry to execute the decoded left-shift instruction, the execution circuitry comprising shift circuitry to left-shift at least first and second packed quadword data elements from first and second packed quadword data element locations, respectively, in the first source register by an amount specified in an immediate value or in a control value in a second source register, to generate first and second left-shifted quadwords; the execution circuitry to cause selection of 16 most significant bits of the first and second left-shifted quadwords to be written to 16 least significant bit regions of first and second quadword data element locations, respectively, of a destination register; and the destination register to store the specified set of the 16 most significant bits of the first and second left-shifted quadwords.
-
公开(公告)号:US20230325241A1
公开(公告)日:2023-10-12
申请号:US18043259
申请日:2020-09-26
Applicant: Intel Corporation
Inventor: Andrew J. HERDRICH , Yen-Cheng LIU , Venkateswara MADDURI , Krishnakumar K. GANAPATHY , Edwin VERPLANKE , Christopher GIANOS , Hanna ALAM , Joseph NUZMAN , Larisa NOVAKOVSKY
IPC: G06F9/50
CPC classification number: G06F9/5016 , G06F2209/504
Abstract: Embodiments for allocating shared resources are disclosed. In an embodiment, an apparatus includes a core and a hardware rate selector. The hardware rate selector is to, in response to a first indication that demand for memory bandwidth from the core has reached a threshold value, determine a delay value to be used to limit allocation of memory bandwidth to the core. The hardware rate selector includes a controller having a first counter to count a second indication of demand for memory bandwidth from the first core and a second counter to count expirations of time windows. The first indication is based on a difference between the first counter value and the second counter value.
-
公开(公告)号:US20220129273A1
公开(公告)日:2022-04-28
申请号:US17518235
申请日:2021-11-03
Applicant: INTEL CORPORATION
Inventor: ElMoustapha OULD-AHMED-VALL , Robert VALENTINE , Mark CHARNEY , Jesus CORBAL , Venkateswara MADDURI
Abstract: An apparatus and method for performing signed multiplication of packed signed doublewords and accumulation with a signed quadword. For example, one exemplary processor comprises three registers and execution circuitry. The execution circuitry is to multiply first and second packed signed doubleword data elements from the first register with third and fourth packed signed doubleword data elements from the second register, respectively, to generate first and second temporary products. It is also to select first, second, third, and fourth signed doubleword data elements. It is also to combine the first temporary products with a first packed signed quadword value read from the third register to generate a first accumulated result and to combine the second temporary product with a second packed signed quadword value read from the third source register to generate a second accumulated result. The third register is to store the results.
-
公开(公告)号:US20220309005A1
公开(公告)日:2022-09-29
申请号:US17214851
申请日:2021-03-27
Applicant: Intel Corporation
Inventor: Vedvyas SHANBHOGUE , Krishnakumar GANAPATHY , Venkateswara MADDURI , James ALLEN , James COLEMAN , Stephen ROBINSON
IPC: G06F12/0897 , G06F3/06
Abstract: Techniques for controlling bandwidth in a core are described. An exemplary core includes a memory bandwidth monitor per thread local to the core, each thread's local bandwidth monitor to at least allocate bandwidth for memory requests originating from the thread according to a class of service level stored in a field of quality of service (QoS) model-specific register (MSR), the class of service level pointed to by a class of service field in a platform quality of service MSR; and execution resources to support execution of at least one thread of the core.
-
公开(公告)号:US20210357215A1
公开(公告)日:2021-11-18
申请号:US17380930
申请日:2021-07-20
Applicant: INTEL CORPORATION
Inventor: Venkateswara MADDURI , Elmoustapha OULD-AHMED-VALL , Mark CHARNEY , Robert VALENTINE , Jesus CORBAL
IPC: G06F9/30
Abstract: An apparatus and method for performing dual concurrent multiplications, subtraction/addition, and accumulation of packed data elements. For example one embodiment of a processor comprises: a decoder to decode an instruction to generate a decoded instruction; a first source register to store first and second packed data elements; a second source register to store third and fourth packed data elements; execution circuitry to execute the decoded instruction, the execution circuitry comprising: multiplier circuitry to multiply the first and third packed data elements to generate a first temporary product and to concurrently multiply the second and fourth packed data elements to generate a second temporary product, the first through fourth packed data elements all being a first width; circuitry to negate the first temporary product to generate a negated first product; adder circuitry to add the first negated product to a first accumulated packed data element from a third source register to generate a first result, the first result being a second width which is at least twice as large as the first width; the adder circuitry to concurrently add the second temporary product to a second accumulated packed data element to generate a second result of the second width; the first and second results to be stored in specified first and second data element positions within a destination register.
-
公开(公告)号:US20230098724A1
公开(公告)日:2023-03-30
申请号:US17485374
申请日:2021-09-25
Applicant: Intel Corporation
Inventor: Vedvyas SHANBHOGUE , Robert VALENTINE , Mark CHARNEY , Venkateswara MADDURI
IPC: G06F9/30
Abstract: Techniques for copying a subset of status flags from a control and status register to a flags register in response to an instruction are described. An exemplary instruction includes a field for an opcode, the opcode to indicate execution circuitry is to copy from a first register a saturation flag value, an overflow value, and a carry value to a second register into one or more instructions of a different instruction set.
-
公开(公告)号:US20220413861A1
公开(公告)日:2022-12-29
申请号:US17359522
申请日:2021-06-26
Applicant: Intel Corporation
Inventor: Venkateswara MADDURI , Cristina ANDERSON , Robert VALENTINE , Mark CHARNEY , Vedvyas SHANBHOGUE
IPC: G06F9/30
Abstract: Techniques for matrix multiplication are described. In some examples, a single instruction having a format of fields for an opcode, one or more fields to indicate a location of a source/destination operand, one or more fields to indicate a location of a first source operand, and one or more fields to indicate a location of a second source operand is used. Wherein the opcode is to indicate that execution circuitry is to: multiply values from corresponding data elements of the first and second sources, add a first subset of the multiplied values to a first value from the source/destination operand and store in a first data element position of the source/destination operand, and add a second subset of the multiplied values to a second value from the source/destination operand and store in a second data element position of the source/destination operand.
-
公开(公告)号:US20220129268A1
公开(公告)日:2022-04-28
申请号:US17518336
申请日:2021-11-03
Applicant: INTEL CORPORATION
Inventor: Venkateswara MADDURI , ElMoustapha OULD-AHMED-VALL , Robert VALENTINE , Mark CHARNEY
IPC: G06F9/30
Abstract: An apparatus and method for performing right-shifting operations on packed quadword data. For example, one embodiment of a processor comprises a decoder to decode a right-shift instruction, a first source register to store a plurality of packed quadword data elements, and execution circuitry to execute the decoded right-shift instruction. The execution circuitry comprises shift circuitry with sign preservation logic to right-shift first and second packed quadword data elements in the first source register by an amount specified in an immediate value or in a control value in a second source register, the right-shifting to generate first and second right-shifted quadwords, the sign preservation logic to shift in the sign bit. The execution circuitry is to cause selection of 16 most significant bits of the first and second right-shifted quadwords to be written to 16 least significant bit regions of first and second quadword data element locations of a destination register.
-
-
-
-
-
-
-
-
-