-
公开(公告)号:US20170177355A1
公开(公告)日:2017-06-22
申请号:US14975380
申请日:2015-12-18
Applicant: Intel Corporation
Inventor: Elmoustapha Ould-Ahmed-Vall , Suleyman Sair , Joonmoo Huh
CPC classification number: G06F9/30036 , G06F9/30 , G06F9/30032 , G06F9/30043 , G06F9/30101 , G06F9/3016 , G06F9/3455 , G06F12/0875 , G06F2212/452
Abstract: A processor includes a core to execute an instruction and logic to determine that the instruction will require strided data converted from source data in memory. The strided data is to include corresponding indexed elements from structures in the source data to be loaded into a final register to be used to execute the instruction. The core also includes logic to load source data into a plurality of preliminary vector registers to align a defined element of one of the preliminary vector registers in a position that corresponds to a required position in the final register for execution. The core includes logic to apply permute instructions to contents of the preliminary vector registers to cause corresponding indexed elements from the structures to be loaded into respective source vector registers.
-
公开(公告)号:US10152321B2
公开(公告)日:2018-12-11
申请号:US14974729
申请日:2015-12-18
Applicant: Intel Corporation
Inventor: Elmoustapha Ould-Ahmed-Vall , Suleyman Sair , Joonmoo Huh
IPC: G06F9/30
Abstract: A processor includes a core to execute an instruction and logic to determine that the instruction will require strided data converted from source data in memory. The strided data is to include corresponding indexed elements from structures in the source data to be loaded into a same register to be used to execute the instruction. The core also includes logic to load source data into preliminary vector registers. The source data is to be unaligned as resident in the vector registers. The core includes logic to apply blend instructions to contents of the preliminary vector registers to cause corresponding indexed elements from the plurality of structures to be loaded into respective interim vector registers, and to apply further blend instructions to contents of the interim vector registers to cause additional indexed elements from the structures to be loaded into respective source vector registers.
-
公开(公告)号:US20170177345A1
公开(公告)日:2017-06-22
申请号:US14975390
申请日:2015-12-18
Applicant: Intel Corporation
Inventor: Elmoustapha Ould-Ahmed-Vall , Suleyman Sair , Joonmoo Huh
IPC: G06F9/30
CPC classification number: G06F9/30029 , G06F9/30032 , G06F9/30036 , G06F9/30043 , G06F9/30101 , G06F9/3016 , G06F9/3455
Abstract: A processor includes a core to execute an instruction and logic to determine that the instruction will require strided data converted from source data in memory. The strided data is to include corresponding indexed elements from a plurality of structures in the source data to be loaded into a same register to be used to execute the instruction. The core also includes logic to load source data into a plurality of preliminary vector registers with a first indexed layout of elements and a second indexed layout of elements. A plurality of the preliminary vector registers are to be loaded with the first indexed layout of elements. A common register of the preliminary vector registers are to be loaded with the second indexed layout of elements. The core also includes logic to apply permute instructions to contents of the preliminary vector registers to cause corresponding indexed elements from the plurality of structures to be loaded into respective source vector registers.
-
公开(公告)号:US20170177344A1
公开(公告)日:2017-06-22
申请号:US14974729
申请日:2015-12-18
Applicant: Intel Corporation
Inventor: Elmoustapha Ould-Ahmed-Vall , Suleyman Sair , Joonmoo Huh
IPC: G06F9/30
CPC classification number: G06F9/30029 , G06F9/30 , G06F9/30036 , G06F9/30101 , G06F9/3016
Abstract: A processor includes a core to execute an instruction and logic to determine that the instruction will require strided data converted from source data in memory. The strided data is to include corresponding indexed elements from structures in the source data to be loaded into a same register to be used to execute the instruction. The core also includes logic to load source data into preliminary vector registers. The source data is to be unaligned as resident in the vector registers. The core includes logic to apply blend instructions to contents of the preliminary vector registers to cause corresponding indexed elements from the plurality of structures to be loaded into respective interim vector registers, and to apply further blend instructions to contents of the interim vector registers to cause additional indexed elements from the structures to be loaded into respective source vector registers.
-
-
-