-
公开(公告)号:US20200073662A1
公开(公告)日:2020-03-05
申请号:US16537460
申请日:2019-08-09
Applicant: Advanced Micro Devices, Inc.
Inventor: Anupama Rajesh Rasale
Abstract: A computing system includes a processing unit and a memory storing instructions that, when executed by the processor, cause the processor to receive program source code in a compiler, identify in the program source code a set of operations for vectorizing, where each operation in the set of operations specifies a set of one or more operands, in response to identifying the set of operations, vectorize the set of operations by, based on the number of operations in the set of operations and a total number of lanes in a first vector register, generating a mask indicating a first unmasked lane and a first masked lane in the first vector register, based on the mask, generating a set of one or more instructions for loading into the first unmasked lane a first operand of a first operation of the set of operations, and loading the first operand into the first masked lane.
-
公开(公告)号:US20180349140A1
公开(公告)日:2018-12-06
申请号:US15607883
申请日:2017-05-30
Applicant: Advanced Micro Devices, Inc.
Inventor: Anupama Rajesh Rasale
CPC classification number: G06F9/30058 , G06F8/443 , G06F8/445 , G06F8/452 , G06F9/3802 , G06F9/3844 , G06F9/3861
Abstract: Systems, apparatuses, and methods for implementing an IF2FOR transformation are disclosed. In one embodiment, a first group of instructions include an IF-statement and one or more control dependent instructions. The first group of instructions are transformed into a second group of instructions if the first group of instructions meet one or more criteria. In one embodiment, the criteria includes the (1) IF-statement being part of a loop and (2) the control dependent instructions not having any inter-loop iteration dependency. The second group of instructions are executable to (1) store results of the IF-statement condition for a first number of iterations and (2) execute the control dependent instructions for a second number of iterations when the IF-statement condition evaluates to true.
-
公开(公告)号:US11789734B2
公开(公告)日:2023-10-17
申请号:US16537460
申请日:2019-08-09
Applicant: Advanced Micro Devices, Inc.
Inventor: Anupama Rajesh Rasale
CPC classification number: G06F9/30036 , G06F9/30018 , G06F9/30043 , G06F9/30058 , G06F9/30098 , G06F9/3887
Abstract: A computing system includes a processing unit and a memory storing instructions that, when executed by the processor, cause the processor to receive program source code in a compiler, identify in the program source code a set of operations for vectorizing, where each operation in the set of operations specifies a set of one or more operands, in response to identifying the set of operations, vectorize the set of operations by, based on the number of operations in the set of operations and a total number of lanes in a first vector register, generating a mask indicating a first unmasked lane and a first masked lane in the first vector register, based on the mask, generating a set of one or more instructions for loading into the first unmasked lane a first operand of a first operation of the set of operations, and loading the first operand into the first masked lane.
-
公开(公告)号:US10235173B2
公开(公告)日:2019-03-19
申请号:US15607883
申请日:2017-05-30
Applicant: Advanced Micro Devices, Inc.
Inventor: Anupama Rajesh Rasale
Abstract: Systems, apparatuses, and methods for implementing an IF2FOR transformation are disclosed. In one embodiment, a first group of instructions include an IF-statement and one or more control dependent instructions. The first group of instructions are transformed into a second group of instructions if the first group of instructions meet one or more criteria. In one embodiment, the criteria includes the (1) IF-statement being part of a loop and (2) the control dependent instructions not having any inter-loop iteration dependency. The second group of instructions are executable to (1) store results of the IF-statement condition for a first number of iterations and (2) execute the control dependent instructions for a second number of iterations when the IF-statement condition evaluates to true.
-
公开(公告)号:US11262989B2
公开(公告)日:2022-03-01
申请号:US16663107
申请日:2019-10-24
Applicant: Advanced Micro Devices, Inc.
Inventor: Abhilash Bhandari , Venugopal Raghavan , Mohammad Asghar Ahmad Shahid , Anupama Rajesh Rasale
Abstract: A computing system includes a compatibility graph builder to generate a compatibility graph based on a dependency graph representing program source code, where the compatibility graph indicates compatibility relationships between operations represented in the dependency graph, a clique generator coupled with the compatibility graph builder to generate a set of candidate vector packings based on the compatibility relationships indicated in the compatibility graph, a set cover generator coupled with the clique generator to select a subset of vector packings from the set of candidate vector packings, and a vector code generator coupled with the set cover generator to generate the vector code based on the selected subset of vector packings.
-
公开(公告)号:US20210042099A1
公开(公告)日:2021-02-11
申请号:US16663107
申请日:2019-10-24
Applicant: Advanced Micro Devices, Inc.
Inventor: Abhilash Bhandari , Venugopal Raghavan , Mohammad Asghar Ahmad Shahid , Anupama Rajesh Rasale
Abstract: A computing system includes a compatibility graph builder to generate a compatibility graph based on a dependency graph representing program source code, where the compatibility graph indicates compatibility relationships between operations represented in the dependency graph, a clique generator coupled with the compatibility graph builder to generate a set of candidate vector packings based on the compatibility relationships indicated in the compatibility graph, a set cover generator coupled with the clique generator to select a subset of vector packings from the set of candidate vector packings, and a vector code generator coupled with the set cover generator to generate the vector code based on the selected subset of vector packings.
-
公开(公告)号:US10353708B2
公开(公告)日:2019-07-16
申请号:US15273916
申请日:2016-09-23
Applicant: Advanced Micro Devices, Inc.
Inventor: Anupama Rajesh Rasale , Dibyendu Das , Ashutosh Nema , Md Asghar Ahmad Shahid , Prathiba Kumar
Abstract: Systems, apparatuses, and methods for utilizing efficient vectorization techniques for operands in non-sequential memory locations are disclosed. A system includes a vector processing unit (VPU) and one or more memory devices. In response to determining that a plurality of vector operands are stored in non-sequential memory locations, the VPU performs a plurality of vector load operations to load the plurality of vector operands into a plurality of vector registers. Next, the VPU performs a shuffle operation to consolidate the plurality of vector operands from the plurality of vector registers into a single vector register. Then, the VPU performs a vector operation on the vector operands stored in the single vector register. The VPU can also perform a vector store operation by permuting and storing a plurality of vector operands in appropriate locations within multiple vector registers and then storing the vector registers to locations in memory using a mask.
-
公开(公告)号:US20180088948A1
公开(公告)日:2018-03-29
申请号:US15273916
申请日:2016-09-23
Applicant: Advanced Micro Devices, Inc.
Inventor: Anupama Rajesh Rasale , Dibyendu Das , Ashutosh Nema , Md Asghar Ahmad Shahid , Prathiba Kumar
CPC classification number: G06F9/30036 , G06F9/30032 , G06F9/30043 , G06F9/3455 , G06F15/8007 , G06F15/8053
Abstract: Systems, apparatuses, and methods for utilizing efficient vectorization techniques for operands in non-sequential memory locations are disclosed. A system includes a vector processing unit (VPU) and one or more memory devices. In response to determining that a plurality of vector operands are stored in non-sequential memory locations, the VPU performs a plurality of vector load operations to load the plurality of vector operands into a plurality of vector registers. Next, the VPU performs a shuffle operation to consolidate the plurality of vector operands from the plurality of vector registers into a single vector register. Then, the VPU performs a vector operation on the vector operands stored in the single vector register. The VPU can also perform a vector store operation by permuting and storing a plurality of vector operands in appropriate locations within multiple vector registers and then storing the vector registers to locations in memory using a mask.
-
-
-
-
-
-
-