Patent search ap:("Advanced Micro Devices Page Inc.") AND inv:"Anupama Rajesh Rasale"

1.

发明申请
PADDED VECTORIZATION WITH COMPILE TIME KNOWN MASKS 审中-公开

公开(公告)号：US20200073662A1

公开(公告)日：2020-03-05

申请号：US16537460

申请日：2019-08-09

Applicant: Advanced Micro Devices, Inc.

Inventor： Anupama Rajesh Rasale

IPC: G06F9/30 , G06F9/38

Abstract: A computing system includes a processing unit and a memory storing instructions that, when executed by the processor, cause the processor to receive program source code in a compiler, identify in the program source code a set of operations for vectorizing, where each operation in the set of operations specifies a set of one or more operands, in response to identifying the set of operations, vectorize the set of operations by, based on the number of operations in the set of operations and a total number of lanes in a first vector register, generating a mask indicating a first unmasked lane and a first masked lane in the first vector register, based on the mask, generating a set of one or more instructions for loading into the first unmasked lane a first operand of a first operation of the set of operations, and loading the first operand into the first masked lane.

2.

发明申请
PROGRAM CODE OPTIMIZATION FOR REDUCING BRANCH MISPREDICTIONS 审中-公开

公开(公告)号：US20180349140A1

公开(公告)日：2018-12-06

申请号：US15607883

申请日：2017-05-30

Applicant: Advanced Micro Devices, Inc.

Inventor： Anupama Rajesh Rasale

IPC: G06F9/30 , G06F9/38

CPC classification number: G06F9/30058 , G06F8/443 , G06F8/445 , G06F8/452 , G06F9/3802 , G06F9/3844 , G06F9/3861

Abstract: Systems, apparatuses, and methods for implementing an IF2FOR transformation are disclosed. In one embodiment, a first group of instructions include an IF-statement and one or more control dependent instructions. The first group of instructions are transformed into a second group of instructions if the first group of instructions meet one or more criteria. In one embodiment, the criteria includes the (1) IF-statement being part of a loop and (2) the control dependent instructions not having any inter-loop iteration dependency. The second group of instructions are executable to (1) store results of the IF-statement condition for a first number of iterations and (2) execute the control dependent instructions for a second number of iterations when the IF-statement condition evaluates to true.

3.

发明授权
Padded vectorization with compile time known masks 有权

公开(公告)号：US11789734B2

公开(公告)日：2023-10-17

申请号：US16537460

申请日：2019-08-09

Applicant: Advanced Micro Devices, Inc.

Inventor： Anupama Rajesh Rasale

IPC: G06F9/30 , G06F9/38

CPC classification number: G06F9/30036 , G06F9/30018 , G06F9/30043 , G06F9/30058 , G06F9/30098 , G06F9/3887

Abstract: A computing system includes a processing unit and a memory storing instructions that, when executed by the processor, cause the processor to receive program source code in a compiler, identify in the program source code a set of operations for vectorizing, where each operation in the set of operations specifies a set of one or more operands, in response to identifying the set of operations, vectorize the set of operations by, based on the number of operations in the set of operations and a total number of lanes in a first vector register, generating a mask indicating a first unmasked lane and a first masked lane in the first vector register, based on the mask, generating a set of one or more instructions for loading into the first unmasked lane a first operand of a first operation of the set of operations, and loading the first operand into the first masked lane.

4.

发明授权
Program code optimization for reducing branch mispredictions 有权

公开(公告)号：US10235173B2

公开(公告)日：2019-03-19

申请号：US15607883

申请日：2017-05-30

Applicant: Advanced Micro Devices, Inc.

Inventor： Anupama Rajesh Rasale

IPC: G06F8/41 , G06F9/30 , G06F9/38

Abstract: Systems, apparatuses, and methods for implementing an IF2FOR transformation are disclosed. In one embodiment, a first group of instructions include an IF-statement and one or more control dependent instructions. The first group of instructions are transformed into a second group of instructions if the first group of instructions meet one or more criteria. In one embodiment, the criteria includes the (1) IF-statement being part of a loop and (2) the control dependent instructions not having any inter-loop iteration dependency. The second group of instructions are executable to (1) store results of the IF-statement condition for a first number of iterations and (2) execute the control dependent instructions for a second number of iterations when the IF-statement condition evaluates to true.

5.

发明授权
Automatic generation of efficient vector code with low overhead in a time-efficient manner independent of vector width 有权

公开(公告)号：US11262989B2

公开(公告)日：2022-03-01

申请号：US16663107

申请日：2019-10-24

Applicant: Advanced Micro Devices, Inc.

Inventor： Abhilash Bhandari , Venugopal Raghavan , Mohammad Asghar Ahmad Shahid , Anupama Rajesh Rasale

IPC: G06F9/44 , G06F8/41 , G06F8/30 , G06F9/30

Abstract: A computing system includes a compatibility graph builder to generate a compatibility graph based on a dependency graph representing program source code, where the compatibility graph indicates compatibility relationships between operations represented in the dependency graph, a clique generator coupled with the compatibility graph builder to generate a set of candidate vector packings based on the compatibility relationships indicated in the compatibility graph, a set cover generator coupled with the clique generator to select a subset of vector packings from the set of candidate vector packings, and a vector code generator coupled with the set cover generator to generate the vector code based on the selected subset of vector packings.

6.

发明申请
AUTOMATIC GENERATION OF EFFICIENT VECTOR CODE WITH LOW OVERHEAD IN A TIME-EFFICIENT MANNER INDEPENDENT OF VECTOR WIDTH 有权

公开(公告)号：US20210042099A1

公开(公告)日：2021-02-11

申请号：US16663107

申请日：2019-10-24

Applicant: Advanced Micro Devices, Inc.

Inventor： Abhilash Bhandari , Venugopal Raghavan , Mohammad Asghar Ahmad Shahid , Anupama Rajesh Rasale

IPC: G06F8/41 , G06F8/30 , G06F9/30

Abstract: A computing system includes a compatibility graph builder to generate a compatibility graph based on a dependency graph representing program source code, where the compatibility graph indicates compatibility relationships between operations represented in the dependency graph, a clique generator coupled with the compatibility graph builder to generate a set of candidate vector packings based on the compatibility relationships indicated in the compatibility graph, a set cover generator coupled with the clique generator to select a subset of vector packings from the set of candidate vector packings, and a vector code generator coupled with the set cover generator to generate the vector code based on the selected subset of vector packings.

7.

发明授权
Strided loading of non-sequential memory locations by skipping memory locations between consecutive loads 有权

公开(公告)号：US10353708B2

公开(公告)日：2019-07-16

申请号：US15273916

申请日：2016-09-23

Applicant: Advanced Micro Devices, Inc.

Inventor： Anupama Rajesh Rasale , Dibyendu Das , Ashutosh Nema , Md Asghar Ahmad Shahid , Prathiba Kumar

IPC: G06F9/30 , G06F15/80 , G06F9/345

Abstract: Systems, apparatuses, and methods for utilizing efficient vectorization techniques for operands in non-sequential memory locations are disclosed. A system includes a vector processing unit (VPU) and one or more memory devices. In response to determining that a plurality of vector operands are stored in non-sequential memory locations, the VPU performs a plurality of vector load operations to load the plurality of vector operands into a plurality of vector registers. Next, the VPU performs a shuffle operation to consolidate the plurality of vector operands from the plurality of vector registers into a single vector register. Then, the VPU performs a vector operation on the vector operands stored in the single vector register. The VPU can also perform a vector store operation by permuting and storing a plurality of vector operands in appropriate locations within multiple vector registers and then storing the vector registers to locations in memory using a mask.

8.

发明申请
EFFICIENT VECTORIZATION TECHNIQUES FOR OPERANDS IN NON-SEQUENTIAL MEMORY LOCATIONS 审中-公开

公开(公告)号：US20180088948A1

公开(公告)日：2018-03-29

申请号：US15273916

申请日：2016-09-23

Applicant: Advanced Micro Devices, Inc.

Inventor： Anupama Rajesh Rasale , Dibyendu Das , Ashutosh Nema , Md Asghar Ahmad Shahid , Prathiba Kumar

IPC: G06F9/30 , G06F15/80

CPC classification number: G06F9/30036 , G06F9/30032 , G06F9/30043 , G06F9/3455 , G06F15/8007 , G06F15/8053

Abstract: Systems, apparatuses, and methods for utilizing efficient vectorization techniques for operands in non-sequential memory locations are disclosed. A system includes a vector processing unit (VPU) and one or more memory devices. In response to determining that a plurality of vector operands are stored in non-sequential memory locations, the VPU performs a plurality of vector load operations to load the plurality of vector operands into a plurality of vector registers. Next, the VPU performs a shuffle operation to consolidate the plurality of vector operands from the plurality of vector registers into a single vector register. Then, the VPU performs a vector operation on the vector operands stored in the single vector register. The VPU can also perform a vector store operation by permuting and storing a plurality of vector operands in appropriate locations within multiple vector registers and then storing the vector registers to locations in memory using a mask.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification