Patent search ap:("Apple Inc.") AND inv:"Ajay Simha Modugala" Page 1

1.

发明授权
Completion signaling techniques in distributed processor 有权

公开(公告)号：US11250538B2

公开(公告)日：2022-02-15

申请号：US16812724

申请日：2020-03-09

Applicant: Apple Inc.

Inventor： Andrew M. Havlir , Ajay Simha Modugala

IPC: G06T1/20 , G06F9/48 , G06F9/50 , G06T15/00

Abstract: Techniques are disclosed relating to tracking compute workgroup completions in a distributed processor. In some embodiments, an apparatus includes a plurality of shader processors configured to perform operations for compute workgroups included in compute kernels, a master workload parser circuit, a plurality of distributed workload parser circuits, and a communications fabric connected to the plurality of distributed workload parser circuits and the master workload parser circuit. In some embodiments, a distributed workload parser circuit is configured to maintain, for each of a set of the shader processors, a data structure that specifies a count of workgroup completions for one or more kernels processed by the shader processor, determine, for the set of shader processors based on counts of workgroup completions for a first kernel, an aggregate count of completions to report for the first kernel, send the aggregate count to the master workload parser circuit over the communications fabric, and adjust the data structures to reflect counts included in the aggregate count.

2.

发明申请
Affinity-based Graphics Scheduling 有权

公开(公告)号：US20230047481A1

公开(公告)日：2023-02-16

申请号：US17399784

申请日：2021-08-11

Applicant: Apple Inc.

Inventor： Andrew M. Havlir , Ajay Simha Modugala , Benjamin Bowman , Yunjun Zhang

IPC: G06F9/50 , G06F9/48

Abstract: Techniques are disclosed relating to affinity-based scheduling of graphics work. In disclosed embodiments, first and second groups of graphics processor sub-units may share respective first and second caches. Distribution circuitry may receive a software-specified set of graphics work and a software-indicated mapping of portions of the set of graphics work to groups of graphics processor sub-units. The distribution circuitry may assign subsets of the set of graphics work based on the mapping. This may improve cache efficiency, in some embodiments, by allowing graphics work that accesses the same memory areas to be assigned to the same group of sub-units that share a cache.

3.

发明申请
Compute Kernel Parsing with Limits in one or more Dimensions 有权

公开(公告)号：US20220083377A1

公开(公告)日：2022-03-17

申请号：US17018913

申请日：2020-09-11

Applicant: Apple Inc.

Inventor： Andrew M. Havlir , Ajay Simha Modugala , Karl D. Mann

IPC: G06F9/50 , G06T1/20

Abstract: Techniques are disclosed relating to dispatching compute work from a compute stream. In some embodiments, a graphics processor executes instructions of compute kernels. Workload parser circuitry may determine, for distribution to the graphics processor circuitry, a set of workgroups from a compute kernel that includes workgroups organized in multiple dimensions, including a first number of workgroups in a first dimension and a second number of workgroups in a second dimension. This may include determining multiple sub-kernels for the compute kernel, wherein a first sub-kernel includes, in the first dimension, a limited number of workgroups that is smaller than the first number of workgroups. The parser circuitry may iterate through workgroups in both the first and second dimensions to generate the set of workgroups, proceeding through the first sub-kernel before iterating through any of the other sub-kernels. Disclosed techniques may provide desirable shapes for batches of workgroups.

4.

发明公开
Compute Kernel Parsing with Limits in one or more Dimensions 审中-公开

公开(公告)号：US20240345892A1

公开(公告)日：2024-10-17

申请号：US18673959

申请日：2024-05-24

Applicant: Apple Inc.

Inventor： Andrew M. Havlir , Ajay Simha Modugala , Karl D. Mann

IPC: G06F9/50 , G06T1/20

CPC classification number: G06F9/505 , G06T1/20

Abstract: Techniques are disclosed relating to dispatching compute work from a compute stream. In some embodiments, a graphics processor executes instructions of compute kernels. Workload parser circuitry may determine, for distribution to the graphics processor circuitry, a set of workgroups from a compute kernel that includes workgroups organized in multiple dimensions, including a first number of workgroups in a first dimension and a second number of workgroups in a second dimension. This may include determining multiple sub-kernels for the compute kernel, wherein a first sub-kernel includes, in the first dimension, a limited number of workgroups that is smaller than the first number of workgroups. The parser circuitry may iterate through workgroups in both the first and second dimensions to generate the set of workgroups, proceeding through the first sub-kernel before iterating through any of the other sub-kernels. Disclosed techniques may provide desirable shapes for batches of workgroups.

5.

发明授权
Compute kernel parsing with limits in one or more dimensions with iterating through workgroups in the one or more dimensions for execution 有权

公开(公告)号：US12020075B2

公开(公告)日：2024-06-25

申请号：US17018913

申请日：2020-09-11

Applicant: Apple Inc.

Inventor： Andrew M. Havlir , Ajay Simha Modugala , Karl D. Mann

IPC: G06F9/50 , G06T1/20

CPC classification number: G06F9/505 , G06T1/20

Abstract: Techniques are disclosed relating to dispatching compute work from a compute stream. In some embodiments, a graphics processor executes instructions of compute kernels. Workload parser circuitry may determine, for distribution to the graphics processor circuitry, a set of workgroups from a compute kernel that includes workgroups organized in multiple dimensions, including a first number of workgroups in a first dimension and a second number of workgroups in a second dimension. This may include determining multiple sub-kernels for the compute kernel, wherein a first sub-kernel includes, in the first dimension, a limited number of workgroups that is smaller than the first number of workgroups. The parser circuitry may iterate through workgroups in both the first and second dimensions to generate the set of workgroups, proceeding through the first sub-kernel before iterating through any of the other sub-kernels. Disclosed techniques may provide desirable shapes for batches of workgroups.

6.

发明申请
Completion Signaling Techniques in Distributed Processor 有权

公开(公告)号：US20210279832A1

公开(公告)日：2021-09-09

申请号：US16812724

申请日：2020-03-09

Applicant: Apple Inc.

Inventor： Andrew M. Havlir , Ajay Simha Modugala

IPC: G06T1/20 , G06T15/00 , G06F9/50 , G06F9/48

Abstract: Techniques are disclosed relating to tracking compute workgroup completions in a distributed processor. In some embodiments, an apparatus includes a plurality of shader processors configured to perform operations for compute workgroups included in compute kernels, a master workload parser circuit, a plurality of distributed workload parser circuits, and a communications fabric connected to the plurality of distributed workload parser circuits and the master workload parser circuit. In some embodiments, a distributed workload parser circuit is configured to maintain, for each of a set of the shader processors, a data structure that specifies a count of workgroup completions for one or more kernels processed by the shader processor, determine, for the set of shader processors based on counts of workgroup completions for a first kernel, an aggregate count of completions to report for the first kernel, send the aggregate count to the master workload parser circuit over the communications fabric, and adjust the data structures to reflect counts included in the aggregate count.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification