Patent search ap:("Apple Inc.") AND inv:"Andrew M. Havlir" Page 5

41.

发明授权
Operand cache control techniques 有权

公开(公告)号：US09785567B2

公开(公告)日：2017-10-10

申请号：US14851859

申请日：2015-09-11

Applicant: Apple Inc.

Inventor： Andrew M. Havlir , Terence M. Potter , Liang-Kai Wang

IPC: G06F12/00 , G06F13/00 , G06F13/28 , G06F12/0884 , G06F12/0846 , G06F9/38 , G06F9/30

CPC classification number: G06F12/0884 , G06F9/3012 , G06F9/3824 , G06F9/383 , G06F9/3834 , G06F9/3838 , G06F9/3859 , G06F12/0848 , G06F2212/604

Abstract: Techniques are disclosed relating to per-pipeline control for an operand cache. In some embodiments, an apparatus includes a register file and multiple execution pipelines. In some embodiments, the apparatus also includes an operand cache that includes multiple entries that each include multiple portions that are each configured to store an operand for a corresponding execution pipeline. In some embodiments, the operand cache is configured, during operation of the apparatus, to store data in only a subset of the portions of an entry. In some embodiments, the apparatus is configured to store, for each entry in the operand cache, a per-entry validity value that indicates whether the entry is valid and per-portion state information that indicates whether data for each portion is valid and whether data for each portion is modified relative to data in a corresponding entry in the register file.

42.

发明授权
GPU predication 有权

公开(公告)号：US09633409B2

公开(公告)日：2017-04-25

申请号：US13975520

申请日：2013-08-26

Applicant: Apple Inc.

Inventor： Andrew M. Havlir , Brian K. Reynolds , Michael A. Geary

IPC: G06T1/20 , G06F9/30 , G06F9/44 , G06F9/38

CPC classification number: G06T1/20 , G06F9/30072 , G06F9/3017 , G06F9/30181 , G06F9/30185 , G06F9/3867 , G06F9/3877

Abstract: Techniques are disclosed relating to predication. In one embodiment, a graphics processing unit is disclosed that includes a first set of architecturally-defined registers configured to store predication information. The graphics processing unit further includes a second set of registers configured to mirror the first set of registers and an execution pipeline configured to discontinue execution of an instruction sequence based on predication information in the second set of registers. In one embodiment, the second set of registers includes one or more registers proximal to an output of the execution pipeline. In some embodiments, the execution pipeline writes back a predicate value determined for a predicate writer to the second set of registers. The first set of architecturally-defined registers is then updated with the predicate value written back to the second set of registers. In some embodiments, the execution pipeline discontinues execution of the instruction sequence without stalling.

43.

发明授权
Operand cache flush, eviction, and clean techniques using hint information and dirty information 有权

公开(公告)号：US09619394B2

公开(公告)日：2017-04-11

申请号：US14805124

申请日：2015-07-21

Applicant: Apple Inc.

Inventor： Andrew M. Havlir , Terence M. Potter

IPC: G06F13/00 , G06F12/0875 , G06F12/0891 , G06F9/30 , G06F12/0815

CPC classification number: G06F12/0875 , G06F9/383 , G06F12/0815 , G06F12/126 , G06F2212/1028 , G06F2212/1056 , G06F2212/452 , G06F2212/6046 , Y02D10/13

Abstract: An apparatus includes an operand cache for storing operands from a register file for use by execution circuitry. In some embodiments, eviction priority for the operand cache is based on the status of entries (e.g., whether dirty or clean) and the retention priority of entries. In some embodiments, flushes are handled differently based on their retention priority (e.g., low-priority entries may be pre-emptively flushed). In some embodiments, timing for cache clean operations is specified on a per-instruction basis. Disclosed techniques may spread out write backs in time, facilitate cache clean operations, facilitate thread switching, extend the time operands are available in an operand cache, and/or improve the use of compiler hints, in some embodiments.

44.

发明授权
Apparatus implementing instructions that impose pipeline interdependencies 有权
Title translation: 实施施加管道相互依赖性的指令的装置

公开(公告)号：US09183611B2

公开(公告)日：2015-11-10

申请号：US13935299

申请日：2013-07-03

Applicant: Apple Inc.

Inventor： Andrew M. Havlir , Terence M. Potter

IPC: G06F9/38 , G06T1/20

CPC classification number: G06T1/20 , G06F9/3838 , G06F9/3851 , G06F9/3867 , G06F9/3885

Abstract: Techniques are disclosed relating to implementation of gradient-type graphics instructions. In one embodiment, an apparatus includes first and second execution pipelines and a register file. In this embodiment, the register file is coupled to the first and second execution pipelines and configured to store operands for the first and second execution pipelines. In this embodiment, the apparatus is configured to determine that a graphics instruction imposes a dependency between the first and second pipeline. In response to this determination, the apparatus is configured to read a plurality of operands from the register file including an operand assigned to the second execution pipeline and to select the operand assigned to the second execution pipeline as an input operand for the first execution pipeline. The apparatus may be configured such that operands assigned to the second execution pipeline are accessible by the first execution pipeline only via the register file and not from other locations.

Abstract translation: 公开了与梯度型图形指令的实现有关的技术。在一个实施例中，装置包括第一和第二执行流水线和寄存器文件。在该实施例中，寄存器文件耦合到第一和第二执行流水线并且被配置为存储用于第一和第二执行流水线的操作数。在该实施例中，该装置被配置为确定图形指令施加第一和第二流水线之间的依赖关系。响应于该确定，该装置被配置为从寄存器文件读取包括分配给第二执行流水线的操作数的多个操作数，并且将分配给第二执行流水线的操作数作为第一执行流水线的输入操作数进行选择。该装置可以被配置为使得分配给第二执行流水线的操作数仅由第一执行流水线仅通过寄存器文件而不是来自其他位置。

45.

发明申请
GPU PREDICATION 有权
Title translation: GPU预测

公开(公告)号：US20150054837A1

公开(公告)日：2015-02-26

申请号：US13975520

申请日：2013-08-26

Applicant: Apple Inc.

Inventor： Andrew M. Havlir , Brian K. Reynolds , Michael A. Geary

IPC: G06T1/20

CPC classification number: G06T1/20 , G06F9/30072 , G06F9/3017 , G06F9/30181 , G06F9/30185 , G06F9/3867 , G06F9/3877

Abstract: Techniques are disclosed relating to predication. In one embodiment, a graphics processing unit is disclosed that includes a first set of architecturally-defined registers configured to store predication information. The graphics processing unit further includes a second set of registers configured to mirror the first set of registers and an execution pipeline configured to discontinue execution of an instruction sequence based on predication information in the second set of registers. In one embodiment, the second set of registers includes one or more registers proximal to an output of the execution pipeline. In some embodiments, the execution pipeline writes back a predicate value determined for a predicate writer to the second set of registers. The first set of architecturally-defined registers is then updated with the predicate value written back to the second set of registers. In some embodiments, the execution pipeline discontinues execution of the instruction sequence without stalling.

Abstract translation: 公开了与预测有关的技术。在一个实施例中，公开了一种图形处理单元，其包括被配置为存储预测信息的第一组体系结构定义的寄存器。图形处理单元还包括配置为镜像第一组寄存器的第二组寄存器和配置为基于第二组寄存器中的预测信息中止指令序列的执行的执行流水线。在一个实施例中，第二组寄存器包括靠近执行流水线的输出的一个或多个寄存器。在一些实施例中，执行流水线将为谓词写入器确定的谓词值写回到第二组寄存器。然后，第一组体系结构定义的寄存器被更新，并将谓词值写回第二组寄存器。在一些实施例中，执行流水线不中断执行指令序列。

46.

发明申请
Logical Slot to Hardware Slot Mapping for Graphics Processors 有权

公开(公告)号：US20230050061A1

公开(公告)日：2023-02-16

申请号：US17399711

申请日：2021-08-11

Applicant: Apple Inc.

Inventor： Andrew M. Havlir , Steven Fishwick , David A. Gotwalt , Benjamin Bowman , Ralph C. Taylor , Melissa L. Velez , Mladen Wilder , Ali Rabbani Rankouhi , Fergus W. MacGarry

IPC: G06F9/50 , G06F9/48 , G06T1/20 , G06T1/60

Abstract: Disclosed techniques relate to work distribution in graphics processors. In some embodiments, an apparatus includes circuitry that implements a plurality of logical slots and a set of graphics processor sub-units that each implement multiple distributed hardware slots. The circuitry may determine different distribution rules for first and second sets of graphics work and map logical slots to distributed hardware slots based on the distribution rules. In various embodiments, disclosed techniques may advantageously distribute work efficiently across distributed shader processors for graphics kicks of various sizes.

47.

发明授权
Dynamic buffering control for compute work distribution 有权

公开(公告)号：US11500692B2

公开(公告)日：2022-11-15

申请号：US17021720

申请日：2020-09-15

Applicant: Apple Inc.

Inventor： Andrew M. Havlir , Benjamin Bowman

IPC: G06F9/50 , G06T1/60 , G06T15/00

Abstract: Techniques are disclosed relating to dynamically adjusting buffering for distributing compute work in a graphics processor. In some embodiments, the graphics processor includes shader circuitry configured to process compute work from a compute kernel, multiple distributed workload parser circuits configured to send compute work to the shader circuitry, primary workload parser circuitry configured to send, via a communications fabric, compute work from the compute kernel to the distributed workload parser circuits, and buffer circuitry configured to buffer compute work received by one or more of the distributed workload parser circuits from the primary workload parser circuitry. In some embodiments, the graphics processor is configured to dynamically adjust a limit on the number of entries used in the buffer circuitry based on information indicating complexity of the compute kernel. This may advantageously maintain launch rates while reducing or avoiding workload imbalances, in some embodiments.

48.

发明申请
Dynamic Buffering Control for Compute Work Distribution 有权

公开(公告)号：US20220083396A1

公开(公告)日：2022-03-17

申请号：US17021720

申请日：2020-09-15

Applicant: Apple Inc.

Inventor： Andrew M. Havlir , Benjamin Bowman

IPC: G06F9/50 , G06T15/00 , G06T1/60

Abstract: Techniques are disclosed relating to dynamically adjusting buffering for distributing compute work in a graphics processor. In some embodiments, the graphics processor includes shader circuitry configured to process compute work from a compute kernel, multiple distributed workload parser circuits configured to send compute work to the shader circuitry, primary workload parser circuitry configured to send, via a communications fabric, compute work from the compute kernel to the distributed workload parser circuits, and buffer circuitry configured to buffer compute work received by one or more of the distributed workload parser circuits from the primary workload parser circuitry. In some embodiments, the graphics processor is configured to dynamically adjust a limit on the number of entries used in the buffer circuitry based on information indicating complexity of the compute kernel. This may advantageously maintain launch rates while reducing or avoiding workload imbalances, in some embodiments.

49.

发明授权
Completion signaling techniques in distributed processor 有权

公开(公告)号：US11250538B2

公开(公告)日：2022-02-15

申请号：US16812724

申请日：2020-03-09

Applicant: Apple Inc.

Inventor： Andrew M. Havlir , Ajay Simha Modugala

IPC: G06T1/20 , G06F9/48 , G06F9/50 , G06T15/00

Abstract: Techniques are disclosed relating to tracking compute workgroup completions in a distributed processor. In some embodiments, an apparatus includes a plurality of shader processors configured to perform operations for compute workgroups included in compute kernels, a master workload parser circuit, a plurality of distributed workload parser circuits, and a communications fabric connected to the plurality of distributed workload parser circuits and the master workload parser circuit. In some embodiments, a distributed workload parser circuit is configured to maintain, for each of a set of the shader processors, a data structure that specifies a count of workgroup completions for one or more kernels processed by the shader processor, determine, for the set of shader processors based on counts of workgroup completions for a first kernel, an aggregate count of completions to report for the first kernel, send the aggregate count to the master workload parser circuit over the communications fabric, and adjust the data structures to reflect counts included in the aggregate count.

50.

发明申请
Low Latency Fetch Circuitry for Compute Kernels 有权

公开(公告)号：US20210026638A1

公开(公告)日：2021-01-28

申请号：US17065761

申请日：2020-10-08

Applicant: Apple Inc.

Inventor： Andrew M. Havlir , Jeffrey T. Brady

IPC: G06F9/38 , G06T1/20

Abstract: Techniques are disclosed relating to fetching items from a compute command stream that includes compute kernels. In some embodiments, stream fetch circuitry sequentially pre-fetches items from the stream and stores them in a buffer. In some embodiments, fetch parse circuitry iterate through items in the buffer using a fetch parse pointer to detect indirect-data-access items and/or redirect items in the buffer. The fetch parse circuitry may send detected indirect data accesses to indirect-fetch circuitry, which may buffer requests. In some embodiments, execute parse circuitry iterates through items in the buffer using an execute parse pointer (e.g., which may trail the fetch parse pointer) and outputs both item data from the buffer and indirect-fetch results from indirect-fetch circuitry for execution. In various embodiments, the disclosed techniques may reduce fetch latency for compute kernels.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification