Patent search ap:("Apple Inc.") AND inv:"Andrew M. Havlir" Page 4

31.

发明申请
APPARATUS IMPLEMENTING INSTRUCTIONS THAT IMPOSE PIPELINE INTERDEPENDENCIES 有权
Title translation: 设备实施管道接口的说明

公开(公告)号：US20150009223A1

公开(公告)日：2015-01-08

申请号：US13935299

申请日：2013-07-03

Applicant: Apple Inc.

Inventor： Andrew M. Havlir , Terence M. Potter

IPC: G06T1/20

CPC classification number: G06T1/20 , G06F9/3838 , G06F9/3851 , G06F9/3867 , G06F9/3885

Abstract: Techniques are disclosed relating to implementation of gradient-type graphics instructions. In one embodiment, an apparatus includes first and second execution pipelines and a register file. In this embodiment, the register file is coupled to the first and second execution pipelines and configured to store operands for the first and second execution pipelines. In this embodiment, the apparatus is configured to determine that a graphics instruction imposes a dependency between the first and second pipeline. In response to this determination, the apparatus is configured to read a plurality of operands from the register file including an operand assigned to the second execution pipeline and to select the operand assigned to the second execution pipeline as an input operand for the first execution pipeline. The apparatus may be configured such that operands assigned to the second execution pipeline are accessible by the first execution pipeline only via the register file and not from other locations.

Abstract translation: 公开了与梯度型图形指令的实现有关的技术。在一个实施例中，装置包括第一和第二执行流水线和寄存器文件。在该实施例中，寄存器文件耦合到第一和第二执行流水线并且被配置为存储用于第一和第二执行流水线的操作数。在该实施例中，该装置被配置为确定图形指令施加第一和第二流水线之间的依赖关系。响应于该确定，该装置被配置为从寄存器文件读取包括分配给第二执行流水线的操作数的多个操作数，并且将分配给第二执行流水线的操作数作为第一执行流水线的输入操作数进行选择。该装置可以被配置为使得分配给第二执行流水线的操作数仅由第一执行流水线仅通过寄存器文件而不是来自其他位置。

32.

发明公开
Compute Kernel Parsing with Limits in one or more Dimensions 审中-公开

公开(公告)号：US20240345892A1

公开(公告)日：2024-10-17

申请号：US18673959

申请日：2024-05-24

Applicant: Apple Inc.

Inventor： Andrew M. Havlir , Ajay Simha Modugala , Karl D. Mann

IPC: G06F9/50 , G06T1/20

CPC classification number: G06F9/505 , G06T1/20

Abstract: Techniques are disclosed relating to dispatching compute work from a compute stream. In some embodiments, a graphics processor executes instructions of compute kernels. Workload parser circuitry may determine, for distribution to the graphics processor circuitry, a set of workgroups from a compute kernel that includes workgroups organized in multiple dimensions, including a first number of workgroups in a first dimension and a second number of workgroups in a second dimension. This may include determining multiple sub-kernels for the compute kernel, wherein a first sub-kernel includes, in the first dimension, a limited number of workgroups that is smaller than the first number of workgroups. The parser circuitry may iterate through workgroups in both the first and second dimensions to generate the set of workgroups, proceeding through the first sub-kernel before iterating through any of the other sub-kernels. Disclosed techniques may provide desirable shapes for batches of workgroups.

33.

发明授权
Compute kernel parsing with limits in one or more dimensions with iterating through workgroups in the one or more dimensions for execution 有权

公开(公告)号：US12020075B2

公开(公告)日：2024-06-25

申请号：US17018913

申请日：2020-09-11

Applicant: Apple Inc.

Inventor： Andrew M. Havlir , Ajay Simha Modugala , Karl D. Mann

IPC: G06F9/50 , G06T1/20

CPC classification number: G06F9/505 , G06T1/20

Abstract: Techniques are disclosed relating to dispatching compute work from a compute stream. In some embodiments, a graphics processor executes instructions of compute kernels. Workload parser circuitry may determine, for distribution to the graphics processor circuitry, a set of workgroups from a compute kernel that includes workgroups organized in multiple dimensions, including a first number of workgroups in a first dimension and a second number of workgroups in a second dimension. This may include determining multiple sub-kernels for the compute kernel, wherein a first sub-kernel includes, in the first dimension, a limited number of workgroups that is smaller than the first number of workgroups. The parser circuitry may iterate through workgroups in both the first and second dimensions to generate the set of workgroups, proceeding through the first sub-kernel before iterating through any of the other sub-kernels. Disclosed techniques may provide desirable shapes for batches of workgroups.

34.

发明授权
Low latency fetch circuitry for compute kernels 有权

公开(公告)号：US11256510B2

公开(公告)日：2022-02-22

申请号：US17065761

申请日：2020-10-08

Applicant: Apple Inc.

Inventor： Andrew M. Havlir , Jeffrey T. Brady

IPC: G06F9/38 , G06T1/20

Abstract: Techniques are disclosed relating to fetching items from a compute command stream that includes compute kernels. In some embodiments, stream fetch circuitry sequentially pre-fetches items from the stream and stores them in a buffer. In some embodiments, fetch parse circuitry iterate through items in the buffer using a fetch parse pointer to detect indirect-data-access items and/or redirect items in the buffer. The fetch parse circuitry may send detected indirect data accesses to indirect-fetch circuitry, which may buffer requests. In some embodiments, execute parse circuitry iterates through items in the buffer using an execute parse pointer (e.g., which may trail the fetch parse pointer) and outputs both item data from the buffer and indirect-fetch results from indirect-fetch circuitry for execution. In various embodiments, the disclosed techniques may reduce fetch latency for compute kernels.

35.

发明申请
Completion Signaling Techniques in Distributed Processor 有权

公开(公告)号：US20210279832A1

公开(公告)日：2021-09-09

申请号：US16812724

申请日：2020-03-09

Applicant: Apple Inc.

Inventor： Andrew M. Havlir , Ajay Simha Modugala

IPC: G06T1/20 , G06T15/00 , G06F9/50 , G06F9/48

Abstract: Techniques are disclosed relating to tracking compute workgroup completions in a distributed processor. In some embodiments, an apparatus includes a plurality of shader processors configured to perform operations for compute workgroups included in compute kernels, a master workload parser circuit, a plurality of distributed workload parser circuits, and a communications fabric connected to the plurality of distributed workload parser circuits and the master workload parser circuit. In some embodiments, a distributed workload parser circuit is configured to maintain, for each of a set of the shader processors, a data structure that specifies a count of workgroup completions for one or more kernels processed by the shader processor, determine, for the set of shader processors based on counts of workgroup completions for a first kernel, an aggregate count of completions to report for the first kernel, send the aggregate count to the master workload parser circuit over the communications fabric, and adjust the data structures to reflect counts included in the aggregate count.

36.

发明授权
Techniques for context switching using distributed compute workload parsers 有权

公开(公告)号：US10901777B1

公开(公告)日：2021-01-26

申请号：US16143432

申请日：2018-09-26

Applicant: Apple Inc.

Inventor： Andrew M. Havlir , Jeffrey T. Brady

IPC: G06F9/48 , G06T15/80

Abstract: Techniques are disclosed relating to context switching using distributed compute workload parsers. In some embodiments, an apparatus includes a plurality of shader units configured to perform operations for compute workgroups included in compute kernels, a plurality of distributed workload parser circuits each configured to dispatch workgroups to a respective set of the shader units, a communications fabric, and a master workload parser circuit configured to communicate with the distributed workload parser circuits via the communications fabric. In some embodiments, the master workload parser circuit maintains a first set of master state information that does not change for a compute kernel based on operations by the shader units and a second set of master state information that may be changed by operations specified by the kernel. In some embodiments, the master workload parser circuit performs a multi-phase state storage process in communications with the distributed workload parser circuits.

37.

发明授权
Re-using graphics vertex identifiers for primitive blocks across states 有权

公开(公告)号：US10269091B1

公开(公告)日：2019-04-23

申请号：US15809687

申请日：2017-11-10

Applicant: Apple Inc.

Inventor： Michael A. Mang , Andrew M. Havlir

IPC: G06T1/20 , G06T15/00 , G06T1/60 , G06T7/49

Abstract: Techniques are disclosed relating to storage techniques for storing primitive information with vertex re-use. In some embodiments, graphics circuitry aggregates primitive information (including vertex data) for multiple primitives into a primitive block data structure. This may include storing only a single instance of a vertex for multiple primitives that share the vertex. The graphics circuitry may switch between primitive blocks, with one being active and the others non-active. For non-active primitive blocks, the graphics circuitry may track whether vertex identifiers have been used for a new vertex, which may prevent vertex re-use. If an identifier is not used for a new vertex, however, a vertex may be re-used across deactivation and reactivation of a primitive block.

38.

发明申请
Clause Chaining for Clause-Based Instruction Execution 审中-公开

公开(公告)号：US20180067748A1

公开(公告)日：2018-03-08

申请号：US15257386

申请日：2016-09-06

Applicant: Apple Inc.

Inventor： Andrew M. Havlir , Brian K. Reynolds , Liang Xia , Terence M. Potter

IPC: G06F9/38

CPC classification number: G06F9/3867 , G06F9/3851 , G06F9/3887

Abstract: Techniques are disclosed relating to clause-based execution of program instructions, which may be single-instruction multiple data (SIMD) computer instructions. In some embodiments, an apparatus includes execution circuitry configured to receive clauses of instructions and SIMD groups of input data to be operated on by the clauses. In some embodiments, the apparatus further includes one or more storage elements configured to store state information for clauses processed by the execution circuitry. In some embodiments, the apparatus further includes scheduling circuitry configured to send instructions of a first clause and corresponding input data for execution by the execution circuitry and indicate, prior to sending instruction and input data of a second clause to the execution circuitry for execution, whether the second clause and a first clause are assigned to operate on groups of input data corresponding to the same instruction stream. In some embodiments, the apparatus is configured to determine, based on the indication, whether to maintain as valid, for use by the second clause, stored state information for the first clause.

39.

发明申请
UNIFIED INTEGER AND FLOATING-POINT COMPARE CIRCUITRY 有权

公开(公告)号：US20170357506A1

公开(公告)日：2017-12-14

申请号：US15180725

申请日：2016-06-13

Applicant: Apple Inc.

Inventor： Liang-Kai Wang , Terence M. Potter , Andrew M. Havlir

IPC: G06F9/30

CPC classification number: G06F9/30021 , G06F9/3001 , G06F9/30083

Abstract: Techniques are disclosed relating to comparison circuitry. In some embodiments, compare circuitry is configured to generate comparison results for sets of inputs in both one or more integer formats and one or more floating-point formats. In some embodiments, the compare circuitry includes padding circuitry configured to add one or more bits to each of first and second input values to generate first and second padded values. In some embodiments, the compare circuitry also includes integer subtraction circuitry configured to subtract the first padded value from the second padded value to generate a subtraction result. In some embodiments, the compare circuitry includes output logic configured to generate the comparison result based on the subtraction result. In various embodiments, using at least a portion of the same circuitry (e.g., the subtractor) for both integer and floating-point comparisons may reduce processor area.

40.

发明申请
Floating-Point Multiply-Add with Down-Conversion 审中-公开

公开(公告)号：US20170293470A1

公开(公告)日：2017-10-12

申请号：US15092401

申请日：2016-04-06

Applicant: Apple Inc.

Inventor： Liang-Kai Wang , Terence M. Potter , Andrew M. Havlir , Yu Sun , Nicolas X. Pena , Xiao-Long Wu , Christopher A. Burns

IPC: G06F7/483

CPC classification number: G06F7/483 , G06F7/5443

Abstract: Techniques are disclosed relating to floating-point operations with down-conversion. In some embodiments, a floating-point unit is configured to perform fused multiply-addition operations based on first and second different instruction types. In some embodiments, the first instruction type specifies result in the first floating-point format and the second instruction type specifies fused multiply addition of input operands in the first floating-point format to generate a result in a second, lower-precision floating-point format. For example, the first format may be a 32-bit format and the second format may be a 16-bit format. In some embodiments, the floating-point unit includes rounding circuitry, exponent circuitry, and/or increment circuitry configured to generate signals for the second instruction type in the same pipeline stage as for the first instruction type. In some embodiments, disclosed techniques may reduce the number of pipeline stages included in the floating-point circuitry.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification