-
公开(公告)号:US10698687B1
公开(公告)日:2020-06-30
申请号:US15695263
申请日:2017-09-05
Applicant: Apple Inc.
Inventor: Dimitri Tan , Jeffrey T. Brady , Terence M. Potter , Jeffrey M. Broton , Frank W. Liljeros
Abstract: An example system includes a plurality of execution units, a shared resource, and an allocation control circuit. Each execution unit may generate a resource allocation request that includes a resource allocation size. The allocation control circuit may select a particular resource allocation request from the plurality of resource allocation requests, and determine an availability, based on an allocation register, of contiguous resource blocks within the shared resource. In response to determining that a number of the contiguous resource blocks satisfies a requested allocation size, the allocation control circuit may select an address corresponding to a particular resource block of the one or more contiguous resource blocks, and allocate the resource blocks to a corresponding execution unit. In response to a beginning of a second system clock cycle, the allocation control circuit may also update the allocation register based on the selected address and the requested allocation size.
-
公开(公告)号:US20190042312A1
公开(公告)日:2019-02-07
申请号:US15669445
申请日:2017-08-04
Applicant: Apple Inc.
Inventor: Mark D. Earl , Dimitri Tan , Christopher L. Spencer , Jeffrey T. Brady , Ralph C. Taylor , Terence M. Potter
IPC: G06F9/50
Abstract: In various embodiments, a resource allocation management circuit may allocate a plurality of different types of hardware resources (e.g., different types of registers) to a plurality of threads. The different types of hardware resources may correspond to a plurality of hardware resource allocation circuits. The resource allocation management circuit may track allocation of the hardware resources to the threads using state identification values of the threads. In response to determining that fewer than a respective requested number of one or more types of the hardware resources are available, the resource allocation management circuit may identify one or more threads for deallocation. As a result, the hardware resource allocation system may allocate hardware resources to threads more efficiently (e.g., may deallocate hardware resources allocated to fewer threads), as compared to a hardware resource allocation system that does not track allocation of hardware resources to threads using state identification values.
-
公开(公告)号:US11360780B2
公开(公告)日:2022-06-14
申请号:US16749618
申请日:2020-01-22
Applicant: Apple Inc.
Inventor: Benjiman L. Goodman , Terence M. Potter , Anjana Rajendran , Jeffrey T. Brady , Brian K. Reynolds , Jeffrey A. Lohman
IPC: G06F9/38
Abstract: Techniques are disclosed relating to context switching in a SIMD processor. In some embodiments, an apparatus includes pipeline circuitry configured to execute graphics instructions included in threads of a group of single-instruction multiple-data (SIMD) threads in a thread group. In some embodiments, context switch circuitry is configured to atomically: save, for the SIMD group, a program counter and information that indicates whether threads in the SIMD group are active using one or more context switch registers, set all threads to an active state for the SIMD group, and branch to handler code for the SIMD group. In some embodiments, the pipeline circuitry is configured to execute the handler code to save context information for the SIMD group and subsequently execute threads of another thread group. Disclosed techniques may allow instruction-level context switching even when some SIMD threads are non-active.
-
公开(公告)号:US20210224072A1
公开(公告)日:2021-07-22
申请号:US16749618
申请日:2020-01-22
Applicant: Apple Inc.
Inventor: Benjiman L. Goodman , Terence M. Potter , Anjana Rajendran , Jeffrey T. Brady , Brian K. Reynolds , Jeffrey A. Lohman
IPC: G06F9/38
Abstract: Techniques are disclosed relating to context switching in a SIMD processor. In some embodiments, an apparatus includes pipeline circuitry configured to execute graphics instructions included in threads of a group of single-instruction multiple-data (SIMD) threads in a thread group. In some embodiments, context switch circuitry is configured to atomically: save, for the SIMD group, a program counter and information that indicates whether threads in the SIMD group are active using one or more context switch registers, set all threads to an active state for the SIMD group, and branch to handler code for the SIMD group. In some embodiments, the pipeline circuitry is configured to execute the handler code to save context information for the SIMD group and subsequently execute threads of another thread group. Disclosed techniques may allow instruction-level context switching even when some SIMD threads are non-active.
-
公开(公告)号:US10990445B2
公开(公告)日:2021-04-27
申请号:US15669445
申请日:2017-08-04
Applicant: Apple Inc.
Inventor: Mark D. Earl , Dimitri Tan , Christopher L. Spencer , Jeffrey T. Brady , Ralph C. Taylor , Terence M. Potter
Abstract: In various embodiments, a resource allocation management circuit may allocate a plurality of different types of hardware resources (e.g., different types of registers) to a plurality of threads. The different types of hardware resources may correspond to a plurality of hardware resource allocation circuits. The resource allocation management circuit may track allocation of the hardware resources to the threads using state identification values of the threads. In response to determining that fewer than a respective requested number of one or more types of the hardware resources are available, the resource allocation management circuit may identify one or more threads for deallocation. As a result, the hardware resource allocation system may allocate hardware resources to threads more efficiently (e.g., may deallocate hardware resources allocated to fewer threads), as compared to a hardware resource allocation system that does not track allocation of hardware resources to threads using state identification values.
-
公开(公告)号:US20200097293A1
公开(公告)日:2020-03-26
申请号:US16143416
申请日:2018-09-26
Applicant: Apple Inc.
Inventor: Andrew M. Havlir , Jeffrey T. Brady
Abstract: Techniques are disclosed relating to fetching items from a compute command stream that includes compute kernels. In some embodiments, stream fetch circuitry sequentially pre-fetches items from the stream and stores them in a buffer. In some embodiments, fetch parse circuitry iterate through items in the buffer using a fetch parse pointer to detect indirect-data-access items and/or redirect items in the buffer. The fetch parse circuitry may send detected indirect data accesses to indirect-fetch circuitry, which may buffer requests. In some embodiments, execute parse circuitry iterates through items in the buffer using an execute parse pointer (e.g., which may trail the fetch parse pointer) and outputs both item data from the buffer and indirect-fetch results from indirect-fetch circuitry for execution. In various embodiments, the disclosed techniques may reduce fetch latency for compute kernels.
-
公开(公告)号:US10475152B1
公开(公告)日:2019-11-12
申请号:US15896831
申请日:2018-02-14
Applicant: Apple Inc.
Inventor: Andrew M. Havlir , Jeffrey T. Brady
IPC: G06T1/20 , G06F12/0891 , G06F9/38
Abstract: Techniques are disclosed relating to managing dependencies in a compute control stream that specifies operations to be performed on a programmable shader (e.g., of a graphics unit). In some embodiments, the compute control stream includes commands and kernels. In some embodiments, dependency circuitry is configured to maintain dependencies such that younger kernels are allowed to execute ahead of a type of cache-related command (e.g., a command that signals a cache flush and/or invalidate). Disclosed circuitry may include separate buffers for commands and kernels, command dependency circuitry, and kernel dependency circuitry. In various embodiments, the disclosed architecture may improve performance in a highly scalable manner.
-
公开(公告)号:US10467724B1
公开(公告)日:2019-11-05
申请号:US15896923
申请日:2018-02-14
Applicant: Apple Inc.
Inventor: Andrew M. Havlir , Jeffrey T. Brady
Abstract: Techniques are disclosed relating to dispatching compute work from a compute stream. In some embodiments, workgroup batch circuitry is configured to select (e.g., in a single clock cycle) multiple workgroups to be distributed to different shader circuitry. In some embodiments, iterator circuitry is configured to determine next positions in different dimensions at least partially in parallel. For example, in some embodiments, first circuitry is configured to determine a next position in a first dimension and an increment amount for a second dimension. In some embodiments, second circuitry is configured to determine at least partially in parallel with the determination of the next position in the first dimension, next positions in the second dimension for multiple possible increment amounts in the second dimension. In some embodiments, this may facilitate a configurable number of workgroups per batch and may increase performance, e.g., by increasing the overall number of workgroups dispatched per clock cycle.
-
-
-
-
-
-
-