-
公开(公告)号:US20240095176A1
公开(公告)日:2024-03-21
申请号:US18054388
申请日:2022-11-10
Applicant: Apple Inc.
Inventor: Benjiman L. Goodman , Yoong Chert Foo , Karl D. Mann , Terence M. Potter , Frank W. Liljeros , Jeffrey T. Brady
IPC: G06F12/0891 , G06F12/0811
CPC classification number: G06F12/0891 , G06F12/0811 , G06F2212/6042
Abstract: Techniques are disclosed relating to thread preemption in the context of memory-backed registers. In some embodiments, a memory hierarchy includes one or more cache levels and one or more memory circuits. Execution circuitry may operate on operands in architectural registers to execute instructions of threads, where data for the architectural registers is stored and backed by the memory hierarchy. Control circuitry may, in response to a context switch indication for a given thread: flush and invalidate a set of architectural register data from a first cache level and store memory page information (e.g., a page catalog base address) associated with the set of architectural register data.
-
公开(公告)号:US20210026638A1
公开(公告)日:2021-01-28
申请号:US17065761
申请日:2020-10-08
Applicant: Apple Inc.
Inventor: Andrew M. Havlir , Jeffrey T. Brady
Abstract: Techniques are disclosed relating to fetching items from a compute command stream that includes compute kernels. In some embodiments, stream fetch circuitry sequentially pre-fetches items from the stream and stores them in a buffer. In some embodiments, fetch parse circuitry iterate through items in the buffer using a fetch parse pointer to detect indirect-data-access items and/or redirect items in the buffer. The fetch parse circuitry may send detected indirect data accesses to indirect-fetch circuitry, which may buffer requests. In some embodiments, execute parse circuitry iterates through items in the buffer using an execute parse pointer (e.g., which may trail the fetch parse pointer) and outputs both item data from the buffer and indirect-fetch results from indirect-fetch circuitry for execution. In various embodiments, the disclosed techniques may reduce fetch latency for compute kernels.
-
公开(公告)号:US20210248006A1
公开(公告)日:2021-08-12
申请号:US17240406
申请日:2021-04-26
Applicant: Apple Inc.
Inventor: Mark D. Earl , Dimitri Tan , Christopher L. Spencer , Jeffrey T. Brady , Ralph C. Taylor , Terence M. Potter
IPC: G06F9/50
Abstract: In various embodiments, a resource allocation management circuit may allocate a plurality of different types of hardware resources (e.g., different types of registers) to a plurality of threads. The different types of hardware resources may correspond to a plurality of hardware resource allocation circuits. The resource allocation management circuit may track allocation of the hardware resources to the threads using state identification values of the threads. In response to determining that fewer than a respective requested number of one or more types of the hardware resources are available, the resource allocation management circuit may identify one or more threads for deallocation. As a result, the hardware resource allocation system may allocate hardware resources to threads more efficiently (e.g., may deallocate hardware resources allocated to fewer threads), as compared to a hardware resource allocation system that does not track allocation of hardware resources to threads using state identification values.
-
公开(公告)号:US20200098160A1
公开(公告)日:2020-03-26
申请号:US16143412
申请日:2018-09-26
Applicant: Apple Inc.
Inventor: Andrew M. Havlir , Benjamin Bowman , Jeffrey T. Brady
Abstract: Techniques are disclosed relating to distributing work from compute kernels using a distributed hierarchical parser architecture. In some embodiments, an apparatus includes a plurality of shader units configured to perform operations for compute workgroups included in compute kernels processed by the apparatus, a plurality of distributed workload parser circuits, and a communications fabric connected to the plurality of distributed workload parser circuits and a master workload parser circuit. In some embodiments, the master workload parser circuit is configured to iteratively determine a next position in multiple dimensions for a next batch of workgroups from the kernel and send batch information to the distributed workload parser circuits via the communications fabric to assign the batch of workgroups. In some embodiments, the distributed parsers maintain coordinate information for the kernel and update the coordinate information in response to the batch information, even when the distributed parsers are not assigned to execute the batch.
-
公开(公告)号:US10593094B1
公开(公告)日:2020-03-17
申请号:US16143412
申请日:2018-09-26
Applicant: Apple Inc.
Inventor: Andrew M. Havlir , Benjamin Bowman , Jeffrey T. Brady
Abstract: Techniques are disclosed relating to distributing work from compute kernels using a distributed hierarchical parser architecture. In some embodiments, an apparatus includes a plurality of shader units configured to perform operations for compute workgroups included in compute kernels processed by the apparatus, a plurality of distributed workload parser circuits, and a communications fabric connected to the plurality of distributed workload parser circuits and a master workload parser circuit. In some embodiments, the master workload parser circuit is configured to iteratively determine a next position in multiple dimensions for a next batch of workgroups from the kernel and send batch information to the distributed workload parser circuits via the communications fabric to assign the batch of workgroups. In some embodiments, the distributed parsers maintain coordinate information for the kernel and update the coordinate information in response to the batch information, even when the distributed parsers are not assigned to execute the batch.
-
公开(公告)号:US11256510B2
公开(公告)日:2022-02-22
申请号:US17065761
申请日:2020-10-08
Applicant: Apple Inc.
Inventor: Andrew M. Havlir , Jeffrey T. Brady
Abstract: Techniques are disclosed relating to fetching items from a compute command stream that includes compute kernels. In some embodiments, stream fetch circuitry sequentially pre-fetches items from the stream and stores them in a buffer. In some embodiments, fetch parse circuitry iterate through items in the buffer using a fetch parse pointer to detect indirect-data-access items and/or redirect items in the buffer. The fetch parse circuitry may send detected indirect data accesses to indirect-fetch circuitry, which may buffer requests. In some embodiments, execute parse circuitry iterates through items in the buffer using an execute parse pointer (e.g., which may trail the fetch parse pointer) and outputs both item data from the buffer and indirect-fetch results from indirect-fetch circuitry for execution. In various embodiments, the disclosed techniques may reduce fetch latency for compute kernels.
-
公开(公告)号:US11023162B2
公开(公告)日:2021-06-01
申请号:US16548784
申请日:2019-08-22
Applicant: Apple Inc.
Inventor: Jeffrey T. Brady , Sindhuja Sethuraman , Frank W. Liljeros , Adil M. Sadik
IPC: G06F12/0871 , G06F3/06
Abstract: Techniques are disclosed relating to caches that support transient storage fields for cache entries. In some embodiments, cache circuitry includes a set of multiple cache entries that each include a tag field and a data field. In some embodiments, transient storage circuitry includes a transient storage field for each of the multiple cache entries. In some embodiments, cache control circuitry stores received first data in the data field of a cache entry and stores received transient data in a corresponding transient storage field. In response to an eviction determination for the cache entry, however, the cache control circuitry may write the first data but not the transient data to a backing memory for the cache circuitry. In various embodiments, disclosed techniques may allow caching additional data that is transient without increasing bandwidth to the backing memory.
-
公开(公告)号:US20210055883A1
公开(公告)日:2021-02-25
申请号:US16548784
申请日:2019-08-22
Applicant: Apple Inc.
Inventor: Jeffrey T. Brady , Sindhuja Sethuraman , Frank W. Liljeros , Adil M. Sadik
IPC: G06F3/06 , G06F12/0871
Abstract: Techniques are disclosed relating to caches that support transient storage fields for cache entries. In some embodiments, cache circuitry includes a set of multiple cache entries that each include a tag field and a data field. In some embodiments, transient storage circuitry includes a transient storage field for each of the multiple cache entries. In some embodiments, cache control circuitry stores received first data in the data field of a cache entry and stores received transient data in a corresponding transient storage field. In response to an eviction determination for the cache entry, however, the cache control circuitry may write the first data but not the transient data to a backing memory for the cache circuitry. In various embodiments, disclosed techniques may allow caching additional data that is transient without increasing bandwidth to the backing memory.
-
公开(公告)号:US10901777B1
公开(公告)日:2021-01-26
申请号:US16143432
申请日:2018-09-26
Applicant: Apple Inc.
Inventor: Andrew M. Havlir , Jeffrey T. Brady
Abstract: Techniques are disclosed relating to context switching using distributed compute workload parsers. In some embodiments, an apparatus includes a plurality of shader units configured to perform operations for compute workgroups included in compute kernels, a plurality of distributed workload parser circuits each configured to dispatch workgroups to a respective set of the shader units, a communications fabric, and a master workload parser circuit configured to communicate with the distributed workload parser circuits via the communications fabric. In some embodiments, the master workload parser circuit maintains a first set of master state information that does not change for a compute kernel based on operations by the shader units and a second set of master state information that may be changed by operations specified by the kernel. In some embodiments, the master workload parser circuit performs a multi-phase state storage process in communications with the distributed workload parser circuits.
-
公开(公告)号:US12182926B1
公开(公告)日:2024-12-31
申请号:US18055111
申请日:2022-11-14
Applicant: Apple Inc.
Inventor: Jeffrey T. Brady , Jason D. Carroll , Michael A. Mang , Ralph C. Taylor
Abstract: Techniques are disclosed relating to using an initial version of an object shader to determine a child count and distribute geometry work based on the child count. In some embodiments, graphics shader circuitry is configured to execute shader programs including object shaders and mesh shaders. Vertex control circuitry is configured to, for a given object shader: launch an initial version of the given object shader to determine a number of meshlets to be generated by the given object shader (e.g., where the initial version of the given object shader does not commit side effects to architectural state of the apparatus) and select shader circuitry to execute a complete version of the given object shader based on the determined number of meshlets.
-
-
-
-
-
-
-
-
-