-
公开(公告)号:US20200065104A1
公开(公告)日:2020-02-27
申请号:US16112614
申请日:2018-08-24
Applicant: Apple Inc.
Inventor: Robert D. Kenney , Terence M. Potter , Andrew M. Havlir , Sivayya V. Ayinala
Abstract: Techniques are disclosed relating to controlling an operand cache in a pipelined fashion. An operand cache may cache operands fetched from the register file or generated by previous instructions to improve performance and/or reduce power consumption. In some embodiments, instructions are pipelined and separate tag information is maintained to indicate allocation of an operand cache entry and ownership of the operand cache entry. In some embodiments, this may allow an operand to remain in the operand cache (and potentially be retrieved or modified) during an interval between allocation of the entry for another operand and ownership of the entry by the other operand. This may improve operand cache efficiency by allowing the entry to be used while to retrieving the other operand from the register file, for example.
-
公开(公告)号:US10445852B2
公开(公告)日:2019-10-15
申请号:US15388804
申请日:2016-12-22
Applicant: Apple Inc.
Inventor: Terence M. Potter , Robert Kenney , Aaftab A. Munshi , Justin A. Hensley , Richard W. Schreyer
Abstract: Techniques are disclosed relating to a hardware-supported flexible data structure for graphics processing. In some embodiments, dimensions of the data structure are configurable in an X direction, a Y direction, a number of samples per pixel, and an amount of data per sample. In some embodiments, these attributes are configurable using hardware registers. In some embodiments, the data structure is persistent across a tile being processed such that local memory context is accessible to both rendering threads of a render pass and mid-render compute threads.
-
公开(公告)号:US10353711B2
公开(公告)日:2019-07-16
申请号:US15257386
申请日:2016-09-06
Applicant: Apple Inc.
Inventor: Andrew M. Havlir , Brian K. Reynolds , Liang Xia , Terence M. Potter
Abstract: Techniques are disclosed relating to clause-based execution of program instructions, which may be single-instruction multiple data (SIMD) computer instructions. In some embodiments, an apparatus includes execution circuitry configured to receive clauses of instructions and SIMD groups of input data to be operated on by the clauses. In some embodiments, the apparatus further includes one or more storage elements configured to store state information for clauses processed by the execution circuitry. In some embodiments, the apparatus further includes scheduling circuitry configured to send instructions of a first clause and corresponding input data for execution by the execution circuitry and indicate, prior to sending instruction and input data of a second clause to the execution circuitry for execution, whether the second clause and a first clause are assigned to operate on groups of input data corresponding to the same instruction stream. In some embodiments, the apparatus is configured to determine, based on the indication, whether to maintain as valid, for use by the second clause, stored state information for the first clause.
-
84.
公开(公告)号:US10324726B1
公开(公告)日:2019-06-18
申请号:US15429982
申请日:2017-02-10
Applicant: Apple Inc.
Inventor: Michael A. Geary , Brian K. Reynolds , Terence M. Potter
IPC: G06F9/30 , G06F9/38 , G06F12/0897 , G06F12/0875
Abstract: Techniques are disclosed relating to scheduling graphics instructions for execution on different types of execution units based on characteristics of decoded and cached graphics instruction. In some embodiments, a graphics unit includes multiple different types of execution units that are configured to execute different types of instructions (e.g., different units for datapath, sample, load/store, etc.). In some embodiments, the graphics unit stores decoded instructions in an instruction cache in at least one cache level, along with information specifying characteristics of the instructions. The characteristics may be stored at clause granularity and may indicate the type of instructions in each clause (e.g., corresponding to which type of execution unit is configured to execute the instructions). In some embodiments, scheduling circuitry is configured to access the information and select instructions from the instruction cache to send to ones of the plurality of execution units based on the stored information.
-
公开(公告)号:US20180349146A1
公开(公告)日:2018-12-06
申请号:US15615412
申请日:2017-06-06
Applicant: Apple Inc.
Inventor: Tatsuya Iwamoto , Kutty Banerjee , Benjiman L. Goodman , Terence M. Potter
Abstract: In general, techniques are disclosed for tracking and allocating graphics processor hardware over specified periods of time. More particularly, hardware sensors may be used to determine the utilization of graphics processor hardware after each of a number of specified intervals (referred to as “sample intervals”). The utilization values so captured may be combined after a first number of sample intervals (the combined interval referred to as an “epoch interval”) and used to determine a normalized utilization of the graphic processor's hardware resources. Normalized epoch utilization values have been adjusted to account for resources used by concurrently executing processes. In some embodiments, a lower priority process that obtains and fails to release resources that should be allocated to one or more higher priority processes may be detected, paused, and its hardware resources given to the higher priority processes.
-
公开(公告)号:US20180182058A1
公开(公告)日:2018-06-28
申请号:US15388804
申请日:2016-12-22
Applicant: Apple Inc.
Inventor: Terence M. Potter , Robert Kenney , Aaftab A. Munshi , Justin A. Hensley , Richard W. Schreyer
Abstract: Techniques are disclosed relating to a hardware-supported flexible data structure for graphics processing. In some embodiments, dimensions of the data structure are configurable in an X direction, a Y direction, a number of samples per pixel, and an amount of data per sample. In some embodiments, these attributes are configurable using hardware registers. In some embodiments, the data structure is persistent across a tile being processed such that local memory context is accessible to both rendering threads of a render pass and mid-render compute threads.
-
公开(公告)号:US20180089090A1
公开(公告)日:2018-03-29
申请号:US15274098
申请日:2016-09-23
Applicant: Apple Inc.
Inventor: Andrew M. Havlir , Terence M. Potter
IPC: G06F12/0875 , G06F9/30 , G06F9/38
CPC classification number: G06F12/0875 , G06F9/30072 , G06F9/383 , G06F12/0888 , G06F2212/452
Abstract: In some embodiments, a system includes an execution unit, a register file, an operand cache, and a predication control circuit. Operands identified by an instruction may be stored in the operand cache. One or more entries of the operand cache that store the operands may be marked as dirty. The predication control circuit may identify an instruction as having an unresolved predication state. Subsequent to initiating execution of the instruction, the predication control circuit may receive results of the at least one unresolved conditional instruction. In response to the results indicating the instruction has a known-to-execute predication state, the predication control circuit may initiate writing, in the operand cache, results of executing the instruction. In response to the results indicating the instruction has a known-not-to-execute predication state, the predication control circuit may prevent the results from executing the instruction from being written in the operand cache.
-
公开(公告)号:US09652233B2
公开(公告)日:2017-05-16
申请号:US13971782
申请日:2013-08-20
Applicant: Apple Inc.
Inventor: Terence M. Potter , Timothy A. Olson , James S. Blomgren , Andrew M. Havlir , Michael Geary
IPC: G06F12/00 , G06F13/00 , G06F13/28 , G06F9/30 , G06F9/38 , G06T1/60 , G06F12/0875 , G06F12/0862
CPC classification number: G06F9/30043 , G06F9/38 , G06F12/0862 , G06F12/0875 , G06F2212/452 , G06T1/60 , Y02D10/13
Abstract: Instructions may require one or more operands to be executed, which may be provided from a register file. In the context of a GPU, however, a register file may be a relatively large structure, and reading from the register file may be energy and/or time intensive An operand cache may be used to store a subset of operands, and may use less power and have quicker access times than the register file. Hint values may be used in some embodiments to suggest that a particular operand should be stored in the operand cache (so that is available for current or future use). In one embodiment, a hint value indicates that an operand should be cached whenever possible. Hint values may be determined by software, such as a compiler, in some embodiments. One or more criteria may be used to determine hint values, such as how soon in the future or how frequently an operand will be used again.
-
公开(公告)号:US09508112B2
公开(公告)日:2016-11-29
申请号:US13956299
申请日:2013-07-31
Applicant: Apple Inc.
Inventor: Andrew M. Havlir , James S. Blomgren , Terence M. Potter
CPC classification number: G06T1/20 , G06F9/3012 , G06F9/30138 , G06F9/3826 , G06F9/3851 , G06F9/3867 , G06F9/3873 , G06T1/60
Abstract: Techniques are disclosed relating to a multithreaded execution pipeline. In some embodiments, an apparatus is configured to assign a number of threads to an execution pipeline that is an integer multiple of a minimum number of cycles that an execution unit is configured to use to generate an execution result from a given set of input operands. In one embodiment, the apparatus is configured to require strict ordering of the threads. In one embodiment, the apparatus is configured so that the same thread access (e.g., reads and writes) a register file in a given cycle. In one embodiment, the apparatus is configured so that the same thread does not write back an operand and a result to an operand cache in a given cycle.
Abstract translation: 公开了涉及多线程执行流水线的技术。 在一些实施例中,设备被配置为向执行流水线分配多个线程,该执行流水线是执行单元被配置为用于从给定的一组输入操作数生成执行结果的最小循环数的整数倍。 在一个实施例中,该装置被配置为要求严格排列螺纹。 在一个实施例中,设备被配置为使得在给定周期中相同的线程访问(例如,读取和写入)寄存器文件。 在一个实施例中,该设备被配置为使得相同的线程不在给定周期中将操作数和结果写回操作数高速缓存。
-
公开(公告)号:US09264066B2
公开(公告)日:2016-02-16
申请号:US13954936
申请日:2013-07-30
Applicant: Apple Inc.
Inventor: James S. Blomgren , Terence M. Potter
CPC classification number: H03M7/24 , G06F9/30025
Abstract: Techniques are disclosed relating to type conversion using a floating-point unit. In one embodiment, to convert a floating-point value to a normalized integer format, a floating-point unit is configured to perform an operation to generate a result having a significant portion and an exponent portion, where the operation includes multiplying the floating-point value by a constant. In one embodiment, the apparatus is further configured to add a value to the exponent portion of the result, and set a rounding mode to round to nearest. The constant may be a greatest value less than one that can be represented using the particular number of unsigned bits. The value added to the initial exponent may be equal to the number of unsigned bits of the normalized integer format. The apparatus may perform this conversion in response to a pack instruction.
Abstract translation: 公开了关于使用浮点单元的类型转换的技术。 在一个实施例中,为了将浮点值转换为归一化的整数格式,浮点单元被配置为执行用于产生具有有效部分和指数部分的结果的操作,其中操作包括将浮点 值由常数。 在一个实施例中,该装置还被配置为向结果的指数部分添加值,并将舍入模式设置为舍入至最接近。 常数可以是小于可以使用特定数目的无符号位来表示的最大值。 添加到初始指数的值可以等于归一化整数格式的无符号位数。 该装置可以响应于包指令执行该转换。
-
-
-
-
-
-
-
-
-