-
公开(公告)号:US09459869B2
公开(公告)日:2016-10-04
申请号:US13971800
申请日:2013-08-20
Applicant: Apple Inc.
Inventor: Timothy A. Olson , Terence M. Potter , James S. Blomgren , Andrew M. Havlir
CPC classification number: G06F9/30043 , G06F9/38 , G06F12/0862 , G06F12/0875 , G06F2212/452 , G06T1/20 , Y02D10/13
Abstract: Instructions may require one or more operands to be executed, which may be provided from a register file. In the context of a GPU, however, a register file may be a relatively large structure, and reading from the register file may be energy and/or time intensive An operand cache may store a subset of operands, and may use less power and have quicker access times than the register file. In some embodiments, intelligent operand prefetching may speed execution by reducing memory bank conflicts (e.g., conflicts within a register file containing multiple memory banks). An unused operand slot for another instruction (e.g., an instruction that does not require a maximum number of source operands allowed by an instruction set architecture) may be used to prefetch an operand for another instruction in one embodiment. Prefetched operands may be stored in an operand cache, and prefetching may occur based on software-provided information.
Abstract translation: 指令可能需要执行一个或多个操作数,这可以从寄存器文件提供。 然而,在GPU的上下文中,寄存器文件可以是相对较大的结构,并且从寄存器文件的读取可能是能量和/或时间密集的。操作数高速缓存可以存储操作数的子集,并且可以使用更少的功率并具有 比寄存器文件更快的访问时间。 在一些实施例中,智能操作数预取可以通过减少存储体冲突(例如,包含多个存储体的寄存器文件内的冲突)来加速执行。 在一个实施例中,用于另一指令的未用操作数时隙(例如,不需要由指令集体系结构允许的最大数量的源操作数的指令)可用于预取另一指令的操作数。 预取操作数可以存储在操作数缓存中,并且可以基于软件提供的信息进行预取。
-
公开(公告)号:US20150058572A1
公开(公告)日:2015-02-26
申请号:US13971800
申请日:2013-08-20
Applicant: Apple Inc.
Inventor: Timothy A. Olson , Terence M. Potter , James S. Blomgren , Andrew M. Havlir
CPC classification number: G06F9/30043 , G06F9/38 , G06F12/0862 , G06F12/0875 , G06F2212/452 , G06T1/20 , Y02D10/13
Abstract: Instructions may require one or more operands to be executed, which may be provided from a register file. In the context of a GPU, however, a register file may be a relatively large structure, and reading from the register file may be energy and/or time intensive An operand cache may store a subset of operands, and may use less power and have quicker access times than the register file. In some embodiments, intelligent operand prefetching may speed execution by reducing memory bank conflicts (e.g., conflicts within a register file containing multiple memory banks). An unused operand slot for another instruction (e.g., an instruction that does not require a maximum number of source operands allowed by an instruction set architecture) may be used to prefetch an operand for another instruction in one embodiment. Prefetched operands may be stored in an operand cache, and prefetching may occur based on software-provided information.
Abstract translation: 指令可能需要执行一个或多个操作数,这可以从寄存器文件提供。 然而,在GPU的上下文中,寄存器文件可以是相对较大的结构,并且从寄存器文件的读取可能是能量和/或时间密集的。操作数高速缓存可以存储操作数的子集,并且可以使用更少的功率并具有 比寄存器文件更快的访问时间。 在一些实施例中,智能操作数预取可以通过减少存储体冲突(例如,包含多个存储体的寄存器文件内的冲突)来加速执行。 在一个实施例中,用于另一指令的未用操作数时隙(例如,不需要由指令集体系结构允许的最大数量的源操作数的指令)可用于预取另一指令的操作数。 预取操作数可以存储在操作数缓存中,并且可以基于软件提供的信息进行预取。
-
公开(公告)号:US09378146B2
公开(公告)日:2016-06-28
申请号:US13971811
申请日:2013-08-20
Applicant: Apple Inc.
Inventor: James S. Blomgren , Terence M. Potter , Timothy A. Olson , Andrew M. Havlir
CPC classification number: G06F12/0875 , G06F9/30043 , G06F9/30138 , G06F9/30145 , G06F9/30185
Abstract: Instructions may require one or more operands to be executed, which may be provided from a register file. In the context of a GPU, however, a register file may be a relatively large structure, and reading from a register file may be energy and/or time intensive An operand cache may be used to store a subset of operands, and may use less power and have quicker access times than the register file. Selectors (e.g., multiplexers) may be used to read operands from the operand cache. Power savings may be achieved in some embodiments by activating only a subset of the selectors, which may be done by activators (e.g. flip-flops). Operands may also be concurrently provided to two or more locations via forwarding, which may be accomplished via a source selection unit in some embodiments. Operand forwarding may also reduce power and/or speed execution in one or more embodiments.
Abstract translation: 指令可能需要执行一个或多个操作数,这可以从寄存器文件提供。 然而,在GPU的上下文中,寄存器文件可以是相对较大的结构,并且从寄存器文件的读取可能是能量和/或时间密集的。操作数高速缓存可以用于存储操作数的子集,并且可以使用较少的 并且具有比寄存器文件更快的访问时间。 选择器(例如,多路复用器)可用于从操作数高速缓存读取操作数。 在一些实施例中可以通过激活选择器的子集来实现功率节省,这可以由激活器(例如,触发器)完成。 操作数还可以经由转发同时提供给两个或更多个位置,这在一些实施例中可以经由源选择单元来实现。 操作数转发还可以在一个或多个实施例中降低功率和/或速度执行。
-
公开(公告)号:US20150058571A1
公开(公告)日:2015-02-26
申请号:US13971782
申请日:2013-08-20
Applicant: Apple Inc.
Inventor: Terence M. Potter , Timothy A. Olson , James S. Blomgren , Andrew M. Havlir , Michael Geary
CPC classification number: G06F9/30043 , G06F9/38 , G06F12/0862 , G06F12/0875 , G06F2212/452 , G06T1/60 , Y02D10/13
Abstract: Instructions may require one or more operands to be executed, which may be provided from a register file. In the context of a GPU, however, a register file may be a relatively large structure, and reading from the register file may be energy and/or time intensive An operand cache may be used to store a subset of operands, and may use less power and have quicker access times than the register file. Hint values may be used in some embodiments to suggest that a particular operand should be stored in the operand cache (so that is available for current or future use). In one embodiment, a hint value indicates that an operand should be cached whenever possible. Hint values may be determined by software, such as a compiler, in some embodiments. One or more criteria may be used to determine hint values, such as how soon in the future or how frequently an operand will be used again.
Abstract translation: 指令可能需要执行一个或多个操作数,这可以从寄存器文件提供。 然而,在GPU的上下文中,寄存器文件可以是相对较大的结构,并且从寄存器文件的读取可能是能量和/或时间密集的。操作数高速缓存可以用于存储操作数的子集,并且可以使用较少的 并且具有比寄存器文件更快的访问时间。 在一些实施例中可以使用提示值来建议特定的操作数应存储在操作数高速缓存中(以便可用于当前或未来的使用)。 在一个实施例中,提示值指示操作数应尽可能缓存。 在一些实施例中,提示值可以由诸如编译器的软件来确定。 可以使用一个或多个标准来确定提示值,例如将来的时间以及操作数将再次被使用的频率。
-
公开(公告)号:US09652233B2
公开(公告)日:2017-05-16
申请号:US13971782
申请日:2013-08-20
Applicant: Apple Inc.
Inventor: Terence M. Potter , Timothy A. Olson , James S. Blomgren , Andrew M. Havlir , Michael Geary
IPC: G06F12/00 , G06F13/00 , G06F13/28 , G06F9/30 , G06F9/38 , G06T1/60 , G06F12/0875 , G06F12/0862
CPC classification number: G06F9/30043 , G06F9/38 , G06F12/0862 , G06F12/0875 , G06F2212/452 , G06T1/60 , Y02D10/13
Abstract: Instructions may require one or more operands to be executed, which may be provided from a register file. In the context of a GPU, however, a register file may be a relatively large structure, and reading from the register file may be energy and/or time intensive An operand cache may be used to store a subset of operands, and may use less power and have quicker access times than the register file. Hint values may be used in some embodiments to suggest that a particular operand should be stored in the operand cache (so that is available for current or future use). In one embodiment, a hint value indicates that an operand should be cached whenever possible. Hint values may be determined by software, such as a compiler, in some embodiments. One or more criteria may be used to determine hint values, such as how soon in the future or how frequently an operand will be used again.
-
公开(公告)号:US20150058573A1
公开(公告)日:2015-02-26
申请号:US13971811
申请日:2013-08-20
Applicant: Apple Inc.
Inventor: James S. Blomgren , Terence M. Potter , Timothy A. Olson , Andrew M. Havlir
CPC classification number: G06F12/0875 , G06F9/30043 , G06F9/30138 , G06F9/30145 , G06F9/30185
Abstract: Instructions may require one or more operands to be executed, which may be provided from a register file. In the context of a GPU, however, a register file may be a relatively large structure, and reading from the register file may be energy and/or time intensive An operand cache may be used to store a subset of operands, and may use less power and have quicker access times than the register file. Selectors (e.g., multiplexers) may be used to read operands from the operand cache. Power savings may be achieved in some embodiments by activating only a subset of the selectors, which may be done by activators (e.g. flip-flops). Operands may also be concurrently provided to two or more locations via forwarding, which may be accomplished via a source selection unit in some embodiments. Operand forwarding may also reduce power and/or speed execution in one or more embodiments.
Abstract translation: 指令可能需要执行一个或多个操作数,可以从寄存器文件提供。 然而,在GPU的上下文中,寄存器文件可以是相对较大的结构,并且从寄存器文件的读取可能是能量和/或时间密集的。操作数高速缓存可以用于存储操作数的子集,并且可以使用较少的 并且具有比寄存器文件更快的访问时间。 选择器(例如,多路复用器)可用于从操作数高速缓存读取操作数。 在一些实施例中可以通过激活选择器的子集来实现功率节省,这可以由激活器(例如,触发器)完成。 操作数还可以经由转发同时提供给两个或更多个位置,这在一些实施例中可以经由源选择单元来实现。 操作数转发还可以在一个或多个实施例中降低功率和/或速度执行。
-
-
-
-
-