-
公开(公告)号:US11256518B2
公开(公告)日:2022-02-22
申请号:US16597625
申请日:2019-10-09
Applicant: Apple Inc.
Inventor: Liang-Kai Wang , Robert D. Kenney , Terence M. Potter , Vinod Reddy Nalamalapu , Sivayya V. Ayinala
Abstract: Techniques are disclosed relating to sharing operands among SIMD threads for a larger arithmetic operation. In some embodiments, a set of multiple hardware pipelines is configured to execute single-instruction multiple-data (SIMD) instructions for multiple threads in parallel, where ones of the hardware pipelines include execution circuitry configured to perform floating-point operations using one or more pipeline stages of the pipeline and first routing circuitry configured to select, from among thread-specific operands stored for the hardware pipeline and from one or more other pipelines in the set, a first input operand for an operation by the execution circuitry. In some embodiments, a device is configured to perform a mathematical operation on source input data structures stored across thread-specific storage for the set of hardware pipelines, by executing multiple SIMD floating-point operations using the execution circuitry and the first routing circuitry. This may improve performance and reduce power consumption for matrix multiply and reduction operations, for example.
-
公开(公告)号:US20210109761A1
公开(公告)日:2021-04-15
申请号:US16597625
申请日:2019-10-09
Applicant: Apple Inc.
Inventor: Liang-Kai Wang , Robert D. Kenney , Terence M. Potter , Vinod Reddy Nalamalapu , Sivayya V. Ayinala
Abstract: Techniques are disclosed relating to sharing operands among SIMD threads for a larger arithmetic operation. In some embodiments, a set of multiple hardware pipelines is configured to execute single-instruction multiple-data (SIMD) instructions for multiple threads in parallel, where ones of the hardware pipelines include execution circuitry configured to perform floating-point operations using one or more pipeline stages of the pipeline and first routing circuitry configured to select, from among thread-specific operands stored for the hardware pipeline and from one or more other pipelines in the set, a first input operand for an operation by the execution circuitry. In some embodiments, a device is configured to perform a mathematical operation on source input data structures stored across thread-specific storage for the set of hardware pipelines, by executing multiple SIMD floating-point operations using the execution circuitry and the first routing circuitry. This may improve performance and reduce power consumption for matrix multiply and reduction operations, for example.
-
公开(公告)号:US10678548B2
公开(公告)日:2020-06-09
申请号:US16112614
申请日:2018-08-24
Applicant: Apple Inc.
Inventor: Robert D. Kenney , Terence M. Potter , Andrew M. Havlir , Sivayya V. Ayinala
Abstract: Techniques are disclosed relating to controlling an operand cache in a pipelined fashion. An operand cache may cache operands fetched from the register file or generated by previous instructions to improve performance and/or reduce power consumption. In some embodiments, instructions are pipelined and separate tag information is maintained to indicate allocation of an operand cache entry and ownership of the operand cache entry. In some embodiments, this may allow an operand to remain in the operand cache (and potentially be retrieved or modified) during an interval between allocation of the entry for another operand and ownership of the entry by the other operand. This may improve operand cache efficiency by allowing the entry to be used while to retrieving the other operand from the register file, for example.
-
公开(公告)号:US20200065104A1
公开(公告)日:2020-02-27
申请号:US16112614
申请日:2018-08-24
Applicant: Apple Inc.
Inventor: Robert D. Kenney , Terence M. Potter , Andrew M. Havlir , Sivayya V. Ayinala
Abstract: Techniques are disclosed relating to controlling an operand cache in a pipelined fashion. An operand cache may cache operands fetched from the register file or generated by previous instructions to improve performance and/or reduce power consumption. In some embodiments, instructions are pipelined and separate tag information is maintained to indicate allocation of an operand cache entry and ownership of the operand cache entry. In some embodiments, this may allow an operand to remain in the operand cache (and potentially be retrieved or modified) during an interval between allocation of the entry for another operand and ownership of the entry by the other operand. This may improve operand cache efficiency by allowing the entry to be used while to retrieving the other operand from the register file, for example.
-
-
-