-
公开(公告)号:US20140195737A1
公开(公告)日:2014-07-10
申请号:US13734444
申请日:2013-01-04
Applicant: APPLE INC.
Inventor: Brian P. Lilly , Gerard R. Williams, III
IPC: G06F12/08
CPC classification number: G06F12/0891 , G06F12/0802 , G06F12/0811 , G06F12/0864 , G06F2212/1028 , Y02D10/13
Abstract: Techniques are disclosed related to flushing one or more data caches. In one embodiment an apparatus includes a processing element, a first cache associated with the processing element, and a circuit configured to copy modified data from the first cache to a second cache in response to determining an activity level of the processing element. In this embodiment, the apparatus is configured to alter a power state of the first cache after the circuit copies the modified data. The first cache may be at a lower level in a memory hierarchy relative to the second cache. In one embodiment, the circuit is also configured to copy data from the second cache to a third cache or a memory after a particular time interval. In some embodiments, the circuit is configured to copy data while one or more pipeline elements of the apparatus are in a low-power state.
Abstract translation: 公开了涉及冲洗一个或多个数据高速缓存的技术。 在一个实施例中,设备包括处理元件,与处理元件相关联的第一高速缓存器,以及被配置为响应于确定处理元件的活动级别将修改的数据从第一高速缓存复制到第二高速缓存的电路。 在该实施例中,该装置被配置为在电路复制修改的数据之后改变第一高速缓存的功率状态。 第一缓存可以在相对于第二高速缓存的存储器层级中处于较低级。 在一个实施例中,电路还被配置为在特定时间间隔之后将数据从第二高速缓存复制到第三高速缓存或存储器。 在一些实施例中,电路被配置为在设备的一个或多个流水线元件处于低功率状态时复制数据。
-
公开(公告)号:US20240160268A1
公开(公告)日:2024-05-16
申请号:US18523819
申请日:2023-11-29
Applicant: Apple Inc.
Inventor: Joseph T. DiBene, II , Inder M. Sodhi , Keith Cox , Gerard R. Williams, III
IPC: G06F1/3206 , G06F1/3203 , G06F1/3296
CPC classification number: G06F1/3206 , G06F1/3203 , G06F1/3296
Abstract: In an embodiment, a system includes multiple power management mechanism operating in different time domains (e.g., with different bandwidths) and control circuitry that is configured to coordinate operation of the mechanisms. If one mechanism is adding energy to the system, for example, the control circuitry may inform another mechanism that the energy is coming so that the other mechanism may not take as drastic an action as it would if no energy were coming. If a light workload is detected by circuitry near the load, and there is plenty of energy in the system, the control circuitry may cause the power management unit (PMU) to generate less energy or even temporarily turn off. A variety of mechanisms for the coordinated, coherent use of power are described.
-
公开(公告)号:US11868192B2
公开(公告)日:2024-01-09
申请号:US17528380
申请日:2021-11-17
Applicant: Apple Inc.
Inventor: Joseph T. DiBene, II , Inder M. Sodhi , Keith Cox , Gerard R. Williams, III
IPC: G06F1/3206 , G06F1/3203 , G06F1/3296
CPC classification number: G06F1/3206 , G06F1/3203 , G06F1/3296
Abstract: In an embodiment, a system includes multiple power management mechanism operating in different time domains (e.g., with different bandwidths) and control circuitry that is configured to coordinate operation of the mechanisms. If one mechanism is adding energy to the system, for example, the control circuitry may inform another mechanism that the energy is coming so that the other mechanism may not take as drastic an action as it would if no energy were coming. If a light workload is detected by circuitry near the load, and there is plenty of energy in the system, the control circuitry may cause the power management unit (PMU) to generate less energy or even temporarily turn off. A variety of mechanisms for the coordinated, coherent use of power are described.
-
公开(公告)号:US10831488B1
公开(公告)日:2020-11-10
申请号:US16105783
申请日:2018-08-20
Applicant: Apple Inc.
Inventor: Eric Bainville , Jeffry E. Gonion , Ali Sazegari , Gerard R. Williams, III , Andrew J. Beaumont-Smith
Abstract: In an embodiment, a computation engine may offload work from a processor (e.g. a CPU) and efficiently perform computations such as those used in LSTM and other workloads at high performance. In an embodiment, the computation engine may perform computations on input vectors from input memories in the computation engine, and may accumulate results in an output memory within the computation engine. The input memories may be loaded with initial vector data from memory, incurring the memory latency that may be associated with reading the operands. Compute instructions may be performed on the operands, generating results in an output memory. One or more extract instructions may be supported to move data from the output memory to the input memory, permitting additional computation on the data in the output memory without moving the results to main memory.
-
公开(公告)号:US20200348934A1
公开(公告)日:2020-11-05
申请号:US16928752
申请日:2020-07-14
Applicant: Apple Inc.
Inventor: Eric Bainville , Jeffry E. Gonion , Ali Sazegari, PhD , Gerard R. Williams, III
Abstract: In an embodiment, a computation engine is configured to perform vector multiplications, producing either vector results or outer product (matrix) results. The instructions provided to the computation engine specify a matrix mode or a vector mode for the instructions. The computation engine performs the specified operation. The computation engine may perform numerous computations in parallel, in an embodiment. In an embodiment, the instructions may also specify an offset with the input memories, providing additional flexibility in the location of operands. More particularly, the computation engine may be configured to perform numerous multiplication operations in parallel and to accumulate results in a result memory, performing multiply-accumulate operations for each matrix/vector element in the targeted locations of the output memory.
-
公开(公告)号:US20190212796A1
公开(公告)日:2019-07-11
申请号:US16360194
申请日:2019-03-21
Applicant: Apple Inc.
Inventor: Joseph T. DiBene, II , Inder M. Sodhi , Gerard R. Williams, III
IPC: G06F1/26 , G06F1/324 , G06F1/3287 , G06F1/3296
CPC classification number: G06F1/263 , G06F1/26 , G06F1/324 , G06F1/3287 , G06F1/3296
Abstract: In an embodiment, a system may support a “coast mode” in which the power management unit (PMU) that supplies the supply voltage to an integrated circuit is disabled temporarily for certain modes of the integrated circuit. The integrated circuit may continue to operate, consuming the energy stored in capacitance in and/or around the integrated circuit. When coast mode is initiated, a time interval for coasting may be determined. When the time interval expires, the PMU may re-enable the power supply voltage.
-
公开(公告)号:US10331558B2
公开(公告)日:2019-06-25
申请号:US15663115
申请日:2017-07-28
Applicant: Apple Inc.
Inventor: Ali Sazegari , Charles E. Tucker , Jeffry E. Gonion , Gerard R. Williams, III , Chris Cheng-Chieh Lee
Abstract: Systems, apparatuses, and methods for efficiently moving data for storage and processing. A compression unit within a processor includes multiple hardware lanes, selects two or more input words to compress, and for assigns them to two or more of the multiple hardware lanes. As each assigned input word is processed, each word is compared to an entry of a plurality of entries of a table. If it is determined that each of the assigned input words indexes the same entry of the table, the hardware lane with the oldest input word generates a single read request for the table entry and the hardware lane with the youngest input word generates a single write request for updating the table entry upon completing compression. Each hardware lane generates a compressed packet based on its assigned input word.
-
18.
公开(公告)号:US20180217659A1
公开(公告)日:2018-08-02
申请号:US15935274
申请日:2018-03-26
Applicant: Apple Inc.
CPC classification number: G06F1/3293 , G06F1/3275 , G06F1/3287 , G06F9/5044 , G06F9/5094 , G06F2009/45579 , Y02D10/122 , Y02D10/171
Abstract: In an embodiment, an integrated circuit may include one or more processors. Each processor may include multiple processor cores, and each core has a different design/implementation and performance level. For example, a core may be implemented for high performance, and another core may be implemented at a lower maximum performance, but may be optimized for efficiency. Additionally, in some embodiments, some features of the instruction set architecture implemented by the processor may be implemented in only one of the cores that make up the processor. If such a feature is invoked by a code sequence while a different core is active, the processor may swap cores to the core the implements the feature. Alternatively, an exception may be taken and an exception handler may be executed to identify the feature and activate the corresponding core.
-
公开(公告)号:US09898071B2
公开(公告)日:2018-02-20
申请号:US14548872
申请日:2014-11-20
Applicant: Apple Inc.
Inventor: David J. Williamson , Gerard R. Williams, III
CPC classification number: G06F1/3293 , G06F1/3206 , G06F1/3234 , G06F1/3287 , G06F1/3296 , G06F9/461 , Y02B70/123 , Y02D10/172 , Y02D10/30
Abstract: In an embodiment, an integrated circuit may include one or more processors. Each processor may include multiple processor cores, and each core has a different design/implementation and performance level. For example, a core may be implemented for high performance, but may have higher minimum voltage at which it operates correctly. Another core may be implemented at a lower maximum performance, but may be optimized for efficiency and may operate correctly at a lower minimum voltage. The processor may support multiple processor states (PStates). Each PState may specify an operating point and may be mapped to one of the processor cores. During operation, one of the cores is active: the core to which the current PState is mapped. If a new PState is selected and is mapped to a different core, the processor may automatically context switch the processor state to the newly-selected core and may begin execution on that core.
-
公开(公告)号:US09626185B2
公开(公告)日:2017-04-18
申请号:US13774093
申请日:2013-02-22
Applicant: Apple Inc.
Inventor: Shyam Sundar , Ian D. Kountanis , Conrado Blasco-Allue , Gerard R. Williams, III , Wei-Han Lien , Ramesh B. Gunna
CPC classification number: G06F9/30054 , G06F9/30181 , G06F9/382 , G06F9/3842 , G06F9/3844
Abstract: Various techniques for processing and pre-decoding branches within an IT instruction block. Instructions are fetched and cached in an instruction cache, and pre-decode bits are generated to indicate the presence of an IT instruction and the likely boundaries of the IT instruction block. If an unconditional branch is detected within the likely boundaries of an IT instruction block, the unconditional branch is treated as if it were a conditional branch. The unconditional branch is sent to the branch direction predictor and the predictor generates a branch direction prediction for the unconditional branch.
-
-
-
-
-
-
-
-
-