Patent search ap:("Apple Inc.") AND inv:"Mridul Agarwal" Page 1

1.

发明授权
Shared learning table for load value prediction and load address prediction 有权

公开(公告)号：US12067398B1

公开(公告)日：2024-08-20

申请号：US17661491

申请日：2022-04-29

Applicant: Apple Inc.

Inventor： Yuan C. Chou , Debasish Chandra , Mridul Agarwal , Haoyan Jia

IPC: G06F9/38

CPC classification number: G06F9/3842 , G06F9/383 , G06F9/3832

Abstract: Techniques are disclosed relating to load value prediction. In some embodiments, a processor includes learning table circuitry that is shared for both address and value prediction. Loads may be trained for value prediction when they are eligible for both value and address prediction. Entries in the learning table may be promoted to an address prediction table or a load value prediction table for prediction, e.g., when they reach a threshold confidence level in the training table. In some embodiments, the learning table stores a hash of a predicted load value and control circuitry uses a probing load to retrieve the actual predicted load value for the value prediction table.

2.

发明授权
Load/store ordering violation management 有权

公开(公告)号：US10983801B2

公开(公告)日：2021-04-20

申请号：US16562675

申请日：2019-09-06

Applicant: Apple Inc.

Inventor： Kulin N. Kothari , Mridul Agarwal

IPC: G06F9/38 , G06F9/30

Abstract: A processor includes a load/store unit that includes one or more load pipelines and one or more store pipelines. Load operations may be issued into the load pipelines out of order with respect to older store operations. If a load operation is executed out or order with an older store operation that writes one or more bytes read by the load operation, and if the store operation is issued shortly after the load operation, such that the load operation is still in the load pipeline when the store operation is issued, some cases of flushing may be converted to replays by detecting the ordering violation while the load operation is still in the load pipeline.

3.

发明授权
Load/store dependency predictor optimization for replayed loads 有权

公开(公告)号：US10437595B1

公开(公告)日：2019-10-08

申请号：US15070435

申请日：2016-03-15

Applicant: Apple Inc.

Inventor： Pradeep Kanapathipillai , Stephan G. Meier , Gerard R. Williams, III , Mridul Agarwal , Kulin N. Kothari

IPC: G06F9/38 , G06F9/30

Abstract: Systems, apparatuses, and methods for optimizing a load-store dependency predictor (LSDP). When a younger load instruction is issued before an older store instruction and the younger load is dependent on the older store, the LSDP is trained on this ordering violation. A replay/flush indicator is stored in a corresponding entry in the LSDP to indicate whether the ordering violation resulted in a flush or replay. On subsequent executions, a dependency may be enforced for the load-store pair if a confidence counter is above a threshold, with the threshold varying based on the status of the replay/flush indicator. If a given load matches on multiple entries in the LSDP, and if at least one of the entries has a flush indicator, then the given load may be marked as a multimatch case and forced to wait to issue until all older stores have issued.

4.

发明授权
Content-directed prefetch circuit with quality filtering 有权

公开(公告)号：US09886385B1

公开(公告)日：2018-02-06

申请号：US15247421

申请日：2016-08-25

Applicant: Apple Inc.

Inventor： Tyler J. Huberty , Stephan G. Meier , Mridul Agarwal

IPC: G06F12/08 , G06F12/0862 , G06F12/0897 , G06F12/0864

CPC classification number: G06F12/0862 , G06F12/0864 , G06F12/0897 , G06F2212/1024 , G06F2212/6022 , G06F2212/6024

Abstract: In a content-directed prefetcher, a pointer detection circuit identifies a given memory pointer candidate within a data cache line fill from a lower level cache (LLC), where the LLC is at a lower level of a memory hierarchy relative to the data cache. A pointer filter circuit initiates a prefetch request to the LLC candidate dependent on determining that a given counter in a quality factor (QF) table satisfies QF counter threshold value. The QF table is indexed dependent upon a program counter address and relative cache line offset of the candidate. Upon initiation of the prefetch request, the given counter is updated to reflect a prefetch cost. In response to determining that a subsequent data cache line fill arriving from the LLC corresponds to the prefetch request for the given memory pointer candidate, a particular counter of the QF table may be updated to reflect a successful prefetch credit.

5.

发明公开
Decoupling Atomicity from Operation Size 审中-公开

公开(公告)号：US20240248844A1

公开(公告)日：2024-07-25

申请号：US18587289

申请日：2024-02-26

Applicant: Apple Inc.

Inventor： Francesco Spadini , Gideon Levinsky , Mridul Agarwal

IPC: G06F12/0804 , G06F9/30 , G06F9/38

CPC classification number: G06F12/0804 , G06F9/30043 , G06F9/3826 , G06F9/3834 , G06F2212/601

Abstract: In an embodiment, a processor implements a different atomicity size (for memory consistency order) than the operation size. More particularly, the processor may implement a smaller atomicity size than the operation size. For example, for multiple register loads, the atomicity size may be the register size. In another example, the vector element size may be the atomicity size for vector load instructions. In yet another example, multiple contiguous vector elements, but fewer than all the vector elements in a vector register, may be the atomicity size for vector load instructions.

6.

发明授权
Load-store unit with banked queue 有权

公开(公告)号：US10133571B1

公开(公告)日：2018-11-20

申请号：US15171369

申请日：2016-06-02

Applicant: Apple Inc.

Inventor： Aditya Kesiraju , Mridul Agarwal , Pradeep Kanapathipillai , Sean M. Reynolds

IPC: G06F9/30 , G06F9/38

Abstract: A load-store unit having one or more banked queues is disclosed. In one embodiment, a load-store unit includes at least one queue that is subdivided into multiple banks. Although divided into multiple banks, the queue logically appears to software as a single queue. A first bank of the queue includes a first plurality of entries, with the second bank of the queue having a second plurality of entries, wherein each of the entries is arranged to store memory instructions. Each of the banks is associated with corresponding logic circuitry that controls one or more pointers for that bank. The pointer information may be exchanged between the logic circuits associated with the banks. Based on the pointer information that is exchanged, each bank may output (e.g., for retirement) one entry per cycle.

7.

发明授权
Prefetch circuit for a processor with pointer optimization 有权

公开(公告)号：US09971694B1

公开(公告)日：2018-05-15

申请号：US14748833

申请日：2015-06-24

Applicant: Apple Inc.

Inventor： Stephan G. Meier , Mridul Agarwal

IPC: G06F12/0862 , G06F9/30

CPC classification number: G06F12/0862 , G06F9/383 , G06F2212/602 , G06F2212/6028

Abstract: In an embodiment, a processor may implement an access map-pattern match (AMPM)-based prefetch circuit with features designed to improve prefetching accuracy and/or reduce power consumption. In an embodiment, the prefetch circuit may be configured to detect that pointer reads are occurring (e.g. “pointer chasing.”) The prefetch circuit may be configured to increase the frequency at which prefetch requests are generated for an access map in which pointer read activity is detected, compared to the frequency at which the prefetch requests would be generated in the pointer read activity is not generated. In an embodiment, the prefetch circuit may also detect access maps that are store-only, and may reduce the frequency of prefetches for store only access maps as compared to the frequency of load-only or load/store maps.

8.

发明授权
Register file circuit design process 有权

公开(公告)号：US09824171B2

公开(公告)日：2017-11-21

申请号：US14820223

申请日：2015-08-06

Applicant: Apple Inc.

Inventor： Harsha Krishnamurthy , Mridul Agarwal , Shyam Sundar Balasubramanian , Christopher S. Thomas , Rajat Goel , Rohit Kumar , Muthukumaravelu Velayoudame

IPC: G06F17/50

CPC classification number: G06F17/505 , G06F17/5068

Abstract: In some embodiments, a register file circuit design process includes instructing an automated integrated circuit design program to generate a register file circuit design, including providing a cell circuit design and instructing the automated integrated circuit design program to generate a selection design, a pre-decode design, and a data gating design. The cell circuit design describes a plurality of selection circuits that have a particular arrangement. The selection design describes a plurality of replica circuits that include respective pluralities of selection circuits having the particular arrangement. The pre-decode design describes a pre-decode circuit configured to identify a plurality of entries identified by a portion of a write instruction. The data gating design describes data gating circuits configured, in response to the pre-decode circuit not identifying respective entries, to disable data inputs to respective write selection circuits connected to the respective entries.

9.

发明申请
REGISTER FILE CIRCUIT DESIGN PROCESS 有权
Title translation: 寄存器文件电路设计流程

公开(公告)号：US20170039299A1

公开(公告)日：2017-02-09

申请号：US14820223

申请日：2015-08-06

Applicant: Apple Inc.

Inventor： Harsha Krishnamurthy , Mridul Agarwal , Shyam Sundar Balasubramanian , Christopher S. Thomas , Rajat Goel , Rohit Kumar , Muthukumaravelu Velayoudame

IPC: G06F17/50

CPC classification number: G06F17/505 , G06F17/5068

Abstract: In some embodiments, a register file circuit design process includes instructing an automated integrated circuit design program to generate a register file circuit design, including providing a cell circuit design and instructing the automated integrated circuit design program to generate a selection design, a pre-decode design, and a data gating design. The cell circuit design describes a plurality of selection circuits that have a particular arrangement. The selection design describes a plurality of replica circuits that include respective pluralities of selection circuits having the particular arrangement. The pre-decode design describes a pre-decode circuit configured to identify a plurality of entries identified by a portion of a write instruction. The data gating design describes data gating circuits configured, in response to the pre-decode circuit not identifying respective entries, to disable data inputs to respective write selection circuits connected to the respective entries.

Abstract translation: 在一些实施例中，寄存器文件电路设计过程包括指示自动集成电路设计程序产生寄存器文件电路设计，包括提供单元电路设计并指示自动化集成电路设计程序产生选择设计，预解码设计和数据门控设计。单元电路设计描述了具有特定布置的多个选择电路。选择设计描述了包括具有特定布置的相应多个选择电路的多个复制电路。预解码设计描述了预解码电路，其被配置为识别由写指令的一部分识别的多个条目。数据门控设计描述了数据选通电路，其响应于未识别相应条目的预解码电路而配置，以禁止连接到各个条目的相应写入选择电路的数据输入。

10.

发明授权
Concurrent store and load operations 有权
Title translation: 并行存储和加载操作

公开(公告)号：US09448936B2

公开(公告)日：2016-09-20

申请号：US14154122

申请日：2014-01-13

Applicant: Apple Inc.

Inventor： Rajat Goel , Mridul Agarwal

IPC: G06F12/08

CPC classification number: G06F12/0815 , G06F12/0844 , G06F12/0891

Abstract: Systems, processors, and methods for efficiently handling concurrent store and load operations within a processor. A processor comprises a load-store unit (LSU) with a banked level-one (L1) data cache. When a store operation is ready to write data to the L1 data cache, the store operation will skip the write to any banks that have a conflict with a concurrent load operation. A partial write of the store operation will be performed to those banks of the L1 data cache that do not have a conflict with a concurrent load operation. For every attempt to write the store operation, a corresponding store mask will be updated to indicate which portions of the store operation were successfully written to the L1 data cache.

Abstract translation: 用于在处理器内有效处理并发存储和加载操作的系统，处理器和方法。处理器包括具有一级（L1）数据高速缓存的加载存储单元（LSU）。当存储操作准备好将数据写入L1数据高速缓存时，存储操作将跳过对与并发加载操作冲突的任何存储区的写操作。将对与数据并行加载操作不冲突的L1数据高速缓存区进行存储操作的部分写入。对于每次尝试写存储操作时，将更新相应的存储掩码，以指示存储操作的哪些部分已成功写入L1数据高速缓存。

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification