Patent search ap:("Samsung Electronics Co. Page Ltd.") AND inv:"Karthik Sundaram"

1.

发明授权
Memory load to load fusing 有权

公开(公告)号：US10372452B2

公开(公告)日：2019-08-06

申请号：US15615811

申请日：2017-06-06

Applicant: Samsung Electronics Co., Ltd.

Inventor： Paul E. Kitchin , Rama S. Gopal , Karthik Sundaram

IPC: G06F9/30 , G06F9/35 , G06F12/0875 , G06F9/38

Abstract: A system and a method to cascade execution of instructions in a load-store unit (LSU) of a central processing unit (CPU) to reduce latency associated with the instructions. First data stored in a cache is read by the LSU in response a first memory load instruction of two immediately consecutive memory load instructions. Alignment, sign extension and/or endian operations are performed on the first data read from the cache in response to the first memory load instruction, and, in parallel, a memory-load address-forwarded result is selected based on a corrected alignment of the first data read in response to the first memory load instruction to provide a next address for a second of the two immediately consecutive memory load instructions. Second data stored in the cache is read by the LSU in response to the second memory load instruction based on the selected memory-load address-forwarded result.

2.

发明申请
ADDRESS RE-ORDERING MECHANISM FOR EFFICIENT PRE-FETCH TRAINING IN AN OUT-OF-ORDER PROCESSOR 审中-公开

公开(公告)号：US20170116128A1

公开(公告)日：2017-04-27

申请号：US15401515

申请日：2017-01-09

Applicant: Samsung Electronics Co., Ltd.

Inventor： Karthik Sundaram , Arun Radhakrishnan

IPC: G06F12/0862 , G06F12/0875

CPC classification number: G06F12/0862 , G06F9/30043 , G06F9/3802 , G06F9/383 , G06F9/3836 , G06F12/0875 , G06F12/1027 , G06F2212/452 , G06F2212/6028 , G06F2212/657 , Y02D10/13

Abstract: A computing system includes: an instruction dispatch module module configured to receive a program instruction; and an address reordering module, coupled to the instruction dispatch module, configured to filter the program instruction when the program instruction is a hit in a cache-line in a prefetch filter. The computer system further includes: an instruction dispatch module configured to receive a program instruction; an address reordering module, coupled to the instruction dispatch module, configured to: allocate a tag in a tag module for the program instruction in a program order, allocate a virtual address in a virtual address module for the program instruction and out-of-order relative to the program order, and insert a pointer associated with the tag to link the tag to the virtual address.

3.

发明授权
Efficient fill-buffer data forwarding supporting high frequencies 有权
Title translation: 高效的填充缓冲区数据转发支持高频率

公开(公告)号：US09418018B2

公开(公告)日：2016-08-16

申请号：US14337211

申请日：2014-07-21

Applicant: Samsung Electronics Co., Ltd.

Inventor： Karthik Sundaram , Rama Gopal , Murali Chinnakonda

IPC: G06F12/10 , G06F12/08 , G06F9/38

CPC classification number: G06F12/1045 , G06F9/3861 , G06F12/0811 , G06F12/0859 , G06F12/1009 , G06F2212/1024 , G06F2212/507

Abstract: A Fill Buffer (FB) based data forwarding scheme that stores a combination of Virtual Address (VA), TLB (Translation Look-aside Buffer) entry# or an indication of a location of a Page Table Entry (PTE) in the TLB, and a TLB page size information in the FB and uses these values to expedite FB forwarding. Load (Ld) operations send their non-translated VA for an early comparison against the VA entries in the FB, and are then further qualified with the TLB entry# to determine a “hit.” This hit determination is fast and enables FB forwarding at higher frequencies without waiting for a comparison of Physical Addresses (PA) to conclude in the FB. A safety mechanism may detect a false hit in the FB and generate a late load cancel indication to cancel the earlier-started FB forwarding by ignoring the data obtained as a result of the Ld execution. The Ld is then re-executed later and tries to complete successfully with the correct data.

Abstract translation: 一种基于填充缓冲器（FB）的数据转发方案，其存储虚拟地址（VA），TLB（翻译后备缓冲区）条目＃或页面表项（PTE）在TLB中的位置的指示的组合，以及 FB中的TLB页面大小信息，并使用这些值来加速FB转发。加载（Ld）操作发送他们的非翻译的VA，以便与FB中的VA条目进行早期比较，然后进一步通过TLB条目＃进行限定，以确定“命中”。该命中确定速度很快，可以使FB转发更高的频率，而不等待物理地址（PA）的比较结束于FB。安全机制可以检测到FB中的错误命中，并产生一个晚期负载取消指示，以通过忽略由于执行Ld而获得的数据来取消较早启动的FB转发。然后，Ld稍后重新执行，并尝试使用正确的数据成功完成。

4.

发明授权
High-frequency and low-power L1 cache and associated access technique 有权

公开(公告)号：US11048637B2

公开(公告)日：2021-06-29

申请号：US16547557

申请日：2019-08-21

Applicant: Samsung Electronics Co., Ltd.

Inventor： Karthik Sundaram

IPC: G06F12/0864 , G06F12/1027 , G06F9/30

Abstract: A high-frequency and low-power L1 cache and associated access technique. The method may include inspecting a virtual address of an L1 data cache load instruction, and indexing into a row and a column of a way predictor table using metadata and a virtual address associated with the load instruction. The method may include matching information stored at the row and the column of the way predictor table to a location of a cache line. The method may include predicting the location of the cache line within the L1 data cache based on the information match. A hierarchy of way predictor tables may be used, with higher level way predictor tables refreshing smaller lower level way predictor tables. The way predictor tables may be trained to make better predictions over time. Only selected circuit macros need to be enabled based on the predictions, thereby saving power.

5.

发明授权
Memory load and arithmetic load unit (ALU) fusing 有权

公开(公告)号：US10275217B2

公开(公告)日：2019-04-30

申请号：US15612963

申请日：2017-06-02

Applicant: Samsung Electronics Co., Ltd.

Inventor： Rama S. Gopal , Paul E. Kitchin , Karthik Sundaram

IPC: G06F7/38 , G06F7/485 , G06F7/50

Abstract: According to one general aspect, a load unit may include a load circuit configured to load at least one piece of data from a memory. The load unit may include an alignment circuit configured to align the data to generate an aligned data. The load unit may also include a mathematical operation execution circuit configured to generate a resultant of a predetermined mathematical operation with the at least one piece of data as an operand. Wherein the load unit is configured to, if an active instruction is associated with the predetermined mathematical operation, bypass the alignment circuit and input the piece of data directly to the mathematical operation execution circuit.

6.

发明授权
Address re-ordering mechanism for efficient pre-fetch training in an out-of-order processor 有权
Title translation: 解决无序处理器中高效预取训练的重新排序机制

公开(公告)号：US09542323B2

公开(公告)日：2017-01-10

申请号：US14498878

申请日：2014-09-26

Applicant: Samsung Electronics Co., Ltd.

Inventor： Karthik Sundaram , Arun Radhakrishnan

IPC: G06F12/00 , G06F12/08 , G06F9/38

CPC classification number: G06F12/0862 , G06F9/30043 , G06F9/3802 , G06F9/383 , G06F9/3836 , G06F12/0875 , G06F12/1027 , G06F2212/452 , G06F2212/6028 , G06F2212/657 , Y02D10/13

Abstract: A computing system includes: an instruction dispatch module configured to receive a program instruction; an address reordering module, coupled to the instruction dispatch module, configured to filter the program instruction when the program instruction is a hit in a cache-line in a prefetch filter. The computer system further includes: an instruction dispatch module configured to receive a program instruction; an address reordering module, coupled to the instruction dispatch module, configured to: allocate a tag in a tag module for the program instruction in a program order, allocate a virtual address in a virtual address module for the program instruction in an out-of-order relative to the program order, and insert a pointer associated with the tag to link the tag to the virtual address.

Abstract translation: 计算系统包括：指令调度模块，被配置为接收程序指令; 耦合到指令调度模块的地址重排序模块，被配置为当所述程序指令是预取过滤器中的高速缓存行中的命中时对所述程序指令进行过滤。计算机系统还包括：指令调度模块，被配置为接收程序指令; 一个地址重排序模块，耦合到指令调度模块，被配置为：以程序顺序在程序指令的标签模块中分配标签，在虚拟地址模块中为程序指令分配虚拟地址，相对于程序顺序的顺序，并插入与标签相关联的指针，以将标签链接到虚拟地址。

7.

发明申请
COMPUTING SYSTEM WITH STRIDE PREFETCH MECHANISM AND METHOD OF OPERATION THEREOF 审中-公开
Title translation: 具有前瞻性机制的计算机系统及其操作方法

公开(公告)号：US20160054997A1

公开(公告)日：2016-02-25

申请号：US14832547

申请日：2015-08-21

Applicant: Samsung Electronics Co., Ltd.

Inventor： Arun Radhakrishnan , Karthik Sundaram , Brian Grayson

IPC: G06F9/30 , G06F9/345 , G06F12/08

Abstract: A computing system includes: an instruction dispatch module configured to receive an address stream; a prefetch module, coupled to the instruction dispatch module, configured to: train to concurrently detect a single-stride pattern or a multi-stride pattern from the address stream, speculatively fetch a program data based on the single-stride pattern or the multi-stride pattern, and continue to train for the single-stride pattern with a larger value for a stride count or for the multi-stride pattern.

Abstract translation: 计算系统包括：指令调度模块，被配置为接收地址流; 耦合到所述指令调度模块的预取模块，被配置为：训练从所述地址流同时检测单步模式或多步式模式，基于所述单步模式或所述多步式模式推测性地获取程序数据，并且继续训练具有更大的步幅计数或多步式模式的单步式模式。

8.

发明授权
Memory load to load fusing 有权

公开(公告)号：US10956155B2

公开(公告)日：2021-03-23

申请号：US16421463

申请日：2019-05-23

Applicant: Samsung Electronics Co., Ltd.

Inventor： Paul E. Kitchin , Rama S. Gopal , Karthik Sundaram

IPC: G06F9/30 , G06F9/38 , G06F9/35 , G06F12/0875

Abstract: A system and a method to cascade execution of instructions in a load-store unit (LSU) of a central processing unit (CPU) to reduce latency associated with the instructions. First data stored in a cache is read by the LSU in response a first memory load instruction of two immediately consecutive memory load instructions. Alignment, sign extension and/or endian operations are performed on the first data read from the cache in response to the first memory load instruction, and, in parallel, a memory-load address-forwarded result is selected based on a corrected alignment of the first data read in response to the first memory load instruction to provide a next address for a second of the two immediately consecutive memory load instructions. Second data stored in the cache is read by the LSU in response to the second memory load instruction based on the selected memory-load address-forwarded result.

9.

发明授权
Address re-ordering mechanism for efficient pre-fetch training in an out-of order processor 有权

公开(公告)号：US10031851B2

公开(公告)日：2018-07-24

申请号：US15401515

申请日：2017-01-09

Applicant: Samsung Electronics Co., Ltd.

Inventor： Karthik Sundaram , Arun Radhakrishnan

IPC: G06F12/00 , G06F12/0862 , G06F12/0875 , G06F12/1027

Abstract: A computing system includes: an instruction dispatch module module configured to receive a program instruction; and an address reordering module, coupled to the instruction dispatch module, configured to filter the program instruction when the program instruction is a hit in a cache-line in a prefetch filter. The computer system further includes: an instruction dispatch module configured to receive a program instruction; an address reordering module, coupled to the instruction dispatch module, configured to: allocate a tag in a tag module for the program instruction in a program order, allocate a virtual address in a virtual address module for the program instruction and out-of-order relative to the program order, and insert a pointer associated with the tag to link the tag to the virtual address.

10.

发明授权
Pre-fetch chaining 有权
Title translation: 预取链接

公开(公告)号：US09569361B2

公开(公告)日：2017-02-14

申请号：US14325343

申请日：2014-07-07

Applicant: Samsung Electronics Co., Ltd.

Inventor： Arun Radhakrishnan , Kevin Lepak , Rama Gopal , Murali Chinnakonda , Karthik Sundaram , Brian Grayson

IPC: G06F12/08

CPC classification number: G06F12/0862 , G06F12/10 , G06F2212/6022

Abstract: According to one general aspect, an apparatus may include a cache pre-fetcher, and a pre-fetch scheduler. The cache pre-fetcher may be configured to predict, based at least in part upon a virtual address, data to be retrieved from a memory system. The pre-fetch scheduler may be configured to convert the virtual address of the data to a physical address of the data, and request the data from one of a plurality of levels of the memory system. The memory system may include a plurality of levels, each level of the memory system configured to store data.

Abstract translation: 根据一个一般方面，设备可以包括高速缓存预取器和预取调度器。高速缓存预取器可以被配置为至少部分地基于虚拟地址预测要从存储器系统检索的数据。预取调度器可以被配置为将数据的虚拟地址转换为数据的物理地址，并且从存储器系统的多个级别之一请求数据。存储器系统可以包括多个级别，存储器系统的每个级别被配置为存储数据。

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification