-
公开(公告)号:US10379855B2
公开(公告)日:2019-08-13
申请号:US15283259
申请日:2016-09-30
Applicant: Intel Corporation
Inventor: William C. Hasenplaugh , Chris J. Newburn , Simon C. Steely, Jr. , Samantika S. Sury
IPC: G06F9/312 , G06F12/00 , G06F9/30 , G06F12/1045 , G06F12/0886 , G06F12/0897 , G06F12/126 , G06F12/1027
Abstract: A processor of an aspect includes a plurality of packed data registers, and a decode unit to decode an instruction. The instruction is to indicate a packed data register of the plurality of packed data registers that is to store a source packed memory address information. The source packed memory address information is to include a plurality of memory address information data elements. An execution unit is coupled with the decode unit and the plurality of packed data registers, the execution unit, in response to the instruction, is to load a plurality of data elements from a plurality of memory addresses that are each to correspond to a different one of the plurality of memory address information data elements, and store the plurality of loaded data elements in a destination storage location. The destination storage location does not include a register of the plurality of packed data registers.
-
公开(公告)号:US20190243761A1
公开(公告)日:2019-08-08
申请号:US16382092
申请日:2019-04-11
Applicant: Intel Corporation
Inventor: Doddaballapur N. Jayasimha , Samantika S. Sury , Christopher J. Hughes , Jonas Svennebring , Yen-Cheng Liu , Stephen R. Van Doren , David A. Koufaty
IPC: G06F12/0815 , G06F12/0808 , G06F9/30
CPC classification number: G06F12/0815 , G06F9/30047 , G06F12/0808 , G06F12/082 , G06F12/0824 , G06F12/0831 , G06F2212/1008 , G06F2212/1021 , G06F2212/608
Abstract: Disclosed embodiments relate to remote atomic operations (RAO) in multi-socket systems. In one example, a method, performed by a cache control circuit of a requester socket, includes: receiving the RAO instruction from the requester CPU core, determining a home agent in a home socket for the addressed cache line, providing a request for ownership (RFO) of the addressed cache line to the home agent, waiting for the home agent to either invalidate and retrieve a latest copy of the addressed cache line from a cache, or to fetch the addressed cache line from memory, receiving an acknowledgement and the addressed cache line, executing the RAO instruction on the received cache line atomically, subsequently receiving multiple local RAO instructions to the addressed cache line from one or more requester CPU cores, and executing the multiple local RAO instructions on the received cache line independently of the home agent.
-
公开(公告)号:US10310978B2
公开(公告)日:2019-06-04
申请号:US15721499
申请日:2017-09-29
Applicant: Intel Corporation
Inventor: Robert G. Blankenship , Samantika S. Sury
IPC: G06F13/00 , G06F12/0815 , G06F12/0811 , G06F12/0875
Abstract: An apparatus and method for multi-level cache request tracking. For example, one embodiment of a processor comprises: one or more cores to execute instructions and process data; a memory subsystem comprising a system memory and a multi-level cache hierarchy; a primary tracker to store a first entry associated with a memory request to transfer a cache line from the system memory or a first cache within the cache hierarchy to a second cache; primary tracker allocation circuitry to allocate and deallocate entries within the primary tracker; a secondary tracker to store a second entry associated with the memory request; secondary tracker allocation circuitry to allocate and deallocate entries within the secondary tracker; the primary tracker allocation circuitry to deallocate the first entry in response to a first indication that one or more cache coherence requirements associated with the cache line have been resolved, the secondary tracker allocation circuitry to deallocate the second entry in response to a second indication related to transmission of the cache line to the second cache.
-
公开(公告)号:US20180095756A1
公开(公告)日:2018-04-05
申请号:US15283259
申请日:2016-09-30
Applicant: Intel Corporation
Inventor: William C. Hasenplaugh , Chris J. Newburn , Simon C. Steely, JR. , Samantika S. Sury
IPC: G06F9/30 , G06F12/1045
CPC classification number: G06F9/30032 , G06F9/30036 , G06F9/3004 , G06F9/30043 , G06F9/3013 , G06F9/3016 , G06F12/0886 , G06F12/0897 , G06F12/1027 , G06F12/1054 , G06F12/126 , G06F2212/1024 , G06F2212/1028 , G06F2212/681
Abstract: A processor of an aspect includes a plurality of packed data registers, and a decode unit to decode an instruction. The instruction is to indicate a packed data register of the plurality of packed data registers that is to store a source packed memory address information. The source packed memory address information is to include a plurality of memory address information data elements. An execution unit is coupled with the decode unit and the plurality of packed data registers, the execution unit, in response to the instruction, is to load a plurality of data elements from a plurality of memory addresses that are each to correspond to a different one of the plurality of memory address information data elements, and store the plurality of loaded data elements in a destination storage location. The destination storage location does not include a register of the plurality of packed data registers.
-
公开(公告)号:US20170010974A1
公开(公告)日:2017-01-12
申请号:US15275630
申请日:2016-09-26
Applicant: Intel Corporation
Inventor: Simon Steely, JR. , Samantika S. Sury , William C. Hasenplaugh
IPC: G06F12/0891 , G06F12/128 , G06F12/1009 , G06F12/1045
CPC classification number: G06F12/0891 , G06F12/1009 , G06F12/1027 , G06F12/1054 , G06F12/126 , G06F12/128 , G06F2212/621 , G06F2212/65 , G06F2212/683
Abstract: Method and apparatus to efficiently manage data in caches. Data in caches may be managed based on priorities assigned to the data. Data may be requested by a process using a virtual address of the data. The requested data may be assigned a priority by a component in a computer system called an address range priority assigner (ARP). The ARP may assign a particular priority to the requested data if the virtual address of the requested data is within a particular range of virtual addresses. The particular priority assigned may be high priority and the particular range of virtual addresses may be smaller than a cache's capacity.
Abstract translation: 有效管理缓存中的数据的方法和设备。 高速缓存中的数据可以基于分配给数据的优先级来管理。 数据可以由使用数据的虚拟地址的进程请求。 请求的数据可以被称为地址范围优先级分配器(ARP)的计算机系统中的组件分配优先级。 如果请求的数据的虚拟地址在虚拟地址的特定范围内,则ARP可以向所请求的数据分配特定优先级。 分配的特定优先级可以是高优先级,并且虚拟地址的特定范围可以小于高速缓存的容量。
-
公开(公告)号:US12204478B2
公开(公告)日:2025-01-21
申请号:US17206961
申请日:2021-03-19
Applicant: Intel Corporation
Inventor: Swapna Raj , Samantika S. Sury , Kermin Chofleming , Simon C. Steely, Jr.
IPC: G06F13/40 , G06F12/0815 , G06F13/16
Abstract: Examples include techniques for near data acceleration for a multi-core architecture. A near data processor included in a memory controller of a processor may access data maintained in a memory device coupled with the near data processor via one or more memory channels responsive to a work request to execute a kernel, an application or a loop routine using the accessed data to generate values. The near data processor provides an indication to the requestor of the work request that values have been generated.
-
公开(公告)号:US11989135B2
公开(公告)日:2024-05-21
申请号:US16786815
申请日:2020-02-10
Applicant: Intel Corporation
Inventor: Farah E. Fargo , Mitchell Diamond , David Keppel , Samantika S. Sury , Binh Pham , Shobha Vissapragada
IPC: G06F12/10 , G06F12/1027
CPC classification number: G06F12/1027 , G06F2212/657
Abstract: Examples described herein relate to a computing system supporting custom page sized ranges for an application to map contiguous memory regions instead of many smaller sized pages. An application can request a custom range size. An operating system can allocate a contiguous physical memory region to a virtual address range by specifying a custom range sizes that are larger or smaller than the normal general page sizes. Virtual-to-physical address translation can occur using an address range circuitry and translation lookaside buffer in parallel. The address range circuitry can determine if a custom entry is available to use to identify a physical address translation for the virtual address. Physical address translation can be performed by transforming the virtual address in some examples.
-
公开(公告)号:US11500636B2
公开(公告)日:2022-11-15
申请号:US16799619
申请日:2020-02-24
Applicant: Intel Corporation
Inventor: Christopher J. Hughes , Joseph Nuzman , Jonas Svennebring , Doddaballapur N. Jayasimha , Samantika S. Sury , David A. Koufaty , Niall D. McDonnell , Yen-Cheng Liu , Stephen R. Van Doren , Stephen J. Robinson
IPC: G06F9/30 , G06F12/0875
Abstract: Disclosed embodiments relate to spatial and temporal merging of remote atomic operations. In one example, a system includes an RAO instruction queue stored in a memory and having entries grouped by destination cache line, each entry to enqueue an RAO instruction including an opcode, a destination identifier, and source data, optimization circuitry to receive an incoming RAO instruction, scan the RAO instruction queue to detect a matching enqueued RAO instruction identifying a same destination cache line as the incoming RAO instruction, the optimization circuitry further to, responsive to no matching enqueued RAO instruction being detected, enqueue the incoming RAO instruction; and, responsive to a matching enqueued RAO instruction being detected, determine whether the incoming and matching RAO instructions have a same opcode to non-overlapping cache line elements, and, if so, spatially combine the incoming and matching RAO instructions by enqueuing both RAO instructions in a same group of cache line queue entries at different offsets.
-
公开(公告)号:US20220206945A1
公开(公告)日:2022-06-30
申请号:US17134254
申请日:2020-12-25
Applicant: Intel Corporation
Inventor: Carl J. Beckmann , Samantika S. Sury , Christopher J. Hughes , Lingxiang Xiang , Rahul Agrawal
IPC: G06F12/0811 , G06F12/0817 , G06F12/0862 , G06F12/084
Abstract: Disclosed embodiments relate to atomic memory operations. In one example, an apparatus includes multiple processor cores, a cache hierarchy, a local execution unit, and a remote execution unit, and an adaptive remote atomic operation unit. The cache hierarchy includes a local cache at a first level and a shared cache at a second level. The local execution unit is to perform an atomic operation at the first level if the local cache is a storing a cache line including data for the atomic operation. The remote execution unit is to perform the atomic operation at the second level. The adaptive remote atomic operation unit is to determine whether to perform the first atomic operation at the first level or at the second level and whether to copy the cache line from the shared cache to the local cache.
-
公开(公告)号:US11138112B2
公开(公告)日:2021-10-05
申请号:US16382092
申请日:2019-04-11
Applicant: Intel Corporation
Inventor: Doddaballapur N. Jayasimha , Samantika S. Sury , Christopher J. Hughes , Jonas Svennebring , Yen-Cheng Liu , Stephen R. Van Doren , David A. Koufaty
IPC: G06F12/0831 , G06F12/0815 , G06F12/0808 , G06F9/30 , G06F12/0817
Abstract: Disclosed embodiments relate to remote atomic operations (RAO) in multi-socket systems. In one example, a method, performed by a cache control circuit of a requester socket, includes: receiving the RAO instruction from the requester CPU core, determining a home agent in a home socket for the addressed cache line, providing a request for ownership (RFO) of the addressed cache line to the home agent, waiting for the home agent to either invalidate and retrieve a latest copy of the addressed cache line from a cache, or to fetch the addressed cache line from memory, receiving an acknowledgement and the addressed cache line, executing the RAO instruction on the received cache line atomically, subsequently receiving multiple local RAO instructions to the addressed cache line from one or more requester CPU cores, and executing the multiple local RAO instructions on the received cache line independently of the home agent.
-
-
-
-
-
-
-
-
-