-
公开(公告)号:US12131182B2
公开(公告)日:2024-10-29
申请号:US16362281
申请日:2019-03-22
发明人: Zhenjiang Wang , Jianjun Li , Liang Chen , Kun Ling , Delin Li , Chen Sun
CPC分类号: G06F9/4881 , G06F9/30007 , G06F9/3004 , G06F9/30076 , G06F9/345 , G06F9/5016 , G06N3/063
摘要: Systems and methods of data processing are provided. The method comprises receiving an input data to be processed by a series of operations, identifying a first operation from the series of operations, selecting at least one second operation from the series of operations to be grouped with the first operation based at least in part on an amount of an input data and an output data of the grouped operations and the capacity of the memory unit, and processing a portion of the input data of the grouped operations. An efficiency of the series of data operations can be improved by ensuring the input data and output data of any data operations are both stored in the memory unit.
-
2.
公开(公告)号:US20240345842A1
公开(公告)日:2024-10-17
申请号:US18754455
申请日:2024-06-26
CPC分类号: G06F9/383 , G06F9/30036 , G06F9/3004 , G06F9/30043
摘要: This disclosure is directed to the problem of paralleling random read access within a reasonably sized block of data for a vector SIMD processor. The invention sets up plural parallel look up tables, moves data from main memory to each plural parallel look up table and then employs a look up table read instruction to simultaneously move data from each parallel look up table to a corresponding part a vector destination register. This enables data processing by vector single instruction multiple data (SIMD) operations. This vector destination register load can be repeated if the tables store more used data. New data can be loaded into the original tables if appropriate. A level one memory is preferably partitioned as part data cache and part directly addressable memory. The look up table memory is stored in the directly addressable memory.
-
公开(公告)号:US12093210B2
公开(公告)日:2024-09-17
申请号:US17430574
申请日:2020-03-14
申请人: Intel Corporation
发明人: Abhishek R. Appu , Altug Koker , Aravindh Anantaraman , Elmoustapha Ould-Ahmed-Vall , Joydeep Ray , Mike Macpherson , Valentin Andrei , Nicolas Galoppo Von Borries , Varghese George , Subramaniam Maiyuran , Vasanth Ranganathan , Jayakrishna P S , K Pattabhiraman , Sudhakar Kamma
IPC分类号: G06F15/78 , G06F7/544 , G06F7/575 , G06F7/58 , G06F9/30 , G06F9/38 , G06F9/50 , G06F12/02 , G06F12/06 , G06F12/0802 , G06F12/0804 , G06F12/0811 , G06F12/0862 , G06F12/0866 , G06F12/0871 , G06F12/0875 , G06F12/0882 , G06F12/0888 , G06F12/0891 , G06F12/0893 , G06F12/0895 , G06F12/0897 , G06F12/1009 , G06F12/128 , G06F15/80 , G06F17/16 , G06F17/18 , G06T1/20 , G06T1/60 , H03M7/46 , G06N3/08 , G06T15/06
CPC分类号: G06F15/7839 , G06F7/5443 , G06F7/575 , G06F7/588 , G06F9/3001 , G06F9/30014 , G06F9/30036 , G06F9/3004 , G06F9/30043 , G06F9/30047 , G06F9/30065 , G06F9/30079 , G06F9/3887 , G06F9/5011 , G06F9/5077 , G06F12/0215 , G06F12/0238 , G06F12/0246 , G06F12/0607 , G06F12/0802 , G06F12/0804 , G06F12/0811 , G06F12/0862 , G06F12/0866 , G06F12/0871 , G06F12/0875 , G06F12/0882 , G06F12/0888 , G06F12/0891 , G06F12/0893 , G06F12/0895 , G06F12/0897 , G06F12/1009 , G06F12/128 , G06F15/8046 , G06F17/16 , G06F17/18 , G06T1/20 , G06T1/60 , H03M7/46 , G06F9/3802 , G06F9/3818 , G06F9/3867 , G06F2212/1008 , G06F2212/1021 , G06F2212/1044 , G06F2212/302 , G06F2212/401 , G06F2212/455 , G06F2212/60 , G06N3/08 , G06T15/06
摘要: Methods and apparatus relating to techniques for data compression. In an example, an apparatus comprises a processor receive a data compression instruction for a memory segment; and in response to the data compression instruction, compress a sequence of identical memory values in response to a determination that the sequence of identical memory values has a length which exceeds a threshold. Other embodiments are also disclosed and claimed.
-
4.
公开(公告)号:US12093208B2
公开(公告)日:2024-09-17
申请号:US17862222
申请日:2022-07-11
申请人: NVIDIA Corporation
发明人: Ryan Olson , Michael Demoret , Bartley Richardson
IPC分类号: G06F15/173 , G06F9/30 , H04L67/025 , H04L67/1097
CPC分类号: G06F15/17331 , G06F9/3004 , H04L67/025 , H04L67/1097
摘要: Technologies for enabling remote direct memory access (RDMA) transport of serialized objects in streaming pipelines are described. One method of a first computing device that stores a serialized object in a first memory can generate a remote descriptor associated with the serialized object. The remote descriptor uniquely identifies the location of the serialized object and a reference count token. The first computing device sends the remote descriptor to a second computing device in the data center over a network fabric. The second computing device uses the remote descriptor to obtain the contiguous block from the first memory for storage at a second memory associated with the second computing device. The value of the reference count token can be updated by receiving a message from the second computing device, and the remote descriptor can be released responsive to the value of the reference count token satisfying a threshold value.
-
5.
公开(公告)号:US20240303078A1
公开(公告)日:2024-09-12
申请号:US18181307
申请日:2023-03-09
IPC分类号: G06F9/30
CPC分类号: G06F9/3004 , G06F9/5016
摘要: The disclosure provides for systems and methods for improving bandwidth and latency associated with executing data requests in disaggregated memory by leveraging usage indicators (also referred to as usage value), such as “freshness” of data operators and processing “gravity” of near memory compute functions. Examples of the systems and methods disclosed herein generate data operators comprising near memory compute functions offloaded proximate to disaggregated memory nodes, assign a usage value to each data operator based on at least one of: (i) a freshness indicator for each data operators, and (ii) a gravity indicator for each near memory compute function; and allocate data operations to the data operators based on the usage value.
-
公开(公告)号:US12067395B2
公开(公告)日:2024-08-20
申请号:US18098068
申请日:2023-01-17
申请人: Tenstorrent Inc.
IPC分类号: G06F9/30
CPC分类号: G06F9/30145 , G06F9/3004 , G06F9/30134
摘要: Methods and systems relating to improved processing architectures with pre-staged instructions are disclosed herein. A disclosed processor includes a memory, at least one functional processing unit, a bus, a set of instruction registers configured to be loaded, using the bus, with a set of pre-staged instructions from the memory, and a logic circuit configured to provide the set of pre-staged instructions from the set of instruction registers to the at least one functional processing unit in response to receiving an instruction from the instruction memory.
-
公开(公告)号:US20240264975A1
公开(公告)日:2024-08-08
申请号:US18619382
申请日:2024-03-28
发明人: Yuan Li , Jianbin Zhu
IPC分类号: G06F15/80 , G06F9/30 , G06F9/34 , G06F9/38 , G06F9/445 , G06F12/0815 , G06F13/16 , G06F15/78
CPC分类号: G06F15/8023 , G06F9/3001 , G06F9/3004 , G06F9/3009 , G06F9/30098 , G06F9/34 , G06F9/3808 , G06F9/3867 , G06F9/3885 , G06F9/44505 , G06F12/0815 , G06F13/1673 , G06F15/7821 , G06F15/7867 , G06F15/7871 , G06F15/7875 , G06F15/7878 , G06F15/7885 , G06F15/7889 , G06F15/8046 , G06F15/8061 , G06F15/8069 , G06F15/8092 , G06F2212/1021 , Y02D10/00
摘要: Processors, systems and methods are provided for thread level parallel processing. A processor may comprise a plurality of processing elements (PEs) that each may comprise a configuration buffer, a sequencer coupled to the configuration buffer of each of the plurality of PEs and configured to distribute one or more PE configurations to the plurality of PEs, and a gasket memory coupled to the plurality of PEs and being configured to store at least one PE execution result to be used by at least one of the plurality of PEs during a next PE configuration.
-
公开(公告)号:US20240211256A1
公开(公告)日:2024-06-27
申请号:US18601006
申请日:2024-03-11
CPC分类号: G06F9/3004 , G06F7/575 , G06F9/3001 , G06F9/3856
摘要: An apparatus that manages multi-process execution in a processing-in-memory (“PIM”) device includes a gatekeeper configured to: receive an identification of one or more registered PIM processes; receive, from a process, a memory request that includes a PIM command; if the requesting process is a registered PIM process and another registered PIM process is active on the PIM device, perform a context switch of PIM state between the registered PIM processes; and issue the PIM command of the requesting process to the PIM device.
-
公开(公告)号:US20240211150A1
公开(公告)日:2024-06-27
申请号:US18408670
申请日:2024-01-10
申请人: Gaea LLC
发明人: Joshua Johnson , Curt Bruner , Jeffrey Reh , Christopher Squires , Brian Wilson
CPC分类号: G06F3/0632 , G06F3/0604 , G06F3/0658 , G06F3/0659 , G06F3/0665 , G06F3/0673 , G06F3/0674 , G06F3/0676 , G06F3/0679 , G06F3/068 , G06F9/06 , G06F9/3004 , G06F3/061 , G06F3/0619
摘要: An apparatus comprises a storage device and a device controller operatively coupled with the storage device. The device controller comprises a memory that stores an application. The application stored on the memory comprises instructions. When executed, the instruction direct the device controller to receive a storage request comprising content. The device controller retrieves a storage device policy from the memory that indicates a set of storage locations on the storage device. The device controller selects one of the storage locations on the storage device based on the storage device policy. The device controller stores the content on the storage device at the selected storage location. The device controller records storage information for the content that indicates the selected location on the memory.
-
公开(公告)号:US12020031B2
公开(公告)日:2024-06-25
申请号:US17334901
申请日:2021-05-31
申请人: Intel Corporation
发明人: Michael Mishaeli , Jason W. Brandt , Gilbert Neiger , Asit K. Mallick , Rajesh M. Sankaran , Raghunandan Makaram , Benjamin C. Chaffin , James B. Crossland , H. Peter Anvin
CPC分类号: G06F9/3009 , G06F9/3004 , G06F9/30076 , G06F9/3851 , G06F9/485 , G06F13/4068
摘要: A processor of an aspect includes a decode unit to decode a user-level suspend thread instruction that is to indicate a first alternate state. The processor also includes an execution unit coupled with the decode unit. The execution unit is to perform the instruction at a user privilege level. The execution unit in response to the instruction, is to: (a) suspend execution of a user-level thread, from which the instruction is to have been received; (b) transition a logical processor, on which the user-level thread was to have been running, to the indicated first alternate state; and (c) resume the execution of the user-level thread, when the logical processor is in the indicated first alternate state, with a latency that is to be less than half a latency that execution of a thread can be resumed when the logical processor is in a halt processor power state.
-
-
-
-
-
-
-
-
-