-
公开(公告)号:US09390010B2
公开(公告)日:2016-07-12
申请号:US13715526
申请日:2012-12-14
Applicant: Intel Corporation
Inventor: Ahmad Samih , Ren Wang , Christian Maciocco , Sameh Gobriel , Tsung-Yuan Tai
IPC: G06F12/08
CPC classification number: G06F12/0804 , G06F12/0888 , Y02D10/13
Abstract: The present disclosure provides techniques for cache management. A data block may be received from an IO interface. After receiving the data block, the occupancy level of a cache memory may be determined. The data block may be directed to a main memory if the occupancy level exceeds a threshold. The data block may be directed to a cache memory if the occupancy level is below a threshold.
Abstract translation: 本公开提供了用于高速缓存管理的技术。 可以从IO接口接收数据块。 在接收到数据块之后,可以确定高速缓冲存储器的占用水平。 如果占用率超过阈值,则数据块可以被引导到主存储器。 如果占用水平低于阈值,则数据块可以被引导到高速缓冲存储器。
-
公开(公告)号:US08898499B2
公开(公告)日:2014-11-25
申请号:US13947479
申请日:2013-07-22
Applicant: Intel Corporation
Inventor: Ren Wang , Christian Mociocco , Sanjay Bakshi , Tsung-Yuan Charles Tai
IPC: G06F1/32
CPC classification number: G06F1/3287 , G06F1/3203 , G06F1/329 , G06F9/4418 , Y02D10/24 , Y02D50/20
Abstract: The present invention relates to platform power management.
Abstract translation: 本发明涉及平台电源管理。
-
公开(公告)号:US12197601B2
公开(公告)日:2025-01-14
申请号:US17560193
申请日:2021-12-22
Applicant: Intel Corporation
Inventor: Ren Wang , Sameh Gobriel , Somnath Paul , Yipeng Wang , Priya Autee , Abhirupa Layek , Shaman Narayana , Edwin Verplanke , Mrittika Ganguli , Jr-Shian Tsai , Anton Sorokin , Suvadeep Banerjee , Abhijit Davare , Desmond Kirkpatrick , Rajesh M. Sankaran , Jaykant B. Timbadiya , Sriram Kabisthalam Muthukumar , Narayan Ranganathan , Nalini Murari , Brinda Ganesh , Nilesh Jain
Abstract: Examples described herein relate to offload circuitry comprising one or more compute engines that are configurable to perform a workload offloaded from a process executed by a processor based on a descriptor particular to the workload. In some examples, the offload circuitry is configurable to perform the workload, among multiple different workloads. In some examples, the multiple different workloads include one or more of: data transformation (DT) for data format conversion, Locality Sensitive Hashing (LSH) for neural network (NN), similarity search, sparse general matrix-matrix multiplication (SpGEMM) acceleration of hash based sparse matrix multiplication, data encode, data decode, or embedding lookup.
-
公开(公告)号:US11570123B2
公开(公告)日:2023-01-31
申请号:US17067564
申请日:2020-10-09
Applicant: Intel Corporation
Inventor: Ren Wang , Tsung-Yuan C. Tai , Jr-Shian Tsai
IPC: H04L67/145 , H04L47/74 , H04L47/722 , H04N1/333 , H04L1/18 , H04L47/30 , H04L12/12 , H04L67/125 , H04L1/00 , H04W28/08 , H04L67/59 , H04L27/26 , H04W28/02
Abstract: In an embodiment, an apparatus is provided that may include circuitry to generate, at least in part, and/or receive, at least in part, at least one request that at least one network node generate, at least in part, information. The information may be to permit selection, at least in part, of (1) at least one power consumption state of the at least one network node, and (2) at least one time period. The at least one time period may be to elapse, after receipt by at least one other network node of at least one packet, prior to requesting at least one change in the at least one power consumption state. The at least one packet may be to be transmitted to the at least one network node. Of course, many alternatives, modifications, and variations are possible without departing from this embodiment.
-
公开(公告)号:US11513957B2
公开(公告)日:2022-11-29
申请号:US17027248
申请日:2020-09-21
Applicant: Intel Corporation
Inventor: Ren Wang , Andrew J. Herdrich , Yen-cheng Liu , Herbert H. Hum , Jong Soo Park , Christopher J. Hughes , Namakkal N. Venkatesan , Adrian C. Moga , Aamer Jaleel , Zeshan A. Chishti , Mesut A. Ergin , Jr-shian Tsai , Alexander W. Min , Tsung-yuan C. Tai , Christian Maciocco , Rajesh Sankaran
IPC: G06F12/0842 , G06F12/0893 , G06F12/109 , G06F12/0813 , G06F12/0831 , G06F9/455
Abstract: Methods and apparatus implementing Hardware/Software co-optimization to improve performance and energy for inter-VM communication for NFVs and other producer-consumer workloads. The apparatus include multi-core processors with multi-level cache hierarchies including and L1 and L2 cache for each core and a shared last-level cache (LLC). One or more machine-level instructions are provided for proactively demoting cachelines from lower cache levels to higher cache levels, including demoting cachelines from L1/L2 caches to an LLC. Techniques are also provided for implementing hardware/software co-optimization in multi-socket NUMA architecture system, wherein cachelines may be selectively demoted and pushed to an LLC in a remote socket. In addition, techniques are disclosure for implementing early snooping in multi-socket systems to reduce latency when accessing cachelines on remote sockets.
-
公开(公告)号:US11362968B2
公开(公告)日:2022-06-14
申请号:US15640258
申请日:2017-06-30
Applicant: Intel Corporation
Inventor: Ren Wang , Mia Primorac , Tsung-Yuan C. Tai , Saikrishna Edupuganti , John J. Browne
IPC: H04L12/861 , H04L49/90 , H04L47/36 , H04L49/9005
Abstract: Technologies for dynamically managing a batch size of packets include a network device. The network device is to receive, into a queue, packets from a remote node to be processed by the network device, determine a throughput provided by the network device while the packets are processed, determine whether the determined throughput satisfies a predefined condition, and adjust a batch size of packets in response to a determination that the determined throughput satisfies a predefined condition. The batch size is indicative of a threshold number of queued packets required to be present in the queue before the queued packets in the queue can be processed by the network device.
-
27.
公开(公告)号:US11082515B2
公开(公告)日:2021-08-03
申请号:US14866891
申请日:2015-09-26
Applicant: Intel Corporation
Inventor: Dinesh Kumar , Nrupal R. Jani , Ren Wang , Christian Maciocco , Sanjeev Jain
IPC: H04L29/08 , H04L12/825 , H04L12/931 , H04L12/725
Abstract: Technologies for offloading data object replication and service function chain management include a switch communicatively coupled to one or more computing nodes capable of executing virtual machines and storing data objects. The switch is configured to determine metadata of a service function chain, transmit a network packet to a service function of the service function chain being executed by one or more of the computing nodes for processing the network packet. The switch is further configured to receive feedback from service function, update the metadata based on the feedback, and transmit the network packet to a next service function of the service function chain. Additionally or alternatively, the switch is configured to identify a plurality of computing nodes (i.e., storage nodes) at which to store a received data object, replicate the data object based on the number of storage nodes, and transmit each of the received data object and replicated data object(s) to different corresponding storage nodes. Other embodiments are described and claimed.
-
公开(公告)号:US10983585B2
公开(公告)日:2021-04-20
申请号:US15195485
申请日:2016-06-28
Applicant: INTEL CORPORATION
Inventor: Ren Wang , Christian Maciocco , Sanjay Bakshi , Tsung-Yuan Charles Tai
IPC: G06F1/3287 , G06F1/329 , G06F1/3203 , G06F9/4401 , G06F1/3209 , G06F1/3215
Abstract: The present invention relates to platform power management.
-
29.
公开(公告)号:US10789176B2
公开(公告)日:2020-09-29
申请号:US16059147
申请日:2018-08-09
Applicant: Intel Corporation
Inventor: Ren Wang , Yipeng Wang , Tsung-Yuan Tai , Cristian Florin Dumitrescu , Xiangyang Guo
IPC: G06F12/123 , G06F12/126 , G06F12/128 , G06F12/0864 , G06F12/0891 , G06F9/30 , G06F12/0871
Abstract: Technologies for least recently used (LRU) cache replacement include a computing device with a processor with vector instruction support. The computing device retrieves a bucket of an associative cache from memory that includes multiple entries arranged from front to back. The bucket may be a 256-bit array including eight 32-bit entries. For lookups, a matching entry is located at a position in the bucket. The computing device executes a vector permutation processor instruction that moves the matching entry to the front of the bucket while preserving the order of other entries of the bucket. For insertion, an inserted entry is written at the back of the bucket. The computing device executes a vector permutation processor instruction that moves the inserted entry to the front of the bucket while preserving the order of other entries. The permuted bucket is stored to the memory. Other embodiments are described and claimed.
-
30.
公开(公告)号:US10719442B2
公开(公告)日:2020-07-21
申请号:US16126907
申请日:2018-09-10
Applicant: Intel Corporation
Inventor: Ren Wang , Raanan Sade , Yipeng Wang , Tsung-Yuan Tai , Sameh Gobriel
IPC: G06F12/00 , G06F12/0811 , G06F9/38 , G06F16/18
Abstract: An apparatus and method for prioritizing transactional memory regions. For example, one embodiment of a processor comprises: a plurality of cores to execute threads comprising sequences of instructions, at least some of the instructions specifying a transactional memory region; a cache of each core to store a plurality of cache lines; transactional memory circuitry of each core to manage execution of the transactional memory (TM) regions based on priorities associated with each of the TM regions; and wherein the transactional memory circuitry, upon detecting a conflict between a first TM region having a first priority value and a second TM region having a second priority value, is to determine which of the first TM region or the second TM region is permitted to continue executing and which is to be aborted based, at least in part, on the first and second priority values.
-
-
-
-
-
-
-
-
-