-
公开(公告)号:US12182018B2
公开(公告)日:2024-12-31
申请号:US17133615
申请日:2020-12-23
Applicant: Intel Corporation
Inventor: Jayesh Gaur , Adarsh Chauhan , Vinodh Gopal , Vedvyas Shanbhogue , Sreenivas Subramoney , Wajdi Feghali
IPC: G06F12/0811 , G06F9/38 , G06F12/0862 , G06F12/0895
Abstract: Methods and apparatus relating to an instruction and/or micro-architecture support for decompression on core are described. In an embodiment, decode circuitry decodes a decompression instruction into a first micro operation and a second micro operation. The first micro operation causes one or more load operations to fetch data into one or more cachelines of a cache of a processor core. Decompression Engine (DE) circuitry decompresses the fetched data from the one or more cachelines of the cache of the processor core in response to the second micro operation. Other embodiments are also disclosed and claimed.
-
公开(公告)号:US12130738B2
公开(公告)日:2024-10-29
申请号:US17130632
申请日:2020-12-22
Applicant: Intel Corporation
Inventor: Vedvyas Shanbhogue , Jayesh Gaur , Wajdi K. Feghali , Vinodh Gopal , Utkarsh Kakaiya
IPC: G06F12/0802 , H03M7/30
CPC classification number: G06F12/0802 , H03M7/60 , G06F2212/401 , G06F2212/60
Abstract: An embodiment of an integrated circuit may comprise, coupled to a core, a hardware decompression accelerator, a compressed cache, a processor and communicatively coupled to the hardware decompression accelerator and the compressed cache, and memory and communicatively coupled to the processor, wherein the memory stores microcode instructions which when executed by the processor causes the processor to store a first address to a decompression work descriptor, retrieve a second address where a compressed page is stored in the compressed cache from the decompression work descriptor at the first address in response to an indication of a page fault, and send instructions to the hardware decompression accelerator to decompress the compressed page at the second address. Other embodiments are disclosed and claimed.
-
43.
公开(公告)号:US12028094B2
公开(公告)日:2024-07-02
申请号:US17133622
申请日:2020-12-23
Applicant: Intel Corporation
Inventor: Jayesh Gaur , Adarsh Chauhan , Vinodh Gopal , Vedvyas Shanbhogue , Sreenivas Subramoney , Wajdi Feghali
CPC classification number: H03M7/6029 , G06F9/3877 , G06F9/541
Abstract: Methods and apparatus relating to an Application Programming Interface (API) for fine grained low latency decompression within a processor core are described. In an embodiment, a decompression Application Programming Interface (API) receives an input handle to a data object. The data object includes compressed data and metadata. Decompression Engine (DE) circuitry decompresses the compressed data to generate uncompressed data. The DE circuitry decompress the compressed data in response to invocation of a decompression instruction by the decompression API. The metadata comprises a first operand to indicate a location of the compressed data, a second operand to indicate a size of the compressed data, a third operand to indicate a location to which decompressed data by the DE circuitry is to be stored, and a fourth operand to indicate a size of the decompressed data. Other embodiments are also disclosed and claimed.
-
44.
公开(公告)号:US20230195456A1
公开(公告)日:2023-06-22
申请号:US17558978
申请日:2021-12-22
Applicant: Intel Corporation
Inventor: Sufiyan Syed , Roger Gramunt , Jayesh Gaur , Priyank Deshpande
CPC classification number: G06F9/28 , G06F9/223 , G06F9/4806 , G06F9/5027 , G06F2209/5014
Abstract: In one embodiment, an apparatus includes: a plurality of execution circuits to execute and instruct micro-operations (μops), where a subset of the plurality of execution circuits are capable of execution of a fused μop; a fusion circuit coupled to at least the subset of the plurality of execution circuits, wherein the fusion circuit is to fuse at least some pairs of producer-consumer μops into fused μops; and a fusion throttle circuit coupled to the fusion circuit, wherein the fusion throttle circuit is to prevent a first μop from being fused with another μop based at least in part on historical information associated with the first μop. Other embodiments are described and claimed.
-
45.
公开(公告)号:US11645078B2
公开(公告)日:2023-05-09
申请号:US16729349
申请日:2019-12-28
Applicant: Intel Corporation
Inventor: Adarsh Chauhan , Franck Sala , Jayesh Gaur , Zeev Sperber , Lihu Rappoport , Adi Yoaz , Sreenivas Subramoney
CPC classification number: G06F9/3806 , G06F9/30058 , G06F9/30145
Abstract: Systems, methods, and apparatuses relating to hardware for auto-predication of critical branches. In one embodiment, a processor core includes a decoder to decode instructions into decoded instructions, an execution unit to execute the decoded instructions, a branch predictor circuit to predict a future outcome of a branch instruction, and a branch predication manager circuit to disable use of the predicted future outcome for a conditional critical branch comprising the branch instruction.
-
46.
公开(公告)号:US20220197659A1
公开(公告)日:2022-06-23
申请号:US17133622
申请日:2020-12-23
Applicant: Intel Corporation
Inventor: Jayesh Gaur , Adarsh Chauhan , Vinodh Gopal , Vedvyas Shanbhogue , Sreenivas Subramoney , Wajdi Feghali
Abstract: Methods and apparatus relating to an Application Programming Interface (API) for fine grained low latency decompression within a processor core are described. In an embodiment, a decompression Application Programming Interface (API) receives an input handle to a data object. The data object includes compressed data and metadata. Decompression Engine (DE) circuitry decompresses the compressed data to generate uncompressed data. The DE circuitry decompress the compressed data in response to invocation of a decompression instruction by the decompression API. The metadata comprises a first operand to indicate a location of the compressed data, a second operand to indicate a size of the compressed data, a third operand to indicate a location to which decompressed data by the DE circuitry is to be stored, and a fourth operand to indicate a size of the decompressed data. Other embodiments are also disclosed and claimed.
-
公开(公告)号:US11256599B2
公开(公告)日:2022-02-22
申请号:US17128291
申请日:2020-12-21
Applicant: Intel Corporation
Inventor: Adarsh Chauhan , Jayesh Gaur , Franck Sala , Lihu Rappoport , Zeev Sperber , Adi Yoaz , Sreenivas Subramoney
Abstract: A processor comprises a microarchitectural feature and dynamic tuning unit (DTU) circuitry. The processor executes a program for first and second execution windows with the microarchitectural feature disabled and enabled, respectively. The DTU circuitry automatically determines whether the processor achieved worse performance in the second execution window. In response to determining that the processor achieved worse performance in the second execution window, the DTU circuitry updates a usefulness state for a selected address of the program to denote worse performance. In response to multiple consecutive determinations that the processor achieved worse performance with the microarchitectural feature enabled, the DTU circuitry automatically updates the usefulness state to denote a confirmed bad state. In response to the usefulness state denoting the confirmed bad state, the DTU circuitry automatically disables the microarchitectural feature for the selected address for execution windows after the second execution window. Other embodiments are described and claimed.
-
公开(公告)号:US11188467B2
公开(公告)日:2021-11-30
申请号:US15717939
申请日:2017-09-28
Applicant: Intel Corporation
Inventor: Israel Diamand , Alaa R. Alameldeen , Sreenivas Subramoney , Supratik Majumder , Srinivas Santosh Kumar Madugula , Jayesh Gaur , Zvika Greenfield , Anant V. Nori
IPC: G06F12/00 , G06F12/0846 , G06F12/0811 , G06F12/128 , G06F12/121 , G06F12/0886 , G06F12/08
Abstract: A method is described. The method includes receiving a read or write request for a cache line. The method includes directing the request to a set of logical super lines based on the cache line's system memory address. The method includes associating the request with a cache line of the set of logical super lines. The method includes, if the request is a write request: compressing the cache line to form a compressed cache line, breaking the cache line down into smaller data units and storing the smaller data units into a memory side cache. The method includes, if the request is a read request: reading smaller data units of the compressed cache line from the memory side cache and decompressing the cache line.
-
公开(公告)号:US10776270B2
公开(公告)日:2020-09-15
申请号:US16222788
申请日:2018-12-17
Applicant: Intel Corporation
Inventor: Jayesh Gaur , Ayan Mandal , Anant V. Nori , Sreenivas Subramoney
IPC: G06F12/0811 , G06F12/0804 , G06F12/084 , G06F12/0888 , G06F11/34
Abstract: A memory-efficient last level cache (LLC) architecture is described. A processor implementing a LLC architecture may include a processor core, a last level cache (LLC) operatively coupled to the processor core, and a cache controller operatively coupled to the LLC. The cache controller is to monitor a bandwidth demand of a channel between the processor core and a dynamic random-access memory (DRAM) device associated with the LLC. The cache controller is further to perform a first defined number of consecutive reads from the DRAM device when the bandwidth demand exceeds a first threshold value and perform a first defined number of consecutive writes of modified lines from the LLC to the DRAM device when the bandwidth demand exceeds the first threshold value.
-
50.
公开(公告)号:US20200169383A1
公开(公告)日:2020-05-28
申请号:US16776467
申请日:2020-01-29
Applicant: Intel Corporation
Inventor: David M. Durham , Michael LeMay , Michael E. Kounavis , Santosh Ghosh , Sergej Deutsch , Anant Vithal Nori , Jayesh Gaur , Sreenivas Subramoney , Karanvir S. Grewal
IPC: H04L9/06 , G06F12/1027 , G06F9/30
Abstract: A processor comprises a first register to store an encoded pointer to a memory location. First context information is stored in first bits of the encoded pointer and a slice of a linear address of the memory location is stored in second bits of the encoded pointer. The processor also includes circuitry to execute a memory access instruction to obtain a physical address of the memory location, access encrypted data at the memory location, derive a first tweak based at least in part on the encoded pointer, and generate a keystream based on the first tweak and a key. The circuitry is to further execute the memory access instruction to store state information associated with memory access instruction in a first buffer, and to decrypt the encrypted data based on the keystream. The keystream is to be generated at least partly in parallel with accessing the encrypted data.
-
-
-
-
-
-
-
-
-