-
公开(公告)号:US09753889B2
公开(公告)日:2017-09-05
申请号:US14881111
申请日:2015-10-12
Applicant: Intel Corporation
Inventor: Zeev Sperber , Robert Valentine , Guy Patkin , Stanislav Shwartsman , Shlomo Raikin , Igor Yanover , Gal Ofir
CPC classification number: G06F15/8007 , G06F9/30018 , G06F9/30036 , G06F9/30043 , G06F9/30145 , G06F9/345 , G06F9/3887
Abstract: Methods and apparatus are disclosed for using an index array and finite state machine for scatter/gather operations. Embodiment of apparatus may comprise: decode logic to decode a scatter/gather instruction and generate a set of micro-operations, and an index array to hold a set of indices and a corresponding set of mask elements. A finite state machine facilitates the gather operation. Address generation logic generates an address from an index of the set of indices for at least each of the corresponding mask elements having a first value. An address is accessed to load a corresponding data element if the mask element had the first value. The data element is written at an in-register position in a destination vector register according to a respective in-register position the index. Values of corresponding mask elements are changed from the first value to a second value responsive to completion of their respective loads.
-
公开(公告)号:US20170192934A1
公开(公告)日:2017-07-06
申请号:US14616323
申请日:2015-02-06
Applicant: Intel Corporation
Inventor: Zeev Sperber , Robert Valentine , Guy Patkin , Stanislav Shwartsman , Shlomo Raikin , Igor Yanover , Gal Ofir
CPC classification number: G06F15/8007 , G06F9/30018 , G06F9/30036 , G06F9/30043 , G06F9/30145 , G06F9/345 , G06F9/3887
Abstract: Methods and apparatus are disclosed for using an index array and finite state machine for scatter/gather operations. Embodiment of apparatus may comprise: decode logic to decode a scatter/gather instruction and generate a set of micro-operations, and an index array to hold a set of indices and a corresponding set of mask elements. A finite state machine facilitates the gather operation. Address generation logic generates an address from an index of the set of indices for at least each of the corresponding mask elements having a first value. An address is accessed to load a corresponding data element if the mask element had the first value. The data element is written at an in-register position in a destination vector register according to a respective in-register position the index. Values of corresponding mask elements are changed from the first value to a second value responsive to completion of their respective loads.
-
公开(公告)号:US20150074373A1
公开(公告)日:2015-03-12
申请号:US13977727
申请日:2012-06-02
Applicant: INTEL CORPORATION
Inventor: Zeev Sperber , Robert Valentine , Shlomo Raikin , Stanislav Shwartsman , Gal Ofir , Igor Yanover , Guy Patkin , Levy Ofer
CPC classification number: G06F15/7839 , G06F9/30018 , G06F9/30036 , G06F9/30043 , G06F9/30145 , G06F9/345 , G06F9/3808 , G06F9/383
Abstract: Methods and apparatus are disclosed using an index array and finite state machine for scatter/gather operations. Embodiment of apparatus may comprise: decode logic to decode scatter/gather instructions and generate micro-operations. An index array holds a set of indices and a corresponding set of mask elements. A finite state machine facilitates the scatter operation. Address generation logic generates an address from an index of the set of indices for at least each of the corresponding mask elements having a first value. Storage is allocated in a buffer for each of the set of addresses being generated. Data elements corresponding to the set of addresses being generated are copied to the buffer. Addresses from the set are accessed to store data elements if a corresponding mask element has said first value and the mask element is changed to a second value responsive to completion of their respective stores.
Abstract translation: 公开了使用索引阵列和有限状态机进行散射/收集操作的方法和装置。 设备的实施例可以包括:解码逻辑以解码散射/收集指令并产生微操作。 索引数组保存一组索引和一组对应的掩码元素。 有限状态机有助于散射操作。 地址生成逻辑从针对具有第一值的对应掩模元素中的至少每一个的索引集合的索引生成地址。 正在生成的每组地址的缓冲区中分配存储空间。 与生成的地址集相对应的数据元素被复制到缓冲器。 如果对应的掩码元素具有所述第一值并且掩模元素被响应于它们各自的存储的完成而被改变为第二值,则访问该集合的地址以存储数据元素。
-
公开(公告)号:US11915000B2
公开(公告)日:2024-02-27
申请号:US18160600
申请日:2023-01-27
Applicant: Intel Corporation
Inventor: Ahmad Yasin , Raanan Sade , Liron Zur , Igor Yanover , Joseph Nuzman
CPC classification number: G06F9/30145 , G06F9/30098 , G06F9/544 , G06F9/546 , G06F11/3037 , G06F11/348
Abstract: Systems, methods, and apparatuses relating to circuitry to precisely monitor memory store accesses are described. In one embodiment, a system includes a memory, a hardware processor core comprising a decoder to decode an instruction into a decoded instruction, an execution circuit to execute the decoded instruction to produce a resultant, a store buffer, and a retirement circuit to retire the instruction when a store request for the resultant from the execution circuit is queued into the store buffer for storage into the memory, and a performance monitoring circuit to mark the retired instruction for monitoring of post-retirement performance information between being queued in the store buffer and being stored in the memory, enable a store fence after the retired instruction to be inserted that causes previous store requests to complete within the memory, and on detection of completion of the store request for the instruction in the memory, store the post-retirement performance information in storage of the performance monitoring circuit.
-
公开(公告)号:US11693785B2
公开(公告)日:2023-07-04
申请号:US16728527
申请日:2019-12-27
Applicant: Intel Corporation
Inventor: Ron Gabor , Enrico Perla , Raanan Sade , Igor Yanover , Tomer Stark , Joseph Nuzman
IPC: G06F12/00 , G06F12/0895 , G06F12/1081 , G06F12/1009 , G06F12/0811 , G06F12/14 , G06F9/30 , G06F11/30
CPC classification number: G06F12/0895 , G06F9/30043 , G06F9/30101 , G06F11/3037 , G06F12/0811 , G06F12/1009 , G06F12/1081 , G06F12/1441 , G06F12/1466 , G06F2212/7207
Abstract: An apparatus and method for tagged memory management. For example, one embodiment of a processor comprises: execution circuitry to execute instructions and process data, at least one instruction to generate a system memory access request having a first address pointer; and address translation circuitry to determine whether to translate the first address pointer with or without metadata processing, wherein if the first address pointer is to be translated with metadata processing, the address translation circuitry to: perform a lookup in a memory metadata table to identify a memory metadata value, determine a pointer metadata value associated with the first address pointer, and compare the memory metadata value with the pointer metadata value, the comparison to generate a validation of the memory access request or a fault condition, wherein if the comparison results in a validation of the memory access request, then accessing a set of one or more address translation tables to translate the first address pointer to a first physical address and to return the first physical address responsive to the memory access request.
-
公开(公告)号:US11544062B2
公开(公告)日:2023-01-03
申请号:US16833596
申请日:2020-03-28
Applicant: Intel Corporation
Inventor: Raanan Sade , Igor Yanover , Stanislav Shwartsman , Muhammad Taher , David Zysman , Liron Zur , Yiftach Gilad
Abstract: An apparatus and method for pairing store operations. For example, one embodiment of a processor comprises: a grouping eligibility checker to evaluate a plurality of store instructions based on a set of grouping rules to determine whether two or more of the plurality of store instructions are eligible for grouping; and a dispatcher to simultaneously dispatch a first group of store instructions of the plurality of store instructions determined to be eligible for grouping by the grouping eligibility checker.
-
公开(公告)号:US11288069B2
公开(公告)日:2022-03-29
申请号:US16487755
申请日:2017-07-01
Applicant: Intel Corporation
Inventor: Robert Valentine , Menachem Adelman , Elmoustapha Ould-Ahmed-Vall , Bret L. Toll , Milind B. Girkar , Zeev Sperber , Mark J. Charney , Rinat Rappoport , Jesus Corbal , Stanislav Shwartsman , Igor Yanover , Alexander F. Heinecke , Barukh Ziv , Dan Baum , Yuri Gebil
Abstract: Embodiments detailed herein relate to matrix operations. In particular, the loading of a matrix (tile) from memory. For example, support for a loading instruction is described in at least a form of decode circuitry to decode an instruction having fields for an opcode, a source matrix operand identifier, and destination memory information, and execution circuitry to execute the decoded instruction to store each data element of configured rows of the identified source matrix operand to memory based on the destination memory information
-
公开(公告)号:US20210271305A1
公开(公告)日:2021-09-02
申请号:US17183518
申请日:2021-02-24
Applicant: Intel Corporation
Inventor: Alexander Gendler , Igor Yanover , Gavri Berger , Edo Hachamo , Elkana Korem , Hanan Shomroni , Daniela Kaufman , Lev Makovsky , Haim Granot
IPC: G06F1/324 , G06F1/3206
Abstract: In an embodiment, a processor includes processing cores to execute instructions; and throttling logic. The throttling logic is to: determine an average capacitance score for execution events in a sliding window; perform frequency throttling when the average capacitance score exceeds a throttling threshold; determine a count of frequency throttling instances; and in response to a determination that the count of frequency throttling instances exceeds a maximum throttling value, increase the throttling threshold and concurrently reduce a baseline frequency. Other embodiments are described and claimed.
-
公开(公告)号:US11086623B2
公开(公告)日:2021-08-10
申请号:US16487787
申请日:2017-07-01
Applicant: Intel Corporation
Inventor: Robert Valentine , Zeev Sperber , Mark J. Charney , Bret L. Toll , Rinat Rappoport , Stanislav Shwartsman , Dan Baum , Igor Yanover , Elmoustapha Ould-Ahmed-Vall , Menachem Adelman , Jesus Corbal , Yuri Gebil , Simon Rubanovich
Abstract: Embodiments detailed herein relate to matrix operations. In particular, matrix (tile) multiply accumulate and negated matrix (tile) multiply accumulate are discussed. For example, in some embodiments decode circuitry to decode an instruction having fields for an opcode, an identifier for a first source matrix operand, an identifier of a second source matrix operand, and an identifier for a source/destination matrix operand; and execution circuitry to execute the decoded instruction to multiply the identified first source matrix operand by the identified second source matrix operand, add a result of the multiplication to the identified source/destination matrix operand, and store a result of the addition in the identified source/destination matrix operand and zero unconfigured columns of identified source/destination matrix operand are detailed.
-
公开(公告)号:US10942738B2
公开(公告)日:2021-03-09
申请号:US16368973
申请日:2019-03-29
Applicant: Intel Corporation
Inventor: Zeev Sperber , Amit Gradstein , Simon Rubanovich , Igor Yanover , Gavri Berger , Eyal Hadas , Saeed Kharouf , Ron Schneider , Sagi Meller , Jose Yallouz
Abstract: The present disclosure is directed to systems and methods for performing one or more operations on a two dimensional tile register using an accelerator that includes a tiled matrix multiplication unit (TMU). The processor circuitry includes reservation station (RS) circuitry to communicatively couple the processor circuitry to the TMU. The RS circuitry coordinates the operations performed by the TMU. TMU dispatch queue (TDQ) circuitry in the TMU maintains the operations received from the RS circuitry in the order that the operations are received from the RS circuitry. Since the duration of each operation is not known prior to execution by the TMU, the RS circuitry maintains shadow dispatch queue (RS-TDQ) circuitry that mirrors the operations in the TDQ circuitry. Communication between the RS circuitry 134 and the TMU provides the RS circuitry with notification of successfully executed operations and allows the RS circuitry to cancel operations where the operations are associated with branch mispredictions and/or non-retired speculatively executed instructions.
-
-
-
-
-
-
-
-
-