Low energy consumption mantissa multiplication for floating point multiply-add operations

    公开(公告)号:US10402168B2

    公开(公告)日:2019-09-03

    申请号:US15283295

    申请日:2016-10-01

    申请人: Intel Corporation

    IPC分类号: G06F7/487 G06F7/544

    摘要: A floating point multiply-add unit having inputs coupled to receive a floating point multiplier data element, a floating point multiplicand data element, and a floating point addend data element. The multiply-add unit including a mantissa multiplier to multiply a mantissa of the multiplier data element and a mantissa of the multiplicand data element to calculate a mantissa product. The mantissa multiplier including a most significant bit portion to calculate most significant bits of the mantissa product, and a least significant bit portion to calculate least significant bits of the mantissa product. The mantissa multiplier has a plurality of different possible sizes of the least significant bit portion. Energy consumption reduction logic to selectively reduce energy consumption of the least significant bit portion, but not the most significant bit portion, to cause the least significant bit portion to not calculate the least significant bits of the mantissa product.

    Layered super-reticle computing : architectures and methods

    公开(公告)号:US10963022B2

    公开(公告)日:2021-03-30

    申请号:US16862263

    申请日:2020-04-29

    申请人: Intel Corporation

    摘要: Embodiments herein may present an integrated circuit or a computing system having an integrated circuit, where the integrated circuit includes a physical network layer, a physical computing layer, and a physical memory layer, each having a set of dies, and a die including multiple tiles. The physical network layer further includes one or more signal pathways dynamically configurable between multiple pre-defined interconnect topologies for the multiple tiles, where each topology of the multiple pre-defined interconnect topologies corresponds to a communication pattern related to a workload. At least a tile in the physical computing layer is further arranged to move data to another tile in the physical computing layer or a storage cell of the physical memory layer through the one or more signal pathways in the physical network layer. Other embodiments may be described and/or claimed.

    Processors, methods, and systems with a configurable spatial accelerator

    公开(公告)号:US10416999B2

    公开(公告)日:2019-09-17

    申请号:US15396395

    申请日:2016-12-30

    申请人: Intel Corporation

    IPC分类号: G06F9/30 G06F15/82 G06F13/42

    摘要: Systems, methods, and apparatuses relating to a configurable spatial accelerator are described. In one embodiment, a processor includes a core with a decoder to decode an instruction into a decoded instruction and an execution unit to execute the decoded instruction to perform a first operation; a plurality of processing elements; and an interconnect network between the plurality of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, wherein the dataflow graph is to be overlaid into the interconnect network and the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements, and the plurality of processing elements are to perform a second operation by a respective, incoming operand set arriving at each of the dataflow operators of the plurality of processing elements.

    Synchronization logic for memory requests

    公开(公告)号:US10146690B2

    公开(公告)日:2018-12-04

    申请号:US15180351

    申请日:2016-06-13

    申请人: Intel Corporation

    IPC分类号: G06F12/0831

    摘要: In an embodiment, a processor includes a plurality of cores and synchronization logic. The synchronization logic includes circuitry to: receive a first memory request and a second memory request; determine whether the second memory request is in contention with the first memory request; and in response to a determination that the second memory request is in contention with the first memory request, process the second memory request using a non-blocking cache coherence protocol. Other embodiments are described and claimed.

    Multicast tree-based data distribution in distributed shared cache

    公开(公告)号:US09734069B2

    公开(公告)日:2017-08-15

    申请号:US14567026

    申请日:2014-12-11

    申请人: Intel Corporation

    摘要: Systems and methods for multicast tree-based data distribution in a distributed shared cache. An example processing system comprises: a plurality of processing cores, each processing core communicatively coupled to a cache; a tag directory associated with caches of the plurality of processing cores; a shared cache associated with the tag directory; a processing logic configured, responsive to receiving an invalidate request with respect to a certain cache entry, to: allocate, within the shared cache, a shared cache entry corresponding to the certain cache entry; transmit, to at least one of: a tag directory or a processing core that last accessed the certain entry, an update read request with respect to the certain cache entry; and responsive to receiving an update of the certain cache entry, broadcast the update to at least one of: one or more tag directories or one or more processing cores identified by a tag corresponding to the certain cache entry.

    Processors, methods, and systems with a configurable spatial accelerator

    公开(公告)号:US10558575B2

    公开(公告)日:2020-02-11

    申请号:US15396402

    申请日:2016-12-30

    申请人: INTEL CORPORATION

    摘要: Systems, methods, and apparatuses relating to a configurable spatial accelerator are described. In one embodiment, a processor includes a core with a decoder to decode an instruction into a decoded instruction and an execution unit to execute the decoded instruction to perform a first operation; a plurality of processing elements; and an interconnect network between the plurality of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, wherein the dataflow graph is to be overlaid into the interconnect network and the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements, and the plurality of processing elements is to perform a second operation when an incoming operand set arrives at the plurality of processing elements.