Multicast tree-based data distribution in distributed shared cache

    公开(公告)号:US09734069B2

    公开(公告)日:2017-08-15

    申请号:US14567026

    申请日:2014-12-11

    Abstract: Systems and methods for multicast tree-based data distribution in a distributed shared cache. An example processing system comprises: a plurality of processing cores, each processing core communicatively coupled to a cache; a tag directory associated with caches of the plurality of processing cores; a shared cache associated with the tag directory; a processing logic configured, responsive to receiving an invalidate request with respect to a certain cache entry, to: allocate, within the shared cache, a shared cache entry corresponding to the certain cache entry; transmit, to at least one of: a tag directory or a processing core that last accessed the certain entry, an update read request with respect to the certain cache entry; and responsive to receiving an update of the certain cache entry, broadcast the update to at least one of: one or more tag directories or one or more processing cores identified by a tag corresponding to the certain cache entry.

    LOW ENERGY CONSUMPTION MANTISSA MULTIPLICATION FOR FLOATING POINT MULTIPLY-ADD OPERATIONS

    公开(公告)号:US20180095728A1

    公开(公告)日:2018-04-05

    申请号:US15283295

    申请日:2016-10-01

    CPC classification number: G06F7/5443 G06F7/4876

    Abstract: A floating point multiply-add unit having inputs coupled to receive a floating point multiplier data element, a floating point multiplicand data element, and a floating point addend data element. The multiply-add unit including a mantissa multiplier to multiply a mantissa of the multiplier data element and a mantissa of the multiplicand data element to calculate a mantissa product. The mantissa multiplier including a most significant bit portion to calculate most significant bits of the mantissa product, and a least significant bit portion to calculate least significant bits of the mantissa product. The mantissa multiplier has a plurality of different possible sizes of the least significant bit portion. Energy consumption reduction logic to selectively reduce energy consumption of the least significant bit portion, but not the most significant bit portion, to cause the least significant bit portion to not calculate the least significant bits of the mantissa product.

    HARDWARE APPARATUSES AND METHODS TO CONTROL CACHE LINE COHERENCY
    5.
    发明申请
    HARDWARE APPARATUSES AND METHODS TO CONTROL CACHE LINE COHERENCY 有权
    硬件设备和控制高速缓存行的方法

    公开(公告)号:US20160092354A1

    公开(公告)日:2016-03-31

    申请号:US14498946

    申请日:2014-09-26

    Abstract: Methods and apparatuses to control cache line coherency are described. A processor may include a first core having a cache to store a cache line, a second core to send a request for the cache line from the first core, moving logic to cause a move of the cache line between the first core and a memory and to update a tag directory of the move, and cache line coherency logic to create a chain home in the tag directory from the request to cause the cache line to be sent from the tag directory to the second core. A method to control cache line coherency may include creating a chain home in a tag directory from a request for a cache line in a first processor core from a second processor core to cause the cache line to be sent from the tag directory to the second processor core.

    Abstract translation: 描述了控制高速缓存行一致性的方法和装置。 处理器可以包括具有高速缓存以存储高速缓存行的第一核心,从第一核心发送对高速缓存线路的请求的第二核心,移动逻辑以使高速缓存行在第一核心和存储器之间移动; 更新移动的标签目录,以及高速缓存行一致性逻辑,以从请求中在标签目录中创建链路归属,以使高速缓存行从标签目录发送到第二核心。 控制高速缓存行相关性的方法可以包括:从第二处理器核心的第一处理器核心中的对高速缓存行的请求创建标签目录中的链路归属,以使高速缓存行从标签目录发送到第二处理器 核心。

    Scalable multi-layer 2D-mesh routers
    6.
    发明授权
    Scalable multi-layer 2D-mesh routers 有权
    可扩展的多层二维网状路由器

    公开(公告)号:US09294419B2

    公开(公告)日:2016-03-22

    申请号:US13927523

    申请日:2013-06-26

    Abstract: Architectures, apparatus and systems employing scalable multi-layer 2D-mesh routers. A 2D router mesh comprises bi-direction pairs of linked paths coupled between pairs of IO interfaces and configured in a plurality of rows and columns forming a 2D mesh. Router nodes are located at the intersections of the rows and columns, and are configured to forward data units between IO inputs and outputs coupled to the mesh at its edges through use of shortest path routes defined by agents at the IO interfaces. Multiple instances of the 2D meshes may be employed to support bandwidth scaling of the router architecture. One implementation of a multi-layer 2D mesh is built using a standard tile that is tessellated to form a 2D array of standard tiles, with each 2D mesh layer offset and overlaid relative to the other 2D mesh layers. IO interfaces are then coupled to the multi-layer 2D mesh via muxes/demuxes and/or crossbar interconnects.

    Abstract translation: 采用可扩展多层二维网状路由器的架构,设备和系统。 2D路由器网格包括耦合在IO对接口之间的双向对链接路径,并且被配置成形成2D网格的多个行和列。 路由器节点位于行和列的交点处,并且被配置为通过使用由IO接口上的代理定义的最短路径路由来在IO输入和耦合到其边缘的网格的输出之间转发数据单元。 可以采用2D网格的多个实例来支持路由器架构的带宽缩放。 使用被镶嵌的标准瓦片来构建多层2D网格的一个实施方式,以形成标准瓦片的2D阵列,其中每个2D网格层相对于其他2D网格层偏移并重叠。 然后,IO接口通过多路复用/解复用和/或交叉连接互连到多层2D网格。

    Low energy consumption mantissa multiplication for floating point multiply-add operations

    公开(公告)号:US10402168B2

    公开(公告)日:2019-09-03

    申请号:US15283295

    申请日:2016-10-01

    Abstract: A floating point multiply-add unit having inputs coupled to receive a floating point multiplier data element, a floating point multiplicand data element, and a floating point addend data element. The multiply-add unit including a mantissa multiplier to multiply a mantissa of the multiplier data element and a mantissa of the multiplicand data element to calculate a mantissa product. The mantissa multiplier including a most significant bit portion to calculate most significant bits of the mantissa product, and a least significant bit portion to calculate least significant bits of the mantissa product. The mantissa multiplier has a plurality of different possible sizes of the least significant bit portion. Energy consumption reduction logic to selectively reduce energy consumption of the least significant bit portion, but not the most significant bit portion, to cause the least significant bit portion to not calculate the least significant bits of the mantissa product.

    Hardware compilation and/or translation with fault detection and roll back functionality
    10.
    发明授权
    Hardware compilation and/or translation with fault detection and roll back functionality 有权
    具有故障检测和回滚功能的硬件编译和/或翻译

    公开(公告)号:US09317263B2

    公开(公告)日:2016-04-19

    申请号:US14513402

    申请日:2014-10-14

    Abstract: Hardware compilation and/or translation with fault detection and roll back functionality are disclosed. Compilation and/or translation logic receives programs encoded in one language, and encodes the programs into a second language including instructions to support processor features not encoded into the original language encoding of the programs. In one embodiment, an execution unit executes instructions of the second language including an operation-check instruction to perform a first operation and record the first operation result for a comparison, and an operation-test instruction to perform a second operation and a fault detection operation by comparing the second operation result to the recorded first operation result. In some embodiments, an execution unit executes instructions of the second language including commit instructions to record execution checkpoint states of registers mapped to architectural registers, and roll-back instructions to restore the registers mapped to architectural registers to previously recorded execution checkpoint states.

    Abstract translation: 公开了具有故障检测和回滚功能的硬件编译和/或翻译。 编译和/或翻译逻辑接收以一种语言编码的程序,并且将该程序编码成包括指令的第二语言,以支持未被编码为程序的原始语言编码的处理器特征。 在一个实施例中,执行单元执行包括执行第一操作的操作检查指令的第二语言的指令并记录用于比较的第一操作结果,以及执行第二操作和故障检测操作的操作测试指令 通过比较第二操作结果与记录的第一操作结果。 在一些实施例中,执行单元执行第二语言的指令,包括提交指令以记录映射到架构寄存器的寄存器的执行检查点状态,以及回滚指令,将映射到架构寄存器的寄存器恢复到先前记录的执行检查点状态。

Patent Agency Ranking