-
公开(公告)号:US09342334B2
公开(公告)日:2016-05-17
申请号:US13530793
申请日:2012-06-22
CPC分类号: G06F9/4552 , G06F11/3644 , G06F17/5022 , G06F2217/86
摘要: A system and method for simulating new instructions without compiler support for the new instructions. A simulator detects a given region in code generated by a compiler. The given region may be a candidate for vectorization or may be a region already vectorized. In response to the detection, the simulator suspends execution of a time-based simulation. The simulator then serially executes the region for at least two iterations using a functional-based simulation and using instructions with operands which correspond to P or less lanes of single-instruction-multiple-data (SIMD) execution. The value P is a maximum number of lanes of SIMD exection supported both by the compiler. The simulator stores checkpoint state during the serial execution. In response to determining no inter-iteration memory dependencies exist, the simulator returns to the time-based simulation and resumes execution using N-wide vector instructions.
摘要翻译: 用于模拟新指令的系统和方法,无需编译器支持新指令。 模拟器会检测编译器生成的代码中的给定区域。 给定区域可以是向量化的候选者,或者可以是已经向量化的区域。 响应于该检测,模拟器暂停执行基于时间的模拟。 仿真器然后使用基于功能的仿真并使用具有对应于单指令多数据(SIMD)执行的P或更少通道的操作数的指令来串行地执行该区域至少两次迭代。 值P是由编译器支持的SIMD exection的最大通道数。 模拟器在串行执行期间存储检查点状态。 响应于确定不存在迭代存储器依赖性,仿真器返回到基于时间的仿真,并使用N宽向量指令恢复执行。
-
公开(公告)号:US20130346058A1
公开(公告)日:2013-12-26
申请号:US13530793
申请日:2012-06-22
IPC分类号: G06F9/45
CPC分类号: G06F9/4552 , G06F11/3644 , G06F17/5022 , G06F2217/86
摘要: A system and method for simulating new instructions without compiler support for the new instructions. A simulator detects a given region in code generated by a compiler. The given region may be a candidate for vectorization or may be a region already vectorized. In response to the detection, the simulator suspends execution of a time-based simulation. The simulator then serially executes the region for at least two iterations using a functional-based simulation and using instructions with operands which correspond to P or less lanes of single-instruction-multiple-data (SIMD) execution. The value P is a maximum number of lanes of SIMD exection supported both by the compiler. The simulator stores checkpoint state during the serial execution. In response to determining no inter-iteration memory dependencies exist, the simulator returns to the time-based simulation and resumes execution using N-wide vector instructions.
摘要翻译: 用于模拟新指令的系统和方法,无需编译器支持新指令。 模拟器会检测编译器生成的代码中的给定区域。 给定区域可以是向量化的候选者,或者可以是已经向量化的区域。 响应于该检测,模拟器暂停执行基于时间的模拟。 仿真器然后使用基于功能的仿真并使用具有对应于单指令多数据(SIMD)执行的P或更少通道的操作数的指令来串行地执行该区域至少两次迭代。 值P是由编译器支持的SIMD exection的最大通道数。 模拟器在串行执行期间存储检查点状态。 响应于确定不存在迭代存储器依赖性,仿真器返回到基于时间的仿真,并使用N宽向量指令恢复执行。
-
3.
公开(公告)号:US20130007373A1
公开(公告)日:2013-01-03
申请号:US13173441
申请日:2011-06-30
IPC分类号: G06F12/12
CPC分类号: G06F12/126 , G06F2212/502
摘要: A method, apparatus, and system for replacing at least one cache region selected from a plurality of cache regions, wherein each of the regions is composed of a plurality of blocks is disclosed. The method includes applying a first algorithm to the plurality of cache regions to limit the number of potential candidate regions to a preset value, wherein the first algorithm assesses the ability of a region to be replaced based on properties of the plurality of blocks associated with that region; and designating at least one of the limited potential candidate regions as a victim based region level information associated with each of the limited potential candidate regions.
摘要翻译: 一种用于替换从多个高速缓存区域中选择的至少一个高速缓存区域的方法,装置和系统,其中每个区域由多个块组成。 该方法包括将第一算法应用于多个高速缓存区域以将潜在候选区域的数量限制为预设值,其中第一算法基于与该相关联的多个块相关联的属性来评估区域被替换的能力 地区; 以及将所述有限潜在候选区域中的至少一个指定为与所述有限潜在候选区域中的每一个相关联的基于受害者的区域级别信息。
-
公开(公告)号:US20110314255A1
公开(公告)日:2011-12-22
申请号:US12817945
申请日:2010-06-17
CPC分类号: G06F15/17337
摘要: A processor and method for broadcasting data among a plurality of processing cores is disclosed. The processor includes a plurality of processing cores connected by point-to-point connections. A first of the processing cores includes a router that includes at least an allocation unit and an output port. The allocation unit is configured to determine that respective input buffers on at least two others of the processing cores are available to receive given data. The output port is usable by the router to send the given data across one of the point-to-point connections. The router is configured to send the given data contingent on determining that the respective input buffers are available. Furthermore, the processor is configured to deliver the data to the at least two other processing cores in response to the first processing core sending the data once across the point-to-point connection.
摘要翻译: 公开了一种用于在多个处理核心之间广播数据的处理器和方法。 处理器包括通过点对点连接连接的多个处理核心。 处理核心中的第一个包括至少包括分配单元和输出端口的路由器。 配置单元被配置为确定处理核中的至少两个其他输入缓冲器可用于接收给定数据。 输出端口可用于路由器通过点对点连接之一发送给定数据。 路由器被配置为在确定相应的输入缓冲器可用的情况下发送给定数据。 此外,处理器被配置为响应于第一处理核心通过点对点连接发送数据一次将数据传送到至少两个其他处理核心。
-
公开(公告)号:US09015448B2
公开(公告)日:2015-04-21
申请号:US12817945
申请日:2010-06-17
IPC分类号: G06F15/00 , G06F15/76 , G06F15/173
CPC分类号: G06F15/17337
摘要: A processor and method for broadcasting data among a plurality of processing cores is disclosed. The processor includes a plurality of processing cores connected by point-to-point connections. A first of the processing cores includes a router that includes at least an allocation unit and an output port. The allocation unit is configured to determine that respective input buffers on at least two others of the processing cores are available to receive given data. The output port is usable by the router to send the given data across one of the point-to-point connections. The router is configured to send the given data contingent on determining that the respective input buffers are available. Furthermore, the processor is configured to deliver the data to the at least two other processing cores in response to the first processing core sending the data once across the point-to-point connection.
摘要翻译: 公开了一种用于在多个处理核心之间广播数据的处理器和方法。 处理器包括通过点对点连接连接的多个处理核心。 处理核心中的第一个包括至少包括分配单元和输出端口的路由器。 配置单元被配置为确定处理核中的至少两个其他输入缓冲器可用于接收给定数据。 输出端口可用于路由器通过点对点连接之一发送给定数据。 路由器被配置为在确定相应的输入缓冲器可用的情况下发送给定数据。 此外,处理器被配置为响应于第一处理核心通过点对点连接发送数据一次将数据传送到至少两个其他处理核心。
-
6.
公开(公告)号:US20130097385A1
公开(公告)日:2013-04-18
申请号:US13275538
申请日:2011-10-18
IPC分类号: G06F12/08
CPC分类号: G06F12/0817 , G06F12/0813 , G06F12/084
摘要: A system and method of providing directory cache coherence are disclosed. The system and method may include tracking the coherence state of at least one cache block contained within a region using a global directory, providing at least one region level sharing information about the least one cache block in the global directory, and providing at least one block level sharing information about the at least one cache block in the global directory. The tracking of the provided at least one region level sharing information and the provided at least one block level sharing information may organize the coherence state of the at least one cache block and the region.
摘要翻译: 公开了提供目录高速缓存一致性的系统和方法。 该系统和方法可以包括使用全局目录跟踪包含在区域内的至少一个高速缓存块的相干状态,提供关于全局目录中的至少一个高速缓存块的至少一个区域级共享信息,以及提供至少一个块 关于全局目录中的至少一个高速缓存块的级别共享信息。 所提供的至少一个区域级共享信息的跟踪和所提供的至少一个块级共享信息可以组织至少一个高速缓存块和该区域的相干状态。
-
公开(公告)号:US20130073811A1
公开(公告)日:2013-03-21
申请号:US13234855
申请日:2011-09-16
IPC分类号: G06F12/08
CPC分类号: G06F12/0817 , Y02D10/13
摘要: A system and method for region privatization in a directory-based cache coherence system is disclosed. The system and method includes receiving a request from a requesting node for at least one block in a region, allocating a new entry for the region based on the request for the block, requesting from the memory controller the data for the region be sent to the requesting node, receiving a subsequent request for a block within the region, determining that any blocks of the region that are cached are also cached at the requesting node, and privatizing the region at the requesting node.
摘要翻译: 公开了一种用于基于目录的高速缓存一致性系统中的区域私有化的系统和方法。 该系统和方法包括从请求节点接收针对区域中的至少一个块的请求,基于对块的请求为该区域分配新条目,向存储器控制器请求将该区域的数据发送到 接收对区域内的块的后续请求,确定缓存的区域的任何块也被缓存在请求节点处,并且在请求节点处对该区域进行私有化。
-
8.
公开(公告)号:US09170948B2
公开(公告)日:2015-10-27
申请号:US13726146
申请日:2012-12-23
CPC分类号: G06F12/0828 , G11C5/025 , G11C8/12 , G11C29/12 , H01L25/0655 , H01L25/0657 , H01L25/18 , H01L2224/16225 , H01L2225/06541 , H01L2225/06565 , H01L2924/1461 , H01L2924/15311 , Y02D10/13 , H01L2924/00
摘要: A die-stacked memory device implements an integrated coherency manager to offload cache coherency protocol operations for the devices of a processing system. The die-stacked memory device includes a set of one or more stacked memory dies and a set of one or more logic dies. The one or more logic dies implement hardware logic providing a memory interface and the coherency manager. The memory interface operates to perform memory accesses in response to memory access requests from the coherency manager and the one or more external devices. The coherency manager comprises logic to perform coherency operations for shared data stored at the stacked memory dies. Due to the integration of the logic dies and the memory dies, the coherency manager can access shared data stored in the memory dies and perform related coherency operations with higher bandwidth and lower latency and power consumption compared to the external devices.
摘要翻译: 堆叠堆叠的存储器件实现集成的一致性管理器以卸载处理系统的设备的高速缓存一致性协议操作。 芯片堆叠的存储器件包括一组一个或多个堆叠的存储器管芯和一组一个或多个逻辑管芯。 一个或多个逻辑模块实现提供存储器接口和一致性管理器的硬件逻辑。 存储器接口操作以响应来自一致性管理器和一个或多个外部设备的存储器访问请求来执行存储器访问。 相关性管理器包括对存储在堆叠存储器管芯上的共享数据执行一致性操作的逻辑。 由于逻辑管芯和存储器管芯的集成,一致性管理器可以访问存储在存储器管芯中的共享数据,并且与外部器件相比具有更高带宽和更低的延迟和功耗的相关一致性操作。
-
公开(公告)号:US09135185B2
公开(公告)日:2015-09-15
申请号:US13726143
申请日:2012-12-23
申请人: Gabriel H. Loh , Bradford M. Beckmann , James M. O'Connor , Michael Ignatowski , Michael J. Schulte , Lisa R. Hsu , Nuwan S. Jayasena
发明人: Gabriel H. Loh , Bradford M. Beckmann , James M. O'Connor , Michael Ignatowski , Michael J. Schulte , Lisa R. Hsu , Nuwan S. Jayasena
CPC分类号: G06F12/1027 , H01L25/18 , H01L2224/16225 , H01L2225/06565 , H01L2924/0002 , H01L2924/15311 , H01L2924/00
摘要: A die-stacked memory device incorporates a data translation controller at one or more logic dies of the device to provide data translation services for data to be stored at, or retrieved from, the die-stacked memory device. The data translation operations implemented by the data translation controller can include compression/decompression operations, encryption/decryption operations, format translations, wear-leveling translations, data ordering operations, and the like. Due to the tight integration of the logic dies and the memory dies, the data translation controller can perform data translation operations with higher bandwidth and lower latency and power consumption compared to operations performed by devices external to the die-stacked memory device.
摘要翻译: 芯片堆叠存储器件在器件的一个或多个逻辑管芯上并入数据转换控制器,以提供数据转换服务,用于存储在芯片堆叠存储器件中或从芯片堆叠的存储器件中取出的数据。 由数据转换控制器实现的数据转换操作可以包括压缩/解压缩操作,加密/解密操作,格式转换,磨损均衡转换,数据排序操作等。 由于逻辑管芯和存储器管芯的紧密集成,与堆叠式存储器件外部的器件执行的操作相比,数据转换控制器可以执行具有更高带宽和更低延迟和功耗的数据转换操作。
-
公开(公告)号:US08621131B2
公开(公告)日:2013-12-31
申请号:US13221465
申请日:2011-08-30
IPC分类号: G06F13/00
CPC分类号: G06F13/4265 , G06F2213/0038
摘要: Various methods, computer-readable mediums, articles of manufacture and systems are disclosed. In one aspect, a method is provided that includes generating a packet with a first semiconductor chip. The packet is destined to transit a first substrate and be received by a node of a second semiconductor chip. The packet includes a packet header and packet body. The packet header includes an identification of a first exit point from the first substrate and an identification of the node. The packet is sent to the first substrate and eventually to the node of the second semiconductor chip.
摘要翻译: 公开了各种方法,计算机可读介质,制品和系统。 一方面,提供一种包括用第一半导体芯片产生分组的方法。 分组旨在传送第一衬底并由第二半导体芯片的节点接收。 分组包括分组报头和分组主体。 分组报头包括来自第一基板的第一出口点的标识和节点的标识。 分组被发送到第一衬底并且最终传送到第二半导体芯片的节点。
-
-
-
-
-
-
-
-
-