Mechanisms to bound the presence of cache blocks with specific properties in caches
    11.
    发明授权
    Mechanisms to bound the presence of cache blocks with specific properties in caches 有权
    限制缓存中具有特定属性的高速缓存块的存在的机制

    公开(公告)号:US09251069B2

    公开(公告)日:2016-02-02

    申请号:US14055869

    申请日:2013-10-16

    Abstract: A system and method for efficiently limiting storage space for data with particular properties in a cache memory. A computing system includes a cache array and a corresponding cache controller. The cache array includes multiple banks, wherein a first bank is powered down. In response a write request to a second bank for data indicated to be stored in the powered down first bank, the cache controller determines a respective bypass condition for the data. If the bypass condition exceeds a threshold, then the cache controller invalidates any copy of the data stored in the second bank. If the bypass condition does not exceed the threshold, then the cache controller stores the data with a clean state in the second bank. The cache controller writes the data in a lower-level memory for both cases.

    Abstract translation: 一种用于有效地限制高速缓冲存储器中具有特定属性的数据的存储空间的系统和方法。 计算系统包括高速缓存阵列和对应的高速缓存控制器。 高速缓存阵列包括多个存储体,其中第一存储体断电。 作为响应,向第二存储体写入请求以指示存储在掉电第一存储体中的数据,高速缓存控制器确定数据的相应旁路条件。 如果旁路条件超过阈值,则高速缓存控制器使存储在第二组中的数据的任何副本无效。 如果旁路条件不超过阈值,则高速缓存控制器将具有干净状态的数据存储在第二存储体中。 高速缓存控制器将这些数据写入较低级别的内存。

    Die-stacked memory device with reconfigurable logic
    13.
    发明授权
    Die-stacked memory device with reconfigurable logic 有权
    具有可重构逻辑的堆叠式存储器件

    公开(公告)号:US08922243B2

    公开(公告)日:2014-12-30

    申请号:US13726145

    申请日:2012-12-23

    Abstract: A die-stacked memory device incorporates a reconfigurable logic device to provide implementation flexibility in performing various data manipulation operations and other memory operations that use data stored in the die-stacked memory device or that result in data that is to be stored in the die-stacked memory device. One or more configuration files representing corresponding logic configurations for the reconfigurable logic device can be stored in a configuration store at the die-stacked memory device, and a configuration controller can program a reconfigurable logic fabric of the reconfigurable logic device using a selected one of the configuration files. Due to the integration of the logic dies and the memory dies, the reconfigurable logic device can perform various data manipulation operations with higher bandwidth and lower latency and power consumption compared to devices external to the die-stacked memory device.

    Abstract translation: 芯片堆叠的存储器件包括可重构逻辑器件,以在执行各种数据操作操作和使用存储在管芯堆叠的存储器件中的数据的其他存储器操作中提供实现灵活性,或者导致要存储在管芯堆叠存储器件中的数据。 堆叠式存储设备。 代表可重配置逻辑器件的相应逻辑配置的一个或多个配置文件可被存储在管芯堆叠的存储器件的配置存储器中,并且配置控制器可使用所选择的一个存储器件对可重新配置的逻辑器件进行编程 配置文件。 由于逻辑管芯和存储器管芯的集成,与可堆叠存储器件外部的器件相比,可重构逻辑器件可以执行具有更高带宽和更低延迟和功耗的各种数据操作操作。

    REDUCING COLD TLB MISSES IN A HETEROGENEOUS COMPUTING SYSTEM
    14.
    发明申请
    REDUCING COLD TLB MISSES IN A HETEROGENEOUS COMPUTING SYSTEM 审中-公开
    减少异构计算系统中的冷TLB缺陷

    公开(公告)号:US20140101405A1

    公开(公告)日:2014-04-10

    申请号:US13645685

    申请日:2012-10-05

    Abstract: Methods and apparatuses are provided for avoiding cold translation lookaside buffer (TLB) misses in a computer system. A typical system is configured as a heterogeneous computing system having at least one central processing unit (CPU) and one or more graphic processing units (GPUs) that share a common memory address space. Each processing unit (CPU and GPU) has an independent TLB. When offloading a task from a particular CPU to a particular GPU, translation information is sent along with the task assignment. The translation information allows the GPU to load the address translation data into the TLB associated with the one or more GPUs prior to executing the task. Preloading the TLB of the GPUs reduces or avoids cold TLB misses that could otherwise occur without the benefits offered by the present disclosure.

    Abstract translation: 提供了用于避免计算机系统中冷翻译后备缓冲器(TLB)未命中的方法和装置。 典型的系统被配置为具有至少一个中央处理单元(CPU)和共享公共存储器地址空间的一个或多个图形处理单元(GPU)的异构计算系统。 每个处理单元(CPU和GPU)都有独立的TLB。 当将任务从特定CPU卸载到特定GPU时,将随任务分配一起发送翻译信息。 翻译信息允许GPU在执行任务之前将地址转换数据加载到与一个或多个GPU相关联的TLB中。 GPU的预加载减少或避免了在没有本公开提供的优点的情况下可能发生的冷TLB未命中。

    Fused Data Generation and Associated Communication

    公开(公告)号:US20240201990A1

    公开(公告)日:2024-06-20

    申请号:US18190620

    申请日:2023-03-27

    CPC classification number: G06F9/30036 G06F9/3834

    Abstract: Fused data generation and associated communication techniques are described. In an implementation, a system includes processing system having a plurality of processors. A data generation and communication tracking module is configured to track programmatically defined data generation and associated communication as performed by the plurality of processors. A targeted communication module is configured to trigger targeted communication of data between the plurality of processors based on the tracked programmatically defined data generation and associated communication.

    Interconnect architecture for three-dimensional processing systems

    公开(公告)号:US10984838B2

    公开(公告)日:2021-04-20

    申请号:US14944099

    申请日:2015-11-17

    Abstract: A processing system includes a plurality of processor cores formed in a first layer of an integrated circuit device and a plurality of partitions of memory formed in one or more second layers of the integrated circuit device. The one or more second layers are deployed in a stacked configuration with the first layer. Each of the partitions is associated with a subset of the processor cores that have overlapping footprints with the partitions. The processing system also includes first memory paths between the processor cores and their corresponding subsets of partitions. The processing system further includes second memory paths between the processor cores and the partitions.

    Memory system with region-specific memory access scheduling

    公开(公告)号:US10956044B2

    公开(公告)日:2021-03-23

    申请号:US14891523

    申请日:2013-05-16

    Abstract: An integrated circuit device includes a memory controller coupleable to a memory. The memory controller to schedule memory accesses to regions of the memory based on memory timing parameters specific to the regions. A method includes receiving a memory access request at a memory device. The method further includes accessing, from a timing data store of the memory device, data representing a memory timing parameter specific to a region of the memory cell circuitry targeted by the memory access request. The method also includes scheduling, at the memory controller, the memory access request based on the data.

    COMPILER-ASSISTED INTER-SIMD-GROUP REGISTER SHARING

    公开(公告)号:US20180275991A1

    公开(公告)日:2018-09-27

    申请号:US15935399

    申请日:2018-03-26

    Abstract: Systems, apparatuses, and methods for efficiently sharing registers among threads are disclosed. A system includes at least a processor, control logic, and a register file with a plurality of registers. The processor assigns a base set of registers to each thread of a plurality of threads executing on the processor. When a given thread needs more than the base set of registers to execute a given phase of program code, the given thread executes an acquire instruction to acquire exclusive access to an extended set of registers from a shared resource pool. When the given thread no longer needs additional registers, the given thread executes a release instruction to release the extended set of registers back into the shared register pool for other threads to use. In one implementation, the compiler inserts acquire and release instructions into the program code based on a register liveness analysis performed during compilation.

    Data distribution among multiple managed memories

    公开(公告)号:US09875195B2

    公开(公告)日:2018-01-23

    申请号:US14459958

    申请日:2014-08-14

    CPC classification number: G06F13/1657 G06F13/1647

    Abstract: A system and method are disclosed for managing memory interleaving patterns in a system with multiple memory devices. The system includes a processor configured to access multiple memory devices. The method includes receiving a first plurality of data blocks, and then storing the first plurality of data blocks using an interleaving pattern in which successive blocks of the first plurality of data blocks are stored in each of the memory devices. The method also includes receiving a second plurality of data blocks, and then storing successive blocks of the second plurality of data blocks in a first memory device of the multiple memory devices.

Patent Agency Ranking