HIERARCHICAL REGISTER FILE AT A GRAPHICS PROCESSING UNIT

    公开(公告)号:US20170278213A1

    公开(公告)日:2017-09-28

    申请号:US15079543

    申请日:2016-03-24

    Abstract: A processor employs a hierarchical register file for a graphics processing unit (GPU). A top level of the hierarchical register file is stored at a local memory of the GPU (e.g., a memory on the same integrated circuit die as the GPU). Lower levels of the hierarchical register file are stored at a different, larger memory, such as a remote memory located on a different die than the GPU. A register file control module monitors the status of in-flight wavefronts at the GPU, and in particular whether each in-flight wavefront is active, predicted to be become active, or inactive. The register file control module places execution data for active and predicted-active wavefronts in the top level of the hierarchical register file and places execution data for inactive wavefronts at lower levels of the hierarchical register file.

    MEMORY OPERATION ENCRYPTION
    63.
    发明申请

    公开(公告)号:US20170201503A1

    公开(公告)日:2017-07-13

    申请号:US14993455

    申请日:2016-01-12

    Abstract: A processing system includes a processing module having a first interface coupleable to an interconnect. The first interface includes a first cryptologic engine to encrypt a representation of store data of a store operation and a memory address using a first key and a first feedback-based cryptologic process to generate first encrypted data and an encrypted memory address. A memory module includes a second interface coupled to the interconnect. The second interface includes a second cryptologic engine to decrypt the first encrypted data and the encrypted memory address using a second key and a second feedback-based cryptologic process to generate a copy of the representation of the store data and a copy of the memory address. The second interface further is to store the copy of the representation of the store data to a memory location of the memory core based on the copy of the memory address.

    METHOD AND SYSTEMS OF CONTROLLING MEMORY-TO-MEMORY COPY OPERATIONS

    公开(公告)号:US20170123670A1

    公开(公告)日:2017-05-04

    申请号:US14924881

    申请日:2015-10-28

    Abstract: A memory-to-memory copy operation control system includes a processor configured to receive an instruction to perform a memory-to-memory copy operation and a memory module network in communication with the processor. The memory module network has a plurality of memory modules that include a proximal memory module in direct communication with the processor and one or more additional memory modules in communication with the processor via the proximal memory module. The system also includes a memory controller in communication with the processor and the network of memory modules. The processor is configured to issue a first command causing data to be copied from a first memory module to a second memory module without sending the data to the processor or the memory controller.

    Processing device with address translation probing and methods
    65.
    发明授权
    Processing device with address translation probing and methods 有权
    具有地址转换探测和方法的处理设备

    公开(公告)号:US08984255B2

    公开(公告)日:2015-03-17

    申请号:US13723379

    申请日:2012-12-21

    Abstract: A data processing device is provided that employs multiple translation look-aside buffers (TLBs) associated with respective processors that are configured to store selected address translations of a page table of a memory shared by the processors. The processing device is configured such that when an address translation is requested by a processor and is not found in the TLB associated with that processor, another TLB is probed for the requested address translation. The probe across to the other TLB may occur in advance of a walk of the page table for the requested address or alternatively a walk can be initiated concurrently with the probe. Where the probe successfully finds the requested address translation, the page table walk can be avoided or discontinued.

    Abstract translation: 提供了一种数据处理设备,其采用与相应处理器相关联的多个翻译后备缓冲器(TLB),其被配置为存储由处理器共享的存储器的页表的所选地址转换。 处理装置被配置为使得当处理器请求地址转换并且在与该处理器相关联的TLB中没有找到地址转换时,探测另一TLB用于请求的地址转换。 跨越其他TLB的探针可以在针对所请求的地址的页表的行进之前发生,或者可以与探针同时启动步行。 探头成功找到所请求的地址转换的地方,可以避免或停止页表的移动。

    Address mapping-aware tasking mechanism

    公开(公告)号:US12099866B2

    公开(公告)日:2024-09-24

    申请号:US17135381

    申请日:2020-12-28

    Abstract: An Address Mapping-Aware Tasking (AMAT) mechanism manages compute task data and issues compute tasks on behalf of threads that created the compute task data. The AMAT mechanism stores compute task data generated by host threads in a set of partitions, where each partition is designated for a particular memory module. The AMAT mechanism maintains address mapping data that maps address information to partitions. Threads push compute task data to the AMAT mechanism instead of generating and issuing their own compute tasks. The AMAT mechanism uses address information included in the compute task data and the address mapping data to determine partitions in which to store the compute task data. The AMAT mechanism then issues compute tasks to be executed near the corresponding memory modules (i.e., in PIM execution units or NUMA compute nodes) based upon the compute task data stored in the partitions.

    DYNAMIC CONTROL OF WORK SCHEDULING
    67.
    发明公开

    公开(公告)号:US20240220315A1

    公开(公告)日:2024-07-04

    申请号:US18091443

    申请日:2022-12-30

    CPC classification number: G06F9/4881 G06F9/52

    Abstract: A processing system includes a scheduling mechanism for producing data for fine-grained reordering of workgroups of a kernel to produce blocks of data, such as for communication across devices to enable overlapping of a producer computation with an all-reduce communication across the network. This scheduling mechanism enables a first parallel processor to schedule and execute a set of workgroups of a producer operation to generate data for transmission to a second parallel processor in a desired traffic pattern. At the same time, the second parallel processor schedules and executes a different set of workgroups of the producer operation to generate data for transmission in a desired traffic pattern to a third parallel processor or back to the first parallel processor.

Patent Agency Ranking