WRITE COMBINING CACHE MICROARCHITECTURE FOR SYNCHRONIZATION EVENTS
    21.
    发明申请
    WRITE COMBINING CACHE MICROARCHITECTURE FOR SYNCHRONIZATION EVENTS 有权
    用于同步事件的写组合高速缓存微型架构

    公开(公告)号:US20150046652A1

    公开(公告)日:2015-02-12

    申请号:US13961561

    申请日:2013-08-07

    CPC classification number: G06F12/0815 G06F12/0811 G06F12/128 Y02D10/13

    Abstract: A method, computer program product, and system is described that enforces a release consistency with special accesses sequentially consistent (RCsc) memory model and executes release synchronization instructions such as a StRel event without tracking an outstanding store event through a memory hierarchy, while efficiently using bandwidth resources. What is also described is the decoupling of a store event from an ordering of the store event with respect to a RCsc memory model. The description also includes a set of hierarchical read/write combining buffers that coalesce stores from different parts of the system. In addition, a pool component maintains partial order of received store events and release synchronization events to avoid content addressable memory (CAM) structures, full cache flushes, as well as direct write-throughs to memory. The approach improves the performance of both global and local synchronization events since a store event may not need to reach main memory to complete.

    Abstract translation: 描述了一种方法,计算机程序产品和系统,其强制与特殊访问顺序一致(RCsc)存储器模型的版本一致性,并且执行诸如StRel事件之类的释放同步指令,而不通过存储器层次来跟踪未完成的存储事件,同时有效地使用 带宽资源。 还描述了存储事件与存储事件的顺序相对于RCsc存储器模型的去耦。 该描述还包括一组层次读/写合并缓冲器,其将来自系统的不同部分的存储合并。 此外,池组件维护接收到的存储事件的部分顺序并释放同步事件,以避免内容可寻址存储器(CAM)结构,全缓存刷新以及对存储器的直接写入。 该方法提高了全局和本地同步事件的性能,因为存储事件可能不需要到达主内存才能完成。

    Selecting a Resource from a Set of Resources for Performing an Operation
    22.
    发明申请
    Selecting a Resource from a Set of Resources for Performing an Operation 有权
    从一组用于执行操作的资源中选择资源

    公开(公告)号:US20140223445A1

    公开(公告)日:2014-08-07

    申请号:US13761985

    申请日:2013-02-07

    CPC classification number: G06F9/5016 G06F9/5011 G06F12/0875 G06F2212/45

    Abstract: The described embodiments comprise a selection mechanism that selects a resource from a set of resources in a computing device for performing an operation. In some embodiments, the selection mechanism is configured to perform a lookup in a table selected from a set of tables to identify a resource from the set of resources. When the identified resource is not available for performing the operation and until a resource is selected for performing the operation, the selection mechanism is configured to identify a next resource in the table and select the next resource for performing the operation when the next resource is available for performing the operation.

    Abstract translation: 所描述的实施例包括从用于执行操作的计算设备中的一组资源中选择资源的选择机制。 在一些实施例中,选择机制被配置为在从一组表中选择的表中执行查找,以从资源集合中识别资源。 当所识别的资源不可用于执行操作并且直到选择资源来执行操作时,选择机制被配置为识别表中的下一个资源,并且当下一个资源可用时选择用于执行操作的下一个资源 用于执行操作。

    PROCESSING DEVICE WITH ADDRESS TRANSLATION PROBING AND METHODS
    23.
    发明申请
    PROCESSING DEVICE WITH ADDRESS TRANSLATION PROBING AND METHODS 有权
    具有地址转换的处理设备探测和方法

    公开(公告)号:US20140181460A1

    公开(公告)日:2014-06-26

    申请号:US13723379

    申请日:2012-12-21

    Abstract: A data processing device is provided that employs multiple translation look-aside buffers (TLBs) associated with respective processors that are configured to store selected address translations of a page table of a memory shared by the processors. The processing device is configured such that when an address translation is requested by a processor and is not found in the TLB associated with that processor, another TLB is probed for the requested address translation. The probe across to the other TLB may occur in advance of a walk of the page table for the requested address or alternatively a walk can be initiated concurrently with the probe. Where the probe successfully finds the requested address translation, the page table walk can be avoided or discontinued.

    Abstract translation: 提供了一种数据处理设备,其采用与相应处理器相关联的多个翻译后备缓冲器(TLB),其配置为存储由处理器共享的存储器的页表的所选地址转换。 处理装置被配置为使得当处理器请求地址转换并且在与该处理器相关联的TLB中没有找到地址转换时,探测另一TLB用于请求的地址转换。 跨越其他TLB的探针可以在针对所请求的地址的页表的行进之前发生,或者可以与探针同时启动步行。 探头成功找到所请求的地址转换的地方,可以避免或停止页表的移动。

    QUALITY OF SERVICE SUPPORT USING STACKED MEMORY DEVICE WITH LOGIC DIE
    24.
    发明申请
    QUALITY OF SERVICE SUPPORT USING STACKED MEMORY DEVICE WITH LOGIC DIE 有权
    使用带LOGO DIE的堆叠存储器设备的服务质量支持

    公开(公告)号:US20140181428A1

    公开(公告)日:2014-06-26

    申请号:US13726144

    申请日:2012-12-23

    Abstract: A die-stacked memory device implements an integrated QoS manager to provide centralized QoS functionality in furtherance of one or more specified QoS objectives for the sharing of the memory resources by other components of the processing system. The die-stacked memory device includes a set of one or more stacked memory dies and one or more logic dies. The logic dies implement hardware logic for a memory controller and the QoS manager. The memory controller is coupleable to one or more devices external to the set of one or more stacked memory dies and operates to service memory access requests from the one or more external devices. The QoS manager comprises logic to perform operations in furtherance of one or more QoS objectives, which may be specified by a user, by an operating system, hypervisor, job management software, or other application being executed, or specified via hardcoded logic or firmware.

    Abstract translation: 堆叠堆叠的存储器件实现集成的QoS管理器以提供集中的QoS功能,以促进一个或多个指定的QoS目标,以便由处理系统的其他组件共享存储器资源。 芯片堆叠的存储器件包括一组一个或多个堆叠的存储器管芯和一个或多个逻辑管芯。 逻辑模块为存储器控制器和QoS管理器实现硬件逻辑。 存储器控制器可耦合到一个或多个堆叠的存储器管芯组的外部的一个或多个器件,并且操作以从一个或多个外部器件服务存储器访问请求。 QoS管理器包括用于执行可由操作系统,管理程序,作业管理软件或正在执行的其他应用或经由硬编码逻辑或固件指定的一个或多个QoS目标的操作的逻辑。

    REDUCING COLD TLB MISSES IN A HETEROGENEOUS COMPUTING SYSTEM
    25.
    发明申请
    REDUCING COLD TLB MISSES IN A HETEROGENEOUS COMPUTING SYSTEM 审中-公开
    减少异构计算系统中的冷TLB缺陷

    公开(公告)号:US20140101405A1

    公开(公告)日:2014-04-10

    申请号:US13645685

    申请日:2012-10-05

    Abstract: Methods and apparatuses are provided for avoiding cold translation lookaside buffer (TLB) misses in a computer system. A typical system is configured as a heterogeneous computing system having at least one central processing unit (CPU) and one or more graphic processing units (GPUs) that share a common memory address space. Each processing unit (CPU and GPU) has an independent TLB. When offloading a task from a particular CPU to a particular GPU, translation information is sent along with the task assignment. The translation information allows the GPU to load the address translation data into the TLB associated with the one or more GPUs prior to executing the task. Preloading the TLB of the GPUs reduces or avoids cold TLB misses that could otherwise occur without the benefits offered by the present disclosure.

    Abstract translation: 提供了用于避免计算机系统中冷翻译后备缓冲器(TLB)未命中的方法和装置。 典型的系统被配置为具有至少一个中央处理单元(CPU)和共享公共存储器地址空间的一个或多个图形处理单元(GPU)的异构计算系统。 每个处理单元(CPU和GPU)都有独立的TLB。 当将任务从特定CPU卸载到特定GPU时,将随任务分配一起发送翻译信息。 翻译信息允许GPU在执行任务之前将地址转换数据加载到与一个或多个GPU相关联的TLB中。 GPU的预加载减少或避免了在没有本公开提供的优点的情况下可能发生的冷TLB未命中。

    SYSTEM PERFORMANCE MANAGEMENT USING PRIORITIZED COMPUTE UNITS

    公开(公告)号:US20220114097A1

    公开(公告)日:2022-04-14

    申请号:US17556348

    申请日:2021-12-20

    Abstract: Methods, devices, and systems for managing performance of a processor having multiple compute units. An effective number of the multiple compute units may be determined to designate as having priority. On a condition that the effective number is nonzero, the effective number of the multiple compute units may each be designated as a priority compute unit. Priority compute units may have access to a shared cache whereas non-priority compute units may not. Workgroups may be preferentially dispatched to priority compute units. Memory access requests from priority compute units may be served ahead of requests from non-priority compute units.

    SYNCHRONIZATION MECHANISM FOR WORKGROUPS
    29.
    发明申请

    公开(公告)号:US20200379820A1

    公开(公告)日:2020-12-03

    申请号:US16425881

    申请日:2019-05-29

    Abstract: A technique for synchronizing workgroups is provided. Multiple workgroups execute a wait instruction that specifies a condition variable and a condition. A workgroup scheduler stops execution of a workgroup that executes a wait instruction and an advanced controller begins monitoring the condition variable. In response to the advanced controller detecting that the condition is met, the workgroup scheduler determines whether there is a high contention scenario, which occurs when the wait instruction is part of a mutual exclusion synchronization primitive and is detected by determining that there is a low number of updates to the condition variable prior to detecting that the condition has been met. In a high contention scenario, the workgroup scheduler wakes up one workgroup and schedules another workgroup to be woken up at a time in the future. In a non-contention scenario, more than one workgroup can be woken up at the same time.

    Monitor support on accelerated processing device

    公开(公告)号:US10558418B2

    公开(公告)日:2020-02-11

    申请号:US15661843

    申请日:2017-07-27

    Abstract: A technique for implementing synchronization monitors on an accelerated processing device (“APD”) is provided. Work on an APD includes workgroups that include one or more wavefronts. All wavefronts of a workgroup execute on a single compute unit. A monitor is a synchronization construct that allows workgroups to stall until a particular condition is met. Responsive to all wavefronts of a workgroup executing a wait instruction, the monitor coordinator records the workgroup in an “entry queue.” The workgroup begins saving its state to a general APD memory and, when such saving is complete, the monitor coordinator moves the workgroup to a “condition queue.” When the condition specified by the wait instruction is met, the monitor coordinator moves the workgroup to a “ready queue,” and, when sufficient resources are available on a compute unit, the APD schedules the ready workgroup for execution on a compute unit.

Patent Agency Ranking