METHOD AND SYSTEM FOR PARTIAL WAVEFRONT MERGER

    公开(公告)号:US20200019530A1

    公开(公告)日:2020-01-16

    申请号:US16042592

    申请日:2018-07-23

    Abstract: A method and system for partial wavefront merger is described. Vector processing machines employ the partial wavefront merger to merge partial wavefronts into one or more wavefronts. The system includes a partial wavefront manager and unified registers. The partial wavefront manager detects wavefronts in different single-instruction-multiple-data (“SIMD”) units which contain inactive work items and active work items (hereinafter referred to as “partial wavefronts”), moves the partial wavefronts into one or more SIMD unit(s) and merges the partial wavefronts into one or more wavefront(s). The unified register allows each active work item in the one or more merged wavefront(s) to access the previously allocated registers in the originating SIMD units. Consequently, the contents of the unified registers do not have to be copied to the SIMD unit(s) executing the one or merged wavefront(s).

    HIGH-SPEED SELECTIVE CACHE INVALIDATES AND WRITE-BACKS ON GPUS

    公开(公告)号:US20180181488A1

    公开(公告)日:2018-06-28

    申请号:US15390080

    申请日:2016-12-23

    Abstract: Techniques for performing cache invalidates and write-backs in an accelerated processing device (e.g., a graphics processing device that renders three-dimensional graphics) are disclosed. The techniques involve receiving requests from a “master” (e.g., the central processing unit). The techniques involve invalidating virtual-to-physical address translations in an address translation request. The techniques include splitting up the requests based on whether the requests target virtually or physically tagged caches. Addresses for the portions of a request that target physically tagged caches are translated using invalidated virtual-to-physical address translations for speed. The split up request is processed to generate micro-transactions for individual caches targeted by the request. Micro-transactions for physically and virtually tagged caches are processed in parallel. Once all micro-transactions for a request have been processed, the unit that made the request is notified.

    Shader writes to compressed resources

    公开(公告)号:US10535178B2

    公开(公告)日:2020-01-14

    申请号:US15389075

    申请日:2016-12-22

    Abstract: Systems, apparatuses, and methods for performing shader writes to compressed surfaces are disclosed. In one embodiment, a processor includes at least a memory and one or more shader units. In one embodiment, a shader unit of the processor is configured to receive a write request targeted to a compressed surface. The shader unit is configured to identify a first block of the compressed surface targeted by the write request. Responsive to determining the data of the write request targets less than the entirety of the first block, the first shader unit reads the first block from the cache and decompress the first block. Next, the first shader unit merges the data of the write request with the decompressed first block. Then, the shader unit compresses the merged data and writes the merged data to the cache.

    Flexible shader export design in multiple computing cores

    公开(公告)号:US10606740B2

    公开(公告)日:2020-03-31

    申请号:US15607118

    申请日:2017-05-26

    Abstract: Systems, apparatuses, and methods for generating flexibly addressed memory requests are disclosed. In one embodiment, a system includes a processor, control unit, and memory subsystem. The processor launches a plurality of threads on a plurality of compute units, wherein each thread generates memory requests without specifying target memory addresses. The threads executing on the plurality of compute units convey a plurality of memory requests to the control unit. The control unit generates target memory addresses for the plurality of received memory requests. In one embodiment, the memory requests are write requests, and the control unit interleaves write requests from the plurality of threads into a single output buffer stored in the memory subsystem. The control unit can be located in a cache, in a memory controller, or in another location within the system.

    DATA DRIVEN SCHEDULER ON MULTIPLE COMPUTING CORES

    公开(公告)号:US20170185451A1

    公开(公告)日:2017-06-29

    申请号:US14981257

    申请日:2015-12-28

    CPC classification number: G06F9/5016 G06F9/5027 G06F9/5066

    Abstract: Methods, devices, and systems for data driven scheduling of a plurality of computing cores of a processor. A plurality of threads may be executed on the plurality of computing cores, according to a default schedule. The plurality of threads may be analyzed, based on the execution, to determine correlations among the plurality of threads. A data driven schedule may be generated based on the correlations. The plurality of threads may be executed on the plurality of computing cores according to the data driven schedule.

    INPUT/OUTPUT MEMORY MAP UNIT AND NORTHBRIDGE
    8.
    发明申请
    INPUT/OUTPUT MEMORY MAP UNIT AND NORTHBRIDGE 有权
    输入/输出存储器映射单元和北桥

    公开(公告)号:US20150120978A1

    公开(公告)日:2015-04-30

    申请号:US14523705

    申请日:2014-10-24

    CPC classification number: G06F12/1009 G06F12/1045 G06F12/12 G06F2212/684

    Abstract: The present invention provides for page table access and dirty bit management in hardware via a new atomic test[0] and OR and Mask. The present invention also provides for a gasket that enables ACE to CCI translations. This gasket further provides request translation between ACE and CCI, deadlock avoidance for victim and probe collision, ARM barrier handling, and power management interactions. The present invention also provides a solution for ARM victim/probe collision handling which deadlocks the unified northbridge. These solutions includes a dedicated writeback virtual channel, probes for IO requests using 4-hop protocol, and a WrBack Reorder Ability in MCT where victims update older requests with data as they pass the requests.

    Abstract translation: 本发明通过新的原子测试[0]和OR和Mask来提供在硬件中的页表访问和脏位管理。 本发明还提供了一种使ACE能够进行CCI翻译的垫圈。 该垫片进一步提供了ACE和CCI之间的请求转换,针对受害者和探针冲突的死锁避免,ARM屏障处理和电源管理交互。 本发明还提供了一种用于ARM受害者/探测器碰撞处理的解决方案,其使统一的北桥陷入僵局。 这些解决方案包括一个专用的回写虚拟通道,使用4跳协议的IO请求的探测器和MCT中的WrBack重新排序能力,其中受害者通过数据通过请求时更新旧的请求。

    Method and apparatus for translation lookaside buffer with multiple compressed encodings

    公开(公告)号:US10540290B2

    公开(公告)日:2020-01-21

    申请号:US15139902

    申请日:2016-04-27

    Abstract: Methods and apparatus obtain one or more system page table entries that represent virtual system (e.g., memory) page to physical system page translations. A number of the obtained system page table entries that can be encoded in each of a plurality of translation lookaside buffer (TLB) entry encoding formats are determined. The method and apparatus may select one of the TLB entry encoding formats that encode a number of the obtained system page table entries. The method and apparatus may encode a number of obtained system page table entries in the TLB entry encoding format selected into a compressed encoding format TLB entry. The method and apparatus may associate the compressed encoding format TLB entry with an encoding format indication of the encoding format selected. The method and apparatus may decode a compressed encoding format TLB entry based on a determined TLB entry encoding format.

Patent Agency Ranking