VARIABLE DISPATCH WALK FOR SUCCESSIVE CACHE ACCESSES

    公开(公告)号:US20230195626A1

    公开(公告)日:2023-06-22

    申请号:US17558008

    申请日:2021-12-21

    CPC classification number: G06F12/0806 G06F12/10 G06F2212/1016

    Abstract: A processing system is configured to translate a first cache access pattern of a dispatch of work items to a cache access pattern that facilitates consumption of data stored at a cache of a parallel processing unit by a subsequent access before the data is evicted to a more remote level of the memory hierarchy. For consecutive cache accesses having read-after-read data locality, in some embodiments the processing system translates the first cache access pattern to a space-filling curve. In some embodiments, for consecutive accesses having read-after-write data locality, the processing system translates a first typewriter cache access pattern that proceeds in ascending order for a first access to a reverse typewriter cache access pattern that proceeds in descending order for a subsequent cache access. By translating the cache access pattern based on data locality, the processing system increases the hit rate of the cache.

    APPROACH FOR PERFORMING EFFICIENT MEMORY OPERATIONS USING NEAR-MEMORY COMPUTE ELEMENTS

    公开(公告)号:US20230195618A1

    公开(公告)日:2023-06-22

    申请号:US17557568

    申请日:2021-12-21

    CPC classification number: G06F12/06

    Abstract: Near-memory compute elements perform memory operations and temporarily store at least a portion of address information for the memory operations in local storage. A broadcast memory command is then issued to the near-memory compute elements that causes the near-memory compute elements to perform a subsequent memory operation using their respective address information stored in the local storage. This allows a single broadcast memory command to be used to perform memory operations across multiple memory elements, such as DRAM banks, using bank-specific address information. In one implementation, the approach is used to process workloads with irregular updates to memory while consuming less command bus bandwidth than conventional approaches. Implementations include using conditional flags to selectively designate address information in local storage that is to be processed with the broadcast memory command.

    HARDWARE ACCELERATED DYNAMIC WORK CREATION ON A GRAPHICS PROCESSING UNIT

    公开(公告)号:US20230185607A1

    公开(公告)日:2023-06-15

    申请号:US17993490

    申请日:2022-11-23

    CPC classification number: G06F9/4881 G06F9/542 G06F9/545 G06F9/546 G06F9/3877

    Abstract: A processor core is configured to execute a parent task that is described by a data structure stored in a memory. A coprocessor is configured to dispatch a child task to the at least one processor core in response to the coprocessor receiving a request from the parent task concurrently with the parent task executing on the at least one processor core. In some cases, the parent task registers the child task in a task pool and the child task is a future task that is configured to monitor a completion object and enqueue another task associated with the future task in response to detecting the completion object. The future task is configured to self-enqueue by adding a continuation future task to a continuation queue for subsequent execution in response to the future task failing to detect the completion object.

    Dual vector arithmetic logic unit
    288.
    发明授权

    公开(公告)号:US11675568B2

    公开(公告)日:2023-06-13

    申请号:US17121354

    申请日:2020-12-14

    CPC classification number: G06F7/57 G06F9/3867 G06F17/16 G06T1/20 G06F15/8015

    Abstract: A processing system executes wavefronts at multiple arithmetic logic unit (ALU) pipelines of a single instruction multiple data (SIMD) unit in a single execution cycle. The ALU pipelines each include a number of ALUs that execute instructions on wavefront operands that are collected from vector general process register (VGPR) banks at a cache and output results of the instructions executed on the wavefronts at a buffer. By storing wavefronts supplied by the VGPR banks at the cache, a greater number of wavefronts can be made available to the SIMD unit without increasing the VGPR bandwidth, enabling multiple ALU pipelines to execute instructions during a single execution cycle.

    READ CLOCK START AND STOP FOR SYNCHRONOUS MEMORIES

    公开(公告)号:US20230176608A1

    公开(公告)日:2023-06-08

    申请号:US17850299

    申请日:2022-06-27

    CPC classification number: G06F1/08 G06F1/10

    Abstract: A memory includes a read clock state machine and a read clock driver circuit. The read clock state machine has a first input for receiving a read command signal, a second input for receiving a read clock mode signal, and an output for providing a drive enable signal. The read clock driver circuit has an output for providing a read clock signal in response to a clock signal when the drive enable signal is active. When the read clock mode signal indicates a read-only mode, the read clock state machine starts toggling the read clock signal during a read preamble period before a data transmission of a first read command, and continues toggling the read clock signal for at least a read postamble period following the data transmission of the first read command.

    Write bank group mask during arbitration

    公开(公告)号:US11669274B2

    公开(公告)日:2023-06-06

    申请号:US17218676

    申请日:2021-03-31

    Abstract: A memory controller includes an arbiter for selecting memory requests from a command queue for transmission to a dynamic random access memory (DRAM) memory. The arbiter includes a bank group tracking circuit that tracks bank group numbers of three or more prior write requests selected by the arbiter. The arbiter also includes a selection circuit that selects requests to be issued from the command queue, and prevents selection of write requests and associated activate commands to the tracked bank group numbers unless no other write request is eligible in the command queue. The bank group tracking circuit indicates that a prior write request and the associated activate commands are eligible to be issued after a number of clock cycles has passed corresponding to a minimum write-to-write timing period for a bank group of the prior write request.

Patent Agency Ranking