DYNAMIC CACHE PREFETCHING BASED ON POWER GATING AND PREFETCHING POLICIES
    13.
    发明申请
    DYNAMIC CACHE PREFETCHING BASED ON POWER GATING AND PREFETCHING POLICIES 审中-公开
    基于功率增益和预选策略的动态缓存预测

    公开(公告)号:US20160034023A1

    公开(公告)日:2016-02-04

    申请号:US14448096

    申请日:2014-07-31

    Abstract: A system may determine that a processor has powered up. The system may determine a first prefetching policy based on determining that the processor has powered up. The system may fetch information, from a main memory and for storage by a cache associated with the processor, using the first prefetching policy. The system may determine, after fetching information using the first prefetching policy, to apply a second prefetching policy that is different than the first prefetching policy. The system may fetch information, from the main memory and for storage by the cache, using the second prefetching policy.

    Abstract translation: 系统可以确定处理器已经通电。 该系统可以基于确定处理器通电来确定第一预取策略。 系统可以使用第一预取策略从主存储器获取信息,并且由与处理器相关联的高速缓存存储信息。 在使用第一预取策略获取信息之后,系统可以确定应用与第一预取策略不同的第二预取策略。 系统可以使用第二预取策略从主存储器获取信息并由高速缓存存储。

    ADVANCED HARDWARE SCHEDULING USING UNMAPPED-QUEUE DOORBELLS

    公开(公告)号:US20240330046A1

    公开(公告)日:2024-10-03

    申请号:US18374837

    申请日:2023-09-29

    CPC classification number: G06F9/4881 G06F9/5038 G06F2209/5021

    Abstract: A processing device includes a hardware scheduler, an unmapped queue unit, and command processor, and a plurality of compute units. Responsive to a queue doorbell being an unmapped queue doorbell, the unmapped queue unit is configured to transmit a signal to the hardware scheduler indicating work has been placed into a queue currently unmapped to a hardware queue of the processing device. The hardware scheduler is configured to map the queue to a hardware queue of a plurality of hardware queues at the processing device in response to the signal. The command processor is configured to dispatch the work associated with the mapped queue to one or more compute units of the plurality of compute units.

    Compiler-initiated tile replacement to enable hardware acceleration resources

    公开(公告)号:US11853734B2

    公开(公告)日:2023-12-26

    申请号:US17740828

    申请日:2022-05-10

    CPC classification number: G06F8/4435 G06F17/16

    Abstract: A processing system includes a compiler that automatically identifies sequences of instructions of tileable source code that can be replaced with tensor operations. The compiler generates enhanced code that replaces the identified sequences of instructions with tensor operations that invoke a special-purpose hardware accelerator. By automatically replacing instructions with tensor operations that invoke the special-purpose hardware accelerator, the compiler makes the performance improvements achievable through the special-purpose hardware accelerator available to programmers using high-level programming languages.

    Compiler-initiated tile replacement to enable hardware acceleration resources

    公开(公告)号:US11347486B2

    公开(公告)日:2022-05-31

    申请号:US16832275

    申请日:2020-03-27

    Abstract: A processing system includes a compiler that automatically identifies sequences of instructions of tileable source code that can be replaced with tensor operations. The compiler generates enhanced code that replaces the identified sequences of instructions with tensor operations that invoke a special-purpose hardware accelerator. By automatically replacing instructions with tensor operations that invoke the special-purpose hardware accelerator, the compiler makes the performance improvements achievable through the special-purpose hardware accelerator available to programmers using high-level programming languages.

    Randomly branching using hardware watchpoints
    19.
    发明授权
    Randomly branching using hardware watchpoints 有权
    使用硬件观察点随机分支

    公开(公告)号:US09483379B2

    公开(公告)日:2016-11-01

    申请号:US14054356

    申请日:2013-10-15

    Abstract: A system and method for efficiently performing program instrumentation. A processor processes instructions stored in a memory. The processor allocates a memory region for the purpose of creating “random branches” in the computer code utilizing existing memory access instructions. When the processor processes a given instruction, the processor both accesses a first location in the memory region and may determine a condition is satisfied. In response, the processor generates an interrupt. The corresponding interrupt handler may transfer control flow from the computer program to instrumentation code. The condition may include a pointer storing an address pointing to locations within the memory region equals a given address after the point is updated. Alternatively, the condition may include an updated data value stored in a location pointed to by the given address equals a threshold value.

    Abstract translation: 一种有效执行程序仪表的系统和方法。 处理器处理存储在存储器中的指令。 处理器为了在现有的存储器访问指令中的计算机代码中创建“随机分支”而分配存储器区域。 当处理器处理给定的指令时,处理器都访问存储器区域中的第一位置并且可以确定满足条件。 作为响应,处理器产生中断。 相应的中断处理程序可以将控制流程从计算机程序传送到仪表代码。 条件可以包括存储指向存储器区域内的位置的地址的指针等于点更新之后的给定地址。 或者,条件可以包括存储在由给定地址指向的位置的更新的数据值等于阈值。

    HETEROGENEOUS FUNCTION UNIT DISPATCH IN A GRAPHICS PROCESSING UNIT
    20.
    发明申请
    HETEROGENEOUS FUNCTION UNIT DISPATCH IN A GRAPHICS PROCESSING UNIT 审中-公开
    图形处理单元中异构功能单元分配

    公开(公告)号:US20160085551A1

    公开(公告)日:2016-03-24

    申请号:US14490213

    申请日:2014-09-18

    CPC classification number: G06F9/3887 G06F9/3851

    Abstract: A compute unit configured to execute multiple threads in parallel is presented. The compute unit includes one or more single instruction multiple data (SIMD) units and a fetch and decode logic. The SIMD units have differing numbers of arithmetic logic units (ALUs), such that each SIMD unit can execute a different number of threads. The fetch and decode logic is in communication with each of the SIMD units, and is configured to assign the threads to the SIMD units for execution based on such differing numbers of ALUs.

    Abstract translation: 呈现并行执行多个线程的计算单元。 计算单元包括一个或多个单指令多数据(SIMD)单元和读取和解码逻辑。 SIMD单元具有不同数量的算术逻辑单元(ALU),使得每个SIMD单元可以执行不同数量的线程。 获取和解码逻辑与每个SIMD单元通信,并且被配置为基于这样不同数量的ALU将线程分配给SIMD单元以供执行。

Patent Agency Ranking