DECOMPOSING MATRICES FOR PROCESSING AT A PROCESSOR-IN-MEMORY

    公开(公告)号:US20230102296A1

    公开(公告)日:2023-03-30

    申请号:US17490037

    申请日:2021-09-30

    Abstract: A processing unit decomposes a matrix for partial processing at a processor-in-memory (PIM) device. The processing unit receives a matrix to be used as an operand in an arithmetic operation (e.g., a matrix multiplication operation). In response, the processing unit decomposes the matrix into two component matrices: a sparse component matrix and a dense component matrix. The processing unit itself performs the arithmetic operation with the dense component matrix, but sends the sparse component matrix to the PIM device for execution of the arithmetic operation. The processing unit thereby offloads at least some of the processing overhead to the PIM device, improving overall efficiency of the processing system.

    Techniques for improving operand caching

    公开(公告)号:US11436016B2

    公开(公告)日:2022-09-06

    申请号:US16703833

    申请日:2019-12-04

    Abstract: A technique for determining whether a register value should be written to an operand cache or whether the register value should remain in and not be evicted from the operand cache is provided. The technique includes executing an instruction that accesses an operand that comprises the register value, performing one or both of a lookahead technique and a prediction technique to determine whether the register value should be written to an operand cache or whether the register value should remain in and not be evicted from the operand cache, and based on the determining, updating the operand cache.

    FINE-GRAINED CONDITIONAL DISPATCHING

    公开(公告)号:US20220091880A1

    公开(公告)日:2022-03-24

    申请号:US17031424

    申请日:2020-09-24

    Abstract: Techniques for executing workgroups are provided. The techniques include executing, for a first workgroup of a first kernel dispatch, a workgroup dependency instruction that includes an indication to prioritize execution of a second workgroup of a second kernel dispatch, and in response to the workgroup dependency instruction, dispatching the second workgroup of the second kernel dispatch prior to dispatching a third workgroup of the second kernel dispatch, wherein no workgroup dependency instruction including an indication to prioritize execution of the third workgroup has been executed.

    System performance management using prioritized compute units

    公开(公告)号:US11204871B2

    公开(公告)日:2021-12-21

    申请号:US14755401

    申请日:2015-06-30

    Abstract: Methods, devices, and systems for managing performance of a processor having multiple compute units. An effective number of the multiple compute units may be determined to designate as having priority. On a condition that the effective number is nonzero, the effective number of the multiple compute units may each be designated as a priority compute unit. Priority compute units may have access to a shared cache whereas non-priority compute units may not. Workgroups may be preferentially dispatched to priority compute units. Memory access requests from priority compute units may be served ahead of requests from non-priority compute units.

    Dynamic wavefront creation for processing units using a hybrid compactor

    公开(公告)号:US09898287B2

    公开(公告)日:2018-02-20

    申请号:US14682971

    申请日:2015-04-09

    Abstract: A method, a non-transitory computer readable medium, and a processor for repacking dynamic wavefronts during program code execution on a processing unit, each dynamic wavefront including multiple threads are presented. If a branch instruction is detected, a determination is made whether all wavefronts following a same control path in the program code have reached a compaction point, which is the branch instruction. If no branch instruction is detected in executing the program code, a determination is made whether all wavefronts following the same control path have reached a reconvergence point, which is a beginning of a program code segment to be executed by both a taken branch and a not taken branch from a previous branch instruction. The dynamic wavefronts are repacked with all threads that follow the same control path, if all wavefronts following the same control path have reached the branch instruction or the reconvergence point.

    SELECTING A RESOURCE FROM A SET OF RESOURCES FOR PERFORMING AN OPERATION
    8.
    发明申请
    SELECTING A RESOURCE FROM A SET OF RESOURCES FOR PERFORMING AN OPERATION 有权
    从一组资源中选择一个资源来执行操作

    公开(公告)号:US20160062803A1

    公开(公告)日:2016-03-03

    申请号:US14935056

    申请日:2015-11-06

    CPC classification number: G06F9/5016 G06F9/5011 G06F12/0875 G06F2212/45

    Abstract: The described embodiments comprise a selection mechanism that selects a resource from a set of resources in a computing device for performing an operation. In some embodiments, the selection mechanism performs a lookup in a table selected from a set of tables to identify a resource from the set of resources. When the resource is not available for performing the operation and until another resource is selected for performing the operation, the selection mechanism identifies a next resource in the table and selects the next resource for performing the operation when the next resource is available for performing the operation.

    Abstract translation: 所描述的实施例包括从用于执行操作的计算设备中的一组资源中选择资源的选择机制。 在一些实施例中,选择机制在从一组表中选择的表中执行查找以从资源集合中识别资源。 当资源不可用于执行操作并且直到选择用于执行操作的另一资源为止时,选择机制识别表中的下一个资源,并且当下一个资源可用于执行操作时选择用于执行操作的下一个资源 。

    Selecting a resource from a set of resources for performing an operation
    9.
    发明授权
    Selecting a resource from a set of resources for performing an operation 有权
    从一组用于执行操作的资源中选择资源

    公开(公告)号:US09183055B2

    公开(公告)日:2015-11-10

    申请号:US13761985

    申请日:2013-02-07

    CPC classification number: G06F9/5016 G06F9/5011 G06F12/0875 G06F2212/45

    Abstract: The described embodiments comprise a selection mechanism that selects a resource from a set of resources in a computing device for performing an operation. In some embodiments, the selection mechanism is configured to perform a lookup in a table selected from a set of tables to identify a resource from the set of resources. When the identified resource is not available for performing the operation and until a resource is selected for performing the operation, the selection mechanism is configured to identify a next resource in the table and select the next resource for performing the operation when the next resource is available for performing the operation.

    Abstract translation: 所描述的实施例包括从用于执行操作的计算设备中的一组资源中选择资源的选择机制。 在一些实施例中,选择机制被配置为在从一组表中选择的表中执行查找,以从资源集合中识别资源。 当所识别的资源不可用于执行操作并且直到选择资源来执行操作时,选择机制被配置为识别表中的下一个资源,并且当下一个资源可用时选择用于执行操作的下一个资源 用于执行操作。

    Multi-core processing device with invalidation cache tags and methods
    10.
    发明授权
    Multi-core processing device with invalidation cache tags and methods 有权
    具有无效缓存标签和方法的多核处理设备

    公开(公告)号:US09003130B2

    公开(公告)日:2015-04-07

    申请号:US13719730

    申请日:2012-12-19

    CPC classification number: G06F12/0864 G06F12/0815

    Abstract: A data processing device is provided that facilitates cache coherence policies. In one embodiment, a data processing device utilizes invalidation tags in connection with a cache that is associated with a processing engine. In some embodiments, the cache is configured to store a plurality of cache entries where each cache entry includes a cache line configured to store data and a corresponding cache tag configured to store address information associated with data stored in the cache line. Such address information includes invalidation flags with respect to addresses stored in the cache tags. Each cache tag is associated with an invalidation tag configured to store information related to invalidation commands of addresses stored in the cache tag. In such embodiment, the cache is configured to set invalidation flags of cache tags based upon information stored in respective invalidation tags.

    Abstract translation: 提供了一种有助于高速缓存一致性策略的数据处理设备。 在一个实施例中,数据处理设备利用与处理引擎相关联的高速缓存的无效标签。 在一些实施例中,高速缓存被配置为存储多个高速缓存条目,其中每个高速缓存条目包括被配置为存储数据的高速缓存行和被配置为存储与存储在高速缓存行中的数据相关联的地址信息的对应高速缓存标签。 这样的地址信息包括关于存储在高速缓存标签中的地址的无效标志。 每个缓存标签与被配置为存储与存储在高速缓存标签中的地址的无效命令相关的信息的无效标签相关联。 在这种实施例中,高速缓存被配置为基于存储在相应无效标签中的信息来设置高速缓存标签的无效标志。

Patent Agency Ranking