Method and apparatus for peer-to-peer messaging in heterogeneous machine clusters

    公开(公告)号:US11429462B2

    公开(公告)日:2022-08-30

    申请号:US16887643

    申请日:2020-05-29

    Inventor: Shuai Che

    Abstract: Various computing network messaging techniques and apparatus are disclosed. In one aspect, a method of computing is provided that includes executing a first thread and a second thread. A message is sent from the first thread to the second thread. The message includes a domain descriptor that identifies a first location of the first thread and a second location of the second thread.

    Flexible framework to support memory synchronization operations

    公开(公告)号:US10198261B2

    公开(公告)日:2019-02-05

    申请号:US15096205

    申请日:2016-04-11

    Abstract: A method of performing memory synchronization operations is provided that includes receiving, at a programmable cache controller in communication with one or more caches, an instruction in a first language to perform a memory synchronization operation of synchronizing a plurality of instruction sequences executing on a processor, mapping the received instruction in the first language to one or more selected cache operations in a second language executable by the cache controller and executing the one or more cache operations to perform the memory synchronization operation. The method further comprises receiving a second mapping that provides mapping instructions to map the received instruction to one or more other cache operations, mapping the received instruction to one or more other cache operations and executing the one or more other cache operations to perform the memory synchronization operation.

    Offloading Execution of an Application by a Network Connected Device

    公开(公告)号:US20170353397A1

    公开(公告)日:2017-12-07

    申请号:US15174624

    申请日:2016-06-06

    Inventor: Shuai Che

    CPC classification number: H04L67/10

    Abstract: A client device detects one or more servers to which an application can be offloaded. The client device receives information from the servers regarding their graphics processing unit (GPU) compute resources. The client device selects one of the servers to offload the application based on such factors as the GPU compute resources, other performance metrics, power, and bandwidth/latency/quality of the communication channel between the server and the client device. The client device sends host code and a GPU computation kernel in intermediate language format to the server. The server compiles the host code and GPU kernel code into suitable machine instruction set architecture code for execution on CPU(s) and GPU(s) of the server. Once the application execution is complete, the server returns the results of the execution to the client device.

    SYSTEMS AND METHODS OF SUPPORTING PARALLEL PROCESSOR MESSAGE-BASED COMMUNICATIONS

    公开(公告)号:US20170289078A1

    公开(公告)日:2017-10-05

    申请号:US15084101

    申请日:2016-03-29

    Inventor: Shuai Che

    Abstract: A method of message-based communication is provided which includes executing, on one or more accelerated processing units, a plurality of groups of work items, receiving a first message from a first group of work items of the plurality of groups of work items executing on the one or more accelerated processing units and storing the first message at a first segment of memory allocated to a second group of work items of the plurality of groups of work items executing on the accelerated processing unit.

    GENERATING A SCHEDULE OF INSTRUCTIONS BASED ON A PROCESSOR MEMORY TREE
    16.
    发明申请
    GENERATING A SCHEDULE OF INSTRUCTIONS BASED ON A PROCESSOR MEMORY TREE 审中-公开
    根据处理器记忆树生成指令时间表

    公开(公告)号:US20160239278A1

    公开(公告)日:2016-08-18

    申请号:US14623180

    申请日:2015-02-16

    Inventor: Shuai Che

    CPC classification number: G06F8/4441

    Abstract: A processor employs a memory tree and a code generation and scheduling framework (CGSF) to generate instructions to access data at memory modules associated with the processor. The memory tree is a data structure having a plurality of nodes, with each node corresponding to a different memory module, memory cluster, or other portion of memory. The CGSF employs the memory tree to expose the memory hierarchy of the processor to a computer programmer. The computer programmer can employ compiler directives to identify nodes of the memory tree and to establish data ordering and manipulation formats for each node. Based on the directives and the memory tree, the CGSF generates schedules of instructions that, when executed at the processor, enforce the data ordering and manipulation formats.

    Abstract translation: 处理器使用存储器树和代码生成和调度框架(CGSF)来生成用于访问与处理器相关联的存储器模块中的数据的指令。 存储器树是具有多个节点的数据结构,每个节点对应于不同的存储器模块,存储器簇或存储器的其他部分。 CGSF使用记忆树将处理器的存储器层次结构公开到计算机编程器。 计算机程序员可以使用编译器指令来识别存储器树的节点,并为每个节点建立数据排序和操作格式。 基于指令和存储器树,CGSF生成指令的计划,当处理器执行时,执行数据排序和操作格式。

    METHOD AND SYSTEM FOR BLOCK SCHEDULING CONTROL IN A PROCESSOR BY REMAPPING
    17.
    发明申请
    METHOD AND SYSTEM FOR BLOCK SCHEDULING CONTROL IN A PROCESSOR BY REMAPPING 有权
    通过重新处理器进行块调度控制的方法和系统

    公开(公告)号:US20160117206A1

    公开(公告)日:2016-04-28

    申请号:US14523682

    申请日:2014-10-24

    CPC classification number: G06F9/547 G06F9/4881 G06T1/20 G06T2200/28

    Abstract: A method and a system for block scheduling are disclosed. The method includes retrieving an original block ID, determining a corresponding new block ID from a mapping, executing a new block corresponding to the new block ID, and repeating the retrieving, determining, and executing for each original block ID. The system includes a program memory configured to store multi-block computer programs, an identifier memory configured to store block identifiers (ID's), management hardware configured to retrieve an original block ID from the program memory, scheduling hardware configured to receive the original block ID from the management hardware and determine a new block ID corresponding to the original block ID using a stored mapping, and processing hardware configured to receive the new block ID from the scheduling hardware and execute a new block corresponding to the new block ID.

    Abstract translation: 公开了一种用于块调度的方法和系统。 该方法包括检索原始块ID,从映射确定相应的新块ID,执行与新块ID相对应的新块,并重复检索,确定和执行每个原始块ID。 该系统包括被配置为存储多块计算机程序的程序存储器,被配置为存储块标识符(ID)的标识符存储器,被配置为从程序存储器检索原始块ID的管理硬件,被配置为接收原始块ID的调度硬件 使用存储的映射来确定与原始块ID相对应的新块ID,以及配置为从调度硬件接收新块ID并执行与新块ID相对应的新块的处理硬件。

    Systems and methods of supporting parallel processor message-based communications

    公开(公告)号:US10681125B2

    公开(公告)日:2020-06-09

    申请号:US15084101

    申请日:2016-03-29

    Inventor: Shuai Che

    Abstract: A method of message-based communication is provided which includes executing, on one or more accelerated processing units, a plurality of groups of work items, receiving a first message from a first group of work items of the plurality of groups of work items executing on the one or more accelerated processing units and storing the first message at a first segment of memory allocated to a second group of work items of the plurality of groups of work items executing on the accelerated processing unit.

    Two-phase hybrid vertex classification

    公开(公告)号:US10134355B2

    公开(公告)日:2018-11-20

    申请号:US14720293

    申请日:2015-05-22

    Inventor: Shuai Che

    Abstract: A processor performs vertex coloring for a graph based at least in part on the degree of each vertex of the graph and based at least in part with another coloring approach, such as comparison of random values assigned to the vertices. For each vertex in the graph, a processor determines whether the degree of the vertex is a local maximum; that is, whether the degree of the vertex is greater than the degree of each of its connected vertices. Each vertex having a local-maximum degree is assigned a specified or randomly selected color, and is then omitted from future iterations of the coloring process. After a stop criterion is met, the processor assigns random values to the remaining uncolored vertices and assigns colors based on comparisons of the random values.

Patent Agency Ranking