Patent search ap:("Advanced Micro Devices Page Inc.") AND inv:"Shuai Che"

1.

发明申请
FLEXIBLE FRAMEWORK TO SUPPORT MEMORY SYNCHRONIZATION OPERATIONS 审中-公开

公开(公告)号：US20170293487A1

公开(公告)日：2017-10-12

申请号：US15096205

申请日：2016-04-11

Applicant: Advanced Micro Devices, Inc.

Inventor： Shuai Che , Marc S. Orr , Bradford M. Beckmann

IPC: G06F9/30 , G06F12/08

CPC classification number: G06F9/30087 , G06F12/0802 , G06F12/0811 , G06F12/0815 , G06F12/0837 , G06F12/0875 , G06F12/0897 , G06F12/10 , G06F2212/1016 , G06F2212/452

Abstract: A method of performing memory synchronization operations is provided that includes receiving, at a programmable cache controller in communication with one or more caches, an instruction in a first language to perform a memory synchronization operation of synchronizing a plurality of instruction sequences executing on a processor, mapping the received instruction in the first language to one or more selected cache operations in a second language executable by the cache controller and executing the one or more cache operations to perform the memory synchronization operation. The method further comprises receiving a second mapping that provides mapping instructions to map the received instruction to one or more other cache operations, mapping the received instruction to one or more other cache operations and executing the one or more other cache operations to perform the memory synchronization operation.

2.

发明申请
SYSTEM AND METHOD FOR REPURPOSING DEAD CACHE BLOCKS 有权
Title translation: 用于修复死卡块的系统和方法

公开(公告)号：US20160085677A1

公开(公告)日：2016-03-24

申请号：US14491296

申请日：2014-09-19

Applicant: Advanced Micro Devices, Inc.

Inventor： Gabriel H. Loh , Derek R. Hower , Shuai Che

IPC: G06F12/08

CPC classification number: G06F12/0815 , G06F12/0864 , G06F12/0891 , Y02D10/13

Abstract: A processing system having a multilevel cache hierarchy employs techniques for repurposing dead cache blocks so as to use otherwise wasted space in a cache hierarchy employing a write-back scheme. For a cache line containing invalid data with a valid tag, the valid tag is maintained for cache coherence purposes or otherwise, resulting in a valid tag for a dead cache block. A cache controller repurposes the dead cache block by storing any of a variety of new data at the dead cache block, while storing the new tag in a tag entry of a dead block tag way with an identifier indicating the location of the new data.

Abstract translation: 具有多级高速缓存层级的处理系统采用用于重新利用死缓存块的技术，以便在采用回写方案的高速缓存层级中使用另外浪费的空间。对于包含具有有效标签的无效数据的高速缓存行，维护有效标记用于高速缓存一致目的或其他方式，导致死缓存块的有效标签。高速缓存控制器通过将死缓存块中的各种新数据中的任何一个存储在死区缓存块中，同时将新标记存储在具有指示新数据的位置的标识符的死区标记方式的标签条目中来重新使用死区高速缓存块。

3.

发明申请
METHOD AND APPARATUS FOR PEER-TO-PEER MESSAGING IN HETEROGENEOUS MACHINE CLUSTERS 审中-公开

公开(公告)号：US20200293387A1

公开(公告)日：2020-09-17

申请号：US16887643

申请日：2020-05-29

Applicant: Advanced Micro Devices, Inc.

Inventor： Shuai Che

IPC: G06F9/54

Abstract: Various computing network messaging techniques and apparatus are disclosed. In one aspect, a method of computing is provided that includes executing a first thread and a second thread. A message is sent from the first thread to the second thread. The message includes a domain descriptor that identifies a first location of the first thread and a second location of the second thread.

4.

发明申请
METHOD AND SYSTEM FOR HARDWARE MAPPING INFERENCE PIPELINES 审中-公开

公开(公告)号：US20190318229A1

公开(公告)日：2019-10-17

申请号：US15952131

申请日：2018-04-12

Applicant: Advanced Micro Devices, Inc.

Inventor： Shuai Che

IPC: G06N3/063 , G06N3/04

Abstract: Methods and systems for hardware mapping inference pipelines in deep neural network (DNN) systems. Each layer of the inference pipeline is mapped to a queue, which in turn is associated with one or more processing elements. Each queue has multiple elements, where an element represents the task to be completed for a given input. Each input is associated with a queue packet which identifies, for example, a type of DNN layer, which DNN layer to use, a next DNN layer to use and a data pointer. A queue packet is written into the element of a queue, and the processing elements read the element and process the input based on the information in the queue packet. The processing element then writes another queue packet to another queue based on the processed queue packet. Multiple inputs can be processed in parallel and on-the-fly using the queues independent of layer starting points.

5.

发明授权
Message aggregation, combining and compression for efficient data communications in GPU-based clusters 有权

公开(公告)号：US10320695B2

公开(公告)日：2019-06-11

申请号：US15165953

申请日：2016-05-26

Applicant: Advanced Micro Devices, Inc.

Inventor： Steven K. Reinhardt , Marc S. Orr , Bradford M. Beckmann , Shuai Che , David A. Wood

IPC: G06F15/173 , H04L12/805 , H04L12/811

Abstract: A system and method for efficient management of network traffic management of highly data parallel computing. A processing node includes one or more processors capable of generating network messages. A network interface is used to receive and send network messages across a network. The processing node reduces at least one of a number or a storage size of the original network messages into one or more new network messages. The new network messages are sent to the network interface to send across the network.

6.

发明申请
TRANSMISSION OF LARGE MESSAGES IN COMPUTER SYSTEMS 审中-公开

公开(公告)号：US20180349215A1

公开(公告)日：2018-12-06

申请号：US15614498

申请日：2017-06-05

Applicant: Advanced Micro Devices, Inc.

Inventor： Shuai Che

IPC: G06F9/54

Abstract: Techniques for managing message transmission in a large networked computer system that includes multiple individual networked computing systems are disclosed. Message passing among the computing systems include a sending computing device transmitting a message to a receiver computing device and a receiver computing device consuming that message. A build-up of data stored in a buffer at the receiver can reduce performance. In order to reduce the potential performance degradation associated with large amounts of “waiting” data in the buffer, a sending computer system first determines whether the receiver computer system is ready to receive a message and does not transmit the message if the receiver computer system is not ready. To determine whether the receiver computer system is ready to receive a message, the receiver computer system, at the request of the sending computer system, checks a counting filter that stores indications of whether particular messages are ready.

7.

发明申请
MEMORY HIERARCHY-AWARE PROCESSING 审中-公开

公开(公告)号：US20180307603A1

公开(公告)日：2018-10-25

申请号：US15497162

申请日：2017-04-25

Applicant: Advanced Micro Devices, Inc.

Inventor： Shuai Che

IPC: G06F12/0811 , G06F9/50 , G06F12/0846

CPC classification number: G06F12/0811 , G06F9/5083 , G06F12/0848 , G06F2212/00

Abstract: Improvements to traditional schemes for storing data for processing tasks and for executing those processing tasks are disclosed. A set of data for which processing tasks are to be executed is processed through a hierarchy to distribute the data through various elements of a computer system. Levels of the hierarchy represent different types of memory or storage elements. Higher levels represent coarser portions of memory or storage elements and lower levels represent finer portions of memory or storage elements. Data proceeds through the hierarchy as “tasks” at different levels. Tasks at non-leaf nodes comprise tasks to subdivide data for storage in the finer granularity memories or storage units associated with a lower hierarchy level. Tasks at leaf nodes comprise processing work, such as a portion of a calculation. Two techniques for organizing the tasks in the hierarchy presented herein include a queue-based technique and a graph-based technique.

8.

发明申请
METHOD AND APPARATUS FOR MASKING AND TRANSMITTING DATA 审中-公开

公开(公告)号：US20180081818A1

公开(公告)日：2018-03-22

申请号：US15268974

申请日：2016-09-19

Applicant: Advanced Micro Devices, Inc.

Inventor： Shuai Che , Jieming Yin

IPC: G06F12/0897

CPC classification number: G06F12/0897 , G06F2212/1024 , G06F2212/60

Abstract: A method and apparatus for transmitting data includes determining whether to apply a mask to a cache line that includes a first type of data and a second type of data for transmission based upon a first criteria. The second type of data is filtered from the cache line, and the first type of data along with an identifier of the applied mask is transmitted. The first type of data and the identifier is received, and the second type of data is combined with the first type of data to recreate the cache line based upon the received identifier.

9.

发明授权
Mechanisms to save user/kernel copy for cross device communications 有权
Title translation: 保存用于交叉设备通信的用户/内核副本的机制

公开(公告)号：US09436395B2

公开(公告)日：2016-09-06

申请号：US14213640

申请日：2014-03-14

Applicant: Advanced Micro Devices, Inc.

Inventor： Blake A. Hechtman , Shuai Che

IPC: G06F9/26 , G06F3/06 , G06F12/10 , G06F9/52 , G06F9/54 , G06F12/14

CPC classification number: G06F3/0619 , G06F3/0637 , G06F3/065 , G06F3/067 , G06F9/52 , G06F9/54 , G06F12/1081 , G06F12/1491 , G06F2212/1024 , G06F2212/657

Abstract: Central processing units (CPUs) in computing systems manage graphics processing units (GPUs), network processors, security co-processors, and other data heavy devices as buffered peripherals using device drivers. Unfortunately, as a result of large and latency-sensitive data transfers between CPUs and these external devices, and memory partitioned into kernel-access and user-access spaces, these schemes to manage peripherals may introduce latency and memory use inefficiencies. Proposed are schemes to reduce latency and redundant memory copies using virtual to physical page remapping while maintaining user/kernel level access abstractions.

Abstract translation: 计算系统中的中央处理单元（CPU）使用设备驱动程序来管理图形处理单元（GPU），网络处理器，安全协处理器和其他数据重型设备作为缓冲外设。不幸的是，由于CPU和这些外部设备之间的大型和延迟敏感的数据传输，以及分区为内核访问和用户访问空间的内存，这些管理外设的方案可能会导致延迟和内存使用效率低下。提出的方案是在维护用户/内核级访问抽象的同时，使用虚拟到物理页面重映射来减少延迟和冗余内存副本。

10.

发明授权
Method and system for block scheduling control in a processor by remapping 有权
Title translation: 通过重映射处理器中块调度控制的方法和系统

公开(公告)号：US09430304B2

公开(公告)日：2016-08-30

申请号：US14523682

申请日：2014-10-24

Applicant: Advanced Micro Devices, Inc.

Inventor： Shuai Che , Derek R. Hower

IPC: G06F9/46 , G06F12/00 , G06F13/00 , G06F13/28 , G06F9/54 , G06T1/20

CPC classification number: G06F9/547 , G06F9/4881 , G06T1/20 , G06T2200/28

Abstract: A method and a system for block scheduling are disclosed. The method includes retrieving an original block ID, determining a corresponding new block ID from a mapping, executing a new block corresponding to the new block ID, and repeating the retrieving, determining, and executing for each original block ID. The system includes a program memory configured to store multi-block computer programs, an identifier memory configured to store block identifiers (ID's), management hardware configured to retrieve an original block ID from the program memory, scheduling hardware configured to receive the original block ID from the management hardware and determine a new block ID corresponding to the original block ID using a stored mapping, and processing hardware configured to receive the new block ID from the scheduling hardware and execute a new block corresponding to the new block ID.

Abstract translation: 公开了一种用于块调度的方法和系统。该方法包括检索原始块ID，从映射确定相应的新块ID，执行与新块ID相对应的新块，并重复检索，确定和执行每个原始块ID。该系统包括被配置为存储多块计算机程序的程序存储器，被配置为存储块标识符（ID）的标识符存储器，被配置为从程序存储器检索原始块ID的管理硬件，被配置为接收原始块ID的调度硬件使用存储的映射来确定与原始块ID相对应的新块ID，以及配置为从调度硬件接收新块ID并执行与新块ID相对应的新块的处理硬件。

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification