SYSTEM AND METHOD FOR REPURPOSING DEAD CACHE BLOCKS
    2.
    发明申请
    SYSTEM AND METHOD FOR REPURPOSING DEAD CACHE BLOCKS 有权
    用于修复死卡块的系统和方法

    公开(公告)号:US20160085677A1

    公开(公告)日:2016-03-24

    申请号:US14491296

    申请日:2014-09-19

    CPC classification number: G06F12/0815 G06F12/0864 G06F12/0891 Y02D10/13

    Abstract: A processing system having a multilevel cache hierarchy employs techniques for repurposing dead cache blocks so as to use otherwise wasted space in a cache hierarchy employing a write-back scheme. For a cache line containing invalid data with a valid tag, the valid tag is maintained for cache coherence purposes or otherwise, resulting in a valid tag for a dead cache block. A cache controller repurposes the dead cache block by storing any of a variety of new data at the dead cache block, while storing the new tag in a tag entry of a dead block tag way with an identifier indicating the location of the new data.

    Abstract translation: 具有多级高速缓存层级的处理系统采用用于重新利用死缓存块的技术,以便在采用回写方案的高速缓存层级中使用另外浪费的空间。 对于包含具有有效标签的无效数据的高速缓存行,维护有效标记用于高速缓存一致目的或其他方式,导致死缓存块的有效标签。 高速缓存控制器通过将死缓存块中的各种新数据中的任何一个存储在死区缓存块中,同时将新标记存储在具有指示新数据的位置的标识符的死区标记方式的标签条目中来重新使用死区高速缓存块。

    METHOD AND APPARATUS FOR PEER-TO-PEER MESSAGING IN HETEROGENEOUS MACHINE CLUSTERS

    公开(公告)号:US20200293387A1

    公开(公告)日:2020-09-17

    申请号:US16887643

    申请日:2020-05-29

    Inventor: Shuai Che

    Abstract: Various computing network messaging techniques and apparatus are disclosed. In one aspect, a method of computing is provided that includes executing a first thread and a second thread. A message is sent from the first thread to the second thread. The message includes a domain descriptor that identifies a first location of the first thread and a second location of the second thread.

    METHOD AND SYSTEM FOR HARDWARE MAPPING INFERENCE PIPELINES

    公开(公告)号:US20190318229A1

    公开(公告)日:2019-10-17

    申请号:US15952131

    申请日:2018-04-12

    Inventor: Shuai Che

    Abstract: Methods and systems for hardware mapping inference pipelines in deep neural network (DNN) systems. Each layer of the inference pipeline is mapped to a queue, which in turn is associated with one or more processing elements. Each queue has multiple elements, where an element represents the task to be completed for a given input. Each input is associated with a queue packet which identifies, for example, a type of DNN layer, which DNN layer to use, a next DNN layer to use and a data pointer. A queue packet is written into the element of a queue, and the processing elements read the element and process the input based on the information in the queue packet. The processing element then writes another queue packet to another queue based on the processed queue packet. Multiple inputs can be processed in parallel and on-the-fly using the queues independent of layer starting points.

    TRANSMISSION OF LARGE MESSAGES IN COMPUTER SYSTEMS

    公开(公告)号:US20180349215A1

    公开(公告)日:2018-12-06

    申请号:US15614498

    申请日:2017-06-05

    Inventor: Shuai Che

    Abstract: Techniques for managing message transmission in a large networked computer system that includes multiple individual networked computing systems are disclosed. Message passing among the computing systems include a sending computing device transmitting a message to a receiver computing device and a receiver computing device consuming that message. A build-up of data stored in a buffer at the receiver can reduce performance. In order to reduce the potential performance degradation associated with large amounts of “waiting” data in the buffer, a sending computer system first determines whether the receiver computer system is ready to receive a message and does not transmit the message if the receiver computer system is not ready. To determine whether the receiver computer system is ready to receive a message, the receiver computer system, at the request of the sending computer system, checks a counting filter that stores indications of whether particular messages are ready.

    MEMORY HIERARCHY-AWARE PROCESSING
    7.
    发明申请

    公开(公告)号:US20180307603A1

    公开(公告)日:2018-10-25

    申请号:US15497162

    申请日:2017-04-25

    Inventor: Shuai Che

    CPC classification number: G06F12/0811 G06F9/5083 G06F12/0848 G06F2212/00

    Abstract: Improvements to traditional schemes for storing data for processing tasks and for executing those processing tasks are disclosed. A set of data for which processing tasks are to be executed is processed through a hierarchy to distribute the data through various elements of a computer system. Levels of the hierarchy represent different types of memory or storage elements. Higher levels represent coarser portions of memory or storage elements and lower levels represent finer portions of memory or storage elements. Data proceeds through the hierarchy as “tasks” at different levels. Tasks at non-leaf nodes comprise tasks to subdivide data for storage in the finer granularity memories or storage units associated with a lower hierarchy level. Tasks at leaf nodes comprise processing work, such as a portion of a calculation. Two techniques for organizing the tasks in the hierarchy presented herein include a queue-based technique and a graph-based technique.

    METHOD AND APPARATUS FOR MASKING AND TRANSMITTING DATA

    公开(公告)号:US20180081818A1

    公开(公告)日:2018-03-22

    申请号:US15268974

    申请日:2016-09-19

    CPC classification number: G06F12/0897 G06F2212/1024 G06F2212/60

    Abstract: A method and apparatus for transmitting data includes determining whether to apply a mask to a cache line that includes a first type of data and a second type of data for transmission based upon a first criteria. The second type of data is filtered from the cache line, and the first type of data along with an identifier of the applied mask is transmitted. The first type of data and the identifier is received, and the second type of data is combined with the first type of data to recreate the cache line based upon the received identifier.

    Mechanisms to save user/kernel copy for cross device communications
    9.
    发明授权
    Mechanisms to save user/kernel copy for cross device communications 有权
    保存用于交叉设备通信的用户/内核副本的机制

    公开(公告)号:US09436395B2

    公开(公告)日:2016-09-06

    申请号:US14213640

    申请日:2014-03-14

    Abstract: Central processing units (CPUs) in computing systems manage graphics processing units (GPUs), network processors, security co-processors, and other data heavy devices as buffered peripherals using device drivers. Unfortunately, as a result of large and latency-sensitive data transfers between CPUs and these external devices, and memory partitioned into kernel-access and user-access spaces, these schemes to manage peripherals may introduce latency and memory use inefficiencies. Proposed are schemes to reduce latency and redundant memory copies using virtual to physical page remapping while maintaining user/kernel level access abstractions.

    Abstract translation: 计算系统中的中央处理单元(CPU)使用设备驱动程序来管理图形处理单元(GPU),网络处理器,安全协处理器和其他数据重型设备作为缓冲外设。 不幸的是,由于CPU和这些外部设备之间的大型和延迟敏感的数据传输,以及分区为内核访问和用户访问空间的内存,这些管理外设的方案可能会导致延迟和内存使用效率低下。 提出的方案是在维护用户/内核级访问抽象的同时,使用虚拟到物理页面重映射来减少延迟和冗余内存副本。

    Method and system for block scheduling control in a processor by remapping
    10.
    发明授权
    Method and system for block scheduling control in a processor by remapping 有权
    通过重映射处理器中块调度控制的方法和系统

    公开(公告)号:US09430304B2

    公开(公告)日:2016-08-30

    申请号:US14523682

    申请日:2014-10-24

    CPC classification number: G06F9/547 G06F9/4881 G06T1/20 G06T2200/28

    Abstract: A method and a system for block scheduling are disclosed. The method includes retrieving an original block ID, determining a corresponding new block ID from a mapping, executing a new block corresponding to the new block ID, and repeating the retrieving, determining, and executing for each original block ID. The system includes a program memory configured to store multi-block computer programs, an identifier memory configured to store block identifiers (ID's), management hardware configured to retrieve an original block ID from the program memory, scheduling hardware configured to receive the original block ID from the management hardware and determine a new block ID corresponding to the original block ID using a stored mapping, and processing hardware configured to receive the new block ID from the scheduling hardware and execute a new block corresponding to the new block ID.

    Abstract translation: 公开了一种用于块调度的方法和系统。 该方法包括检索原始块ID,从映射确定相应的新块ID,执行与新块ID相对应的新块,并重复检索,确定和执行每个原始块ID。 该系统包括被配置为存储多块计算机程序的程序存储器,被配置为存储块标识符(ID)的标识符存储器,被配置为从程序存储器检索原始块ID的管理硬件,被配置为接收原始块ID的调度硬件 使用存储的映射来确定与原始块ID相对应的新块ID,以及配置为从调度硬件接收新块ID并执行与新块ID相对应的新块的处理硬件。

Patent Agency Ranking