专利检索 ap:("Wei Liu" OR "Youfeng Wu" OR "Christopher B. Wilkerson" OR "Herbert H. Hum") AND inv:"Youfeng Wu" 第 1 页

1.

发明申请
Methods And Apparatuses For Efficient Load Processing Using Buffers 有权
标题翻译：使用缓冲器高效加载处理的方法和设备

公开(公告)号：US20110154002A1

公开(公告)日：2011-06-23

申请号：US12640707

申请日：2009-12-17

申请人： Wei Liu , Youfeng Wu , Christopher B. Wilkerson , Herbert H. Hum

发明人： Wei Liu , Youfeng Wu , Christopher B. Wilkerson , Herbert H. Hum

IPC分类号： G06F9/38

CPC分类号： G06F12/0888 , G06F8/4442 , G06F9/30043 , G06F9/3826 , G06F9/383 , Y02D10/13

摘要： Various embodiments of the invention concern methods and apparatuses for power and time efficient load handling. A compiler may identify producer loads, consumer reuse loads, consumer forwarded loads, and producer/consumer hybrid loads. Based on this identification, performance of the load may be efficiently directed to a load value buffer, store buffer, data cache, or elsewhere. Consequently, accesses to cache are reduced, through direct loading from load value buffers and store buffers, thereby efficiently processing the loads.

摘要翻译： 本发明的各种实施例涉及用于功率和时间有效的负载处理的方法和装置。编译器可以识别生产者负载，消费者重用负载，消费者转发负载以及生产者/消费者混合负载。基于该识别，可以将负载的性能有效地指向负载值缓冲器，存储缓冲器，数据高速缓存或其他位置。因此，通过从负载值缓冲区和存储缓冲区直接加载，从而降低对高速缓存的访问，从而有效地处理负载。

2.

发明授权
Methods and apparatuses for efficient load processing using buffers 有权
标题翻译：使用缓冲区进行高效加载处理的方法和装置

公开(公告)号：US08452946B2

公开(公告)日：2013-05-28

申请号：US12640707

申请日：2009-12-17

申请人： Wei Liu , Youfeng Wu , Christopher B. Wilkerson , Herbert H. Hum

发明人： Wei Liu , Youfeng Wu , Christopher B. Wilkerson , Herbert H. Hum

IPC分类号： G06F9/30 , G06F9/40 , G06F15/00

CPC分类号： G06F12/0888 , G06F8/4442 , G06F9/30043 , G06F9/3826 , G06F9/383 , Y02D10/13

摘要： Various embodiments of the invention concern methods and apparatuses for power and time efficient load handling. A compiler may identify producer loads, consumer reuse loads, consumer forwarded loads, and producer/consumer hybrid loads. Based on this identification, performance of the load may be efficiently directed to a load value buffer, store buffer, data cache, or elsewhere. Consequently, accesses to cache are reduced, through direct loading from load value buffers and store buffers, thereby efficiently processing the loads.

摘要翻译： 本发明的各种实施例涉及用于功率和时间有效的负载处理的方法和装置。编译器可以识别生产者负载，消费者重用负载，消费者转发负载以及生产者/消费者混合负载。基于该识别，可以将负载的性能有效地指向负载值缓冲器，存储缓冲器，数据高速缓存或其他位置。因此，通过从负载值缓冲区和存储缓冲区的直接加载，减少对高速缓存的访问，从而有效地处理负载。

3.

发明申请
Software constructed stands for execution on a multi-core architecture 有权
标题翻译：构建的软件代表在多核架构上执行

公开(公告)号：US20090077360A1

公开(公告)日：2009-03-19

申请号：US11901644

申请日：2007-09-18

申请人： Wei Liu , Lixin Su , Youfeng Wu , Herbert Hum

发明人： Wei Liu , Lixin Su , Youfeng Wu , Herbert Hum

IPC分类号： G06F9/44 , G06F9/38

CPC分类号： G06F8/433

摘要： In one embodiment, the present invention includes a software-controlled method of forming instruction strands. The software may include instructions to obtain code of a superblock including a plurality of basic blocks, build a dependency directed acyclic graph (DAG) for the code, sort nodes coupled by edges of the dependency DAG into a topological order, form strands from the nodes based on hardware constraints, rule constraints, and scheduling constraints, and generate executable code for the strands and store the executable code in a storage. Other embodiments are described and claimed.

摘要翻译： 在一个实施例中，本发明包括一种形成指令串的软件控制方法。软件可以包括用于获得包括多个基本块的超级块的代码的指令，为代码构建依赖性有向非循环图（DAG），将依赖性DAG的边缘耦合的分类节点排列成拓扑顺序，从节点形成线基于硬件约束，规则约束和调度约束，并且生成链的可执行代码并将可执行代码存储在存储器中。描述和要求保护其他实施例。

4.

发明授权
Software constructed strands for execution on a multi-core architecture 有权
标题翻译：用于在多核架构上执行的软件构造的线

公开(公告)号：US08789031B2

公开(公告)日：2014-07-22

申请号：US11901644

申请日：2007-09-18

申请人： Wei Liu , Lixin Su , Youfeng Wu , Herbert Hum

发明人： Wei Liu , Lixin Su , Youfeng Wu , Herbert Hum

IPC分类号： G06F9/45

CPC分类号： G06F8/433

摘要： In one embodiment, the present invention includes a software-controlled method of forming instruction strands. The software may include instructions to obtain code of a superblock including a plurality of basic blocks, build a dependency directed acyclic graph (DAG) for the code, sort nodes coupled by edges of the dependency DAG into a topological order, form strands from the nodes based on hardware constraints, rule constraints, and scheduling constraints, and generate executable code for the strands and store the executable code in a storage. Other embodiments are described and claimed.

摘要翻译： 在一个实施例中，本发明包括一种形成指令串的软件控制方法。软件可以包括用于获得包括多个基本块的超级块的代码的指令，为代码构建依赖性有向非循环图（DAG），将依赖性DAG的边缘耦合的分类节点排列成拓扑顺序，从节点形成线基于硬件约束，规则约束和调度约束，并且生成链的可执行代码并将可执行代码存储在存储器中。描述和要求保护其他实施例。

5.

发明申请
DYNAMIC DATA SYNCHRONIZATION IN THREAD-LEVEL SPECULATION 审中-公开
标题翻译：动态数据同步在线程分析

公开(公告)号：US20110320781A1

公开(公告)日：2011-12-29

申请号：US12826287

申请日：2010-06-29

申请人： Wei Liu , Youfeng Wu

发明人： Wei Liu , Youfeng Wu

IPC分类号： G06F9/312 , G06F12/02 , G06F12/08

CPC分类号： G06F9/3834 , G06F9/3004 , G06F9/30087 , G06F9/3851 , G06F9/52

摘要： In one embodiment, the present invention introduces a speculation engine to parallelize serial instructions by creating separate threads from the serial instructions and inserting processor instructions to set a synchronization bit before a dependence source and to clear the synchronization bit after a dependence source, where the synchronization bit is designed to stall a dependence sink from a thread running on a separate core. Other embodiments are described and claimed.

摘要翻译： 在一个实施例中，本发明引入了一种推测引擎，以通过从串行指令中创建单独的线程并插入处理器指令来在依赖源之前设置同步位并在依赖源之后清除同步位，从而并行化串行指令，其中同步位被设计为从在单独核心上运行的线程停止依赖宿。描述和要求保护其他实施例。

6.

发明申请
DYNAMIC OPTIMIZATION FOR CONDITIONAL COMMIT 审中-公开
标题翻译：动态优化条件咨询

公开(公告)号：US20120079245A1

公开(公告)日：2012-03-29

申请号：US12890638

申请日：2010-09-25

申请人： Cheng Wang , Edson Borin , Youfeng Wu , Shiliang Hu , Wei Liu , Mauricio Breternitz, JR.

发明人： Cheng Wang , Edson Borin , Youfeng Wu , Shiliang Hu , Wei Liu , Mauricio Breternitz, JR.

IPC分类号： G06F9/312 , G06F9/38 , G06F9/30

CPC分类号： G06F9/3842 , G06F8/52 , G06F9/3004 , G06F9/30072 , G06F9/30087 , G06F9/30116 , G06F9/3857

摘要： An apparatus and method is described herein for conditionally committing and/or speculative checkpointing transactions, which potentially results in dynamic resizing of transactions. During dynamic optimization of binary code, transactions are inserted to provide memory ordering safeguards, which enables a dynamic optimizer to more aggressively optimize code. And the conditional commit enables efficient execution of the dynamic optimization code, while attempting to prevent transactions from running out of hardware resources. While the speculative checkpoints enable quick and efficient recovery upon abort of a transaction. Processor hardware is adapted to support dynamic resizing of the transactions, such as including decoders that recognize a conditional commit instruction, a speculative checkpoint instruction, or both. And processor hardware is further adapted to perform operations to support conditional commit or speculative checkpointing in response to decoding such instructions.

摘要翻译： 本文描述了用于有条件地提交和/或推测性检查点事务的装置和方法，这可能导致事务的动态调整大小。在二进制代码的动态优化期间，插入事务以提供存储器排序保护措施，这使得动态优化器能够更积极地优化代码。并且条件提交可以有效地执行动态优化代码，同时尝试防止事务用尽硬件资源。虽然投机检查点能够在中止交易后快速有效地恢复。处理器硬件适于支持事务的动态调整大小，诸如包括识别条件提交指令的解码器，推测性检查点指令或两者。并且处理器硬件还适于执行响应于解码这样的指令来支持条件提交或推测性检查点的操作。

7.

发明申请
APPARATUS, METHOD, AND SYSTEM FOR IMPROVING POWER, PERFORMANCE EFFICIENCY BY COUPLING A FIRST CORE TYPE WITH A SECOND CORE TYPE 审中-公开
标题翻译：用于提高功率的装置，方法和系统，通过与第二核心类型耦合的第一核心类型的性能效率

公开(公告)号：US20110320766A1

公开(公告)日：2011-12-29

申请号：US12826107

申请日：2010-06-29

申请人： Youfeng Wu , Shiliang Hu , Edson Borin , Cheng C. Wang , Mauricio Breternitz, JR. , Wei Liu

发明人： Youfeng Wu , Shiliang Hu , Edson Borin , Cheng C. Wang , Mauricio Breternitz, JR. , Wei Liu

IPC分类号： G06F9/30 , G06F15/76

CPC分类号： G06F9/30076 , G06F9/30174 , G06F9/3879 , G06F9/4893 , Y02D10/24

摘要： An apparatus and method is described herein for coupling a processor core of a first type with a co-designed core of a second type. Execution of program code on the first core is monitored and hot sections of the program code are identified. Those hot sections are optimize for execution on the co-designed core, such that upon subsequently encountering those hot sections, the optimized hot sections are executed on the co-designed core. When the co-designed core is executing optimized hot code, the first processor core may be in a low-power state to save power or executing other code in parallel. Furthermore, multiple threads of cold code may be pipelined on the first core, while multiple threads of hot code are pipeline on the co-designed core to achieve maximum performance.

摘要翻译： 本文描述了一种用于将第一类型的处理器核与第二类型的共同设计的核耦合的装置和方法。对第一个核心上的程序代码执行进行监控，并且识别程序代码的热部分。这些热部分优化用于在共同设计的芯上执行，使得在随后遇到这些热部分时，优化的热部分在共同设计的核上执行。当共同设计的核心正在执行优化的热代码时，第一处理器核心可以处于低功率状态以节省功率或并行执行其他代码。此外，多个冷码线程可以在第一核心上流水线化，而多个热代码线程在共同设计的核心上进行流水线以实现最大性能。

8.

发明申请
TECHNOLOGIES FOR LOW-LEVEL COMPOSABLE HIGH PERFORMANCE COMPUTING LIBRARIES 有权
标题翻译：低水平复合高性能计算机图形技术

公开(公告)号：US20160188305A1

公开(公告)日：2016-06-30

申请号：US14583657

申请日：2014-12-27

申请人： Hongbo Rong , Peng Tu , Tatiana Shpeisman , Hai Liu , Todd A. Anderson , Youfeng Wu , Arthur N. Glew , Paul M. PetersEn , Victor W. Lee , P.G. Lowney , Arch D. Robinson , Cheng Wang

发明人： Hongbo Rong , Peng Tu , Tatiana Shpeisman , Hai Liu , Todd A. Anderson , Youfeng Wu , Arthur N. Glew , Paul M. PetersEn , Victor W. Lee , P.G. Lowney , Arch D. Robinson , Cheng Wang

IPC分类号： G06F9/45 , G06F9/44

CPC分类号： G06F8/43 , G06F8/31 , G06F8/443 , G06F8/4441 , G06F8/453 , G06F8/49 , G06F8/54

摘要： Technologies for generating composable library functions include a first computing device that includes a library compiler configured to compile a composable library and second computing device that includes an application compiler configured to compose library functions of the composable library based on a plurality of abstractions written at different levels of abstractions. For example, the abstractions may include an algorithm abstraction at a high level, a blocked-algorithm abstraction at medium level, and a region-based code abstraction at a low level. Other embodiments are described and claimed herein.

摘要翻译： 用于生成可组合库函数的技术包括：第一计算设备，其包括被配置为编译可组合库的库编译器和第二计算设备，所述第二计算设备包括应用编译器，所述应用编译器被配置为基于以不同级别写入的多个抽象来组合所述可组合库的库函数的抽象。例如，抽象可以包括高级别的算法抽象，中等级别的阻塞算法抽象，以及低级别的基于区域的代码抽象。在此描述和要求保护的其它实施例。

9.

发明申请
SOFTWARE REPLAYER FOR TRANSACTIONAL MEMORY PROGRAMS 有权
标题翻译：用于实时存储器程序的软件更新程序

公开(公告)号：US20150277968A1

公开(公告)日：2015-10-01

申请号：US14226312

申请日：2014-03-26

申请人： Justin E. Gottschlich , Gilles A. Pokam , Shiliang Hu , Rolf Kassa , Youfeng Wu , Irina Calciu

发明人： Justin E. Gottschlich , Gilles A. Pokam , Shiliang Hu , Rolf Kassa , Youfeng Wu , Irina Calciu

IPC分类号： G06F9/46 , G06F11/36 , G11C7/10

CPC分类号： G06F9/467 , G06F11/34 , G06F11/362 , G06F11/3648

摘要： A system is disclosed that includes a processor and a dynamic random access memory (DRAM). The processor includes a hybrid transactional memory (HyTM) that includes hardware transactional memory (HTM), and a program debugger to replay a program that includes an HTM instruction and that has been executed has been executed using the HyTM. The program debugger includes a software emulator that is to replay the HTM instruction by emulation of the HTM. Other embodiments are disclosed and claimed.

摘要翻译： 公开了一种包括处理器和动态随机存取存储器（DRAM）的系统。处理器包括包括硬件事务存储器（HTM）的混合事务处理存储器（HyTM），并且使用HyTM执行用于重放包括HTM指令并已被执行的程序的程序调试器。程序调试器包括一个软件仿真器，通过仿真HTM来重播HTM指令。公开和要求保护其他实施例。

10.

发明申请
INSTRUCTION AND LOGIC TO EFFICIENTLY MONITOR LOOP TRIP COUNT 有权
标题翻译：指令和逻辑到有效的监视器循环次数

公开(公告)号：US20140208085A1

公开(公告)日：2014-07-24

申请号：US13996861

申请日：2012-03-30

申请人： Jaewoong Chung , Hyunchul Park , Hongbo Rong , Cheng Wang , Youfeng Wu

发明人： Jaewoong Chung , Hyunchul Park , Hongbo Rong , Cheng Wang , Youfeng Wu

IPC分类号： G06F9/32

CPC分类号： G06F9/325 , G06F8/443 , G06F9/30072 , G06F9/3842 , G06F9/3857 , G06F11/3409 , G06F11/348 , G06F2201/88

摘要： Logic and instruction to efficiently monitor loop trip count. Loop trip count information of a loop may be stored in a dedicated hardware buffer. Average loop trip count of the loop may be calculated based on the stored loop trip count information. Based on the average trip count, loop optimizations may be applied or removed from the loop. The stored loop trip count information may include an identifier identifying the loop, a total loop trip count of the loop, and an exit count of the loop.

摘要翻译： 有效监控回路行程数的逻辑和指令。循环的循环行程计数信息可以存储在专用硬件缓冲器中。可以基于存储的循环行程计数信息来计算循环的平均循环行程计数。基于平均行程计数，循环优化可以从循环中应用或移除。存储的循环行程计数信息可以包括标识循环的标识符，循环的总循环行程计数以及循环的退出计数。

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类