Patent search ap:"YOUFENG WU" Page 3

21.

发明授权
Run-ahead program execution with value prediction 失效
Title translation: 带有价值预测的预测程序执行

公开(公告)号：US07188234B2

公开(公告)日：2007-03-06

申请号：US10017793

申请日：2001-12-12

Applicant: Youfeng Wu , Tin-Fook Ngai

Inventor： Youfeng Wu , Tin-Fook Ngai

IPC: G06F9/312

CPC classification number: G06F9/383 , G06F9/3832 , G06F9/3842 , G06F9/3861

Abstract: A data processing apparatus, a computer, an article including a machine-accessible medium, and a method of processing data are disclosed. The data processing apparatus may include a pair of pipelines sharing an instruction cache, data cache, and a branch predictor with the second pipeline running ahead of the first pipeline using a data value prediction module. The pipelines may be included in one or more processors and coupled to a memory to form a computer. The method includes executing a plurality of instructions using the pipeline pair, such that when a cache miss is encountered by the second pipeline during execution of a LOAD instruction, the data value prediction module supplies a predicted load value in lieu of a cached value, enabling continued execution of the plurality of instructions by the second pipeline without waiting for the return of the cached value.

Abstract translation: 公开了一种数据处理装置，计算机，包括机器可访问介质的物品和处理数据的方法。数据处理装置可以包括使用数据值预测模块，共享指令高速缓存，数据高速缓存和分支预测器的一对管线，其中第二管线在第一管线之前运行。管线可以包括在一个或多个处理器中并且耦合到存储器以形成计算机。该方法包括使用流水线对来执行多个指令，使得当在执行LOAD指令期间由第二流水线遇到高速缓存未命中时，数据值预测模块提供代替缓存值的预测负载值，使能通过第二管道继续执行多个指令，而不等待返回缓存的值。

22.

发明授权
Cache mechanism 失效
Title translation: 缓存机制

公开(公告)号：US07120749B2

公开(公告)日：2006-10-10

申请号：US10803452

申请日：2004-03-18

Applicant: Ryan Rakvic , Youfeng Wu , Bryan Black , John Shen

Inventor： Ryan Rakvic , Youfeng Wu , Bryan Black , John Shen

IPC: G06F12/12

CPC classification number: G06F12/0848 , G06F12/0888

Abstract: According to one embodiment a system is disclosed. The system includes a central processing unit (CPU), a first cache memory coupled to the CPU to store only data for vital loads that are to be immediately processed at the CPU, a second cache memory coupled to the CPU to store data for semi-vital loads to be processed at the CPU, and a third cache memory coupled to the CPU, the first cache memory and the second cache memory to store non-vital loads to be processed at the CPU.

Abstract translation: 根据一个实施例，公开了一种系统。该系统包括中央处理单元（CPU），第一高速缓存存储器，其耦合到CPU以仅存储要在CPU处理的重要负载的数据;耦合到CPU的第二高速缓存存储器，在CPU处理的重要负载，以及耦合到CPU，第一高速缓冲存储器和第二高速缓冲存储器的第三高速缓存存储器，用于存储要在CPU处理的非重要负载。

23.

发明授权
Software set-value profiling and code reuse 有权
Title translation: 软件设置值分析和代码重用

公开(公告)号：US07100155B1

公开(公告)日：2006-08-29

申请号：US09522510

申请日：2000-03-10

Applicant: Youfeng Wu

Inventor： Youfeng Wu

IPC: G06F9/45 , G06F9/44

CPC classification number: G06F8/443

Abstract: An apparatus and method for profiling candidate reuse regions and candidate load instructions aids in the selection of computation reuse regions and computation reuse instructions with good reuse qualities. Registers holding input values for candidate reuse regions are sampled periodically when the candidate reuse region is encountered. The register contents are combined into set-values. When a relatively small number of set-values account for a large percentage of occurrences, the candidate reuse region may be a good computation reuse region. Load instructions are profiled for the location accessed and the value loaded. The location and value are combined into location-values. The relative occurrence frequency of location-values can be used to evaluate load instructions as candidate instructions for reuse.

Abstract translation: 用于分析候选重用区域和候选加载指令的装置和方法有助于选择具有良好重用质量的计算重用区域和计算重用指令。当候选重用区域被遇到时，周期性地对候选重用区域保持输入值的寄存器进行采样。寄存器内容被组合成设定值。当相对较少数量的设定值占很大比例时，候选重用区域可能是一个很好的计算重用区域。为访问的位置和加载的值分配加载指令。位置和值被组合成位置值。位置值的相对出现频率可用于评估加载指令作为重用的候选指令。

24.

发明授权
Early exit transformations for software pipelining 有权
Title translation: 软件流水线的早期退出转换

公开(公告)号：US06571385B1

公开(公告)日：2003-05-27

申请号：US09273947

申请日：1999-03-22

Applicant: Kalyan Muthukumar , Dong-Yuan Chen , Youfeng Wu , Daniel M. Lavery

Inventor： Kalyan Muthukumar , Dong-Yuan Chen , Youfeng Wu , Daniel M. Lavery

IPC: G06F944

CPC classification number: G06F9/325 , G06F8/4452 , G06F9/30072 , G06F9/30094

Abstract: The invention is directed to the transformation of software loops having early exit conditions, thereby allowing the loops to be more effectively converted to a single basic block for software pipelining. The invention assigns a predicate register for each early exit condition of the software loop. The predicate registers are set when the corresponding early exit condition is satisfied. In this manner, when the loop terminates the predicate registers can be examined to indicate which early exit conditions were satisfied. The invention produces loops having a lower recurrence II and resource II than conventional techniques.

Abstract translation: 本发明涉及具有早期退出条件的软件循环的变换，从而允许循环更有效地转换成用于软件流水线化的单个基本块。本发明为软件循环的每个提前退出条件分配谓词寄存器。当满足相应的提前退出条件时，设定谓词寄存器。以这种方式，当循环终止时，可以检查谓词寄存器以指示哪个早期退出条件被满足。本发明产生具有比常规技术更低的复发II和资源II的环。

25.

发明授权
Optimizing compiler with static prediction of branch probability, branch frequency and function frequency 失效
Title translation: 优化编译器与分支概率，分支频率和功能频率的静态预测

公开(公告)号：US5655122A

公开(公告)日：1997-08-05

申请号：US417219

申请日：1995-04-05

Applicant: Youfeng Wu

Inventor： Youfeng Wu

IPC: G06F9/45

CPC classification number: G06F8/445 , G06F8/4441

Abstract: A compiler and method for optimizing a program based on branch probabilities, branch frequencies and function frequencies. A number of algorithms executed by the compiler determine statically from the program code the probabilities that branches with the program are taken and how often the branches are taken. With this information, the compiler arranges the object code in memory to improve execution of the program. The frequency of functions within the code may be determined from the branch probability and branch frequency information. The compiler uses the function frequency information to arrange the functions in a desirable order, such as storing function pairs with the highest global call frequencies on the same memory page. This minimizes the number of calls to functions that are stored on disk and thus improves the speed of execution of the program.

Abstract translation: 一种基于分支概率，分支频率和功能频率优化程序的编译器和方法。由编译器执行的许多算法从程序代码中静态地确定采用程序分支的概率以及分支采用的频率。使用这些信息，编译器将目标代码安排在内存中，以改善程序的执行。代码内的功能频率可以根据分支概率和分支频率信息来确定。编译器使用功能频率信息以期望的顺序排列功能，例如在同一存储器页面上存储具有最高全局呼叫频率的功能对。这最小化了对存储在磁盘上的函数的调用次数，从而提高了程序的执行速度。

26.

发明申请
METHOD AND APPARATUS FOR SPECULATIVE VECTORIZATION 审中-公开

公开(公告)号：US20180018177A1

公开(公告)日：2018-01-18

申请号：US15653403

申请日：2017-07-18

Applicant: NALINI VASUDEVAN , CHENG WANG , YOUFENG WU , ALBERT HARTONO , SARA S. BAGHSORKHI

Inventor： NALINI VASUDEVAN , CHENG WANG , YOUFENG WU , ALBERT HARTONO , SARA S. BAGHSORKHI

IPC: G06F9/38 , G06F9/30 , G06F15/80

CPC classification number: G06F9/3842 , G06F9/30032 , G06F9/30036 , G06F9/3004 , G06F9/30043 , G06F9/3013 , G06F9/30174 , G06F9/3824 , G06F9/3834 , G06F9/3838 , G06F9/384 , G06F15/8053

Abstract: An apparatus and method for speculative vectorization. For example, one embodiment of a processor comprises: a queue comprising a set of locations for storing addresses associated with vectorized memory access instructions; and execution logic to execute a first vectorized memory access instruction to access the queue and to compare a new address associated with the first vectorized memory access instruction with existing addresses stored within a specified range of locations within the queue to detect whether a conflict exists, the existing addresses having been previously stored responsive to one or more prior vectorized memory access instructions.

27.

发明授权
Energy/performance with optimal communication in dynamic parallelization of single threaded programs 有权

公开(公告)号：US09715376B2

公开(公告)日：2017-07-25

申请号：US12344721

申请日：2008-12-29

Applicant: Cheng Wang , Youfeng Wu

Inventor： Cheng Wang , Youfeng Wu

IPC: G06F9/44 , G06F9/45

CPC classification number: G06F8/443 , G06F8/41 , G06F8/445 , G06F8/457

Abstract: A method and apparatus for optimizing parallelized single threaded programs is herein described. Code regions, such as dependency chains, are replicated utilizing any known method, such as dynamic code replication. A flow network associated with a replicated code region is built and a minimum cut algorithm is applied to determine duplicated nodes, which may include a single instruction or a group of instructions, to be removed. The dependency of removed nodes is fulfilled with inserted communication to ensure proper data consistency of the original single-threaded program. As a result, both performance and power consumption is optimized for parallel code sections through removal of expensive workload nodes and replacement with communication between other replicated code regions to be executed in parallel.

28.

发明申请
FAST APPROXIMATE CONFLICT DETECTION 有权
Title translation: 快速反应冲突检测

公开(公告)号：US20160188392A1

公开(公告)日：2016-06-30

申请号：US14582430

申请日：2014-12-24

Applicant: SARA S. BAGHSORKHI , ALBERT HARTONO , YOUFENG WU , CHENG WANG

Inventor： SARA S. BAGHSORKHI , ALBERT HARTONO , YOUFENG WU , CHENG WANG

IPC: G06F11/07

CPC classification number: G06F9/50 , G06F9/30 , G06F9/3834 , G06F9/3838

Abstract: The present disclosure is directed to fast approximate conflict detection. A device may comprise, for example, a memory, a processor and a fast conflict detection module (FCDM) to cause the processor to perform fast conflict detection. The FCDM may cause the processor to read a first and second vector from memory, and to then generate summaries based on the first and second vectors. The summaries may be, for example, shortened versions of write and read addresses in the first and second vectors. The FCDM may then cause the processor to distribute the summaries into first and second summary vectors, and may then determine potential conflicts between the first and second vectors by comparing the first and second summary vectors. The summaries may be distributed into the first and second summary vectors in a manner allowing all of the summaries to be compared to each other in one vector comparison transaction.

Abstract translation: 本公开涉及快速近似冲突检测。设备可以包括例如存储器，处理器和快速冲突检测模块（FCDM），以使处理器执行快速冲突检测。 FCDM可以使处理器从存储器读取第一和第二矢量，然后基于第一和第二矢量生成汇总。摘要可以是例如第一和第二向量中的写入和读取地址的缩写版本。然后，FCDM可以使处理器将摘要分发到第一和第二摘要向量中，然后可以通过比较第一和第二概括向量来确定第一和第二向量之间的潜在冲突。总结可以以允许在一个向量比较事务中将所有概要相互比较的方式分发到第一和第二摘要向量中。

29.

发明授权
Bi-directional copying of register content into shadow registers 有权
Title translation: 将寄存器内容双向复制到影子寄存器中

公开(公告)号：US09292221B2

公开(公告)日：2016-03-22

申请号：US13995943

申请日：2011-09-29

Applicant: Cheng Wang , Youfeng Wu , Jaewoong Chung

Inventor： Cheng Wang , Youfeng Wu , Jaewoong Chung

IPC: G06F3/06 , G06F9/30 , G06F9/38

CPC classification number: G06F3/065 , G06F9/30116 , G06F9/30123 , G06F9/3863

Abstract: Embodiments of the present disclosure describe a processor, which may include copy circuitry coupled to a shadow register file and a control register. The copy circuitry may be configured to copy content from a range of a number of registers to a shadow range of the shadow register file in a forward or backward direction. The forward or backward direction may be based at least in part on a value stored in the control register.

Abstract translation: 本公开的实施例描述了一种处理器，其可以包括耦合到影子寄存器文件和控制寄存器的复制电路。复制电路可以被配置为将内容从多个寄存器的范围向前或向后复制到影子寄存器文件的阴影范围。前进或后退方向可以至少部分地基于存储在控制寄存器中的值。

30.

发明申请
SOFTWARE PIPELINING AT RUNTIME 有权
Title translation: 软件管道运行

公开(公告)号：US20140298306A1

公开(公告)日：2014-10-02

申请号：US13853430

申请日：2013-03-29

Applicant: Hongbo Rong , Hyunchul Park , Youfeng Wu

Inventor： Hongbo Rong , Hyunchul Park , Youfeng Wu

IPC: G06F9/45

CPC classification number: G06F8/4452 , G06F8/433

Abstract: Apparatuses and methods may provide for determining a level of performance for processing one or more loops by a dynamic compiler and executing code optimizations to generate a pipelined schedule for the one or more loops that achieves the determined level of performance within a prescribed time period. In one example, a dependence graph may be established for the one or more loops, and each dependence graph may be partitioned into stages based on the level of performance.

Abstract translation: 设备和方法可以提供用于通过动态编译器来确定用于处理一个或多个循环的性能水平，并且执行代码优化以生成用于在规定时间段内实现所确定的性能水平的所述一个或多个循环的流水线调度。在一个示例中，可以为一个或多个循环建立依赖图，并且可以基于性能水平将每个依赖图划分成多个阶段。

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification