Patent search ap:("QUALCOMM INCORPORATED") AND inv:"KRISHNA Page Anil"

1.

发明申请
PROVIDING VARIABLE INTERPRETATION OF USEFULNESS INDICATORS FOR MEMORY TABLES IN PROCESSOR-BASED SYSTEMS 审中-公开

公开(公告)号：WO2019055170A1

公开(公告)日：2019-03-21

申请号：PCT/US2018/047043

申请日：2018-08-20

Applicant: QUALCOMM INCORPORATED

Inventor： KRISHNA, Anil , YI, Yongseok , ROTENBERG, Eric , KOTHINTI NARESH, Vignyan Reddy , WRIGHT, Gregory, Michael

IPC: G06F12/123

Abstract: Providing variable interpretation of usefulness indicators for memory tables in processor-based systems is disclosed. In one aspect, a memory system comprises a memory table providing multiple memory table entries, each including a usefulness indicator. A memory controller of the memory system comprises a global polarity indicator representing how the usefulness indicator for each memory table entry is interpreted and updated by the memory controller. If the global polarity indicator is set, the memory controller interprets a value of each usefulness indicator as directly corresponding to the usefulness of the corresponding memory table entry. Conversely, if the global polarity indicator is not set, the polarity is reversed such that the memory controller interprets the usefulness indicator value as inversely corresponding to the usefulness of the corresponding memory table entry. In this manner, the interpretation and updating of usefulness indicators by the memory controller can be varied using the global polarity indicator.

2.

发明申请
PERFORMING DISTRIBUTED BRANCH PREDICTION USING FUSED PROCESSOR CORES IN PROCESSOR-BASED SYSTEMS 审中-公开
Title translation: 在基于处理器的系统中使用融合处理器核心执行分布式分支预测

公开(公告)号：WO2018057222A1

公开(公告)日：2018-03-29

申请号：PCT/US2017/048378

申请日：2017-08-24

Applicant: QUALCOMM INCORPORATED

Inventor： KRISHNA, Anil , KOTHINTI NARESH, Vignyan Reddy , WRIGHT, Gregory, Michael

IPC: G06F9/38

Abstract: Performing distributed branch prediction using fused processor cores in processor-based systems is disclosed. In one aspect, a distributed branch predictor is provided as a plurality of processor cores supporting core fusion. Each processor core is configured to receive a program identifier from another of the processor cores (or from itself), generate a subsequent predicted program identifier, and forward the predicted program identifier (and, optionally, a global history indicator) to the appropriate processor core responsible for handling the next prediction. The processor core also fetches a header and/or one or more instructions for the received program identifier, and sends the header and/or the one or more instructions to the appropriate processor core for execution. The processor core also determines the processor core that will handle execution of the predicted program identifier, and sends that information to the processor core that received the predicted program identifier as an instruction window tracker.

Abstract translation: 公开了在基于处理器的系统中使用融合处理器核执行分布式分支预测。在一个方面中，提供分布式分支预测器作为支持核心融合的多个处理器核心。每个处理器核被配置为从另一个处理器核（或从其自身）接收程序标识符，生成后续预测的程序标识符，并将预测的程序标识符（以及可选的全局历史指示符）转发到适当的处理器核负责处理下一个预测。处理器核还为所接收的程序标识符提取标题和/或一个或多个指令，并将标题和/或一个或多个指令发送到适当的处理器核以供执行。处理器内核还确定将处理预测程序标识符的执行的处理器内核，并将该信息发送给接收预测程序标识符的处理器内核作为指令窗口跟踪器。

3.

发明申请
HIERARCHICAL REGISTER FILE SYSTEM 审中-公开
Title translation: 分层寄存器文件系统

公开(公告)号：WO2017040087A1

公开(公告)日：2017-03-09

申请号：PCT/US2016/048008

申请日：2016-08-22

Applicant: QUALCOMM INCORPORATED

Inventor： KRISHNA, Anil , SMITH, Rodney Wayne , NAVADA, Sandeep Suresh , PRIYADARSHI, Shivam , CHOUDHARY, Niket Kumar , DAMODARAN, Raguram

IPC: G06F9/30 , G06F9/38

CPC classification number: G06F9/30105 , G06F9/30138 , G06F9/384 , G06F9/3867

Abstract: Systems and methods relate to a hierarchical register file system including a level 1 physical register file (LI PRF) and a backing physical register file (PRF). A subset of ouptuts of instructions executed in an instruction pipeline of a processor which are deemed to have a high likelihood of use for one or more future instructions are identified. The subset of instruction outputs are stored in the LI PRF, while all instructon outputs are stored in the backing PRF.

Abstract translation: 系统和方法涉及包括1级物理寄存器文件（LI PRF）和后置物理寄存器文件（PRF）的分级寄存器文件系统。识别在处理器的指令流水线中执行的被认为对于一个或多个未来指令具有高可用性的指令的子集。指令输出的子集存储在LI PRF中，而所有的指令输出都存储在后备PRF中。

4.

发明申请
FREEING PHYSICAL REGISTERS IN A MICROPROCESSOR 审中-公开
Title translation: 在微处理器中释放物理寄存器

公开(公告)号：WO2015142435A1

公开(公告)日：2015-09-24

申请号：PCT/US2015/014541

申请日：2015-02-05

Applicant: QUALCOMM INCORPORATED

Inventor： KRISHNA, Anil , WU, Weidan , NAVADA, Sandeep Suresh , CHOUDHARY, Niket Kumar , SMITH, Rodney Wayne

IPC: G06F9/38

CPC classification number: G06F9/30098 , G06F9/3832 , G06F9/3838 , G06F9/384 , G06F9/3861

Abstract: Physical register scrubbing in computer microprocessors. Most instructions in a computer program produce some output value that is destined for one or more architected registers. These architected destination registers are renamed, in the processor pipeline, to physical registers in order to improve performance by exposing more instruction level parallelism to the processor. In one aspect, a method comprises identifying, in a reorder buffer, a first instruction and a second instruction, without intervening potential pipeline flushers, that write to the same architected destination register, in order to free the physical register corresponding to the older of the two instructions.

Abstract translation: 计算机微处理器中的物理寄存器擦除。计算机程序中的大多数指令产生一些输出值，用于一个或多个架构化寄存器。这些架构化的目标寄存器在处理器流水线中被重命名为物理寄存器，以便通过向处理器暴露更多的指令级并行性来提高性能。在一个方面，一种方法包括在重排序缓冲器中识别第一指令和第二指令，而不间断地写入到同一架构目的寄存器的潜在流水线冲洗器，以便释放对应于较早的两个说明。

5.

发明申请
CACHING INSTRUCTION BLOCK HEADER DATA IN BLOCK ARCHITECTURE PROCESSOR-BASED SYSTEMS 审中-公开

公开(公告)号：WO2019045940A1

公开(公告)日：2019-03-07

申请号：PCT/US2018/044617

申请日：2018-07-31

Applicant: QUALCOMM INCORPORATED

Inventor： KRISHNA, Anil , WRIGHT, Gregory, Michael , YI, Yongseok , GILBERT, Matthew , KOTHINTI NARESH, Vignyan Reddy

IPC: G06F9/38

CPC classification number: G06F3/064 , G06F9/3802 , G06F9/3808 , G06F9/3836 , G06F12/0246 , G06F12/0802

Abstract: Caching instruction block header data in block architecture processor-based systems is disclosed. In one aspect, a computer processor device, based on a block architecture, provides an instruction block header cache dedicated to caching instruction block header data. Upon a subsequent fetch of an instruction block, cached instruction block header data may be retrieved from the instruction block header cache (if present) and used to optimize processing of the instruction block. In some aspects, the instruction block header data may include a microarchitectural block header (MBH) generated upon the first decoding of the instruction block by an MBH generation circuit. The MBH may contain static or dynamic information about the instructions within the instruction block. As non-limiting examples, the information may include data relating to register reads and writes, load and store operations, branch information, predicate information, special instructions, and/or serial execution preferences.

6.

发明申请
REDUCED LOGIC LEVEL OPERATION FOLDING OF CONTEXT HISTORY IN A HISTORY REGISTER IN A PREDICTION SYSTEM FOR A PROCESSOR-BASED SYSTEM 审中-公开

公开(公告)号：WO2019040268A1

公开(公告)日：2019-02-28

申请号：PCT/US2018/045373

申请日：2018-08-06

Applicant: QUALCOMM INCORPORATED

Inventor： KRISHNA, Anil , YI, Yongseok , KOTHINTI NARESH, Vignyan Reddy

IPC: G06F9/38

Abstract: Reduced logic level operation folding of context history in a history register in a prediction system for a processor-based system is disclosed. The prediction system includes a prediction circuit employing reduced operation folding of the history register for indexing a prediction table containing prediction values used to process a consumer instruction when value has not yet been resolved. To avoid the requirement to perform successive logic folding operations to produce a folded context history of a resultant reduced bit width, reduced logic level folding operation of the resultant reduced bit width is employed. Reduced logic level folding operation of the resultant reduced bit width involves using current folded context history from previous contents of a history register as basis for determining a new folded context history. In this manner, logic folding of the history register is faster and operates with reduced power consumption as a result of fewer logic operations.

7.

发明申请
METHOD AND APPARATUS FOR DYNAMIC CLOCK AND VOLTAGE SCALING IN A COMPUTER PROCESSOR BASED ON PROGRAM PHASE 审中-公开
Title translation: 基于程序相位的计算机处理器中的动态时钟和电压缩放的方法和装置

公开(公告)号：WO2017119991A1

公开(公告)日：2017-07-13

申请号：PCT/US2016/066099

申请日：2016-12-12

Applicant: QUALCOMM INCORPORATED

Inventor： PRIYADARSHI, Shivam , KRISHNA, Anil , DAMODARAN, Raguram , BRIDGES, Jeffrey Todd , WELLS, Ryan , GARGASH, Norman , SMITH, Rodney Wayne

IPC: G06F1/32

CPC classification number: G06F1/3228 , G06F1/3206 , G06F1/324 , G06F1/3296 , Y02D10/126 , Y02D10/172

Abstract: The disclosure generally relates to dynamic clock and voltage scaling (DCVS) based on program phase. For example, during each program phase, a first hardware counter may count each cycle where a dispatch stall occurs and an oldest instruction in a load queue is a last-level cache miss, a second hardware counter may count total cycles, and a third hardware counter may count committed instructions. Accordingly, a software/firmware mechanism may read the various hardware counters once the committed instruction counter reaches a threshold value and divide a value of first hardware counter by a value of second hardware counter to measure a stall fraction during a current program execution phase. The measured stall fraction can then be used to predict a stall fraction in a next program execution phase such that optimal voltage and frequency settings can be applied in the next phase based on the predicted stall fraction.

Abstract translation: 本公开总体上涉及基于编程阶段的动态时钟和电压缩放（DCVS）。例如，在每个程序阶段期间，第一硬件计数器可以计数发生调度停顿的每个周期，并且加载队列中的最老指令是最后一级高速缓存未命中，第二硬件计数器可以对总周期进行计数，并且第三硬件计数器可以计数承诺的指示因此，一旦所提交的指令计数器达到阈值并且将第一硬件计数器的值除以第二硬件计数器的值以在当前程序执行阶段期间测量停滞部分，则软件/固件机制可以读取各种硬件计数器。然后可以使用测量的失速分数来预测下一个程序执行阶段中的失速分数，使得可以基于预测的失速分数在下一阶段应用最佳电压和频率设置。

8.

发明申请
STORING NARROW PRODUCED VALUES FOR INSTRUCTION OPERANDS DIRECTLY IN A REGISTER MAP IN AN OUT-OF-ORDER PROCESSOR 审中-公开
Title translation: 在非订单处理者的注册地图中直接存储指令操作的生成值

公开(公告)号：WO2017030692A1

公开(公告)日：2017-02-23

申请号：PCT/US2016/042240

申请日：2016-07-14

Applicant: QUALCOMM INCORPORATED

Inventor： KRISHNA, Anil , SMITH, Rodney, Wayne , NAVADA, Sandeep, Suresh , PRIYADARSHI, Shivam , DAMODARAN, Raguram

IPC: G06F9/38

CPC classification number: G06F9/30112 , G06F9/3838 , G06F9/384 , G06F9/3857

Abstract: Storing narrow produced values for instruction operands directly in a register map in an out-of-order processor (OoP) is provided. An OoP is provided that includes an instruction processing system. The instruction processing system includes a number of instruction processing stages configured to pipeline the processing and execution of instructions according to a dataflow execution. The instruction processing system also includes a register map table (RMT) configured to store address pointers mapping logical registers to physical registers in a physical register file (PRF) for storing produced data for use by consumer instructions without overwriting logical registers for later executed, out-of-order instructions. In certain aspects, the instruction processing system is configured to write back (i.e., store) narrow values produced by executed instructions directly into the RMT, as opposed to writing the narrow produced values into the PRF in a write back stage.

Abstract translation: 提供将指令操作数的窄生成值直接存储在乱序处理器（OoP）的寄存器映射中。提供了包括指令处理系统的OoP。指令处理系统包括多个指令处理阶段，其被配置为根据数据流执行流水线处理和执行指令。指令处理系统还包括寄存器映射表（RMT），其被配置为存储将逻辑寄存器映射到物理寄存器文件（PRF）中的物理寄存器的地址指针，用于存储由消费者指令使用的产生数据，而不覆盖用于稍后执行的逻辑寄存器订单说明。在某些方面，指令处理系统被配置为将由执行的指令产生的窄值直接写入（即存储）到RMT中，而不是在写回阶段将窄的产生值写入PRF。

9.

发明申请
SELECT IN-ORDER INSTRUCTION PICK USING AN OUT OF ORDER INSTRUCTION PICKER 审中-公开

公开(公告)号：WO2019055168A1

公开(公告)日：2019-03-21

申请号：PCT/US2018/046898

申请日：2018-08-17

Applicant: QUALCOMM INCORPORATED

Inventor： KOTHINTI NARESH, Vignyan Reddy , HSU, Lisa , MURTHY, Vinay , KRISHNA, Anil , WRIGHT, Gregory

IPC: G06F9/38 , G06F9/30

Abstract: Systems and methods are directed to instruction execution in a computer system having an out of order instruction picker, which are typically used in computing systems capable of executing multiple instructions in parallel. Such systems are typically block based and multiple instructions are grouped in execution units such as Reservation Station (RSV) Arrays. If an event, such as an exception, page fault, or similar event occurs, the block may have to be swapped out, that is removed from execution, until the event clears. Typically when the event clears the block is brought back to be executed, but typically will be assigned a different RSV Array and re-executed from the beginning of the block. Tagging instructions that may cause such events and then untagging them, by resetting the tag, once they have executed can eliminate much of the typical unnecessary re-execution of instructions.

10.

发明申请
METHOD AND APPARATUS FOR EFFECTIVE CLOCK SCALING AT EXPOSED CACHE STALLS 审中-公开
Title translation: 用于高速缓存的有效时钟的方法和装置

公开(公告)号：WO2017052966A1

公开(公告)日：2017-03-30

申请号：PCT/US2016/048628

申请日：2016-08-25

Applicant: QUALCOMM INCORPORATED

Inventor： PRIYADARSHI, Shivam , KRISHNA, Anil , DAMODARAN, Raguram , BRIDGES, Jeffrey Todd , SPEIER, Thomas Philip , SMITH, Rodney Wayne , BOWMAN, Keith Alan , HANSQUINE, David Joseph Winston

IPC: G06F1/32 , G06F9/38

CPC classification number: G06F1/08 , G06F1/3206 , G06F1/324 , G06F1/3243 , G06F9/3824 , G06F9/3855 , G06F12/0804 , G06F12/0875 , G06F12/0897 , G06F12/12 , G06F2212/1024 , G06F2212/60 , Y02D10/126 , Y02D10/152

Abstract: The clock frequency of a processor is reduced in response to a dispatch stall due to a cache miss. In an embodiment, the processor clock frequency is reduced for a load instruction that causes a last level cache miss, provided that the load instruction is the oldest load instruction and the number of consecutive processor cycles in which there is a dispatch stall exceeds a threshold, and provided that the total number of processor cycles since the last level cache miss does not exceed some specified number.

Abstract translation: 响应于由于高速缓存未命中而导致的调度停顿，处理器的时钟频率被减小。在一个实施例中，如果加载指令是最早的加载指令，并且其中存在分派失速的连续处理器周期的数量超过阈值，则针对导致最后一级高速缓存未命中的加载指令减小处理器时钟频率，并且条件是自上一级高速缓存未命中以来的处理器周期总数不超过某个指定数量。

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification