Methods and apparatus to insert profiling instructions into a graphics processing unit kernel

    公开(公告)号:US11775304B2

    公开(公告)日:2023-10-03

    申请号:US17359114

    申请日:2021-06-25

    申请人: Intel Corporation

    摘要: Embodiments are disclosed for inserting profiling instructions into graphics processing unit (GPU) kernels. An example apparatus includes instructions, and at least one processor to execute the instructions to determine whether a GPU supports modification of entry point addresses, detect a first entry point address and a second entry point address of an original GPU kernel, create a corresponding instrumented GPU kernel from the original GPU kernel based on the determination by inserting at least one of first profiling initialization instructions or first jump instructions at the first entry point address of the instrumented GPU kernel, inserting at least one of second profiling initialization instructions or second jump instructions at the second entry point address of the instrumented GPU kernel, and inserting profiling measurement instructions into the instrumented GPU kernel.

    Persistent relocatable reset vector for processor

    公开(公告)号:US09959120B2

    公开(公告)日:2018-05-01

    申请号:US13750013

    申请日:2013-01-25

    申请人: Apple Inc.

    IPC分类号: G06F1/32 G06F9/32 G06F9/30

    CPC分类号: G06F9/322 G06F9/30076

    摘要: In an embodiment, an integrated circuit includes at least one processor. The processor may include a reset vector base address register configured to store a reset vector address for the processor. Responsive to a reset, the processor may be configured to capture a reset vector address on an input, updating the reset vector base address register. Upon release from reset, the processor may initiate instruction execution at the reset vector address. The integrated circuit may further include a logic circuit that is coupled to provide the reset vector address. The logic circuit may include a register that is programmable with the reset vector address. More particularly, in an embodiment, the register may be programmable via a write operation issued by the processor (e.g. a memory-mapped write operation). Accordingly, the reset vector address may be programmable in the integrated circuit, and may be changed from time to time.

    GUEST INSTRUCTION TO NATIVE INSTRUCTION RANGE BASED MAPPING USING A CONVERSION LOOK ASIDE BUFFER OF A PROCESSOR
    7.
    发明申请
    GUEST INSTRUCTION TO NATIVE INSTRUCTION RANGE BASED MAPPING USING A CONVERSION LOOK ASIDE BUFFER OF A PROCESSOR 审中-公开
    使用转换视图处理器的缓冲区的指南到基于范围的映射

    公开(公告)号:US20170068541A1

    公开(公告)日:2017-03-09

    申请号:US15354679

    申请日:2016-11-17

    发明人: Mohammad Abdallah

    IPC分类号: G06F9/30 G06F12/0875 G06F9/38

    摘要: A method for translating instructions for a processor. The method includes accessing a plurality of guest instructions that comprise multiple guest branch instructions, and assembling the plurality of guest instructions into a guest instruction block. The guest instruction block is converted into a corresponding native conversion block. A mapping of the guest instruction block to corresponding native conversion block is stored in a conversion look aside buffer. Upon a subsequent request for a guest instruction, the conversion look aside buffer is indexed to determine whether a hit occurred, wherein the mapping indicates whether the guest instruction has a corresponding converted native instruction in the native cache. The converted native instruction is forwarded for execution in response to the hit.

    摘要翻译: 一种用于翻译处理器的指令的方法。 该方法包括:访问包含多个客户分支指令的多个访客指令,以及将多个访客指令组装成访客指令块。 客户指令块被转换为相应的本机转换块。 访客指令块到对应的本机转换块的映射被存储在转换后备缓冲器中。 在对客户指令的后续请求之后,转换看起来缓冲器被索引以确定是否发生命中,其中该映射指示访客指令是否具有本地高速缓存中的对应转换的本机指令。 转换的本地指令被转发以执行命令。

    GUEST TO NATIVE BLOCK ADDRESS MAPPINGS AND MANAGEMENT OF NATIVE CODE STORAGE
    8.
    发明申请
    GUEST TO NATIVE BLOCK ADDRESS MAPPINGS AND MANAGEMENT OF NATIVE CODE STORAGE 审中-公开
    对本地区地址映射的访问和本地代码存储的管理

    公开(公告)号:US20160321077A1

    公开(公告)日:2016-11-03

    申请号:US15208404

    申请日:2016-07-12

    发明人: Mohammad Abdallah

    摘要: A method for managing mappings of storage on a code cache for a processor. The method includes storing a plurality of guest address to native address mappings as entries in a conversion look aside buffer, wherein the entries indicate guest addresses that have corresponding converted native addresses stored within a code cache memory, and receiving a subsequent request for a guest address at the conversion look aside buffer. The conversion look aside buffer is indexed to determine whether there exists an entry that corresponds to the index, wherein the index comprises a tag and an offset that is used to identify the entry that corresponds to the index. Upon a hit on the tag, the corresponding entry is accessed to retrieve a pointer to the code cache memory corresponding block of converted native instructions. The corresponding block of converted native instructions are fetched from the code cache memory for execution.

    摘要翻译: 一种用于管理用于处理器的代码高速缓存上的存储的映射的方法。 该方法包括将多个访客地址存储为本地地址映射作为转换看待缓冲区中的条目,其中条目指示具有存储在代码高速缓冲存储器中的相应转换的本机地址的访客地址,以及接收对访客地址的后续请求 在转换看看缓冲区。 将缓冲器的转换看起来被索引以确定是否存在对应于索引的条目,其中索引包括用于标识对应于索引的条目的标签和偏移。 在标签上点击时,访问相应的条目以检索到转换的本地指令的代码高速缓冲存储器相应块的指针。 转换的本地指令的相应块从代码高速缓冲存储器中取出以供执行。

    COMPUTER PROCESSOR WITH REGISTER DIRECT BRANCHES AND EMPLOYING AN INSTRUCTION PRELOAD STRUCTURE
    9.
    发明申请
    COMPUTER PROCESSOR WITH REGISTER DIRECT BRANCHES AND EMPLOYING AN INSTRUCTION PRELOAD STRUCTURE 有权
    具有注册直接分支机构的计算机处理器,并采用指令性预告结构

    公开(公告)号:US20160314071A1

    公开(公告)日:2016-10-27

    申请号:US15087269

    申请日:2016-03-31

    IPC分类号: G06F12/08 G06F12/10

    摘要: A computer processor with register direct branches and employing an instruction preload structure is disclosed. The computer processor may include a hierarchy of memories comprising a first memory organized in a structure having one or more entries for one or more addresses corresponding to one or more instructions. The one or more entries of the one or more addresses may have a starting address. The structure may have one or more locations for storing the one or more instructions. The computer processor may include one or more registers to which one or more corresponding instruction addresses are writable. The computer processor may include processing logic. In response to the processing logic writing the one or more instruction addresses to the one or more registers, the processing logic may to pre-fetch the one or more instructions of a linear sequence of instructions from a first memory level of the hierarchy of memories into a second memory level of the hierarchy of memories beginning at the starting address. At least one address of the one or more addresses may be the contents of a register of the one or more registers.

    摘要翻译: 公开了一种具有寄存器直接分支和采用指令预加载结构的计算机处理器。 计算机处理器可以包括存储器层级,其包括以具有一个或多个对应于一个或多个指令的地址的一个或多个地址的一个或多个条目的结构组织的第一存储器。 一个或多个地址的一个或多个条目可以具有起始地址。 该结构可以具有用于存储一个或多个指令的一个或多个位置。 计算机处理器可以包括一个或多个寄存器,一个或多个对应的指令地址可写入到该寄存器。 计算机处理器可以包括处理逻辑。 响应于将一个或多个指令地址写入一个或多个寄存器的处理逻辑,处理逻辑可以从存储器层级的第一存储器级别预先获取线性指令序列的一个或多个指令, 从起始地址开始的存储器层级的第二存储器级别。 一个或多个地址的至少一个地址可以是一个或多个寄存器的寄存器的内容。