RANDOMLY BRANCHING USING HARDWARE WATCHPOINTS
    21.
    发明申请
    RANDOMLY BRANCHING USING HARDWARE WATCHPOINTS 有权
    使用硬件手表的随机分支

    公开(公告)号:US20150106602A1

    公开(公告)日:2015-04-16

    申请号:US14054356

    申请日:2013-10-15

    Abstract: A system and method for efficiently performing program instrumentation. A processor processes instructions stored in a memory. The processor allocates a memory region for the purpose of creating “random branches” in the computer code utilizing existing memory access instructions. When the processor processes a given instruction, the processor both accesses a first location in the memory region and may determine a condition is satisfied. In response, the processor generates an interrupt. The corresponding interrupt handler may transfer control flow from the computer program to instrumentation code. The condition may include a pointer storing an address pointing to locations within the memory region equals a given address after the point is updated. Alternatively, the condition may include an updated data value stored in a location pointed to by the given address equals a threshold value.

    Abstract translation: 一种有效执行程序仪表的系统和方法。 处理器处理存储在存储器中的指令。 处理器为了在现有的存储器访问指令中的计算机代码中创建“随机分支”而分配存储器区域。 当处理器处理给定的指令时,处理器都访问存储器区域中的第一位置并且可以确定满足条件。 作为响应,处理器产生中断。 相应的中断处理程序可以将控制流程从计算机程序传送到仪表代码。 条件可以包括存储指向存储器区域内的位置的地址的指针等于点更新之后的给定地址。 或者,条件可以包括存储在由给定地址指向的位置的更新的数据值等于阈值。

    Dynamically adapting mechanism for translation lookaside buffer shootdowns

    公开(公告)号:US10552339B2

    公开(公告)日:2020-02-04

    申请号:US16005882

    申请日:2018-06-12

    Abstract: An operating system (OS) of a processing system having a plurality of processor cores determines a cost associated with different mechanisms for performing a translation lookaside buffer (TLB) shootdown in response to, for example, a virtual address being remapped to a new physical address, and selects a TLB shootdown mechanism to purge outdated or invalid address translations from the TLB based on the determined cost. In some embodiments, the OS selects an inter-processor interrupt (IPI) as the TLB shootdown mechanism if the cost associated with sending an IPI is less than a threshold cost. In some embodiments, the OS compares the cost of using an IPI as the TLB shootdown mechanism versus the cost of sending a hardware broadcast to all processor cores of the processing system as the shootdown mechanism and selects the shootdown mechanism having the lower cost.

    Detecting buffer overflows in general-purpose GPU applications

    公开(公告)号:US10067710B2

    公开(公告)日:2018-09-04

    申请号:US15360518

    申请日:2016-11-23

    Abstract: A processing apparatus is provided that includes a plurality of memory regions each corresponding to a memory address and configured to store data associated with the corresponding memory address. The processing apparatus also includes an accelerated processing device in communication with the memory regions and configured to determine a request to allocate an initial memory buffer comprising a number of contiguous memory regions, create a new memory buffer comprising one or more additional memory regions adjacent to the contiguous memory regions of the initial memory buffer, assign one or more values to the one or more additional memory regions and detect a change to the one or more values at the one or more additional memory regions.

    INSTRUCTION CONTEXT SWITCHING
    27.
    发明申请
    INSTRUCTION CONTEXT SWITCHING 有权
    指令语境切换

    公开(公告)号:US20160371082A1

    公开(公告)日:2016-12-22

    申请号:US14746601

    申请日:2015-06-22

    CPC classification number: G06F9/461 G06F9/3013 G06F9/3851

    Abstract: A processing device includes a first memory that includes a context buffer. The processing device also includes a processor core to execute threads based on context information stored in registers of the processor core and a memory controller to selectively move a subset of the context information between the context buffer and the registers based on one or more latencies of the threads.

    Abstract translation: 处理装置包括包括上下文缓冲器的第一存储器。 处理设备还包括处理器核心,用于基于存储在处理器核心的寄存器中的上下文信息来执行线程,以及存储器控制器,用于基于上下文缓冲器和寄存器的一个或多个延迟来选择性地移动上下文信息的子集 线程。

    EFFICIENT SPARSE MATRIX-VECTOR MULTIPLICATION ON PARALLEL PROCESSORS
    28.
    发明申请
    EFFICIENT SPARSE MATRIX-VECTOR MULTIPLICATION ON PARALLEL PROCESSORS 有权
    并行处理器的有效空间矩阵矢量多项式

    公开(公告)号:US20160140084A1

    公开(公告)日:2016-05-19

    申请号:US14542003

    申请日:2014-11-14

    CPC classification number: G06F17/16

    Abstract: A method of multiplication of a sparse matrix and a vector to obtain a new vector and a system for implementing the method are claimed. Embodiments of the method are intended to optimize the performance of sparse matrix-vector multiplication in highly parallel processors, such as GPUs. The sparse matrix is stored in compressed sparse row (CSR) format.

    Abstract translation: 要求一种稀疏矩阵和向量的乘法以获得新的向量的方法和用于实现该方法的系统。 该方法的实施例旨在优化在诸如GPU的高度并行处理器中的稀疏矩阵向量乘法的性能。 稀疏矩阵以压缩稀疏行(CSR)格式存储。

    RANDOMLY BRANCHING USING PERFORMANCE COUNTERS
    29.
    发明申请
    RANDOMLY BRANCHING USING PERFORMANCE COUNTERS 有权
    使用性能计数器的随机分支

    公开(公告)号:US20150106604A1

    公开(公告)日:2015-04-16

    申请号:US14054345

    申请日:2013-10-15

    Abstract: A system and method for efficiently performing program instrumentation. A processor processes instructions stored in a memory. When the processor processes a given instruction of a given instruction type, the processor updates a corresponding performance counter. When the performance counter reaches a threshold, the processor generates an interrupt and compares a location of the given instruction with stored locations in a given list. If a match is not found, then the processor processes an instruction following the given instruction in the computer program without processing intermediate instrumentation code. If a match is found, then the processor processes instrumentation code. Regardless of whether or not the instrumentation code is processed, when control flow returns to the computer program, the corresponding performance counter is initialized with a random value.

    Abstract translation: 一种有效执行程序仪表的系统和方法。 处理器处理存储在存储器中的指令。 当处理器处理给定指令类型的给定指令时,处理器更新相应的性能计数器。 当性能计数器达到阈值时,处理器产生中断,并将给定指令的位置与给定列表中存储的位置进行比较。 如果未找到匹配项,则处理器处理计算机程序中的给定指令之后的指令,而不处理中间的仪器代码。 如果找到匹配项,则处理器处理检测代码。 无论仪器代码是否被处理,当控制流程返回到计算机程序时,相应的性能计数器将以随机值初始化。

    PREFETCHING USING A DIRECT MEMORY ACCESS ENGINE

    公开(公告)号:US20250156329A1

    公开(公告)日:2025-05-15

    申请号:US18388940

    申请日:2023-11-13

    Abstract: A processing system includes one or more DMA engines that load data from memory or another cache location without storing the data after loading it. As the data propagates past caches located between the memory or other cache location that stores the requested data (“intermediate caches”), the data is selectively copied to the intermediate caches based on a cache replacement policy. Rather than the DMA engine manually storing the data into the intermediate caches, the cache replacement policies of the intermediate caches determine whether the data is copied into each respective cache and a replacement priority of the data. By bypassing storing the data, the DMA engine effectuates prefetching to the intermediate caches without expending unnecessary bandwidth or searching for a memory location to store the data, thus reducing latency and saving energy.

Patent Agency Ranking