Patent search ap:("Advanced Micro Devices Page Inc.") AND inv:"Joseph L. Greathouse"

21.

发明申请
RANDOMLY BRANCHING USING HARDWARE WATCHPOINTS 有权
Title translation: 使用硬件手表的随机分支

公开(公告)号：US20150106602A1

公开(公告)日：2015-04-16

申请号：US14054356

申请日：2013-10-15

Applicant: Advanced Micro Devices, Inc.

Inventor： Joseph L. Greathouse , David S. Christie

IPC: G06F9/38

CPC classification number: G06F11/36 , G06F11/3065 , G06F11/3072 , G06F11/3093 , G06F11/3466 , G06F11/3471 , G06F11/3476 , G06F11/3495 , G06F11/3636 , G06F2201/81 , G06F2201/865 , G06F2201/88

Abstract: A system and method for efficiently performing program instrumentation. A processor processes instructions stored in a memory. The processor allocates a memory region for the purpose of creating “random branches” in the computer code utilizing existing memory access instructions. When the processor processes a given instruction, the processor both accesses a first location in the memory region and may determine a condition is satisfied. In response, the processor generates an interrupt. The corresponding interrupt handler may transfer control flow from the computer program to instrumentation code. The condition may include a pointer storing an address pointing to locations within the memory region equals a given address after the point is updated. Alternatively, the condition may include an updated data value stored in a location pointed to by the given address equals a threshold value.

Abstract translation: 一种有效执行程序仪表的系统和方法。处理器处理存储在存储器中的指令。处理器为了在现有的存储器访问指令中的计算机代码中创建“随机分支”而分配存储器区域。当处理器处理给定的指令时，处理器都访问存储器区域中的第一位置并且可以确定满足条件。作为响应，处理器产生中断。相应的中断处理程序可以将控制流程从计算机程序传送到仪表代码。条件可以包括存储指向存储器区域内的位置的地址的指针等于点更新之后的给定地址。或者，条件可以包括存储在由给定地址指向的位置的更新的数据值等于阈值。

22.

发明公开
HARDWARE SUPPORTED SPLIT BARRIER 审中-公开

公开(公告)号：US20230205608A1

公开(公告)日：2023-06-29

申请号：US17562934

申请日：2021-12-27

Applicant: Advanced Micro Devices, Inc.

Inventor： Brian Emberling , Joseph L. Greathouse

IPC: G06F9/52 , G06F9/54 , G06F9/30

CPC classification number: G06F9/522 , G06F9/542 , G06F9/3005

Abstract: A disclosed technique includes executing, for a first wavefront, a barrier arrival notification instruction, for a first barrier, indicating arrival at a first barrier point; performing, for the first wavefront, work prior to the first barrier point; executing, for the first wavefront, a barrier check instruction; and executing, for the first wavefront, at a control flow path based on a result of the barrier check instruction.

23.

发明授权
Method and apparatus for temperature-gradient aware data-placement for 3D stacked DRAMs 有权

公开(公告)号：US10725670B2

公开(公告)日：2020-07-28

申请号：US16052055

申请日：2018-08-01

Applicant: ADVANCED MICRO DEVICES, INC.

Inventor： Jagadish B. Kotra , Karthik Rao , Joseph L. Greathouse

IPC: G06F3/06

Abstract: A system including a stack of two or more layers of volatile memory, such as layers of a 3D stacked DRAM memory, places data in the stack based on a temperature or a refresh rate. When a threshold is exceeded, data are moved from a first region to a second region in the stack, the second region having one or both of a second temperature lower than a first temperature of the first region or a second refresh rate lower than a first refresh rate of the first region.

24.

发明授权
Dynamically adapting mechanism for translation lookaside buffer shootdowns 有权

公开(公告)号：US10552339B2

公开(公告)日：2020-02-04

申请号：US16005882

申请日：2018-06-12

Applicant: ADVANCED MICRO DEVICES, INC.

Inventor： Arkaprava Basu , Joseph L. Greathouse

IPC: G06F12/10 , G06F12/1027 , G06F9/48

Abstract: An operating system (OS) of a processing system having a plurality of processor cores determines a cost associated with different mechanisms for performing a translation lookaside buffer (TLB) shootdown in response to, for example, a virtual address being remapped to a new physical address, and selects a TLB shootdown mechanism to purge outdated or invalid address translations from the TLB based on the determined cost. In some embodiments, the OS selects an inter-processor interrupt (IPI) as the TLB shootdown mechanism if the cost associated with sending an IPI is less than a threshold cost. In some embodiments, the OS compares the cost of using an IPI as the TLB shootdown mechanism versus the cost of sending a hardware broadcast to all processor cores of the processing system as the shootdown mechanism and selects the shootdown mechanism having the lower cost.

25.

发明授权
Detecting buffer overflows in general-purpose GPU applications 有权

公开(公告)号：US10067710B2

公开(公告)日：2018-09-04

申请号：US15360518

申请日：2016-11-23

Applicant: Advanced Micro Devices, Inc.

Inventor： Joseph L. Greathouse , Christopher D. Erb , Michael G. Collins

IPC: G06F3/06 , G06F9/44 , G06T1/20 , G06T1/60 , G06F9/451

Abstract: A processing apparatus is provided that includes a plurality of memory regions each corresponding to a memory address and configured to store data associated with the corresponding memory address. The processing apparatus also includes an accelerated processing device in communication with the memory regions and configured to determine a request to allocate an initial memory buffer comprising a number of contiguous memory regions, create a new memory buffer comprising one or more additional memory regions adjacent to the contiguous memory regions of the initial memory buffer, assign one or more values to the one or more additional memory regions and detect a change to the one or more values at the one or more additional memory regions.

26.

发明授权
Efficient sparse matrix-vector multiplication on parallel processors 有权

公开(公告)号：US09697176B2

公开(公告)日：2017-07-04

申请号：US14542003

申请日：2014-11-14

Applicant: Advanced Micro Devices, Inc.

Inventor： Mayank Daga , Joseph L. Greathouse

IPC: G06F7/00 , G06F17/16

CPC classification number: G06F17/16

Abstract: A method of multiplication of a sparse matrix and a vector to obtain a new vector and a system for implementing the method are claimed. Embodiments of the method are intended to optimize the performance of sparse matrix-vector multiplication in highly parallel processors, such as GPUs. The sparse matrix is stored in compressed sparse row (CSR) format.

27.

发明申请
INSTRUCTION CONTEXT SWITCHING 有权
Title translation: 指令语境切换

公开(公告)号：US20160371082A1

公开(公告)日：2016-12-22

申请号：US14746601

申请日：2015-06-22

Applicant: Advanced Micro Devices, Inc.

Inventor： Dmitri Yudanov , Sergey Blagodurov , Arkaprava Basu , Sooraj Puthoor , Joseph L. Greathouse

IPC: G06F9/30

CPC classification number: G06F9/461 , G06F9/3013 , G06F9/3851

Abstract: A processing device includes a first memory that includes a context buffer. The processing device also includes a processor core to execute threads based on context information stored in registers of the processor core and a memory controller to selectively move a subset of the context information between the context buffer and the registers based on one or more latencies of the threads.

Abstract translation: 处理装置包括包括上下文缓冲器的第一存储器。处理设备还包括处理器核心，用于基于存储在处理器核心的寄存器中的上下文信息来执行线程，以及存储器控制器，用于基于上下文缓冲器和寄存器的一个或多个延迟来选择性地移动上下文信息的子集线程。

28.

发明申请
EFFICIENT SPARSE MATRIX-VECTOR MULTIPLICATION ON PARALLEL PROCESSORS 有权
Title translation: 并行处理器的有效空间矩阵矢量多项式

公开(公告)号：US20160140084A1

公开(公告)日：2016-05-19

申请号：US14542003

申请日：2014-11-14

Applicant: ADVANCED MICRO DEVICES, INC.

Inventor： Mayank Daga , Joseph L. Greathouse

IPC: G06F17/16

CPC classification number: G06F17/16

Abstract: A method of multiplication of a sparse matrix and a vector to obtain a new vector and a system for implementing the method are claimed. Embodiments of the method are intended to optimize the performance of sparse matrix-vector multiplication in highly parallel processors, such as GPUs. The sparse matrix is stored in compressed sparse row (CSR) format.

Abstract translation: 要求一种稀疏矩阵和向量的乘法以获得新的向量的方法和用于实现该方法的系统。该方法的实施例旨在优化在诸如GPU的高度并行处理器中的稀疏矩阵向量乘法的性能。稀疏矩阵以压缩稀疏行（CSR）格式存储。

29.

发明申请
RANDOMLY BRANCHING USING PERFORMANCE COUNTERS 有权
Title translation: 使用性能计数器的随机分支

公开(公告)号：US20150106604A1

公开(公告)日：2015-04-16

申请号：US14054345

申请日：2013-10-15

Applicant: Advanced Micro Devices, Inc.

Inventor： Joseph L. Greathouse , David S. Christie

IPC: G06F9/30

CPC classification number: G06F11/3466 , G06F11/3409 , G06F11/348 , G06F11/3636 , G06F2201/865 , G06F2201/88

Abstract: A system and method for efficiently performing program instrumentation. A processor processes instructions stored in a memory. When the processor processes a given instruction of a given instruction type, the processor updates a corresponding performance counter. When the performance counter reaches a threshold, the processor generates an interrupt and compares a location of the given instruction with stored locations in a given list. If a match is not found, then the processor processes an instruction following the given instruction in the computer program without processing intermediate instrumentation code. If a match is found, then the processor processes instrumentation code. Regardless of whether or not the instrumentation code is processed, when control flow returns to the computer program, the corresponding performance counter is initialized with a random value.

Abstract translation: 一种有效执行程序仪表的系统和方法。处理器处理存储在存储器中的指令。当处理器处理给定指令类型的给定指令时，处理器更新相应的性能计数器。当性能计数器达到阈值时，处理器产生中断，并将给定指令的位置与给定列表中存储的位置进行比较。如果未找到匹配项，则处理器处理计算机程序中的给定指令之后的指令，而不处理中间的仪器代码。如果找到匹配项，则处理器处理检测代码。无论仪器代码是否被处理，当控制流程返回到计算机程序时，相应的性能计数器将以随机值初始化。

30.

发明申请
PREFETCHING USING A DIRECT MEMORY ACCESS ENGINE 有权

公开(公告)号：US20250156329A1

公开(公告)日：2025-05-15

申请号：US18388940

申请日：2023-11-13

Applicant: ADVANCED MICRO DEVICES, INC.

Inventor： Vydhyanathan Kalyanasundharam , Christopher J. Brennan , Joseph L. Greathouse , Mark Fowler

IPC: G06F12/0862 , G06F13/28

Abstract: A processing system includes one or more DMA engines that load data from memory or another cache location without storing the data after loading it. As the data propagates past caches located between the memory or other cache location that stores the requested data (“intermediate caches”), the data is selectively copied to the intermediate caches based on a cache replacement policy. Rather than the DMA engine manually storing the data into the intermediate caches, the cache replacement policies of the intermediate caches determine whether the data is copied into each respective cache and a replacement priority of the data. By bypassing storing the data, the DMA engine effectuates prefetching to the intermediate caches without expending unnecessary bandwidth or searching for a memory location to store the data, thus reducing latency and saving energy.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification