-
公开(公告)号:US20170185451A1
公开(公告)日:2017-06-29
申请号:US14981257
申请日:2015-12-28
Applicant: Advanced Micro Devices, Inc. , ATI Technologies ULC
Inventor: Jimshed Mirza , YunPeng Zhu
IPC: G06F9/50
CPC classification number: G06F9/5016 , G06F9/5027 , G06F9/5066
Abstract: Methods, devices, and systems for data driven scheduling of a plurality of computing cores of a processor. A plurality of threads may be executed on the plurality of computing cores, according to a default schedule. The plurality of threads may be analyzed, based on the execution, to determine correlations among the plurality of threads. A data driven schedule may be generated based on the correlations. The plurality of threads may be executed on the plurality of computing cores according to the data driven schedule.
-
公开(公告)号:US20150120978A1
公开(公告)日:2015-04-30
申请号:US14523705
申请日:2014-10-24
Applicant: Advanced Micro Devices, Inc. , ATI Technologies ULC
Inventor: Vydhyanathan Kalyanasundharam , Philip Ng , Maggie Chan , Vincent Cueva , Liang Chen , Anthony Asaro , Jimshed Mirza , Greggory D. Donley , Bryan Broussard , Benjamin Tsien , Yaniv Adiri
CPC classification number: G06F12/1009 , G06F12/1045 , G06F12/12 , G06F2212/684
Abstract: The present invention provides for page table access and dirty bit management in hardware via a new atomic test[0] and OR and Mask. The present invention also provides for a gasket that enables ACE to CCI translations. This gasket further provides request translation between ACE and CCI, deadlock avoidance for victim and probe collision, ARM barrier handling, and power management interactions. The present invention also provides a solution for ARM victim/probe collision handling which deadlocks the unified northbridge. These solutions includes a dedicated writeback virtual channel, probes for IO requests using 4-hop protocol, and a WrBack Reorder Ability in MCT where victims update older requests with data as they pass the requests.
Abstract translation: 本发明通过新的原子测试[0]和OR和Mask来提供在硬件中的页表访问和脏位管理。 本发明还提供了一种使ACE能够进行CCI翻译的垫圈。 该垫片进一步提供了ACE和CCI之间的请求转换,针对受害者和探针冲突的死锁避免,ARM屏障处理和电源管理交互。 本发明还提供了一种用于ARM受害者/探测器碰撞处理的解决方案,其使统一的北桥陷入僵局。 这些解决方案包括一个专用的回写虚拟通道,使用4跳协议的IO请求的探测器和MCT中的WrBack重新排序能力,其中受害者通过数据通过请求时更新旧的请求。
-
公开(公告)号:US20150100818A1
公开(公告)日:2015-04-09
申请号:US14045701
申请日:2013-10-03
Applicant: ATI Technologies ULC , Advanced Micro Devices, Inc.
Inventor: Andrew Kegel , Jimshed Mirza , Paul Blinzer , Philip Ng
IPC: G06F11/07
CPC classification number: G06F13/00 , G06F11/073 , G06F11/0745 , G06F11/0793 , G06F12/00 , G06F12/10 , G06F12/1009 , G06F12/1081 , G06F13/385 , Y02D10/14 , Y02D10/151
Abstract: A system and method of managing requests from peripherals in a computer system are provided. In the system and method, an input/output memory management unit (IOMMU) receives a peripheral page request (PPR) from a peripheral. In response to a determination that a criterion regarding an available capacity of a PPR log is satisfied, a completion message is sent to the peripheral indicating that the PPR is complete and the PPR is discarded without queuing the PPR in the PPR log.
Abstract translation: 提供了一种在计算机系统中管理来自外围设备的请求的系统和方法。 在系统和方法中,输入/输出存储器管理单元(IOMMU)从外设接收外围寻呼请求(PPR)。 响应于满足关于PPR日志的可用容量的标准的确定,向外设发送完成消息,指示PPR完成并且PPR被丢弃,而不在PPR日志中排队PPR。
-
14.
公开(公告)号:US10540290B2
公开(公告)日:2020-01-21
申请号:US15139902
申请日:2016-04-27
Applicant: ATI Technologies ULC , Advanced Micro Devices, Inc.
Inventor: Gabriel H Loh , Jimshed Mirza
IPC: G06F12/00 , G06F12/1027 , G06F12/1009
Abstract: Methods and apparatus obtain one or more system page table entries that represent virtual system (e.g., memory) page to physical system page translations. A number of the obtained system page table entries that can be encoded in each of a plurality of translation lookaside buffer (TLB) entry encoding formats are determined. The method and apparatus may select one of the TLB entry encoding formats that encode a number of the obtained system page table entries. The method and apparatus may encode a number of obtained system page table entries in the TLB entry encoding format selected into a compressed encoding format TLB entry. The method and apparatus may associate the compressed encoding format TLB entry with an encoding format indication of the encoding format selected. The method and apparatus may decode a compressed encoding format TLB entry based on a determined TLB entry encoding format.
-
公开(公告)号:US10540280B2
公开(公告)日:2020-01-21
申请号:US15390080
申请日:2016-12-23
Applicant: Advanced Micro Devices, Inc. , ATI Technologies ULC
Inventor: Mark Fowler , Jimshed Mirza , Anthony Asaro
IPC: G06F12/1009 , G06T1/20 , G06F12/0804 , G06F12/0891
Abstract: Techniques for performing cache invalidates and write-backs in an accelerated processing device (e.g., a graphics processing device that renders three-dimensional graphics) are disclosed. The techniques involve receiving requests from a “master” (e.g., the central processing unit). The techniques involve invalidating virtual-to-physical address translations in an address translation request. The techniques include splitting up the requests based on whether the requests target virtually or physically tagged caches. Addresses for the portions of a request that target physically tagged caches are translated using invalidated virtual-to-physical address translations for speed. The split up request is processed to generate micro-transactions for individual caches targeted by the request. Micro-transactions for physically and virtually tagged caches are processed in parallel. Once all micro-transactions for a request have been processed, the unit that made the request is notified.
-
公开(公告)号:US20180308216A1
公开(公告)日:2018-10-25
申请号:US15496637
申请日:2017-04-25
Applicant: ATI Technologies ULC
Inventor: Jimshed Mirza , Al Hasanur Rahman , Sergey Korobkov , Houman Namiranian
IPC: G06T1/60 , G06T1/20 , G06F13/24 , G06F12/1009 , G06F12/121
CPC classification number: G06T1/60 , G06F12/1009 , G06F12/121 , G06F13/24
Abstract: Systems, apparatuses, and methods for tracking page reuse and migrating pages are disclosed. In one embodiment, a system includes one or more processors, a memory access monitor, and multiple memory regions. The memory access monitor tracks accesses to memory pages in a system memory during a programmable interval. If the number of accesses to a given page is greater than a programmable threshold during the programmable interval, then the memory access monitor generates an interrupt for software to migrate the given page from the system memory to a local memory. If the number of accesses to the given page is less than or equal to the programmable threshold during the programmable interval, then the given page remains in the system memory. After the programmable interval, the memory access monitor starts tracking the number of accesses to a new page in a subsequent interval.
-
公开(公告)号:US20180246820A1
公开(公告)日:2018-08-30
申请号:US15442402
申请日:2017-02-24
Applicant: ATI Technologies ULC
Inventor: Jimshed Mirza , Qian Ma
IPC: G06F13/16
Abstract: A system and method for maintaining information of pending operations are described. A buffer uses multiple linked lists implementing a single logical queue for a single requestor. The buffer maintains multiple head pointers and multiple tail pointers for the single requestor. Data entries of the single logical queue are stored in an alternating pattern among the multiple linked lists. During the allocation of buffer entries, the tail pointers are selected in the same alternating manner, and during the deallocation of buffer entries, the multiple head pointers are selected in the same manner.
-
公开(公告)号:US20180232316A1
公开(公告)日:2018-08-16
申请号:US15433560
申请日:2017-02-15
Applicant: ATI Technologies ULC
Inventor: Jimshed Mirza , Anthony Chan , Edwin Chi Yeung Pang
IPC: G06F12/1027 , G06F12/1009
CPC classification number: G06F12/1027 , G06F12/1009 , G06F2212/657 , G06F2212/68
Abstract: Systems, apparatuses, and methods for selecting default page sizes in a variable page size translation lookaside buffer (TLB) are disclosed. In one embodiment, a system includes at least one processor, a memory subsystem, and a first TLB. The first TLB is configured to allocate a first entry for a first request responsive to detecting a miss for the first request in the first TLB. Prior to determining a page size targeted by the first request, the first TLB specifies, in the first entry, that the first request targets a page of a first page size. Responsive to determining that the first request actually targets a second page size, the first TLB reissues the first request with an indication that the first request targets the second page size. On the reissue, the first TLB allocates a second entry and specifies the second page size for the first request.
-
公开(公告)号:US20180182155A1
公开(公告)日:2018-06-28
申请号:US15389075
申请日:2016-12-22
Applicant: Advanced Micro Devices, Inc. , ATI Technologies ULC
Inventor: Jimshed Mirza , Christopher J. Brennan , Anthony Chan , Leon Lai
IPC: G06T15/00 , G06T15/04 , G06T15/80 , G06F12/0875
Abstract: Systems, apparatuses, and methods for performing shader writes to compressed surfaces are disclosed. In one embodiment, a processor includes at least a memory and one or more shader units. In one embodiment, a shader unit of the processor is configured to receive a write request targeted to a compressed surface. The shader unit is configured to identify a first block of the compressed surface targeted by the write request. Responsive to determining the data of the write request targets less than the entirety of the first block, the first shader unit reads the first block from the cache and decompress the first block. Next, the first shader unit merges the data of the write request with the decompressed first block. Then, the shader unit compresses the merged data and writes the merged data to the cache.
-
公开(公告)号:US12236529B2
公开(公告)日:2025-02-25
申请号:US17562653
申请日:2021-12-27
Applicant: Advanced Micro Devices, Inc. , ATI Technologies ULC
Inventor: Christopher J. Brennan , Randy Wayne Ramsey , Nishank Pathak , Ricky Wai Yeung Iu , Jimshed Mirza , Anthony Chan
Abstract: Systems, apparatuses, and methods for implementing a discard engine in a graphics pipeline are disclosed. A system includes a graphics pipeline with a geometry engine launching shaders that generate attribute data for vertices of each primitive of a set of primitives. The attribute data is consumed by pixel shaders, with each pixel shader generating a deallocation message when the pixel shader no longer needs the attribute data. A discard engine gathers deallocations from multiple pixel shaders and determines when the attribute data is no longer needed. Once a block of attributes has been consumed by all potential pixel shader consumers, the discard engine deallocates the given block of attributes. The discard engine sends a discard command to the caches so that the attribute data can be invalidated and not written back to memory.
-
-
-
-
-
-
-
-
-