Patent search ap:("INTEL CORPORATION") AND inv:"Xinmin Tian" Page 3

21.

发明授权
Systems and methods for cache optimization 有权

公开(公告)号：US12182035B2

公开(公告)日：2024-12-31

申请号：US17428529

申请日：2020-03-14

Applicant: Intel Corporation

Inventor： Altug Koker , Joydeep Ray , Elmoustapha Ould-Ahmed-Vall , Abhishek Appu , Aravindh Anantaraman , Valentin Andrei , Durgaprasad Bilagi , Varghese George , Brent Insko , Sanjeev Jahagirdar , Scott Janus , Pattabhiraman K , SungYe Kim , Subramaniam Maiyuran , Vasanth Ranganathan , Lakshminarayanan Striramassarma , Xinmin Tian

IPC: G06F12/00 , G06F12/0875 , G06F12/0891 , G06F12/123 , G06T1/60

Abstract: Systems and methods for improving cache efficiency and utilization are disclosed. In one embodiment, a graphics processor includes processing resources to perform graphics operations and a cache controller of a cache memory that is coupled to the processing resources. The cache controller is configured to set an initial aging policy using an aging field based on age of cache lines within the cache memory and to determine whether a hint or an instruction to indicate a level of aging has been received.

22.

发明公开
HARDWARE ENHANCEMENTS FOR DOUBLE PRECISION SYSTOLIC SUPPORT 审中-公开

公开(公告)号：US20240111826A1

公开(公告)日：2024-04-04

申请号：US17937252

申请日：2022-09-30

Applicant: Intel Corporation

Inventor： Jiasheng Chen , Kevin Hurd , Changwon Rhee , Jorge Parra , Fangwen Fu , Theo Drane , William Zorn , Peter Caday , Gregory Henry , Guei-Yuan Lueh , Farzad Chehrazi , Amit Karande , Turbo Majumder , Xinmin Tian , Milind Girkar , Hong Jiang

IPC: G06F17/16 , G06F7/544 , G06T1/20

CPC classification number: G06F17/16 , G06F7/5443 , G06T1/20

Abstract: An apparatus to facilitate hardware enhancements for double precision systolic support is disclosed. The apparatus includes matrix acceleration hardware having double-precision (DP) matrix multiplication circuitry including a multiplier circuits to multiply pairs of input source operands in a DP floating-point format; adders to receive multiplier outputs from the multiplier circuits and accumulate the multiplier outputs in a high precision intermediate format; an accumulator circuit to accumulate adder outputs from the adders with at least one of a third global source operand on a first pass of the DP matrix multiplication circuitry or an intermediate result from the first pass on a second pass of the DP matrix multiplication circuitry, wherein the accumulator circuit to generate an accumulator output in the high precision intermediate format; and a down conversion and rounding circuit to down convert and round an output of the second pass as final result in the DP floating-point format.

23.

发明授权
Graphics processing unit processing and caching improvements 有权

公开(公告)号：US11861761B2

公开(公告)日：2024-01-02

申请号：US17095590

申请日：2020-11-11

Applicant: Intel Corporation

Inventor： Subramaniam Maiyuran , Durgaprasad Bilagi , Joydeep Ray , Scott Janus , Sanjeev Jahagirdar , Brent Insko , Lidong Xu , Abhishek R. Appu , James Holland , Vasanth Ranganathan , Nikos Kaburlasos , Altug Koker , Xinmin Tian , Guei-Yuan Lueh , Changliang Wang

IPC: G06T1/60 , G06T1/20 , G06F12/0802 , G06N5/04

CPC classification number: G06T1/60 , G06F12/0802 , G06N5/04 , G06T1/20 , G06F2212/251

Abstract: Embodiments described herein are generally directed to improvements relating to power, latency, bandwidth and/or performance issues relating to GPU processing/caching. According to one embodiment, a system includes a producer intellectual property (IP) (e.g., a media IP), a compute core (e.g., a GPU or an AI-specific core of the GPU), a streaming buffer logically interposed between the producer IP and the compute core. The producer IP is operable to consume data from memory and output results to the streaming buffer. The compute core is operable to perform AI inference processing based on data consumed from the streaming buffer and output AI inference processing results to the memory.

24.

发明公开
GRAPHICS PROCESSING UNIT PROCESSING AND CACHING IMPROVEMENTS 审中-公开

公开(公告)号：US20230260075A1

公开(公告)日：2023-08-17

申请号：US18305904

申请日：2023-04-24

Applicant: Intel Corporation

Inventor： Subramaniam Maiyuran , Durgaprasad Bilagi , Joydeep Ray , Scott Janus , Sanjeev Jahagirdar , Brent Insko , Lidong Xu , Abhishek R. Appu , James Holland , Vasanth Ranganathan , Nikos Kaburlasos , Altug Koker , Xinmin Tian , Guei-Yuan Lueh , Changliang Wang

IPC: G06T1/60 , G06T1/20 , G06F12/0802 , G06N5/04

CPC classification number: G06T1/60 , G06T1/20 , G06F12/0802 , G06N5/04 , G06F2212/251

Abstract: Embodiments described herein are generally directed to improvements relating to power, latency, bandwidth and/or performance issues relating to GPU processing/caching. According to one embodiment, a state of multiple intellectual property (IP) cores that have access to a common cache via a central fabric is observed. Responsive to the observed state being indicative of performance of a standalone workload by a first IP core of the multiple IP cores, the common cache is treated as a local cache of the first IP core by powering off the central fabric and causing the first IP core to access the common cache via a low power access path between the first IP core and the common cache that is outside of the central fabric.

25.

发明申请
PERFORMING GLOBAL MEMORY ATOMICS IN A PRIVATE CACHE OF A SUB-CORE OF A GRAPHICS PROCESSING UNIT 有权

公开(公告)号：US20230028666A1

公开(公告)日：2023-01-26

申请号：US17379121

申请日：2021-07-19

Applicant: Intel Corporation

Inventor： Joydeep Ray , Prathamesh Raghunath Shinde , Yue Qi , Abhishek R. Appu , Xinmin Tian , Vasanth Ranganathan , Ben J. Ashbaugh

IPC: G06F9/30

Abstract: Embodiments are directed to systems and methods for performing global memory atomics in a private cache of a sub-core of a GPU. An embodiment of a GPU includes multiple sub-cores each including a load/store pipeline. The load/store pipeline is operable to receive information specifying an atomic operation to be performed within a primary data cache of the load/store pipeline. The load/store pipeline is also operable to read data to be modified by the atomic operation into the primary data cache from a memory hierarchy shared by the multiple sub-cores. The load/store pipeline is further operable to produce an atomic result of the atomic operation by modifying the data within the primary data cache based on the atomic operation.

26.

发明授权
Mechanism for instruction set based thread execution on a plurality of instruction sequencers 有权

公开(公告)号：US10452403B2

公开(公告)日：2019-10-22

申请号：US14866875

申请日：2015-09-26

Applicant: Intel Corporation

Inventor： Hong Wang , John P. Shen , Edward T. Grochowski , Richard A. Hankins , Gautham N. Chinya , Bryant E. Bigbee , Shivnandan D. Kaushik , Xiang Chris Zou , Per Hammarlund , Scott Dion Rodgers , Xinmin Tian , Anil Aggawal , Prashant Sethi , Baiju V. Patel , James P Held

IPC: G06F9/455 , G06F9/38 , G06F9/30 , G06F9/48

Abstract: In an embodiment, a method is provided. The method includes managing user-level threads on a first instruction sequencer in response to executing user-level instructions on a second instruction sequencer that is under control of an application level program. A first user-level thread is run on the second instruction sequencer and contains one or more user level instructions. A first user level instruction has at least 1) a field that makes reference to one or more instruction sequencers or 2) implicitly references with a pointer to code that specifically addresses one or more instruction sequencers when the code is executed.

27.

发明申请
METHODS, SYSTEMS AND APPARATUS TO IMPROVE FPGA PIPELINE EMULATION EFFICIENCY ON CPUs 审中-公开

公开(公告)号：US20190005175A1

公开(公告)日：2019-01-03

申请号：US15636265

申请日：2017-06-28

Applicant: Intel Corporation

Inventor： Xinmin Tian , Geoff Lowney

IPC: G06F17/50

Abstract: Methods, apparatus, systems and articles of manufacture are disclosed to improve FPGA pipeline emulation efficiency on CPUs. An example disclosed apparatus includes a loop detector to identify a register shift loop in field programmable gate array (FPGA) code, an unroller to shift and store pipeline stages in the register shift loop to a temporary unroll array, an intermediate canceller to cancel out intermediate load and store values of the temporary unroll array to retain last shifted values of the pipeline stages, and a propagator to improve emulation efficiency of the FPGA code by generating a scalar loop of the retained last shifted values for a vectorization input.

28.

发明授权
Programmable event driven yield mechanism which may activate other threads 有权

公开(公告)号：US09910796B2

公开(公告)日：2018-03-06

申请号：US13844343

申请日：2013-03-15

Applicant: Intel Corporation

Inventor： Hong Wang , Per Hammarlund , Xiang Zou , John P. Shen , Xinmin Tian , Milind Girkar , Perry H. Wang , Piyush N. Desai

IPC: G06F9/38 , G06F13/24 , G06F9/30 , G06F9/48 , G06F11/34

CPC classification number: G06F13/24 , G06F9/3005 , G06F9/3009 , G06F9/30145 , G06F9/3851 , G06F9/4843 , G06F11/3024 , G06F11/348 , G06F12/0875 , G06F2201/86 , G06F2201/88 , G06F2201/885 , G06F2212/452

Abstract: Method, apparatus, and program means for a programmable event driven yield mechanism that may activate other threads. In one embodiment, an apparatus includes execution resources to execute a plurality of instructions and a monitor to detect a condition indicating a low level of progress. The monitor can disrupt processing of a program by transferring to a handler in response to detecting the condition indicating a low level of progress. In another embodiment, thread switch logic may be coupled to a plurality of event monitors which monitor events within the multithreading execution logic. The thread switch logic switches threads based at least partially on a programmable condition of one or more of the performance monitors.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification