-
公开(公告)号:US20230359496A1
公开(公告)日:2023-11-09
申请号:US17589689
申请日:2022-01-31
Applicant: Intel Corporation
Inventor: PAWEL MAJEWSKI , PRASOONKUMAR SURTI , KARTHIK VAIDYANATHAN , JOSHUA BARCZAK , VASANTH RANGANATHAN , VIKRANTH VEMULAPALLI
CPC classification number: G06F9/5027 , G06F9/4881 , G06F9/54
Abstract: Apparatus and method for stack access throttling for synchronous ray tracing. For example, one embodiment of an apparatus comprises: ray tracing acceleration hardware to manage active ray tracing stack allocations to ensure that a size of the active ray tracing stack allocations remains within a threshold; and an execution unit to execute a thread to explicitly request a new ray tracing stack allocation from the ray tracing acceleration hardware, the ray tracing acceleration hardware to permit the new ray tracing stack allocation if the size of the active ray tracing stack allocations will remain within the threshold after permitting the new ray tracing stack allocation.
-
公开(公告)号:US20190332869A1
公开(公告)日:2019-10-31
申请号:US16379176
申请日:2019-04-09
Applicant: Intel Corporation
Inventor: MAYURESH M. VARERKAR , BARNAN DAS , NARAYAN BISWAL , STANLEY J. BARAN , GOKCEN CILINGIR , NILESH V. SHAH , ARCHIE SHARMA , SHERINE ABDELHAK , SACHIN GODSE , FARSHAD AKHBARI , NARAYAN SRINIVASA , ALTUG KOKER , NADATHUR RAJAGOPALAN SATISH , DUKHWAN KIM , FENG CHEN , ABHISHEK R. APPU , JOYDEEP RAY , PING T. TANG , MICHAEL S. STRICKLAND , XIAOMING CHEN , ANBANG YAO , TATIANA SHPEISMAN , VASANTH RANGANATHAN , SANJEEV JAHAGIRDAR
Abstract: A mechanism is described for facilitating person tracking and data security in machine learning at autonomous machines. A method of embodiments, as described herein, includes detecting, by a camera associated with one or more trackers, a person within a physical vicinity, where detecting includes capturing one or more images the person. The method may further include tracking, by the one or more trackers, the person based on the one or more images of the person, where tracking includes collect tracking data relating to the person. The method may further include selecting a tracker of the one or more trackers as a preferred tracker based on the tracking data.
-
公开(公告)号:US20210150770A1
公开(公告)日:2021-05-20
申请号:US17095544
申请日:2020-11-11
Applicant: Intel Corporation
Inventor: ABHISHEK R. APPU , PRASOONKUMAR SURTI , JILL BOYCE , SUBRAMANIAM MAIYURAN , MICHAEL APODACA , ADAM T. LAKE , JAMES HOLLAND , VASANTH RANGANATHAN , ALTUG KOKER , LIDONG XU , NIKOS KABURLASOS
Abstract: Embodiments described herein provided for an instruction and associated logic to enable a processing resource including a tensor accelerator to perform optimized computation of sparse submatrix operations. One embodiment provides hardware logic to apply a numerical transform to matrix data to increase the sparsity of the data. Increasing the sparsity may result in a higher compression ratio when the matrix data is compressed.
-
公开(公告)号:US20200218539A1
公开(公告)日:2020-07-09
申请号:US16243663
申请日:2019-01-09
Applicant: Intel Corporation
Inventor: JAMES VALERIO , VASANTH RANGANATHAN , JOYDEEP RAY , PRADEEP RAMANI
Abstract: A graphics processing device comprises a set of compute units to execute multiple threads of a workload, a cache coupled with the set of compute units, and a prefetcher to prefetch instructions associated with the workload. The prefetcher is configured to use a thread dispatch command that is used to dispatch threads to execute a kernel to prefetch instructions, parameters, and/or constants that will be used during execution of the kernel. Prefetch operations for the kernel can then occur concurrently with thread dispatch operations.
-
公开(公告)号:US20220083339A1
公开(公告)日:2022-03-17
申请号:US17509726
申请日:2021-10-25
Applicant: Intel Corporation
Inventor: JAMES VALERIO , VASANTH RANGANATHAN , JOYDEEP RAY , PRADEEP RAMANI
Abstract: A graphics processing device comprises a set of compute units to execute multiple threads of a workload, a cache coupled with the set of compute units, and a prefetcher to prefetch instructions associated with the workload. The prefetcher is configured to use a thread dispatch command that is used to dispatch threads to execute a kernel to prefetch instructions, parameters, and/or constants that will be used during execution of the kernel. Prefetch operations for the kernel can then occur concurrently with thread dispatch operations.
-
公开(公告)号:US20210191868A1
公开(公告)日:2021-06-24
申请号:US16724813
申请日:2019-12-23
Applicant: Intel Corporation
Inventor: JOYDEEP RAY , VASANTH RANGANATHAN , BEN ASHBAUGH , JAMES VALERIO
IPC: G06F12/0846 , G06F12/0837 , G06F12/084 , G06F9/50 , G06F9/38 , G06F9/30
Abstract: An apparatus to facilitate partitioning of local memory is disclosed. The apparatus includes a plurality of execution units to execute a plurality of execution threads, a memory coupled to share access between the plurality of execution units and partitioning hardware to partition the memory to be used as a cache and as shared local memory (SLM), wherein the partitioning hardware partitions the memory based on a quantity of the plurality of execution threads executing on the execution units that are active.
-
公开(公告)号:US20250077232A1
公开(公告)日:2025-03-06
申请号:US18882364
申请日:2024-09-11
Applicant: Intel Corporation
Inventor: JAMES VALERIO , VASANTH RANGANATHAN , JOYDEEP RAY , PRADEEP RAMANI
Abstract: A graphics processing device is provided that includes a set of compute units to execute a workload, a cache coupled with the set of compute units, and circuitry coupled with the cache and the set of compute units. The circuitry is configured to, in response to a cache miss for the read from a first cache, broadcast an event within the graphics processor device to identify data associated with the cache miss, receive the event at a second compute unit in the set of compute units, and prefetch the data identified by the event into a second cache that is local to the second compute unit before an attempt to read the instruction or data by the second thread.
-
公开(公告)号:US20200210246A1
公开(公告)日:2020-07-02
申请号:US16696848
申请日:2019-11-26
Applicant: Intel Corporation
Inventor: PRASOONKUMAR SURTI , DAVID COWPERTHWAITE , ABHISHEK R. APPU , JOYDEEP RAY , VASANTH RANGANATHAN , ALTUG KOKER , BALAJI VEMBU
IPC: G06F9/50
Abstract: A mechanism is described for facilitating localized load-balancing for processors in computing devices. A method of embodiments, as described herein, includes facilitating hosting, at a processor of a computing device, a local load-balancing mechanism. The method may further include monitoring balancing of loads at the processor and serving as a local scheduler to maintain de-centralized load-balancing at the processor and between the processor and other one or more processors.
-
公开(公告)号:US20230401064A1
公开(公告)日:2023-12-14
申请号:US18347964
申请日:2023-07-06
Applicant: Intel Corporation
Inventor: JAMES VALERIO , VASANTH RANGANATHAN , JOYDEEP RAY , PRADEEP RAMANI
CPC classification number: G06F9/3802 , G06T1/20 , G06F13/28
Abstract: A graphics processing device is provided that includes a set of compute units to execute a workload, a cache coupled with the set of compute units, and circuitry coupled with the cache and the set of compute units. The circuitry is configured to, in response to a cache miss for the read from a first cache, broadcast an event within the graphics processor device to identify data associated with the cache miss, receive the event at a second compute unit in the set of compute units, and prefetch the data identified by the event into a second cache that is local to the second compute unit before an attempt to read the instruction or data by the second thread.
-
公开(公告)号:US20230377209A1
公开(公告)日:2023-11-23
申请号:US18322194
申请日:2023-05-23
Applicant: Intel Corporation
Inventor: ABHISHEK R. APPU , PRASOONKUMAR SURTI , JILL BOYCE , SUBRAMANIAM MAIYURAN , MICHAEL APODACA , ADAM T. LAKE , JAMES HOLLAND , VASANTH RANGANATHAN , ALTUG KOKER , LIDONG XU , NIKOS KABURLASOS
CPC classification number: G06T9/002 , G06T9/007 , G06T15/005 , G06T9/008 , G06N3/045
Abstract: Embodiments described herein provided for an instruction and associated logic to enable a processing resource including a tensor accelerator to perform optimized computation of sparse submatrix operations. One embodiment provides a parallel processor comprising a processing cluster coupled with the cache memory. The processing cluster includes a plurality of multiprocessors coupled with a data interconnect, where a multiprocessor of the plurality of multiprocessors includes a tensor core configured to load tensor data and metadata associated with the tensor data from the cache memory, wherein the metadata indicates a first numerical transform applied to the tensor data, perform an inverse transform of the first numerical transform, perform a tensor operation on the tensor data after the inverse transform is performed, and write output of the tensor operation to a memory coupled with the processing cluster.
-
-
-
-
-
-
-
-
-